User Tools

Site Tools


str.extract_-_extract_string_betwen_two_strings_in_pandas

This is an old revision of the document!


I have a text column that looks like:

http://start.blabla.com/landing/fb603?&mkw...

I want to extract “start.blabla.com” which is always between:

http://

and:

/landing/

namely: start.blabla.com

solution:

You need to match and capture the characters (The extract method accepts a regular expression with at least one capture group.) after http: other than /, 1 or more times. It can be done with: <code>df.col.str.extract(r'http:([^/]+)/landing')</code>

Your regex matches http:/, then 0+ / symbols as few as possible and then /landing.

where [^/]+ is a negated character class that matches 1+ occurrences of characters other than /.

str.extract_-_extract_string_betwen_two_strings_in_pandas.1481715632.txt.gz · Last modified: 2016/12/14 12:40 by vincenzo