User Tools

Site Tools


str.extract_-_extract_string_betwen_two_strings_in_pandas

I have a text column that looks like:

http://start.blabla.com/landing/fb603?&mkw...

I want to extract “start.blabla.com” which is always between:

http://

and:

/landing/

namely: start.blabla.com

solution:

df.col.str.extract(r'http://([^/]+)/landing')

Your regex matches http:/, then 0+ / symbols as few as possible and then /landing.

where [^/]+ is a negated character class that matches 1+ occurrences of characters other than /.

str.extract_-_extract_string_betwen_two_strings_in_pandas.txt · Last modified: 2016/12/14 22:29 by vincenzo