解析pandas df列中的url并获取特定索引的值

问题描述

我有一个url列的pandas df。数据如下:

row               url
1      'https://www.delish.com/cooking/recipe-ideas/recipes/four-cheese'
2      'https://www.delish.com/holiday-recipes/thanksgiving/thanksgiving-cabbage/
3      'https://www.delish.com/kitchen-tools/cookware-reviews/advice/kitchen-tools-gadgets/'

我只需要获取第二索引的值,即烹饪或度假食谱等。
所需的输出:

row               url
1               cooking
2               holiday-recipes
3               kitchen-tools

我想将URL解析为不同的列,然后删除不需要的列。这是代码:

df['protocol'],df['domain'],df['path']=zip(*df['url'].map(urlparse(df['url']).urlsplit))

错误消息是:ValueError: The truth value of a Series is ambiguous. Use a.empty,a.bool(),a.item(),a.any() or a.all(). 有解决这个问题的更好方法吗?如何获取特定索引?

解决方法

这是您要找的吗?

df['url'] = df['url'].str.split('/').str[3]
print(df)

   row              url
0    1          cooking
1    2  holiday-recipes
2    3    kitchen-tools
,

另一种方法是将alphas之后的-与字符com匹配

df['url']=df['url'].str.extract('((?<=com\/)[a-z-]+)')



          url
0          cooking
1  holiday-recipes
2    kitchen-tools

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...