Python:在DataFrame字符串对象中获取以特定字词开头和结尾的数字

问题描述

我希望在 total USD 之间出现的总费用中,字符串形式为整数。

示例数据框:

    id   name    lastname   message 
0   1   John    Doe        John have 100 USD,so he buy 5 eggs which total cost 10 USD
1   2   Mar     Aye        Mar have 10 USD,he just buy a banana from another shop for 16 USD

所以最终结果应该是:

    id   name    lastname   message                                                             total
0   1   John    Doe        John have 100 USD,so he buy 5 eggs which total cost 10 USD         10
1   2   Mar     Aye        Mar have 10 USD,he just buy a banana from another shop for 16 USD  0

解决方法

您可以使用正则表达式捕获介于“总计”和“美元”之间的任何数字。

下面的代码将捕获任何数字(如果是第一个,则为第一个,如果应接受浮点数,则需要进行一些调整,但由于类型应为int,因此不需要),然后将其转换为int类型。

df['total'] = df['message'].str.extract('total.*?(\d+).*?USD').fillna(0).astype(int)

结果:

id   name    lastname   message                                                             total
0   1   John    Doe        John have 100 USD,so he buy 5 eggs which total cost 10 USD         10
1   2   Mar     Aye        Mar have 10 USD,he just buy a banana from another shop for 16 USD  0