问题描述
<subject> <action> <object> @ <price> ... // The sentence can continue
我想从句子中提取这些值。
约束:
- 主题始终是
Bob
或Alice
- 操作是
bought
或sold
- 对象可以是 1-7 个字母的任意单词 //
4apples
应该返回 NULL - 价格可以是浮点数/整数
-
subject
之前可以有句子,但保证不会 包含Bob/Alice
。 -
@
之后可能有也可能没有空格
示例:
Hi there,Bob sold apples @2.0 dollars each
期望的输出:
Subject: Bob
Action: sold
Object: apples
Price: 2.0
目前,我是通过以下方式实现的:
#!/usr/bin/env python3
sentence = "Hi there,alice sold apples @2.0 dollars each"
sentence = sentence.lower()
if 'alice' in sentence or 'bob' in sentence:
s_list = sentence.split(" ")
s_idx = -1
if 'bob' in sentence:
s_idx = s_list.index('bob')
elif 'alice' in sentence:
s_idx = s_list.index('alice')
if s_idx > -1:
Subject = s_list[s_idx]
Action = s_list[s_idx+1]
Object = s_list[s_idx+2] #more if/else to validate Object contraints
Price = s_list[s_idx+3] #more if/else to extract 2.0 if we get @2.0
print("Subject: {},Action: {},Object: {},Price: {}".format(Subject,Action,Object,Price))
我怎样才能做得更好?可能使用 re
解决方法
您可以为每个元素使用带有命名捕获组的正则表达式:
import re
sentence = "Hi there,alice sold apples @2.0 dollars each"
values = re.search('(?P<subject>bob|alice)\s+(?P<action>bought|sold)\s+(?P<object>[A-Za-z]{1,7})\s+@\s*(?P<price>\d+(?:\.\d+)?)',sentence)
if values:
Subject = values['subject']
Action = values['action']
Object = values['object']
Price = values['price']
print("Subject: {},Action: {},Object: {},Price: {}".format(Subject,Action,Object,Price))
这将输出
Subject: alice,Action: sold,Object: apples,Price: 2.0
请注意,您可能希望为 re.I
提供 re.search
标志以允许 bob
或 Bob
(或 Sold
或 sold
等.) 匹配;在这种情况下,您可以将 [A-Za-z]
捕获组中的 object
替换为 [a-z]
。