问题描述
- 首字母缩写词以左括号开头,后跟大写或 数量:例如'(ABC' 或 '(ABC)' 或 '(ABC-2A)' 或 '(ABC-1)'。
但是NOT括号之间的单词以大写开头后跟小写,例如'(Bobby)' 或 '(Bob to the beach..)' --> 这是我正在努力解决的部分。
text = ['(ABC went to the beach','The girl (ABC-2A) is walking','The dog (Bobby) is being walked','They are there (ABC)' ]
for string in text:
cleaned_acronyms = re.sub(r'\([A-Z]*\)?','',string)
print(cleaned_acronyms)
#current output:
>> 'went to the beach' #Correct
>>'The girl -2A) is walking' #Not correct
>>'The dog obby) is being walked' #Not correct
>>'They are there' #Correct
#desired & correct output:
>> 'went to the beach'
>>'The girl is walking'
>>'The dog (Bobby) is being walked' #(Bobby) is NOT an acronym (uppercase+lowercase)
>>'They are there'
解决方法
在以下上下文中使用 \([A-Z\-0-9]{2,}\)?
:
import re
text = ['(ABC went to the beach','The girl (ABC-2A) is walking','The dog (Bobby) is being walked','They are there (ABC)' ]
for string in text:
cleaned_acronyms = re.sub(r'\([A-Z\-0-9]{2,}\)?','',string)
print(cleaned_acronyms)
我得到了这些结果:
' went to the beach'
'The girl is walking'
'The dog (Bobby) is being walked'
'They are there '
,
尝试一个负面的前瞻:
\((?![A-Z][a-z])[A-Z\d-]+\)?\s*
查看在线demo
-
\(
- 文字开头的括号。 -
(?![A-Z][a-z])
- 断言位置的否定前瞻,后跟大写和小写。 -
[A-Z\d-]+
- 匹配 1+ 个大写字母字符、数字或连字符。 -
\)?
- 可选的文字右括号。 -
\s*
- 0+ 个空白字符。
一些示例 Python 脚本:
import re
text = ['(ABC went to the beach','They are there (ABC)' ]
for string in text:
cleaned_acronyms = re.sub(r'\((?![A-Z][a-z])[A-Z\d-]+\)?\s*',string)
print(cleaned_acronyms)
打印:
went to the beach
The girl is walking
The dog (Bobby) is being walked
They are there
,
使用模式 \([A-Z0-9\-]+\)
例如:
import re
text = ['ABC went to the beach','They are there (ABC)' ]
ptrn = re.compile(r"\([A-Z0-9\-]+\)")
for i in text:
print(ptrn.sub("",i))
输出:
ABC went to the beach
The girl is walking
The dog (Bobby) is being walked
They are there