问题描述
我想替换产品变体的许多值。
Big Ben Personalized Products AVENGERS – Stark / 2 set 2
BigBen Personalized Products Expendables – Statham / 2 set 2
BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 5 set 2
BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 3 set 1
Personalized Toy 5 set 1
BIG BEN Personalized Machine 20.00% Off Auto renew (Versand jeden 3 Monate) Kids Toy / 3 set 1
BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Kids Toy / 5 set 1
BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Adults Toy / 5 set 1
BigBen Personalized Products 20.00% Off Auto renew (Versand jeden 5 Monate) Adults Toy / 5 set
有许多产品变体实际上具有相同的值。
df["product_variant"]= df["product_variant"].str.replace('BigBen Personalized','',case = False)
df["product_variant"]= df["product_variant"].str.replace('Big Ben Personalized ',case = False)
df["product_variant"]= df["product_variant"].str.replace('BigBen Personalized',case = False)
df["product_variant"]= df["product_variant"].str.replace('Auto renew',case = False)
I expect the data row by row to look more like this:
AVENGERS - Stark (2 set)
Expendables - Statham (2 set)
Adults Toy (5 set)
Toy (5 set)
Kids Toy (3 set)
Kids Toy (5 set)
Adults Toy (5 set)
Kids Toy (5 set)
Adults Toy (3 set)
解决方法
一个选择是为这些示例创建一个带有2个捕获组的特定模式。
对于大多数项目,请先全部匹配,直到Products
之后或Adults
或Kids
之前
- 在{strong>第1组中捕获
/
之前存在的部分。 - 在第2组 1中捕获,或在数字后跟
set
示例模式
^(?:big\s*ben personalized (?:products\s+)?(?:.*?(?=Adult|Kids))?|personalized\s+)(\w+(?: \w+)*(?: – \w+(?: \w+)*)?)(?: /)? (\d+ set)\b.*
在使用两个捕获组\1 (\2)
import pandas as pd
regex = r"^Event:\s+Task_(\d+)Error:(NO_ERROR|ERROR_(?:MINOR|\d+))(?:\w+:(\w+))?"
items = [
"Big Ben Personalized Products AVENGERS – Stark / 2 set 2","BigBen Personalized Products Expendables – Statham / 2 set 2","BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 5 set 2","BigBen Personalized Toy 20.00% Off Auto renew Adults Toy / 3 set 1","Personalized Toy 5 set 1","BIG BEN Personalized Machine 20.00% Off Auto renew (Versand jeden 3 Monate) Kids Toy / 3 set 1","BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Kids Toy / 5 set 1","BigBen Personalized Toy 20.00% Off Auto renew (Versand jeden 2 Monate) Adults Toy / 5 set 1","BigBen Personalized Products 20.00% Off Auto renew (Versand jeden 5 Monate) Adults Toy / 5 set "
]
df = pd.DataFrame(items,columns=["product_variant"])
df["product_variant"] = df["product_variant"].replace(
r'(?i)^(?:big\s*ben personalized (?:products\s+)?(?:.*?(?=Adult|Kids))?|personalized\s+)(\w+(?: \w+)*(?: – \w+(?: \w+)*)?)(?: /)? (\d+ set)\b.*',r'\1 (\2)',regex=True
)
print(df)
输出
product_variant
0 AVENGERS – Stark (2 set)
1 Expendables – Statham (2 set)
2 Adults Toy (5 set)
3 Adults Toy (3 set)
4 Toy (5 set)
5 Kids Toy (3 set)
6 Kids Toy (5 set)
7 Adults Toy (5 set)
8 Adults Toy (5 set)