从数据框python中删除字符

问题描述

我要替换表中列之一的str。示例:我想从df列中删除b“ SET和b” MULTISET。如何做到这一点。 我需要像这样的输出 详细信息如下,

columns = ['cust_id','cust_name','vehicle','details','bill'] 
df = pd.DataFrame(data=t,columns=columns)
df
    
        cust_id     cust_name                   vehicle                             details                                                 bill
0   101         b"SET{'Tom','C'}"           b"MULTISET{'Toyota','Cruiser'}"     b"ROW('Street 1','12345678','NewYork,US')"             1200.00
1   102         b"SET{'Rachel','Green'}"    b"MULTISET{'Ford','se'}"            b"ROW('Street 2','12344444','Florida,US')"             2400.00
2   103         b"SET{'Chandler','Bing'}"   b"MULTISET{'Dodge','mpv'}"          b"ROW('Street 1','12345555','Georgia,US')"             601.10 

必需的输出

    cust_id     cust_name                   vehicle                             details                                         bill
0   101         {'Tom','C'}                 {'Toyota','Cruiser'}            ('Street 1',US')               1200.00
1   102         {'Rachel','Green'}          {'Ford','se'}                   ('Street 2',US')               2400.00
2   103         {'Chandler','Bing'}         {'Dodge','mpv'}                 ('Street 1',US')               601.10 

解决方法

这是一个可能的解决方案,

  • 让我们定义感兴趣的列
columns = ['cust_name','vehicle','details']
  • 使用正则表达式来提取{}()之间的值
regex_ = r"([\{|\(].*[\}|\)])"
  • 放在一起,str.decode('ascii')是将列值从byte转换为string
columns = ['cust_name','details']

regex_ = r"([\{|\(].*[\}|\)])"

for col in columns:
    df[col] = df[col].str.decode('ascii').str.extract(regex_)

   cust_id            cust_name  ...                                details    bill
0      101          {'Tom','C'}  ...  ('Street 1','12345678','NewYork,US')  1200.0
1      102   {'Rachel','Green'}  ...  ('Street 2','12344444','Florida,US')  2400.0
2      103  {'Chandler','Bing'}  ...  ('Street 1','12345555','Georgia,US')   601.1