如何完成Python Dataframe列中缺少的字符

问题描述

我有以下Pandas DataFrame：

d = {'col1': ["1","2","3","4"],'col2': ["5%","6","7%","8%"]}
df = pd.DataFrame(data=d)
df

   col1  col2
0     1    5%
1     2     6
2     3    7%
3     4    8%

在col2的某些行中，可以有数字结尾处没有％符号。而且我事先不知道我在哪个行中有此问题。我需要确保所有数字在col2中都带有％号。

有没有一种方法可以在Python中完成而又不会遍历DataFrame？

解决方法

尝试numpy where：

df["col2"] = np.where(df.col2.str.endswith("%"),df.col2,df.col2.add("%"))

        col1    col2
    0   1       5%
    1   2       6%
    2   3       7%
   3    4       8%

或者，您可以使用list comprehension-它们非常有效，尤其是对于字符串：

df['col2'] = [f"{entry}%" if not entry.endswith("%") else entry 
              for entry in df.col2]

import numpy as np
condition=[df.col2.str.contains('%'),~df.col2.str.contains('%')]
choices=[df.col2,df.col2 +"%"]
df.col2=np.select(condition,choices)

  col1 col2
0    1   5%
1    2   6%
2    3   7%
3    4   8%

类似于sammywemmy的答案。在这种情况下，np.where()通常是我的首选：

df['col2'] = np.where(~(df['col2'].str.contains('%')),df['col2'] + '%',df['col2'])

df['col2'] = np.where((df['col2'].str.contains('%')),df['col2'],df['col2'] + '%')

如果'%'符号在右侧，则可以删除它们，然后在各处添加一个符号。

df['col2'] = df['col2'].str.rstrip('%')+'%'