使用Pandas.DataFranme.replace()用NaN替换空字符串时遇到麻烦

我有一个pandas数据框,其中有一些观察到的空字符串,我想用NaN(np.nan)替换.

我成功使用替换了大多数这些空字符串

df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)

但是我仍然发现空字符串.例如,当我跑步

sub_df = df[df['OBJECT_COL'] == '']
sub_df.replace(r'\s+', np.nan, regex = True)
print(sub_df['OBJECT_COL'] == '')

输出全部返回True

我应该尝试其他方法吗？有没有办法读取这些单元格的编码,使得我的.replace()无效,因为编码很奇怪？

解决方法:

另一种选择.

sub_df.replace(r'^\s+$', np.nan, regex=True)

或,以仅空格替换空字符串和记录

sub.df.replace(r'^\s*$', np.nan, regex=True)

选择：

使用带lambda函数的apply().

sub_df.apply(lambda x: x.str.strip()).replace('', np.nan)

只是示例插图：

>>> import numpy as np
>>> import pandas as pd

示例DataFrame具有空字符串和空格.

>>> sub_df
        col_A
0
1
2   somevalue
3  othervalue
4

适用的解决方案针对不同条件：

最佳解决方案：

>>> sub_df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
        col_A
0         NaN
1         NaN
2   somevalue
3  othervalue
4         NaN

2)这是可行的,但在两种情况下都不是：

>>> sub_df.replace(r'^\s+$', np.nan, regex=True)
        col_A
0
1         NaN
2   somevalue
3  othervalue
4         NaN

3)这也适用于两种情况.

>>> sub_df.replace(r'^\s*$', np.nan, regex=True)

            col_A
    0         NaN
    1         NaN
    2   somevalue
    3  othervalue
    4         NaN

4)这也适用于两种情况.

>>> sub_df.apply(lambda x: x.str.strip()).replace('', np.nan)
        col_A
0         NaN
1         NaN
2   somevalue
3  othervalue
4         NaN

使用Pandas.DataFranme.replace()用NaN替换空字符串时遇到麻烦

相关文章