在Python中将特殊字符替换为“ N / A”

问题描述

我想将仅带有表情符号（例如df['Comments'][2]）的所有行更改为N / A。

df['Comments'][:6]
0                                                          nice
1                                                       Insane3
2                                                          ??❤️
3                                                @bertelsen1986
4                       20 or 30 mm rise on the Renthal Fatbar?
5                                     Luckily I have one to ???

以下代码不会返回我期望的输出：

df['Comments'].replace(';',':','!','*',np.NaN)

预期输出：

df['Comments'][:6]
0                                                          nice
1                                                       Insane3
2                                                          nan
3                                                @bertelsen1986
4                       20 or 30 mm rise on the Renthal Fatbar?
5                                     Luckily I have one to ???

解决方法

您可以通过遍历每行中的Unicode字符（使用emoji和unicodedata包）来检测仅包含表情符号的行：

df = {}
df['Comments'] = ["Test","Hello ?","???"]

import unicodedata
import numpy as np
from emoji import UNICODE_EMOJI
for i in range(len(df['Comments'])):
    pure_emoji = True
    for unicode_char in unicodedata.normalize('NFC',df['Comments'][i]):
        if unicode_char not in UNICODE_EMOJI:
            pure_emoji = False
            break
    if pure_emoji:
        df['Comments'][i] = np.NaN
print(df['Comments'])

函数（remove_emoji）参考https://stackoverflow.com/a/61839832/6075699

尝试
安装第一个emoji库-pip install emoji

import re
import emoji

df.Comments.apply(lambda x: x if (re.sub(r'(:[!_\-\w]+:)','',emoji.demojize(x)) != "") else np.nan)
0                         nice
1                      Insane3
2                          NaN
3               @bertelsen1986
4    Luckily I have one to ???
Name: a,dtype: object

pandas python python-3.x regex special-characters