删除字符串python中的ascii字符

问题描述

我想删除字符串中的特殊字符。但是，我没有成功。你能帮我吗？

它分别显示两个“”，但在打印时仅显示“”。为什么会这样？。

数据更新：

data = [{
            "data": "0\\x1e\\x82*.extractdomain.com\\x82\\x0ctest.extractdomain.com","name": "subjectAltName"
        }]

re.sub("[^\x20-\x7E]","",data["data"])

解决方法

尝试一下。

clean_text = ' '.join(re.findall(r"[^\W]+",text))

编辑：或这个。

custom_translation = {130: None,22: None}
print(text.translate(custom_translation))

帖子已被编辑为“文本已更改”，此解决方案不再起作用。旧文本是

text = '0:\x82 test test test\x82\x16testtesttest'

较新的解决方案：

custom_translation = {22: None,49: None,50: None,54: None,56: None,92: None,120: None}
print(text.translate(custom_translation))

txt = "0:\\x82 test test test\\x82\\x16testtesttest"
x = re.sub("\\\\(?:x16|x82)","",txt)

作为此类字符的概括：

x = re.sub("\\\\(?:x\w\w)",txt)

输出：

0: test test testtesttesttest

认识到

：

简而言之，要匹配文字反斜杠，必须将'\\'作为RE字符串写入，因为正则表达式必须为\，并且每个反斜杠必须在常规Python字符串文字中表示为\。在具有重复反斜杠的RE中，这会导致很多重复的反斜杠，并使生成的字符串难以理解。

另一种方法是对正则表达式使用Python的原始字符串表示法；反斜杠在以'r'开头的字符串文字中不会以任何特殊方式处理，因此r“ \ n”是一个包含''和'n'的两个字符的字符串，而“ \ n”是一个包含以下内容的单字符字符串换行符。正则表达式通常会使用原始字符串表示法以Python代码编写。

有关更多示例-Backslash Plague

错误在于text的声明中，您对\进行了两次转义，因此您正在编写普通的\而不是转义十六进制字符

text = '0:\x82 test test test\x82\x16testtesttest'

print(re.sub("[^\x20-\x7E]",text))

张照片： 0: test test testtesttesttest

尝试这种方法

import re


def delete_punc(s):

  s1 = s.split()

  match_pattern1 = re.findall(r'[a-zA-Z]',(str(s1[0])))
  match_pattern2 = re.findall(r'[a-zA-Z]',(str(s1[1])))



  listToStr1 = ''.join([str(elem) for elem in match_pattern1])
  listToStr2 = ''.join([str(elem) for elem in match_pattern2])

  return listToStr1 + ' ' + listToStr2

print(delete_punc("He3l?/l!o W{o'r[l9\d)"))

输出

Hello World

该字符串似乎包含\x个转义符，这些转义符本身已经被转义，导致双反斜杠。也许您收到了这样的数据，或者某些较早的处理损坏了数据。通过将字符串编码为字节，然后使用unicode-escape编解码器进行解码，可以删除加倍的反斜杠。之后，您的正则表达式将起作用。

>>> s = "0\\x1e\\x82*.extractdomain.com\\x82\\x0ctest.extractdomain.com"
>>> fixed = s.encode('latin-1').decode('unicode-escape')
>>> fixed
'0\x1e\x82*.extractdomain.com\x82\x0ctest.extractdomain.com'
>>> re.sub("[^\x20-\x7E]",fixed)
'0*.extractdomain.comtest.extractdomain.com'

encode python

删除字符串python中的ascii字符

问题描述

解决方法

相关问答