UnicodeEncodeError:'ascii'编解码器无法在位置12编码字符'\ u200b':即使不在每个字符串前添加u,序数也不在range128中

问题描述

我的urllib代码如下:

#-*- coding:utf-8 -*-

import http.client
import urllib.parse

host = u"​www.cloudflare.com"
url = u"%s:80" % host
conn = http.client.httpconnection(url)

method = u"GET"

request_url = u"https://%s" % host


headers = {
    u"Host": host,u"Accept": u"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",u"Accept-Encoding": u"gzip,deflate,br",u"User-Agent":  u"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/86.0.4240.75 Safari/537.36",#"Android-ALI-Moblie 1.3.0",u"Content-Type": u"application/x-www-form-urlencoded;charset=UTF-8",u"Cookie": u"spreadCode=789789; md5Password=true; JSESSIONID=25DE5EBD2C30D10A505FA70B64D8EA03",u"Accept-Language": u"zh-CN,zh;q=0.9,en;q=0.8",u"Cache-Control": u"max-age=0",u"Connection": u"keep-alive",u"Upgrade-Insecure-Requests": u"1",}

conn.request(method=method,url=request_url,headers=headers)  # this line get error

response = conn.getresponse()

print(response.status,response.reason)

response_data = response.read()
print(response_data)

response_headers = response.getheaders()
print(response_headers)

response_head_cookie = response.getheader('Set-Cookie')
print(response_head_cookie)

conn.close()

运行代码时,出现问题:

File "/Users/dele/Desktop/TestPython/httpconnection/test_httpconnection.py",line 38,in <module>
...
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u200b' in position 12: ordinal not in range(128)

我还搜索了这些帖子:questions/15084194

所有人都说在每个字符串前加u。但是我仍然会出错。


编辑-01

我将代码更改为以下代码

import http.client

host="www.cloudflare.com"
url="%s:80" % host
conn=http.client.httpconnection(url)

method = "GET"

request_url = "https://%s" % host

headers = {
    "Host": "​www.cloudflare.com","Accept-Encoding": "gzip","User-Agent": "Android-ALI-Moblie 1.3.0","Content-Type": "application/x-www-form-urlencoded;charset=UTF-8","Connection": "Keep-Alive"
}

conn.request(method=method,headers=headers)

response = conn.getresponse()

print(response.status,response.reason)

response_data = response.read()
print(response_data)

response_headers = response.getheaders()
print(response_headers)

response_head_cookie = response.getheader('Set-Cookie')
print(response_head_cookie)

conn.close()

但仍然出现此错误

UnicodeEncodeError: 'latin-1' codec can't encode character '\u200b' in position 0: ordinal not in range(256)

解决方法

\u200b是“零宽度”字符;看起来像在代码中的某个地方,您无意间将此字符(基本上是不可见的)粘贴到了headers中。

尝试遍历标题,并通过.encode('ascii',errors='xmlcharrefreplace')运行每个项目;应该会向您显示多余字符的隐藏位置,然后可以将其删除。

编辑:找到了它;您的host变量的开头为\u200b字符。您需要删除host = u"www.cloudflare.com"之间的不可见字符。