问题描述
我的urllib代码如下:
#-*- coding:utf-8 -*-
import http.client
import urllib.parse
host = u"www.cloudflare.com"
url = u"%s:80" % host
conn = http.client.httpconnection(url)
method = u"GET"
request_url = u"https://%s" % host
headers = {
u"Host": host,u"Accept": u"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",u"Accept-Encoding": u"gzip,deflate,br",u"User-Agent": u"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/86.0.4240.75 Safari/537.36",#"Android-ALI-Moblie 1.3.0",u"Content-Type": u"application/x-www-form-urlencoded;charset=UTF-8",u"Cookie": u"spreadCode=789789; md5Password=true; JSESSIONID=25DE5EBD2C30D10A505FA70B64D8EA03",u"Accept-Language": u"zh-CN,zh;q=0.9,en;q=0.8",u"Cache-Control": u"max-age=0",u"Connection": u"keep-alive",u"Upgrade-Insecure-Requests": u"1",}
conn.request(method=method,url=request_url,headers=headers) # this line get error
response = conn.getresponse()
print(response.status,response.reason)
response_data = response.read()
print(response_data)
response_headers = response.getheaders()
print(response_headers)
response_head_cookie = response.getheader('Set-Cookie')
print(response_head_cookie)
conn.close()
运行代码时,出现问题:
File "/Users/dele/Desktop/TestPython/httpconnection/test_httpconnection.py",line 38,in <module>
...
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\u200b' in position 12: ordinal not in range(128)
我还搜索了这些帖子:questions/15084194
所有人都说在每个字符串前加u
。但是我仍然会出错。
编辑-01
import http.client
host="www.cloudflare.com"
url="%s:80" % host
conn=http.client.httpconnection(url)
method = "GET"
request_url = "https://%s" % host
headers = {
"Host": "www.cloudflare.com","Accept-Encoding": "gzip","User-Agent": "Android-ALI-Moblie 1.3.0","Content-Type": "application/x-www-form-urlencoded;charset=UTF-8","Connection": "Keep-Alive"
}
conn.request(method=method,headers=headers)
response = conn.getresponse()
print(response.status,response.reason)
response_data = response.read()
print(response_data)
response_headers = response.getheaders()
print(response_headers)
response_head_cookie = response.getheader('Set-Cookie')
print(response_head_cookie)
conn.close()
但仍然出现此错误:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u200b' in position 0: ordinal not in range(256)
解决方法
\u200b
是“零宽度”字符;看起来像在代码中的某个地方,您无意间将此字符(基本上是不可见的)粘贴到了headers
中。
尝试遍历标题,并通过.encode('ascii',errors='xmlcharrefreplace')
运行每个项目;应该会向您显示多余字符的隐藏位置,然后可以将其删除。
编辑:找到了它;您的host
变量的开头为\u200b
字符。您需要删除host = u"
和www.cloudflare.com"
之间的不可见字符。