lxml无法解析xml(其他编码是否为utf-8)[python]

我的代码

import re
import requests
from lxml import etree

url = 'http://weixin.sogou.com/gzhjs?openid=oIWsFt__d2wSBKMfQtkFfeVq_u8I&ext=2JjmXOu9jMsFW8Sh4E_XmC0DOkcPpGX18Zm8qPG7F0L5ffrupfFtkDqSOm47Bv9U'

r = requests.get(url)

items = r.json()['items']

>没有编码(‘utf-8’):

etree.fromstring(items [0])输出

ValueError                                
Traceback (most recent call last)
<ipython-input-69-cb8697498318> in <module>()
----> 1 etree.fromstring(items[0])

lxml.etree.pyx in lxml.etree.fromstring (src\lxml\lxml.etree.c:68121)()

parser.pxi in lxml.etree._parseMemoryDocument (src\lxml\lxml.etree.c:102435)()

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

> with encode(‘utf-8’):

etree.fromstring(items [0] .encode(‘utf-8’))输出

File "<string>",line unkNown
XMLSyntaxError: CData section not finished
鎶楀啺鎶㈤櫓鎹锋姤:闃冲寳I绾挎,line 1,column 281

不知道解析这个xml ..

解决方法

作为解决方法,您可以在将字符串传递给etree.fromstring之前删除编码属性

xml = re.sub(r'\bencoding="[-\w]+"','',items[0],count=1)
root = etree.fromstring(xml)

看到@ Lea在问题中的评论后更新:

使用显式编码指定解析器:

xml = r.json()['items'].encode('utf-8')
root = etree.fromstring(xml,parser=etree.XMLParser(encoding='utf-8'))

相关文章

php输出xml格式字符串
J2ME Mobile 3D入门教程系列文章之一
XML轻松学习手册
XML入门的常见问题(一)
XML入门的常见问题(三)
XML轻松学习手册(2)XML概念