问题描述
您需要有BeautifulSoup。
from BeautifulSoup import BeautifulStonesoup
import cgi
def HTMLEntitiesToUnicode(text):
"""Converts HTML entities to unicode. For example '&' becomes '&'."""
text = unicode(BeautifulStonesoup(text, convertEntities=BeautifulStonesoup.ALL_ENTITIES))
return text
def unicodetoHTMLEntities(text):
"""Converts unicode to HTML entities. For example '&' becomes '&'."""
text = cgi.escape(text).encode('ascii', 'xmlcharrefreplace')
return text
text = "&, ®, <, >, ¢, £, ¥, €, §, ©"
uni = HTMLEntitiesToUnicode(text)
htmlent = unicodetoHTMLEntities(uni)
print uni
print htmlent
# &, ®, <, >, ¢, £, ¥, €, §, ©
# &, ®, <, >, ¢, £, ¥, €, §, ©
解决方法
如何在Python中将HTML实体转换为Unicode,反之亦然?