将lxml.etree._ElementTree对象存储在数据帧中:TypeError:无法腌制lxml.etree._ElementTree对象

问题描述

我尝试将lxml.etree._ElementTree对象存储在数据框中。不幸的是,熊猫无法识别这些物体。有没有办法将它们存储在数据框中,或者有没有其他方法将所有信息存储在单个文件中,具有良好的读写速度和文件大小?

以下是重新创建错误的示例:

import pandas as pd

import lxml
from lxml import etree

s = '''<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>'''

doc = etree.fromstring(s)
root = etree.ElementTree(doc)

df = pd.DataFrame(data = [["name1","date1",root]],columns = ["name","date","root"])
df.to_pickle(r"D:\test\test.pkl")
# TypeError: can't pickle lxml.etree._ElementTree objects

跟踪:

Traceback (most recent call last):

  File "<...>",line 2,in <module>
    df.to_pickle(r"D:\test\test.pkl")

  File "...\Anaconda\envs\...\lib\site-packages\pandas\core\generic.py",line 2771,in to_pickle
    to_pickle(self,path,compression=compression,protocol=protocol)

  File "...\Anaconda\envs\...\lib\site-packages\pandas\io\pickle.py",line 76,in to_pickle
    f.write(pickle.dumps(obj,protocol=protocol))

TypeError: can't pickle lxml.etree._ElementTree objects

解决方法

对于将来的读者,请执行以下操作对其进行修复:

df["root"] = df["root"].map(lambda x: etree.tostring(x,encoding='utf8',method='xml'))
df.to_pickle(r"D:\test\test.pkl")


df = pd.read_pickle(r"D:\test\test.pkl")
df["root"] = df["root"].map(etree.fromstring).map(etree.ElementTree)