问题描述
我有一个从包含 XBRL 内容的获取请求返回的字符串,我如何使用 XBRPARSER 解析它? 代码如下:
from xbrl import XBRLParser
import base64
decoded = base64.decodebytes(data[0].text.encode()) ---> #decoded has the XBRL content
# data = decoded.find('xbrl')
# dom = xml.dom.minidom.parseString(decoded) # or xml.dom.minidom.parseString(xml_string)
# pretty_xml_as_string = dom.toprettyxml()
# print(pretty_xml_as_string)
xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse(decoded) #---> File "/Users/~/Downloads/venv/lib/python3.7/site-packages/xbrl/xbrl.py",line 64,in parse
#file_handler = open(file_handle)
我用 DOM 来表明它有一个 parseString
,这是我需要的,但对于 XBRLParser
解决方法
在 xbrl.py(xbrl 库的来源)上工作后,我想出了这个解决方案:
在 xbrl.py 的解析函数中,我注释了打开 xml 文件并读取它然后将其传递给 XBRLPreprocessedFile 函数的行。现在它直接将解析参数传递给 XBRLPreprocessedFile 而不打开它。在 XBRLPreprocessedFile 函数中,我将行 xbrl_string = self.fh.read()
更改为 xbrl_string = self.fh
,因为我发送的是字符串而不是文件。
在我的代码中,我创建了 Custome
类,因为它是在 xbrl.py 中创建的,并将 decoded.decode('utf-8)
传递给解析。
我的代码:
class Custom(object):
def __init__(self):
return None
def __call__(self):
return self.__dict__.items()
from xbrl import XBRLParser
import base64
decoded = base64.decodebytes(data[0].text.encode())
xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse(decoded.decode("utf-8"))
# *** here I find all the tags
custom_obj = Custom()
custom_data = xbrl.find_all(re.compile('^((?!(us-gaap|dei|xbrll|xbrldi)).)*:\s*',re.IGNORECASE | re.MULTILINE))
xbrl.py 中的解析函数:
def parse(self,file_handle):
"""
parse is the main entry point for an XBRLParser. It takes a file
handle. "*** which now takes a string ***"
"""
xbrl_obj = XBRL()
# if no file handle was given create our own
"""if not hasattr(file_handle,'read'):
file_handler = open(file_handle)
else:
file_handler = file_handle"""
# Store the headers
xbrl_file = XBRLPreprocessedFile(file_handle)
xbrl = soup_maker(xbrl_file.fh)
# file_handler.close()
xbrl_base = xbrl.find(name=re.compile("xbrl*:*"))
if xbrl.find('xbrl') is None and xbrl_base is None:
raise XBRLParserException('The xbrl file is empty!')
# lookahead to see if we need a custom leading element
lookahead = xbrl.find(name=re.compile("context",re.IGNORECASE | re.MULTILINE)).name
if ":" in lookahead:
self.xbrl_base = lookahead.split(":")[0] + ":"
else:
self.xbrl_base = ""
return xbrl
xbrl.py 中 XBRLPreprocessedFile 函数的变化:
xbrl_string = self.fh.read() --> xbrl_string = self.fh