问题描述
假设我想解析 Microsoft 10-Q SEC XBRL 文件:
library('xml2')
url <- "https://www.sec.gov/Archives/edgar/data/789019/000156459021002316/msft-10q_20201231_htm.xml"
xml <- read_xml(url)
xml_find_all(xml,"./us-gaap:EarningsPerShareBasic")
# {xml_nodeset (10)}
# [1] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20201001_20201231" decimals="2" id="F_000099" unitRef="U_iso4217USD_x ...
# [2] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20191001_20191231" decimals="2" id="F_000100" unitRef="U_iso4217USD_x ...
# [3] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20200701_20201231" decimals="2" id="F_000101" unitRef="U_iso4217USD_x ...
# [4] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20190701_20191231" decimals="2" id="F_000102" unitRef="U_iso4217USD_x ...
# [5] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_us-gaapChangeInAccountingEstimateByTypeAxis_us-gaapServiceLifeMember_ ...
# [6] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_us-gaapChangeInAccountingEstimateByTypeAxis_us-gaapServiceLifeMember_ ...
# [7] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20201001_20201231" decimals="2" id="F_000517" unitRef="U_iso4217USD_x ...
# [8] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20191001_20191231" decimals="2" id="F_000518" unitRef="U_iso4217USD_x ...
# [9] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20200701_20201231" decimals="2" id="F_000519" unitRef="U_iso4217USD_x ...
# [10] <us-gaap:EarningsPerShareBasic contextRef="C_0000789019_20190701_20191231" decimals="2" id="F_000520" unitRef="U_iso4217USD_x ...
如上所述,大多数美国 XBRL 标签都有命名空间前缀;这里 us-gaap:
表示会计准则。但是,某些 xml2
函数,例如:
xml_name(xml_find_all(xml,"./us-gaap:EarningsPerShareBasic"))
# [1] "EarningsPerShareBasic" "EarningsPerShareBasic" "EarningsPerShareBasic" "EarningsPerShareBasic" "EarningsPerShareBasic"
# [6] "EarningsPerShareBasic" "EarningsPerShareBasic" "EarningsPerShareBasic" "EarningsPerShareBasic" "EarningsPerShareBasic"
和
xml_find_first(xml,"./us-gaap:EarningsPerShareBasic")
# {xml_node}
# <EarningsPerShareBasic contextRef="C_0000789019_20201001_20201231" decimals="2" id="F_000099" unitRef="U_iso4217USD_xbrlishares">
nodes <- xml_find_all(xml,"./*")
tags <- xml_name(nodes)
grep("earnings",tags,ignore.case = TRUE,value=TRUE)
因为 xml_name(nodes)
去掉了前缀,所以我没有从 grep 中得到实际的标签。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)