使用 Python 将 Atom 或 OData XML 文件转换为 OData Json 文件

问题描述

我一直在尝试将 PowerShell 脚本转换为 Python 代码，以便从 Sharepoint 下载列表文件。截至目前，大部分编码部分已完成并执行良好。但是，当我将文件从 Sharepoint 下载到扩展名为 .json 的本地驱动器时，文件内容与预期不符。

Sharepoint 列表内容类型为 => content-type: application/atom+xml;type=Feed;charset=utf-8，为 xml 格式。由于我无法以 .json 格式保存内容，我已将文件下载为 .xml 并使用 xmltodict python 包将其转换为 .json ，到目前为止还不错。

这是我的实际查询：我们如何使用 .json 下载 xml 内容或将 xml 文件转换为没有属性类型、标签和命名空间等的 json 文件？ 我们需要该文件以下面的 PowerShell 脚本生成的输出格式下载，没有任何标签，只有键值对。

我只是分享示例文件内容，而不是复制整个内容，因为它涉及一些敏感数据。

这是原子 xml 格式/Odata xml 的 Sharepoint 网址内容。

<?xml version="1.0" encoding="utf-8"?><Feed xml:base="https://myorg.sharepoint.com/sites/pwaeng/_api/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/Metadata"

<d:Created m:type="Edm.DateTime">2018-05-09T21:21:03Z</d:Created><d:AuthorId m:type="Edm.Int32">1344</d:AuthorId><d:EditorId m:type="Edm.Int32">1344</d:EditorId><d:OData__UIVersionString>1.0</d:OData__UIVersionString><d:Attachments m:type="Edm.Boolean">false</d:Attachments><d:GUID m:type="Edm.Guid">9ef38bd1-a098-4610-98a4-dbf7488a5a27</d:GUID></m:properties></content></entry></Feed>

这是Python转换后的json数据

{"Feed": {"@xml:base": "https://myorg.sharepoint.com/sites/pwaeng/_api/","@xmlns": "http://www.w3.org/2005/Atom","@xmlns:d": "http://schemas.microsoft.com/ado/2007/08/dataservices","d:Created": {"@m:type": "Edm.DateTime","#text": "2018-05-09T21:21:03Z"},"d:AuthorId": {"@m:type": "Edm.Int32","#text": "1344"},"d:EditorId": {"@m:type": "Edm.Int32","d:OData__UIVersionString": "1.0","d:Attachments": {"@m:type": "Edm.Boolean","#text": "false"},"d:GUID": {"@m:type": "Edm.Guid","#text": "9ef38bd1-a098-4610-98a4-dbf7488a5a27"}}}}}}

PowerShell 下载 Json 文件

{"odata.Metadata":"https://myorg.sharepoint.com/sites/pwaeng/_api/$Metadata#SP.ListData.Program_x0020_RisksListItems","value":[{"odata.type":" SP.Data.Program_x0020_RisksListItem","odata.id":"a878d166-c19d-4c16-82b4-e150e7e49626","odata.etag":""2"","odata.editLink":"Web/Lists

"Created":"2018-05-09T21:21:03Z","AuthorId":1344,"EditorId":1344,"OData__UIVersionString":"1.0","Attachments":false,"GUID":"9ef38bd1-a098-4610-98a4-dbf7488a5a27"}]}

以下是部分 Python 代码。我已经尝试了大多数选项，但没有获得所需的输出。

     listURL = webAbsoluteURL + 
    "/_api/web/lists/GetByTitle('" + List + "')/items"
   

   #print(listURL)
   count = 0
   #print(type(str(count)))
   fileName = "file_" + ListFolder.strip() + "_" + str(count) + "_" + date
   #print(fileName)
   xml_output = Filepath + "/" + fileName + ".xml"  ##USe backslash in Windows
   json_output = Filepath + "/" + fileName + ".json"
   #print(output)
   #print(userName,Password)
   url = listURL
   #ctx = ClientContext(url).with_credentials(UserCredential(userName,Password))
   #web = ctx.web.get().execute_query()
   #print("Web title: {0}".format(web.properties['Title']))
   ctx_auth = AuthenticationContext(webAbsoluteURL)
   token = ctx_auth.acquire_token_for_user(userName,Password)
   #ctx = ClientContext(webAbsoluteURL,ctx_auth)
   #print(token)
   options = RequestOptions(webAbsoluteURL)
   ctx_auth.authenticate_request(options)
   #options.headers = {
   #'accept' : 'text/html,application/xhtml+xml,application/xml',#'content-type': 'application/atom+xml;type=Feed;charset=utf-8',#'X-RequestForceAuthentication' : 'true'
   #}
   response = requests.get(url,headers=options.headers,allow_redirects=True,timeout=60000)
   #print(req.status_code)
   #headers = {
   #'accept' : 'application/json;odata=verbose',#'content-type' : 'application/json;odata=verbose',#'X-RequestForceAuthentication' : 'true'
   #}
   #response = requests.get(url,headers=headers,timeout=60000)
   #print(response.status_code)
   with open(xml_output,'wb') as file_save:
      file_save.write(response.content)
   with open(xml_output,'r',encoding = "UTF-8") as xml_file:
      data_dict = xmltodict.parse(xml_file.read()) #,attr_prefix='')
      xml_file.close()
      #json_data = json.dumps(data_dict,separators=(',',':'))
      #json_data = json.dumps(data_dict,indent=2)
      json_data = json.dumps(data_dict)
   #with open(json_output,'w') as json_file:
   #   json.dump(data_dict,json_file)
   #   json_file.close()
   with open(json_output,'wb') as json_file:
      json_file.write(json_data.encode("UTF-8"))
      json_file.close()

解决方法

找到解决方案，而不是使用xml到json解析器（xmltodict.parse）等，简单的解决方案是将此“?&$format=json”添加到网络末尾网址。

XML_DATA_URL = https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/Tasks

JSON_FORMAT_URL = https://myorg.sharepoint.com/sites/pwaeng/_api/projectdata/Tasks?&$format=json

但是，这不适用于以下类型的 URL。

https://myorg.sharepoint.com/sites/pwaeng/_api/web/lists/GetByTitle('Program Risks')/items

https://myorg.sharepoint.com/sites/pwaeng/_api/web/lists/GetByTitle('Program Risks')/items?&$format=json

如果有人有任何建议，请在此处添加您的评论..

powershell python sharepoint xml-to-json xmltodict