将rest api get方法响应另存为json文档

问题描述

我正在使用下面的代码从rest api中读取并将响应写入pyspark中的json文档,并将文件保存到Azure Data Lake Gen2。当响应没有空白数据,但是当我尝试取回所有数据然后遇到以下错误时,该代码可以正常工作。

错误消息:ValueError:推断后无法确定某些类型

代码

import requests
response = requests.get('https://apiurl.com/demo/api/v3/data',auth=('user','password'))
data = response.json()
from pyspark.sql import *
df=spark.createDataFrame([Row(**i) for i in data])
df.show()
df.write.mode("overwrite").json("wasbs://<file_system>@<storage-account-name>.blob.core.windows.net/demo/data")

响应:

[
    {
        "ProductID": "156528","ProductType": "Home Improvement","Description": "","SaleDate": "0001-01-01T00:00:00","UpdateDate": "2015-02-01T16:43:18.247"
    },{
        "ProductID": "126789","ProductType": "Pharmacy","UpdateDate": "2015-02-01T16:43:18.247"
    }
]

尝试修复如下所示的模式。

from pyspark.sql.types import StructType,StructField,StringType
schema = StructType([StructField("ProductID",StringType(),True),StructField("ProductType","Description",StructField("SaleDate",StructField("UpdateDate",True)])
df = spark.createDataFrame([[None,None,None]],schema=schema)
df.show()

不确定如何创建数据框并将数据写入json文档。

解决方法

您可以将dataschema变量传递给 spark.createDataFrame(),然后spark将创建一个数据框。

Example:

from pyspark.sql.functions import *
from pyspark.sql import *
from pyspark.sql.types import *


data=[
    {
        "ProductID": "156528","ProductType": "Home Improvement","Description": "","SaleDate": "0001-01-01T00:00:00","UpdateDate": "2015-02-01T16:43:18.247"
    },{
        "ProductID": "126789","ProductType": "Pharmacy","UpdateDate": "2015-02-01T16:43:18.247"
    }
]

schema = StructType([StructField("ProductID",StringType(),True),StructField("ProductType",StructField("Description",StructField("SaleDate",StructField("UpdateDate",True)])


df = spark.createDataFrame(data,schema=schema)

df.show()
#+---------+----------------+-----------+-------------------+--------------------+
#|ProductID|     ProductType|Description|           SaleDate|          UpdateDate|
#+---------+----------------+-----------+-------------------+--------------------+
#|   156528|Home Improvement|           |0001-01-01T00:00:00|2015-02-01T16:43:...|
#|   126789|        Pharmacy|           |0001-01-01T00:00:00|2015-02-01T16:43:...|
#+---------+----------------+-----------+-------------------+--------------------+