问题描述
我收到有关使用以下代码收集的距离矩阵的以下json响应:
import requests
import json
payload = {
"origins": [{"latitude": 54.6565153,"longitude": -1.6802816},{"latitude": 54.6365153,"longitude": -1.6202816}],#surgery
"destinations": [{"latitude": 54.6856522,"longitude": -1.2183634},{"latitude": 54.5393295,"longitude": -1.2623914},"longitude": -1.2623914}],#oa - up to 625 entries
"travelMode": "driving","startTime": "2014-04-01T11:59:59+01:00","timeUnit": "second"
}
headers = {"Content-Length": "497","Content-Type": "application/json"}
paramtr = {"key": "INSERT_KEY_HERE"}
r = requests.post('https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix',data = json.dumps(payload),params = paramtr,headers = headers)
data = r.json()["resourceSets"][0]["resources"][0]
并且正在尝试展平:
destinations.latitude,destinations。经度,origins.latitude, origins.longitude,departmentTime,destinationIndex,originIndex, totalWalkDuration,travelDistance,travelDuration
来自:
{'__type': 'DistanceMatrix:http://schemas.microsoft.com/search/local/ws/rest/v1','destinations': [{'latitude': 54.6856522,'longitude': -1.2183634},{'latitude': 54.5393295,'longitude': -1.2623914},'longitude': -1.2623914}],'errorMessage': 'Request completed.','origins': [{'latitude': 54.6565153,'longitude': -1.6802816},{'latitude': 54.6365153,'longitude': -1.6202816}],'results': [{'departureTime': '/Date(1396349159000-0700)/','destinationIndex': 0,'originIndex': 0,'totalWalkDuration': 0,'travelDistance': 38.209,'travelDuration': 3082},{'departureTime': '/Date(1396349159000-0700)/','destinationIndex': 1,'travelDistance': 40.247,'travelDuration': 2708},'destinationIndex': 2,'originIndex': 1,'travelDistance': 34.857,'travelDuration': 2745},'travelDistance': 36.895,'travelDuration': 2377},'travelDuration': 2377}]}
我目前取得的最好成绩是:
json_normalize(outtie,record_path="results",meta="origins")
但是其中包含嵌套的起点和终点,因此拒绝附加。我还尝试删除该类型以查看它是否有所不同,并尝试了max_level =和record_prefix ='_',但无济于事。
解决方法
- 我认为这不是
flatten_json
的适当问题,但是,它对于构造不太周到的JSON对象很有用。 -
list
中的destinations
对应于list
中的results
,这意味着当它们标准化后,它们将具有相同的索引。 - 可以正确整理数据框,因为它们将具有相应的索引。
# create a dataframe for results and origins
res_or = pd.json_normalize(data,record_path=['results'],meta=[['origins']])
# create a dataframe for destinations
dest = pd.json_normalize(data,record_path=['destinations'],record_prefix='dest_')
# normalize the origins column in res_or
orig = pd.json_normalize(res_or.origins).rename(columns={'latitude': 'origin_lat','longitude': 'origin_long'})
# concat the dataframes
df = pd.concat([res_or,orig,dest],axis=1).drop(columns=['origins'])
# display(df)
departureTime destinationIndex originIndex totalWalkDuration travelDistance travelDuration origin_lat origin_long dest_latitude dest_longitude
0 /Date(1396349159000-0700)/ 0 0 0 38.209 3082 54.656515 -1.680282 54.685652 -1.218363
1 /Date(1396349159000-0700)/ 1 0 0 40.247 2708 54.656515 -1.680282 54.539330 -1.262391
2 /Date(1396349159000-0700)/ 2 0 0 40.247 2708 54.656515 -1.680282 54.539330 -1.262391
更新新示例数据
- 记录包含
destinations
和origins
的索引,因此很容易为每个键创建一个单独的数据框,然后再.merge
个数据框。-
orig
和dest
的索引对应于destinationIndex
中的originsIndex
和results
。
-
# create three separate dataframe
results = pd.json_normalize(data,record_path=['results'])
dest = pd.json_normalize(data,record_prefix='dest_')
orig = pd.json_normalize(data,record_path=['origins'],record_prefix='orig_')
# merge them at the appropriate location
df = pd.merge(results,dest,left_on='destinationIndex',right_index=True)
df = pd.merge(df,left_on='originIndex',right_index=True)
# display(df)
departureTime destinationIndex originIndex totalWalkDuration travelDistance travelDuration dest_latitude dest_longitude orig_latitude orig_longitude
0 /Date(1396349159000-0700)/ 0 0 0 38.209 3082 54.685652 -1.218363 54.656515 -1.680282
1 /Date(1396349159000-0700)/ 1 0 0 40.247 2708 54.539330 -1.262391 54.656515 -1.680282
2 /Date(1396349159000-0700)/ 2 0 0 40.247 2708 54.539330 -1.262391 54.656515 -1.680282
3 /Date(1396349159000-0700)/ 0 1 0 34.857 2745 54.685652 -1.218363 54.636515 -1.620282
4 /Date(1396349159000-0700)/ 1 1 0 36.895 2377 54.539330 -1.262391 54.636515 -1.620282
5 /Date(1396349159000-0700)/ 2 1 0 36.895 2377 54.539330 -1.262391 54.636515 -1.620282
,
我之前遇到过类似的情况,我得到的最好的是创建一个 OrderedDict 的递归函数,然后我在其中循环遍历,就是这样。
def flatten(data,sep="_"):
import collections
obj = collections.OrderedDict()
def recurse(temp,parent_key=""):
if isinstance(temp,list):
for i in range(len(temp)):
recurse(temp[i],parent_key + sep + str(i) if parent_key else str(i))
elif isinstance(temp,dict):
for key,value in temp.items():
recurse(value,parent_key + sep + key if parent_key else key)
else:
obj[parent_key] = temp
recurse(data)
return obj
当您遍历它时,您的数据将看起来像这样
for key,value in flatten(a).items():
print(key,value)
destinations_0_latitude 54.6856522
destinations_0_longitude -1.2183634
destinations_1_latitude 54.5393295
destinations_1_longitude -1.2623914
destinations_2_latitude 54.5393295
destinations_2_longitude -1.2623914
我使用分隔符的原因是,它为您提供了可扩展性,因此您可以使用
key.split("_")
['destinations','0','latitude'] 54.6856522
['destinations','longitude'] -1.2183634
之后,您可以轻松修改语句,例如
if key.split("_")[2] = "latitude":
do something...
if key.endswith("latitude"):
do something...