尝试根据每个数据帧中的经纬度差异来比较两个数据帧

问题描述

我正在尝试比较两个数据框中的经纬度坐标。如果 latitude_fuze 的差异

这是我正在测试的代码

lat1 = df_result['latitude_fuze']
lon1 = df_result['longitude_fuze']
lat2 = df_airports['latitude_air']
lon2 = df_airports['longitude_air']

fuze_rows=range(df_result.shape[0])
air_rows=range(df_airports.shape[0])

for r in fuze_rows:
    lat = df_result.loc[r,lat1]
    max_lat = lat + .01
    min_lat = lat - .01
    lon = df_result.loc[r,lon1]
    max_lon = lon + .01
    min_lon = lon - .01
    for a in air_rows:
        if (min_lat <= df_airports.loc[a,lat2] <= max_lat) and (min_lon <= df_airports.loc[a,lon2] <= max_lon):
            df_result['Type'] = 'Airport'

以下是两个示例数据框:

# Import pandas library 
import pandas as pd 
  
# initialize list of lists 
data = [['NY','UnionDale','Nassau','40.72','-73.59'],['NY','NY','New York','40.76','73.98'],'73.98']] 
  
# Create the pandas DataFrame 
df_result = pd.DataFrame(data,columns = ['state','city','county','latitude_fuze','longitude_fuze']) 
# print dataframe. 
df_result

还有……

data = [['New York','JFK','40.64','-73.78'],['New York',['Los Angeles','LAX','33.94','-118.41'],['Chicago','ORD','41.98','-87.90'],['San Francisco','SFO','37.62','-122.38']] 
  
# Create the pandas DataFrame 
df_airports = pd.DataFrame(data,columns = ['municipality_name','airport_code','latitude_air','longitude_air']) 
# print dataframe. 
df_airports

运行此代码时,出现此错误

KeyError: "None of [Float64Index([40.719515,40.719515,40.75682,\n               40.75682,40.7646,\n              ...\n                40.0006,40.0006,\n                40.0006,39.742417,39.742417],\n             dtype='float64',length=1720)] are in the [index]"

如果使用 KNN 或haversine 方法进行计算更好,我对此持开放态度。我不是在这里寻找距离,而是寻找纬度和经度数字的相似性。如果我确实需要计算距离以使其正常工作,请告诉我。谢谢大家。

解决方法

我不确定您需要采取什么方法,因为我不是 100% 清楚您要做什么。但是,这样的事情可能有助于使您当前的方法发挥作用:

# join the two dataframes - must be the same length
df = pd.concat([df_result,df_airports],axis=1)

# cast latitudes and longitudes to numeric
cols = ["latitude_fuze","latitude_air","longitude_fuze","longitude_air"]
df[cols] = df[cols].apply(pd.to_numeric,errors='coerce',axis=1)

# create a mask where our conditions are met (difference between lat fuze and lat air < 0.1 and difference between long fuze and long air < 0.1)
mask = ((abs(df["latitude_fuze"] - df["latitude_air"]) < 0.1) & (abs(df["longitude_fuze"] - df["longitude_air"]) < 0.1))

# fill the type column
df.loc[mask,'Type'] = "Airport"