如何将两个数据框中的列传递给Haversine函数?

问题描述

我刚接触经纬度。我发现了一个看起来很有趣的haversine 函数。我尝试将两个数据框输入到函数中,但出现错误

这是函数

import numpy as np

lon1 = df["longitude_fuze"]
lat1 = df["latitude_fuze"]
lon2 = df["longitude_air"]
lat2 = df["latitude_air"]

# haversine
from math import radians,cos,sin,asin,sqrt
def haversine(lon1,lat1,lon2,lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1,lat2 = map(radians,[lon1,lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    return km

我正在尝试将其添加到数据框中的列中,就像这样。

df['haversine_dist'] = haversine(lon1,lat2)

函数编译良好,但当我尝试调用它时,出现此错误

df['haversine_dist'] = haversine(lon1,lat2)
Traceback (most recent call last):

  File "<ipython-input-38-cc7e470610ee>",line 1,in <module>
    df['haversine_dist'] = haversine(lon1,lat2)

  File "<ipython-input-37-f357b0fc2e88>",line 16,in haversine
    lon1,lat2])

  File "C:\Users\ryans\anaconda3\lib\site-packages\pandas\core\series.py",line 129,in wrapper
    raise TypeError(f"cannot convert the series to {converter}")

TypeError: cannot convert the series to <class 'float'>

这是我正在测试的两个数据框。

# Import pandas library 
import pandas as pd 
  
# initialize list of lists 
data = [['NY','UnionDale','Nassau','40.72','-73.59'],['NY','NY','New York','40.76','73.98'],'73.98']] 
  
# Create the pandas DataFrame 
df_result = pd.DataFrame(data,columns = ['state','city','county','latitude_fuze','longitude_fuze']) 
# print dataframe. 
df_result


data = [['New York','JFK','40.63','-73.60'],['New York','40.64','-73.78'],['Los Angeles','LAX','33.94','-118.41'],['Chicago','ORD','40.98','73.90'],['San Francisco','SFO','40.62','73.38']] 
  
# Create the pandas DataFrame 
df_airports = pd.DataFrame(data,columns = ['municipality_name','airport_code','latitude_air','longitude_air']) 
# print dataframe. 
df_airports

我在此链接中找到了该功能

https://kanoki.org/2019/12/27/how-to-calculate-distance-in-python-and-pandas-using-scipy-spatial-and-distance-functions/

解决方法

这是因为您要传递系列数据,而需要传递单个值..

# Below variables are going to have series data
lon1 = df["longitude_fuze"]
lat1 = df["latitude_fuze"]
lon2 = df["longitude_air"]
lat2 = df["latitude_air"]

相反,您可以选择特定索引处的值,例如,索引 0 处的值:

lon1 = df["longitude_fuze"].iloc[0]
lat1 = df["latitude_fuze"].iloc[0]
lon2 = df["longitude_air"].iloc[0]
lat2 = df["latitude_air"].iloc[0]

使用这些值,现在您可以调用您的函数:

df['haversine_dist'] = haversine(lon1,lat1,lon2,lat2)

或者,如果您想评估这些列中所有值的值,您甚至可以在循环中执行此操作:

for i in df.index:
    lon1 = df["longitude_fuze"].iloc[i]
    lat1 = df["latitude_fuze"].iloc[i]
    lon2 = df["longitude_air"].iloc[i]
    lat2 = df["latitude_air"].iloc[i]

    df.loc[i,'haversine_dist'] = haversine(lon1,lat2)
,

我在这里看到的两个问题:

  1. 经度和纬度仍然是数据框中的字符串,因此您可能会遇到数据类型的问题。

  2. 此处使用的 lastKey 的实现不适用于经度和纬度的类似数组的对象。


数据类型问题可以通过 haversine 轻松解决。例如,您可以使用 astype。或者更好的是,直接在您的数据框中更改类型:

lon1 = df["longitude_fuze"].astype(float)

对于支持类似数组参数的hoversine函数,由于它相当简单,我建议重新实现它,使其与numpy兼容。我继续为你做了这件事:

dt_dict = {"longitude_fuze": float,"latitude_fuze": float,"longitude_air": float,"latitude_air": float}
df = df.astype(dt_dict)

组合起来:

import numpy as np

def haversine_array(lon1,lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1,lat2 = map(lambda x: x/360.*(2*np.pi),[lon1,lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a)) 
    km = 6367 * c
    return km

现在您将获得:

import pandas as pd
import numpy as np

def haversine_array(lon1,lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a)) 
    km = 6367 * c
    return km

# initialize list of lists 
data = [['NY','Uniondale','Nassau','40.72','-73.59'],['NY','NY','New York','40.76','73.98'],'73.98']] 
  
# Create the pandas DataFrame 
df_result = pd.DataFrame(data,columns = ['state','city','county','latitude_fuze','longitude_fuze']) 
data = [['New York','JFK','40.63','-73.60'],['New York','40.64','-73.78'],['Los Angeles','LAX','33.94','-118.41'],['Chicago','ORD','40.98','73.90'],['San Francisco','SFO','40.62','73.38']]
df_airports = pd.DataFrame(data,columns = ['municipality_name','airport_code','latitude_air','longitude_air'])

# note the conversion to float

lon1 = df_result["longitude_fuze"].astype(float)
lat1 = df_result["latitude_fuze"].astype(float)
lon1 = df_result["longitude_fuze"].astype(float)
lon2 = df_airports['longitude_air'].astype(float)
lat2 = df_airports['latitude_air'].astype(float)

# using the haversine implementation above

df_result['haversine_dist'] = haversine_array(lon1,lat2)

希望有所帮助!