问题描述
有一个熊猫数据框列,其中纬度值为字符串
0 47º 58,46 N
1 48º 06,8 N
2 NaN
3 47º 58,1 N
4 48º 05,0 N
代码:
parts = df["Latitud"].str.extract('(\d+)º\s(\d*.\d*).([N|S|E|W])',expand=True) #(\d+)º\s(\d*.\d*).(.)
df["latitude"] = (parts[0].astype(int) + parts[1].astype(float) / 60 ) * parts[3].map({'N':1,'S':-1,'E': 1,'W':-1})
错误:
ValueError: cannot convert float NaN to integer
如何跳过NaN空值?
解决方法
让我们尝试根据要求跳过NaN
值:
# skip them here
notna = df['Latitud'].notna()
# extract the parts
parts = df.loc[notna,"Latitud"].str.extract('(\d+)º\s(\d*.\d*).([N|S|E|W])',expand=True)
# update the data
df.loc[notna,'latitude'] = (parts[0].astype(int) + parts[1].str.replace(',','.').astype(float) / 60 ) * parts[2].map({'N':1,'S':-1,'E': 1,'W':-1})
输出:
Latitud latitude
0 47º 58,46 N 47.974333
1 48º 06,8 N 48.113333
2 NaN NaN
3 47º 58,1 N 47.968333
4 48º 05,0 N 48.083333