问题描述
我有共享单车的数据集。每个站的数据有lan和long。数据示例如下所示。我想找到在坐标方面彼此接近的每 3 个站,并总结每个子类别(3 个最近点)的计数。
我知道如何计算两点之间的距离。但我不知道如何编程,以找到最近坐标的每 3 个子集。
计算两点间距离的代码:
i
数据:
from math import cos,asin,sqrt,pi
def distance(lat1,lon1,lat2,lon2):
p = pi/180
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p) * cos(lat2*p) * (1-cos((lon2-lon1)*p))/2
return 12742 * asin(sqrt(a))
我想要的是:
start_station_name start_station_latitude start_station_longitude. count
0 Schous plass 59.920259 10.760629. 2
1 Pilestredet 59.926224 10.729625. 4
2 Kirkeveien 59.933558 10.726426. 8
3 Hans Nielsen Hauges plass 59.939244 10.774319. 0
4 Fredensborg 59.920995 10.750358. 8
5 Marienlyst 59.932454 10.721769. 9
6 Sofienbergparken nord 59.923229 10.766171. 3
7 Stensparken 59.927140 10.730981. 4
8 Vålerenga 59.908576 10.786856. 6
9 Schous plass trikkestopp 59.920728 10.759486. 5
10 Griffenfeldts gate 59.933703 10.751930. 4
11 Hallénparken 59.931530 10.762169. 8
12 Alexander Kiellands Plass 59.928058 10.751397. 3
13 Uranienborgparken 59.922485 10.720896. 2
14 Sommerfrydhagen 59.911453 10.776072 1
15 Vestkanttorvet 59.924403 10.713069. 8
16 Bislettgata 59.923834 10.734638 9
17 Biskop Gunnerus' gate 59.912334 10.752292 1
18 Botanisk Hage sør 59.915282 10.769620 1
19 Hydroparken. 59.914145 10.715505 1
20 Bøkkerveien 59.927375 10.796015 1
closest count_sum
Schous plass,Pilestredet,Kirkeveien. 14
.
.
.
解决方法
您可以尝试使用 itertools.combinations() 的所有可能组合并保存总距离最短的站对。
from itertools import combinations
best = (float('inf'),None)
for combination in combinations(data,3):
total_distance = 0
for idx_1,idx_2 in [(0,1),(1,2),(0,2)]:
total_distance += distance(
combination[idx_1]['start_station_latitude'],combination[idx_1]['start_station_longitude'],combination[idx_2]['start_station_latitude'],combination[idx_2]['start_station_longitude'],)
if total_distance < best[0]:
best = (total_distance,combination)
print(f'Best combination is {best[1]},total distance: {best[0]}')
请记住,仍有优化的空间,例如缓存两个站点之间的距离,如
lru_cache(maxsize=None)
def distance(lat1,lon1,lat2,lon2):
p = pi/180
...