问题描述
我目前正在执行如下反向地理编码操作:
import json
from shapely.geometry import shape,Point
import time
with open('districts.json') as f: districts = json.load(f)
# file also kept at https://raw.githubusercontent.com/Thevesh/display/master/districts.json
def reverse_geocode(lon,lat):
point = Point(lon,lat) # lon/lat
for feature in districts['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point): return [(feature['properties'])['ADM1_EN'],(feature['properties'])['ADM2_EN']]
return ['','']
start_time = time.time()
for i in range(1000): test = reverse_geocode(103,3)
print('----- Code ran in ' + "{:.3f}".format(time.time() - start_time) + ' seconds -----')
反向地理编码 1000 个点大约需要 13 秒,这很好。
但是,我将需要为一项任务对 1000 万个坐标对进行反向地理编码,这意味着假设线性复杂性,它将需要 130k 秒(1.5 天)。不好。
该算法明显的低效之处在于它每次对一个点进行分类时都会遍历整个多边形集,这是一种巨大的时间浪费。
如何改进此代码?要在任务可接受的时间内计算 1000 万对,我需要在 1 秒内运行 1k 对。
解决方法
我使用并行性得出了这个算法
如果可能,如果它对您的目的有用,请将其返回给我。请记住,这是一个业余算法,需要调整。
import concurrent.futures
with open('districts.json') as f: districts = json.load(f)
def reverse_geocode(lon:int,lat:int) -> list:
point = Point(lon,lat) # lon/lat
for feature in districts['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point):
return [(feature['properties'])['ADM1_EN'],(feature['properties'])['ADM2_EN']]
return ['','']
if __name__ == '__main__':
time_start = time.time()
with concurrent.futures.ProcessPoolExecutor() as process:
for url in range(1000):
process.submit(reverse_geocode,103,3)
time_end = time.time()
print(f'\nfim {round(time_end - time_start,2)} seconds')