使用地理编码 Python 对循环进行矢量化

问题描述

我有很长的地址列表,我需要在坐标中对它们进行地理编码,我正在 Python 中使用 geopy 来完成。我编写了一个循环,以便为每个观察找到相应的坐标。在这个循环中,我还考虑到有时会出现连接超时问题(因此它会重新尝试进行地理编码)并且有时无法找到坐标(不返回任何坐标)这一事实。问题是它很慢,我在半小时内设法对 1000 个 obs 进行了地理编码,所以我想知道是否有一种方法可以加快速度,例如矢量化。

我可以减少重试的等待时间,但更多的尝试将失败

这是一个示例代码

import pandas as pd
from geopy.geocoders import Nominatim
import numpy as np
import time

geolocator = Nominatim(user_agent = 'local_agent')

def geocode_address(address):
    g = geolocator.geocode(address)
    return g

def try_address(address,attempts_remaining,wait_time):
    g = geocode_address(address)
    if g is None:
        time.sleep(wait_time)
        if attempts_remaining > 0:
            try_address(address,attempts_remaining-1,wait_time+wait_time)
    return g

start_index = 0
# How often the program prints the status of the running program
status_rate = 100
# How many times the program tries to geocode an address before it gives up
attempts_to_geocode = 2
# Time it delays each time it does not find an address
wait_time = 3

# Variables used in the main for loop 
results = []
Failed = 0
total_Failed = 0
progress = len(df) - start_index

for i,address in enumerate(df["address"]):
    # Print the status of how many addresses have be processed so far and how many of the Failed.
    if (start_index + i) % status_rate == 0:
        total_Failed += Failed
        print("Completed {} of {}. Failed {} for this section and {} in total."
              .format(i + start_index,progress,Failed,total_Failed))
        Failed = 0
    # Try geocoding the addresses
    try:
        g = try_address(address,attempts_to_geocode,wait_time)
        if g is None:
            results.append([address,"","None"])
            print("Gave up on address: " + address)
            Failed += 1
        else:
            results.append([address,g.latitude,g.longitude,"ArcGIS"])
    # If we Failed with an error like a timeout we will try the address again after we wait 5 secs
    except Exception as e:
        print("Failed with error {} on address {}. Will try again.".format(e,address))
        try:
            time.sleep(5)
            g = geocode_address(address)
            if g is None:
                print("Did not find it.")
                results.append([address,"None"])
                Failed += 1
            else:
                print("Successfully found it.")
                results.append([address,"ArcGIS"])
        except Exception as e:
            print("Failed with error {} on address {} again.".format(e,address))
            Failed += 1
            results.append([address,"Error"])

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)