Nominatim无法对以不同方式格式化的地址进行地理编码

问题描述

我有一个程序,用于抓取食品网站的咖啡馆名称和地址,对它们进行地理编码,然后将其存储在空间PostgresQL db表中。对于在网站115 Plenty Road,Preston上设置为以下格式的地址,此方法效果很好,但在遇到以下格式的地址(例如:Shop 9 Corner Cramer and Mary Street,Preston)时失败。

由于geocoord列为not null,这使导入数据库混乱。

我想知道是否可以做一些事情来帮助Nominatim理解可能包含Warehouse 1Shop 2之类的地址,而不是直接地址。

如果要查看它是否有效,请输入参数melbourne,如果要查看失败则输入thornburymelbournepreston

这是刮板代码:

import psycopg2
from config import config
from bs4 import BeautifulSoup
import requests
from requests import get
import geopandas
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

#get cafe names,addresses and geocoords for user parameters

def scrapecafes(city,area):

    url = f"https://www.broadsheet.com.au/{city}/guides/best-cafes-{area}"
    response = requests.get(url,timeout=5)

    soup_cafe_names = BeautifulSoup(response.content,"html.parser")
    type(soup_cafe_names)

    cafeNames = soup_cafe_names.findAll('h2',attrs={"class":"venue-title",}) #scrape the names
    cafeNamesClean = [cafe.text.strip() for cafe in cafeNames] #clean the names

    #addresses
    soup_cafe_addresses = BeautifulSoup(response.content,"html.parser")
    type(soup_cafe_addresses)

    cafeAddresses = soup_cafe_addresses.findAll( attrs={"class":"address-content" }) #scrape the addresses
    cafeAddressesClean = [address.text for address in cafeAddresses] #clean the addresses

    ##geocode addresses
    locator = Nominatim(user_agent="myGeocoder")
    geocode = RateLimiter(locator.geocode,min_delay_seconds=1)
    lat = []
    long = []

    try:
        for address in cafeAddressesClean:
            location = locator.geocode(address.strip().replace(',',''))
            long.append(location.longitude)
            lat.append(location.latitude)
    except:
            long.append(None)
            lat.append(None)

    #zip up to be added to database table
    fortable = list(zip(cafeNamesClean,cafeAddressesClean,long,lat))
    print(fortable)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...