问题描述
我有一个程序,用于抓取食品网站的咖啡馆名称和地址,对它们进行地理编码,然后将其存储在空间PostgresQL db表中。对于在网站115 Plenty Road,Preston
上设置为以下格式的地址,此方法效果很好,但在遇到以下格式的地址(例如:Shop 9 Corner Cramer and Mary Street,Preston
)时失败。
由于geocoord列为not null
,这使导入数据库混乱。
我想知道是否可以做一些事情来帮助Nominatim理解可能包含Warehouse 1
或Shop 2
之类的地址,而不是直接地址。
如果要查看它是否有效,请输入参数melbourne
,如果要查看失败则输入thornbury
:melbourne
,preston
。
这是刮板代码:
import psycopg2
from config import config
from bs4 import BeautifulSoup
import requests
from requests import get
import geopandas
import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
#get cafe names,addresses and geocoords for user parameters
def scrapecafes(city,area):
url = f"https://www.broadsheet.com.au/{city}/guides/best-cafes-{area}"
response = requests.get(url,timeout=5)
soup_cafe_names = BeautifulSoup(response.content,"html.parser")
type(soup_cafe_names)
cafeNames = soup_cafe_names.findAll('h2',attrs={"class":"venue-title",}) #scrape the names
cafeNamesClean = [cafe.text.strip() for cafe in cafeNames] #clean the names
#addresses
soup_cafe_addresses = BeautifulSoup(response.content,"html.parser")
type(soup_cafe_addresses)
cafeAddresses = soup_cafe_addresses.findAll( attrs={"class":"address-content" }) #scrape the addresses
cafeAddressesClean = [address.text for address in cafeAddresses] #clean the addresses
##geocode addresses
locator = Nominatim(user_agent="myGeocoder")
geocode = RateLimiter(locator.geocode,min_delay_seconds=1)
lat = []
long = []
try:
for address in cafeAddressesClean:
location = locator.geocode(address.strip().replace(',',''))
long.append(location.longitude)
lat.append(location.latitude)
except:
long.append(None)
lat.append(None)
#zip up to be added to database table
fortable = list(zip(cafeNamesClean,cafeAddressesClean,long,lat))
print(fortable)
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)