问题描述
输出是一个包含名称、地址、电话和坐标等企业列表的 csv 文件,由于某些原因,只生成了部分坐标,未生成并在单次运行中使用 geopy 运行的将找到坐标,所以潜在的 geopy 可以找到所有坐标,但由于某种原因它有时会跳过,我认为它可能需要一些时间来调用 api 并添加线程,但它没有解决问题。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import threading
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="[email protected]")
main_list = []
def extract(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/89.0.4389.114 Safari/537.36'}
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.content,'html.parser')
return soup.find_all('div',class_ = 'listing__content__wrap--flexed jsGoToMp')
def transform(articles):
for item in articles:
name = item.find('a',class_ ='listing__name--link listing__link jsListingName').text
try:
street = item.find('span',{'itemprop':'streetAddress'}).text
except:
street = ''
try:
city = item.find('span',{'itemprop':'addressLocality'}).text
except:
city = ''
try:
province = item.find('span',{'itemprop':'addressRegion'}).text
except:
province = ''
try:
postCode = item.find('span',{'itemprop':'postalCode'}).text
except:
postCode = ''
try:
phone = item.find('li',class_ = 'mlr__submenu__item').text.strip()
except:
phone = ''
try:
def search_geo():
global location
location = geolocator.geocode(street + ' ' + city)
print(street + ' ' + city)
thread = threading.Thread(target=search_geo)
thread.start()
thread.join()
slatitude = location.latitude
except:
slatitude = ''
try:
thread = threading.Thread(target=search_geo)
thread.start()
thread.join()
slongitude = location.longitude
except:
slongitude = ''
business = {
'name': name,'street': street,'city': city,'province': province,'postCode': postCode,'phone': phone,'slongitude': slongitude,'slatitude': slatitude
}
main_list.append(business)
return
def load():
df = pd.DataFrame(main_list)
df.to_csv('repairshopsbc',index=False)
for x in range(1,2):
print(f'Getting page {x}')
articles = extract(f'https://www.yellowpages.ca/search/si/{x}/car+repair/British+Columbia+BC')
transform(articles)
time.sleep(5)
load()
print('Saved to CSV')
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)