为什么在运行 Geopy 抓取黄页时不生成所有坐标?

问题描述

输出是一个包含名称、地址、电话和坐标等企业列表的 csv 文件,由于某些原因,只生成了部分坐标,未生成并在单次运行中使用 geopy 运行的将找到坐标,所以潜在的 geopy 可以找到所有坐标,但由于某种原因它有时会跳过,我认为它可能需要一些时间来调用 api 并添加线程,但它没有解决问题。


import requests
from bs4 import BeautifulSoup
import pandas as pd
import time
import threading
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="[email protected]")

main_list = []

def extract(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/89.0.4389.114 Safari/537.36'}
    r = requests.get(url,headers=headers)
    soup = BeautifulSoup(r.content,'html.parser')
    return soup.find_all('div',class_ = 'listing__content__wrap--flexed jsGoToMp')

def transform(articles):
    for item in articles:
        name = item.find('a',class_ ='listing__name--link listing__link jsListingName').text
        try:
            street = item.find('span',{'itemprop':'streetAddress'}).text
        except:
            street = ''
        try:
            city = item.find('span',{'itemprop':'addressLocality'}).text
        except:
            city = ''
        try:   
            province = item.find('span',{'itemprop':'addressRegion'}).text
        except:
            province = ''
        try:
            postCode = item.find('span',{'itemprop':'postalCode'}).text
        except:
            postCode = ''
        try:
            phone = item.find('li',class_ = 'mlr__submenu__item').text.strip()
        except:
            phone = ''
        try:
            
            def search_geo():
                global location
                location = geolocator.geocode(street + ' ' + city)
            print(street + ' ' + city)
            thread = threading.Thread(target=search_geo)
            thread.start()
            thread.join()
            slatitude = location.latitude
        except:
            slatitude = ''
        try:
            thread = threading.Thread(target=search_geo)
            thread.start()
            thread.join()
            slongitude = location.longitude
        except:
            slongitude = ''

        business = {
            'name': name,'street': street,'city': city,'province': province,'postCode': postCode,'phone': phone,'slongitude': slongitude,'slatitude': slatitude
        }
        main_list.append(business)
    return

def load():
    df = pd.DataFrame(main_list)
    df.to_csv('repairshopsbc',index=False)

for x in range(1,2):
    print(f'Getting page {x}')
    articles = extract(f'https://www.yellowpages.ca/search/si/{x}/car+repair/British+Columbia+BC')
    transform(articles)
    time.sleep(5)

load()
print('Saved to CSV')

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)