问题描述
当我将它们写入文件时,我终生无法弄清楚如何获取此列表的元素(列表本身)以多行打印。我从网站上抓取标题,然后抓取链接。最终目标(为了您的见解)是以如下格式配对标题和链接:
<a href='www.mywebsite.com/curry-recipe'>Curry Recipe<a/>
但就目前而言,问题是虽然我最终使 desfinalList 看起来不错,例如:
[['Curry Recipe','www.originalwebsite.com/curry-recipe'],['Pancake Recipe','www.originalwebsite.com/pancake-recipe']]
我似乎无法将它打印到文件中,除非它全部排成一行。使用文本换行,它可以明显地进行管理,但我更喜欢它在多行上。
def OFDdesserts():
urlA = 'https://olivesfordinner.com/category/dessert/page/{}'
for i in range(2,5):
url = urlA.format(i)
response = requests.get(url)
htmlText = response.text
soup = BeautifulSoup(htmlText,'lxml')
links = soup.find_all('article')
for title in links[0:12]:
titleActual = title.get('aria-label')
if 'Giveaway' not in titleActual:
hyperL = title.find('header',class_ = 'entry-header').a['href']
if titleActual not in desTitleList:
desTitleList.append(titleActual)
desLinkList.append(hyperL)
desList3.append([[x,y] for x,y in zip(desTitleList,desLinkList)])
#erase duplicates
for item in desList3:
if item not in desfinalList:
desfinalList.append(item)
#write the file
for elem in desfinalList:
with open('recipes/desserts.txt','w') as f:
f.write('\n \n'.join(map(str,desfinalList)))
print('just added something yummy to desserts!')
解决方法
最好使用 .select()
方法,它像 jQuery 或 CSS 一样选择器并使用 str()
来获取链接为 html <a href="....">anchor</a>
。
def OFDdesserts():
urlA = 'https://...../page/{}'
linkTags = []
for i in range(2,5):
url = urlA.format(i)
response = requests.get(url)
soup = BeautifulSoup(response.text,'html.parser')
links = soup.select('.content article h2 a')
for link in links:
if 'Giveaway' in link.text:
continue
# clean tag (i) from anchor text
link.extract()
# clean link attributes
link.attrs = {'href': link.attrs['href']}
linkStr = str(link)
if linkStr not in linkTags:
linkTags.append(linkStr)
#write the file
with open('desserts.txt','w') as f:
f.write('\n\n'.join(linkTags))
print('just added something yummy to desserts!')