从特定类 HTML 标签中检索文本并存储到 Python 中的数组中

问题描述

我正在创建一个 discord 机器人，我正在从底部开始。我要解决的第一件事是使用 BeautifulSoup。我当前的代码：

soup = BeautifulSoup(page,'html.parser')
pbe_titles = soup.find_all('h1',attrs={'class': 'news-title'})
for tag in pbe_titles :
  print(tag.text.strip())

到目前为止，这正是我需要它做的。
它检索由“新闻标题”类标识的标签之间的所有文本，即 <h1 class="news-title">text here</h1> 并打印出与该类关联的所有标签的所有文本。现在，我想将 BeautifulSoup 找到的所有标题存储到一个数组中，然后我可以将其打印到我的 discord 客户端中。

soup = BeautifulSoup(page,attrs={'class': 'news-title'})
for tag in pbe_titles :
  totalTags = [tag.text.strip()]


@client.event
async def on_message(message):
    if message.author == client.user:
        return

    if message.content.startswith('$show'):
        await message.channel.send(totalTags)

client.run(os.getenv('TOKEN'))

我在这里遇到的问题是 totalTags = [tag.text.strip()] 只返回标题之一，而不是全部。但如果我只是坚持 print(tag.text.strip()) 它将打印 15+ 标题。关于我的阵列，我做错了什么？

解决方法

所以在 python 中，你想要一个列表。您可以将其分解为生成器函数并将其转换为如下所示的列表：

def get_titles(soup):
    pbe_titles = soup.find_all('h1',attrs={'class': 'news-title'})
    for tag in pbe_titles :
        yield tag.text.strip()

soup = BeautifulSoup(page,'html.parser')
titles = list(get_titles(soup))
print(titles)

或者你可以只使用列表理解

titles = [tag.text.strip() 
          for tag in soup.find_all('h1',attrs={'class': 'news-title'})]

beautifulsoup beautifulsoup discord.py python read-eval-print-loop web-scraping