问题描述
如果表格行包含文本,我想将每个表格行的文本添加到列表中。我想使用列表理解来做到这一点。
这就是我尝试过的
listt2 = [s.span.text for s in soup.find_all('tr') if s.span.text]
这是错误
listt2 = [s.span.text for s in soup.find_all('tr') if s.span.text]
AttributeError: 'nonetype' object has no attribute 'text'
<tr>
<td colspan="2" class="cell--section-end cell--link cell--link__icon">
<a data-analytics="[Competitions] - German Bundesliga" href="/football/german-bundesliga/event/26301018" class="cell--link__link cell-text">
<i class="i accordion__title-icon--green accordion__title-icon--right" data-char=""></i> <b class="cell-text__line cell-text__line--icon">
<span class="competitions-team-name js-ev-desc">1. FC Köln v 1899 Hoffenheim</span>
</b>
</a>
</td>
<tr>
这是另一个没有的
<tr>
<td colspan="5" class="group-header">
Sat 14:30 </td>
</tr>
解决方法
如果您只想获取包含<tr>
标记的<span>
标记,则可以使用以下列表理解:
listt2 = [s.span.text for s in soup.select('tr:has(span)') if s.span.text]
编辑:
from bs4 import BeautifulSoup
html_doc = '''<tr>
<td colspan="2" class="cell--section-end cell--link cell--link__icon">
<a data-analytics="[Competitions] - German Bundesliga" href="/football/german-bundesliga/event/26301018" class="cell--link__link cell-text">
<i class="i accordion__title-icon--green accordion__title-icon--right" data-char=""></i> <b class="cell-text__line cell-text__line--icon">
<span class="competitions-team-name js-ev-desc">1. FC Köln v 1899 Hoffenheim</span>
</b>
</a>
</td>
<tr>'''
soup = BeautifulSoup(html_doc,'html.parser')
listt2 = [s.span.text for s in soup.select('tr:has(span)') if s.span.text]
print(listt2)
打印:
['1. FC Köln v 1899 Hoffenheim']
,
您只需要在检查span is not None
之前检查span.text
。
listt2 = [s.span.text for s in soup.find_all('tr') if s.span is not None and s.span.text]
由于short-circuiting,如果s.span.text
s.span is None
是False and *
,则False
永远不会被评估