如何不包含来自soup.select的特定元素?

问题描述

我使用soup.select('.c-w a')选择元素。在c-w中,有c-s个我不希望包含在此选择中的内容。

from bs4 import BeautifulSoup

txt = '''
<div class="c-w">
  <div class="c-s">
    <a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
</div></div>
'''
soup = BeautifulSoup(txt,'html.parser')

for a in soup.select('.c-w a'):
        a['href'] = 'entry://'

结果是

<div class="c-w">
<div class="c-s">
<a href="entry://"><img class="soundpng" src="file://sound.png"/></a>
</div></div>

我的目标是在替换过程中不要包括.c-s .a。我的意思是,当搜索遇到c-s时,它将忽略此元素并搜索其他元素。您能详细说明如何实现我的目标吗?

解决方法

根据您的评论,您可以使用.find_parent()来确定<a>标签是否位于带有class="c-s"的标签内:

from bs4 import BeautifulSoup

txt = '''
<div class="c-w">
  <div class="c-s">
    <a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
  </div>

  <div>
    <a href="THIS I WANT TO REPLACE">...</a>
  </div>
</div>
'''
soup = BeautifulSoup(txt,'html.parser')

for a in soup.select('.c-w a'):
    if a.find_parent(class_='c-s'):
        continue
    a['href'] = 'entry://'

print(soup.prettify())

打印:

<div class="c-w">
 <div class="c-s">
  <a href="sound://english-french/sound/M000001099.mp3">
   <img class="soundpng" src="file://sound.png"/>
  </a>
 </div>
 <div>
  <a href="entry://">
   ...
  </a>
 </div>
</div>

编辑:要同时排除.c-s.c-v,您可以执行以下操作:

from bs4 import BeautifulSoup

txt = '''
<div class="c-w">
  <div class="c-s">
    <a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
  </div>

  <div class="c-v">
    <a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
  </div>

  <div>
    <a href="THIS I WANT TO REPLACE">...</a>
  </div>
</div>
'''
soup = BeautifulSoup(txt,'html.parser')

for a in soup.select('.c-w a'):
    if a.find_parent(class_=['c-s','c-v']):
        continue
    a['href'] = 'entry://'

print(soup.prettify())

打印:

<div class="c-w">
 <div class="c-s">
  <a href="sound://english-french/sound/M000001099.mp3">
   <img class="soundpng" src="file://sound.png"/>
  </a>
 </div>
 <div class="c-v">
  <a href="sound://english-french/sound/M000001099.mp3">
   <img class="soundpng" src="file://sound.png"/>
  </a>
 </div>
 <div>
  <a href="entry://">
   ...
  </a>
 </div>
</div>

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...