问题描述
我使用soup.select('.c-w a')
选择元素。在c-w
中,有c-s
个我不希望包含在此选择中的内容。
from bs4 import BeautifulSoup
txt = '''
<div class="c-w">
<div class="c-s">
<a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
</div></div>
'''
soup = BeautifulSoup(txt,'html.parser')
for a in soup.select('.c-w a'):
a['href'] = 'entry://'
结果是
<div class="c-w">
<div class="c-s">
<a href="entry://"><img class="soundpng" src="file://sound.png"/></a>
</div></div>
我的目标是在替换过程中不要包括.c-s .a
。我的意思是,当搜索遇到c-s
时,它将忽略此元素并搜索其他元素。您能详细说明如何实现我的目标吗?
解决方法
根据您的评论,您可以使用.find_parent()
来确定<a>
标签是否位于带有class="c-s"
的标签内:
from bs4 import BeautifulSoup
txt = '''
<div class="c-w">
<div class="c-s">
<a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
</div>
<div>
<a href="THIS I WANT TO REPLACE">...</a>
</div>
</div>
'''
soup = BeautifulSoup(txt,'html.parser')
for a in soup.select('.c-w a'):
if a.find_parent(class_='c-s'):
continue
a['href'] = 'entry://'
print(soup.prettify())
打印:
<div class="c-w">
<div class="c-s">
<a href="sound://english-french/sound/M000001099.mp3">
<img class="soundpng" src="file://sound.png"/>
</a>
</div>
<div>
<a href="entry://">
...
</a>
</div>
</div>
编辑:要同时排除.c-s
和.c-v
,您可以执行以下操作:
from bs4 import BeautifulSoup
txt = '''
<div class="c-w">
<div class="c-s">
<a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
</div>
<div class="c-v">
<a href="sound://english-french/sound/M000001099.mp3"><img class="soundpng" src="file://sound.png"/></a>
</div>
<div>
<a href="THIS I WANT TO REPLACE">...</a>
</div>
</div>
'''
soup = BeautifulSoup(txt,'html.parser')
for a in soup.select('.c-w a'):
if a.find_parent(class_=['c-s','c-v']):
continue
a['href'] = 'entry://'
print(soup.prettify())
打印:
<div class="c-w">
<div class="c-s">
<a href="sound://english-french/sound/M000001099.mp3">
<img class="soundpng" src="file://sound.png"/>
</a>
</div>
<div class="c-v">
<a href="sound://english-french/sound/M000001099.mp3">
<img class="soundpng" src="file://sound.png"/>
</a>
</div>
<div>
<a href="entry://">
...
</a>
</div>
</div>