问题描述
我的html看起来像这样:
URL 1:
<div class="ABC" 11-22-11="1">
<h2 class="single-col-row-heading" 11-22-11="1">About</h2>
<p 11-22-11="1">TEXT1;</p>
</div>
URL 2:
<div class="ABC" 11-22-11="1">
<h2 class="single-col-row-heading" 11-22-11="1">About</h2>
<p 11-22-11="1">TEXT2;</p>
<p 11-22-11="1">TEXT3</p>
<p 11-22-11="1">TEXT4</p>
<p 11-22-11="1">TEXT5</p>
</div>
URL 3:
<div class="ABC" 11-22-11="1">
<h2 class="single-col-row-heading" 11-22-11="1">About</h2>
<p 11-22-11="1">TEXT6;</p>
<p 11-22-11="1">TEXT7</p>
<p 11-22-11="1">TEXT8</p>
<p 11-22-11="1">TEXT9</p>
</div>
URL 4:
<div class="ABC" 11-22-11="1">
<h2 class="single-col-row-heading" 11-22-11="1">About</h2>
<p 11-22-11="1">TEXT10;</p>
</div>
我的代码正在组合长度大于5的所有URL的内容。如何使代码打印内容并转到每个URL的下一行。
contents = []
with open('xyz.csv','r') as csvf:
urls = csv.reader(csvf)
for url in urls:
contents.append(url)
for url in contents:
page = urlopen(url[0]).read()
soup = bs(page,"html.parser")
for item in soup.find_all("div",class_="ABC"):
for content in item.find_all({'p','li'}): ## some urls have <li> tags
if len(item) == 5:
content = content.text
print(content)
else:
print(content.text,sep='',end=' ',flush=False)
我的代码输出为:
TEXT1
TEXT2. TEXT3. TEXT4. TEXT5. TEXT6. TEXT7. TEXT8. TEXT9.
TEXT10
但是,所需的输出是:
TEXT1
TEXT2 TEXT3 TEXT4 TEXT5
TEXT6 TEXT7 TEXT8 TEXT9
TEXT10
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)