问题描述
我正在尝试从多个本地下载的 .HTML 文件中提取标记之间的单词并提取到 CSV。它在使用 print (title)
命令时显示“标题”列表,但一旦我尝试导出到 CSV,它只显示一个条目。
import glob
import lxml
import csv
from bs4 import BeautifulSoup
path = "C:\\Users\\user1\\Downloads\\lksd\\"
for infile in glob.glob(os.path.join(path,"*.html")):
markup = (infile)
soup = BeautifulSoup(open(markup,"r").read(),'lxml')
title = soup.find_all('title')
title.append(title)
print ([title])
with open('output2.csv','w') as myfile:
writer = csv.writer(myfile)
writer.writerows((title))
有什么建议吗?
解决方法
会发生什么?
您将循环中的 title
附加到自身:
title = soup.find_all('title')
title.append(title)
尝试在循环外定义一个空列表,并将您的 title
附加到此列表中。
...
titleList = []
for infile in glob.glob(os.path.join(path,"*.html")):
markup = (infile)
soup = BeautifulSoup(open(markup,"r").read(),'lxml')
title = soup.find_all('title')
titleList.append(title)
with open('output2.csv','w') as myfile:
writer = csv.writer(myfile)
writer.writerows((titleList))