问题描述
我在整理包含url的CSV文件和下载每个url的每个图像时遇到了麻烦。
https://i.imgur.com/w1slgf6.png
这真是地狱,但目标是:
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import csv
# BeautifulSoup4 findAll src from img
print ('Downloading URLs to file')
sleep(1)
with open('output.csv','w',newline='\n',encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(srcs)
print ('Downloading images to folder')
sleep(1)
filename = "output"
with open("{0}.csv".format(filename),'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.request.urlretrieve(splitted_line[1],"img_" + str(i) + ".png")
print ("Image saved for {0}".format(splitted_line[0]))
i += 1
else:
print ("No result for {0}".format(splitted_line[0]))
解决方法
根据您提供的有限资源,我认为这是您需要的代码:
import requests
with open('output.csv','r') as file:
oldfile = file.read()
linkslist = oldfile.replace("\n","") # Because your file is wrongly splitted by new lines so I removed it
links = linkslist.split(",")
with open('new.csv','w') as file: # Writing all your links to a new file,this can combine with the below code but I think open file and requests at the same time will make it slower
for link in links:
file.write(link + "\n")
for link in links:
response = requests.get(link) # This is to save image
file = open("(yourfilenamehere).png","wb") # Replace the name that you want for the picture in here
file.write(response.content)
file.close()
请在代码中找到解释的注释,如果您有任何问题,请问,我没有对其进行测试,因为我没有确切的CSV格式,但应该可以使用