Python整理url的CSV文件,每个url分割行,从url下载图像

问题描述

我在整理包含url的CSV文件和下载每个url的每个图像时遇到了麻烦。

https://i.imgur.com/w1slgf6.png

这真是地狱,但目标是:

  1. 将这些图片的src写入一个csv文件,每行分割每个url。
  2. 并下载每张图片

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import csv




# BeautifulSoup4 findAll src from img




print ('Downloading URLs to file')
sleep(1)
with open('output.csv','w',newline='\n',encoding='utf-8') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(srcs)




print ('Downloading images to folder')
sleep(1)

filename = "output"

with open("{0}.csv".format(filename),'r') as csvfile:
    # iterate on all lines
    i = 0
    for line in csvfile:
        splitted_line = line.split(',')
        # check if we have an image URL
        if splitted_line[1] != '' and splitted_line[1] != "\n":
            urllib.request.urlretrieve(splitted_line[1],"img_" + str(i) + ".png")
            print ("Image saved for {0}".format(splitted_line[0]))
            i += 1
        else:
            print ("No result for {0}".format(splitted_line[0]))

解决方法

根据您提供的有限资源,我认为这是您需要的代码:

import requests

with open('output.csv','r') as file:
    oldfile = file.read()
linkslist = oldfile.replace("\n","") # Because your file is wrongly splitted by new lines so I removed it
links = linkslist.split(",")

with open('new.csv','w') as file: # Writing all your links to a new file,this can combine with the below code but I think open file and requests at the same time will make it slower
    for link in links:
        file.write(link + "\n")

for link in links:
    response = requests.get(link) # This is to save image
    file = open("(yourfilenamehere).png","wb") # Replace the name that you want for the picture in here
    file.write(response.content)
    file.close()

请在代码中找到解释的注释,如果您有任何问题,请问,我没有对其进行测试,因为我没有确切的CSV格式,但应该可以使用