如何从csv导入scrapy的start_urls?

问题描述

我尝试从一个 csv 文件(全部在 1 列中)抓取多个 url。但是,该代码不返回任何内容。 谢谢, 妮可

import scrapy
from scrapy.http import HtmlResponse
from scrapy.http import Request
import csv

scrapurls = ""

def get_urls_from_csv():
    with open("produktlink_test.csv",'rbU') as csv_file:
        data = csv.reader(csv_file)
        scrapurls = []
        for row in data:
            scrapurls.append(column)
            return scrapurls

class Getlinksgalaxusspider(scrapy.Spider):
    name = 'getlinksgalaxus'
    allowed_domains = []
    
    # An dieser Stelle definieren wir unsere Zieldomains
    start_urls = scrapurls

    def parse(self,response):

    ....

解决方法

Previous Answer: How to loop through multiple URLs to scrape from a CSV file in Scrapy?l

此外,最好将您的所有方法都放在 Scrapy 蜘蛛中并明确添加到 start_requests 中。