问题描述
我是scrapy的新手。我使用下面的代码下载了一些文件。我想更改下载文件的名称,但不知道如何更改。
任何帮助将不胜感激
我的蜘蛛
import scrapy from scrapy.loader
import Itemloader from demo_downloader.items
import DemodownloaderItem
class FileDownloader(scrapy.Spider):
name = "file_downloader"
def start_requests(self):
urls = [
"https://www.data.gouv.fr/en/datasets/bases-de-donnees-annuelles-des-accidents-corporels-de-la-circulation-routiere-annees-de-2005-a-2019/#_"
]
for url in urls:
yield scrapy.Request(url=url,callback=self.parse)
def parse(self,response):
for link in response.xpath('//article[@class = "card resource-card "]'):
name = link.xpath('.//h4[@class="ellipsis"]/text()').extract_first()
if ".csv" in name:
loader = Itemloader(item=DemodownloaderItem(),selector=link)
absolute_url = link.xpath(".//a[@class = 'btn btn-sm btn-primary']//@href").extract_first()
loader.add_value("file_urls",absolute_url)
loader.add_value("files",name)
yield loader.load_item()
items.py
from scrapy.item import Field,Item
class DemodownloaderItem(Item):
file_urls = Field()
files = Field()
pipelines.py
from itemadapter import ItemAdapter
class DemodownloaderPipeline:
def process_item(self,item,spider):
return item
settings.py
BOT_NAME = 'demo_downloader'
SPIDER_MODULES = ['demo_downloader.spiders']
NEWSPIDER_MODULE = 'demo_downloader.spiders'
ROBOTSTXT_OBEY = False
ITEM_PIPELInes = {
'scrapy.pipelines.files.FilesPipeline': 1
}
DOWNLOAD_TIMEOUT = 1200
FILES_STORE = "C:\\Users\\EL\\Desktop\\work\\demo_downloader"
MEDIA_ALLOW_REDIRECTS = True
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)