Python和Scrapy输出：“ \ r \ n \ t \ t \ t \ t \ t \ t \ t \ t”

问题描述

我正在学习使用Scrapy进行抓取，并且在某些代码方面遇到了一些问题，这些问题使我产生了一个我不理解的奇怪输出。有人可以向我解释为什么我得到一堆“ \ r \ n \ t \ t \ t \ t \ t \ t \ t \ t”

我在堆栈溢出中找到了以下解决方案： Remove an '\\n\\t\\t\\t'-element from list

但是我想知道是什么原因造成的。

这是引起我问题的我的代码。上面链接中的Strip方法可以解决该问题，但是如上所述，我不知道它的来源。

import scrapy
import logging
import re

class CitySpider(scrapy.Spider):
    name = 'city'
    allowed_domains = ['www.a-tembo.nl']
    start_urls = ['https://www.a-tembo.nl/themas/category/city/']

    def parse(self,response):
        titles = response.xpath("//div[@class='hikashop_category_image']/a")
        
        for title in titles:
            series = title.xpath(".//@title").get()
            link = title.xpath(".//@href").get()

            #absolute_url = f"https://www.a-tembo.nl{link}"
            #absolute_url = response.urljoin(link)

            yield response.follow(link,callback=self.parse_title)

    def parse_title(self,response):
        rows = response.xpath("//table[@class='hikashop_products_table adminlist table']/tbody/tr")

        for row in rows:
            product_code = row.xpath(".//span[@class='hikashop_product_code']/text()").get()
            product_name = row.xpath(".//span[@class='hikashop_product_name']/a/text()").get()

            yield{
                "Product_code": product_code,"Product_name": product_name
                       
            }

解决方法

像\n这样的字符称为转义字符。例如： \n表示新行，\t表示制表符。网站上满是它们，尽管您不检查页面就不会看到它们。如果您想了解有关Python中转义字符的更多信息，可以阅读here。我希望能回答您的问题。

python python-3.x scrapy scrapy-shell web-scraping

Python和Scrapy输出：“ \ r \ n \ t \ t \ t \ t \ t \ t \ t \ t”

问题描述

解决方法

相关问答