Python：Scrapy 收集选择器子项的所有文本

问题描述

我正在尝试抓取 ebay 列表的描述，并且正在接近它：

 def parse_description(self,response):
        description = response.css('div#ds_div*::text').get()
        yield {
            "description": description
        }

这个想法是获取 .css('div#ds_div') 下所有标签的文本但是我收到了这个错误：

"Expected selector,got %s" % (peek,))
  File "<string>",line None
cssselect.parser.SelectorSyntaxError: Expected selector,got <DELIM '*' at 10>

我试图抓取的示例网址：https://www.ebay.co.uk/itm/Vintage-Toastmaster-Chrome-Toaster-Model-D182-4-Slice-Wide-Slot-Nos/114677725765?hash=item1ab3533a45:g:ui8AAOSw-jpgBbFS 我哪里出错了？

解决方法

错误是指选择器无效：

div#ds_div*::text

如果您在 div#ds_div 和 * 之间放置一个空格，它是有效的，正如您在评论中提到的那样。

通过查看链接，另一个问题是您尝试检索的文本位于 ID 为 desc_ifr 的 iframe 内。

如果您想抓取此 iframe 中的内容，请查看 iframe 的 src 属性并抓取此 url 而不是您问题中的 url。然后你可以这样做：

response.css('div#ds_div p::text').get()

children children python scrapy selector selector selector