问题描述
我正在尝试在Google搜索中删除文章的日期。但是,我认为我一直坚持寻找正确的XPath
来做到这一点。我试图通过开发模式(检查代码)找到它,但得到//*@id="rso"]/div[3]/div/div[2]/div/span/span[1]
,它不起作用。
我最接近日期的是这个:
library(rvest)
library(dplyr)
web1 <- read_html("https://www.google.at/search?q=uk+house+prices&source=lnt&tbs=qdr:m&sa=X&ved=2ahUKEwin8NynhMjsAhUmQkEAHTqzBygQpwV6BAgVEB0&biw=927&bih=722")
web1 %>%
html_nodes(xpath = '//div/div/div/div/div[not(div)]') %>%
html_text
[1] "Search options"
[2] "Any country"
[3] "Any Language"
[4] "Last month"
[5] "All results"
[6] "01.10.2020 · Why record UK house prices Could be falling again soon. Analysis by Hanna Ziady,CNN Business. Updated 11:49 AM ET,Thu October 1,2020. london UK ..."
<...>
[7] "Is the UK housing market about to crash?"
[31] "08.10.2020 · Which? explains what Could happen to house prices after the Brexit transition period ends,including advice and predictions from mortgage and property experts."
我唯一需要的是日期(01.10.2020,08.10.2020)。
如何从Google的SERP中提取日期?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)