在R中使用xpath查找包括变音符号的文本

问题描述

我想通过 text() 识别包含带有“Umlaute”文本的节点。

library(xml2)
library(rvest)
doc <- "<p>Über uns </p>" %>% xml2::read_html()
grepl(pattern = "Über uns",x = as.character(doc))
grepl(pattern = "Über uns",x = doc)

问题:

如何提取包含文本“Über uns”的节点?

尝试了什么:

https://forum.fhem.de/index.php?topic=96254.0

Java XPath umlaut/vowel parsing

# does not work
xp <- paste0("//*[contains(text(),'Über uns')]")
html_nodes(x = doc,xpath = xp)

# does not work    
xp <- paste0("//*[translate(text(),'Ü','U') = 'Uber uns']")
html_nodes(x = doc,xpath = xp)

# does not work
xp <- paste0("//*[contains(text(),'&Uuml;ber uns')]")
html_nodes(x = doc,xpath = xp)


# this works but i wonder if there is a solution with xpath
doc2 <- doc %>% 
  as.character() %>% 
  gsub(pattern = "Ü",replacement = "Ue") %>% 
  xml2::read_html()

xp <- paste0("//*[contains(text(),'Ueber uns')]")
html_nodes(x = doc2,xpath = xp)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)