无法从RSS

问题描述

这是RSS的样子：https://reddit.0qz.fun/r/dankmemes/top.json

我的脚本完美地解析了RSS中的“标题”，“描述”和其他项目标签。但是它不会解析“ content：encoded”。

我尝试过：

item.getChild("content:encoded").getText();

这：

item.getChild("encoded").getText();

这（在Stackoverflow上找到）：

item.getChild("http://purl.org/rss/1.0/modules/content/","encoded").getText();

但是没有任何效果...您能帮我吗？

解决方法

namespace对于getChild和类似方法成功解析内容很重要。

您的第三个示例已经结束，但是参数的顺序向后，您需要使用XmlService.getNamespace方法，而不是原始字符串。（签名是getChild(string,namespace)，而不是getChild(string,string)。）

这一点很棘手，因为应该为某些元素而不是其他元素包括名称空间。我不是XML专家，所以我不知道这是否是预期的行为。下面的最小示例脚本确实使用<content:encoded>查找并记录了getChild元素的文本，但是我只能通过反复试验弄清楚何时包括或排除名称空间。（如果有人对此有进一步的了解，请在评论中让我知道。）

function logContentEncoded() {
  const result = UrlFetchApp.fetch("https://reddit.0qz.fun/r/dankmemes/top.json");
  const document = XmlService.parse(result.getContentText());
  const root = document.getRootElement();
  const namespace = XmlService.getNamespace("http://purl.org/rss/1.0/modules/content/");
  const channel = root.getChild("channel"); // fails if namespace is included
  const item = channel.getChild("item"); // fails if namespace is included
  const encoded = item.getChild("encoded",namespace); // fails if namespace is EXCLUDED
  console.log(encoded.getText());
}

将此库添加到项目中：1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw

您可以抓取页面。使用此代码，即，您可以获得<content:encoded>标签的第一个内容。

function getDataFromJson() {
  var url = "https://reddit.0qz.fun/r/dankmemes/top.json";
  var fromText = '<content:encoded>';
  var toText = '</content:encoded>';
  
  var content = UrlFetchApp.fetch(url).getContentText();
  var scraped = Parser
  .data(content)
  .from(fromText)
  .to(toText)
  .build();
  Logger.log(scraped);
  return scraped;
}

google-apps-script urlfetch xml-parsing

无法从RSS

问题描述

解决方法

相关问答