问题描述
我正在尝试在线解析HTML页面,以使用Jsoup从表中检索数据。我要解析的页面包含多个表。
我该怎么做?
这是我要解析的示例页面:
https://www.cpu-world.com/info/AMD/AMD_A4-Series.html
编辑:
try {
/**
* Works to iterate through the items at the following website
* https://www.cpu-world.com/cpus/K10/AMD-A4-Series%20A4-3300.html
*/
URL url = new URL("https://www.cpu-world.com/cpus/K10/AMD-A4-Series%20A4-3300.html");
Document doc = Jsoup.parse(url,3000);
// spec_table is the name of the class associated with the table
Elements table = doc.select("table.spec_table");
Elements rows = table.select("tr");
Iterator<Element> rowIterator = rows.iterator();
rowIterator.next();
boolean wasMatch = false;
// Loop through all items in list
while (rowIterator.hasNext()) {
Element row = rowIterator.next();
Elements cols = row.select("td");
String rowName = cols.get(0).text();
}
} catch (MalformedURLException e) {
e.printstacktrace();
} catch (IOException e) {
e.printstacktrace();
}
我一直在阅读一些教程和文档,但似乎无法弄清楚如何浏览网页以提取所需的数据。我了解HTML和CSS,但只是在学习Jsoup。
(我将其标记为Android,因为这是我使用Java代码的地方。猜想没有必要具体说明。)
解决方法
这看起来像您所追求的:
std::sized_sentinel_for<sentinel_t<T>,iterator_t<T>>
让我知道这是否不是你的追求
编辑:结果:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import java.io.IOException;
import java.net.URL;
public class CpuWorld {
public static void main(String[] args) throws IOException {
URL url = null;
try {
/**
* Works to iterate through the items at the following website
* https://www.cpu-world.com/CPUs/K10/AMD-A4-Series%20A4-3300.html
*/
url = new URL("https://www.cpu-world.com/CPUs/K10/AMD-A4-Series%20A4-3300.html");
} catch (IOException e) {
e.printStackTrace();
}
Document doc = Jsoup.parse(url,3000);
// spec_table is the name of the class associated with the table
String modelNumber = doc.select("table tr:has(td:contains(Model number)) td b a").text();
String modelUrl = doc.select("table tr:has(td:contains(Model number)) td b a").attr("href");
System.out.println(modelNumber + " : " + modelUrl);
}
}
编辑:
这比一盒青蛙还要疯狂,但是我们开始...我将让您将2和2放在一起,以遍历URL以获取您想要的单个详细信息:
A4-3300 : https://www.cpu-world.com/CPUs/K10/AMD-A4-Series%20A4-3300.html
Process finished with exit code 0