问题描述
Document doc = Jsoup.parse(url1,3*1000);
String subHead = "A h2 heading"; //note that at this point I have already parsed the html and found all the H2 headings and analysed them,But Now I want to go further and analyse all H4 headings within the H2 section
print("Printing h4 titles of : " + subHead);
Elements sibHead; //variable that stores all elements between this H2 title and the next
String bodySelect = ("h2");
Elements kpageE = kpage.select(bodySelect);
for (Element e : kpageE) {
String estring = e.text();
print(estring + "--------------------------------------------");
if (estring.contentEquals(subHead)) {
sibHead = e.nextElementSiblings(); //this prints all elements in the h2 title section but i want only the h4 titles
for(Element ei : sibHead) {
String eistr = ei.text();
print(eistr);
}
}
我已经解析了HTML并获得了所有H2元素的列表,现在我想要一个H2元素与下一个H2元素之间的特定元素,更具体地说,我想要所有H4元素。
解决方法
通过 Jsoup,您可以使用 Document 类的 .getElementsByTag 方法,该方法允许您根据其 tagName 检索所有元素。
这是一个使用示例:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class App {
public static void main(String[] args) {
try {
Document doc = Jsoup.connect("https://inscription.devlab.umontp.fr/").get();
Elements h4elements = doc.getElementsByTag("h4");
for (Element h4 : h4elements) {
System.out.println(h4.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}