如何在HTML Java中的较大元素中查找特定元素

问题描述

        Document doc = Jsoup.parse(url1,3*1000);
        String subHead = "A h2 heading"; //note that at this point I have already parsed the html and found all the H2 headings and analysed them,But Now I want to go further and analyse all H4 headings within the H2 section 
        print("Printing h4 titles of : " + subHead);
        Elements sibHead; //variable that stores all elements between this H2 title and the next
        String bodySelect = ("h2");
        Elements kpageE = kpage.select(bodySelect);
        for (Element e : kpageE) {
            String estring = e.text();
            print(estring + "--------------------------------------------");
            if (estring.contentEquals(subHead)) {
                sibHead = e.nextElementSiblings(); //this prints all elements in the h2 title section but i want only the h4 titles

                for(Element ei : sibHead) {
                    String eistr = ei.text();
                    print(eistr);
                }
            }

我已经解析了HTML并获得了所有H2元素的列表，现在我想要一个H2元素与下一个H2元素之间的特定元素，更具体地说，我想要所有H4元素。

解决方法

通过 Jsoup，您可以使用 Document 类的 .getElementsByTag 方法，该方法允许您根据其 tagName 检索所有元素。

这是一个使用示例：

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class App {
    public static void main(String[] args) {
        try {
            Document doc = Jsoup.connect("https://inscription.devlab.umontp.fr/").get();
            Elements h4elements = doc.getElementsByTag("h4");
            for (Element h4 : h4elements) {
                System.out.println(h4.text());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

html java java jsoup