Jsoupを使ってみた
File input = new File("/tmp/input.html"); Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/"); Elements links = doc.select("a[href]"); // a with href Elements pngs = doc.select("img[src$=.png]"); // img with src ending .png Element masthead = doc.select("div.masthead").first(); // div with class=masthead Elements resultLinks = doc.select("h3.r > a"); // direct a after h3
Use selector-syntax to find elements: jsoup Java HTML parser