Java HTML parser library implementing the WHATWG HTML5 specification for parsing, manipulating, and sanitizing HTML and XML documents.
—
CSS selector engine for finding and filtering elements using familiar CSS syntax, plus bulk operations on element collections. jsoup supports the full range of CSS selectors including pseudo-selectors and combinators.
Find elements using CSS selectors with comprehensive syntax support.
/**
* Find descendant elements that match the CSS selector.
* @param cssQuery CSS selector query
* @return Elements collection of matching elements
*/
public Elements select(String cssQuery);
/**
* Find the first descendant element that matches the CSS selector.
* @param cssQuery CSS selector query
* @return first matching Element, or null if none found
*/
public Element selectFirst(String cssQuery);
/**
* Test if this element matches the CSS selector.
* @param cssQuery CSS selector query
* @return true if element matches selector
*/
public boolean is(String cssQuery);
/**
* Find the closest ancestor element that matches the CSS selector.
* @param cssQuery CSS selector query
* @return closest matching ancestor Element, or null if none found
*/
public Element closest(String cssQuery);Usage Examples:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Document doc = Jsoup.parse(html);
// Basic selectors
Elements paragraphs = doc.select("p");
Elements divs = doc.select("div");
Element firstLink = doc.selectFirst("a");
// Class and ID selectors
Elements highlighted = doc.select(".highlight");
Element header = doc.selectFirst("#header");
Elements navLinks = doc.select("nav a");
// Attribute selectors
Elements links = doc.select("a[href]");
Elements externalLinks = doc.select("a[href^=http]");
Elements images = doc.select("img[alt]");
// Pseudo-selectors
Elements firstChildren = doc.select("li:first-child");
Elements evenRows = doc.select("tr:nth-child(even)");
Elements hasText = doc.select("p:contains(important)");jsoup supports the complete CSS selector specification including complex selectors and pseudo-classes.
Selector Types:
// Tag selectors
doc.select("p"); // All <p> elements
doc.select("div"); // All <div> elements
// Class selectors
doc.select(".class-name"); // Elements with class="class-name"
doc.select("p.highlight"); // <p> elements with class="highlight"
// ID selectors
doc.select("#element-id"); // Element with id="element-id"
// Attribute selectors
doc.select("[href]"); // Elements with href attribute
doc.select("[href=value]"); // Elements where href equals "value"
doc.select("[href^=http]"); // Elements where href starts with "http"
doc.select("[href$=.pdf]"); // Elements where href ends with ".pdf"
doc.select("[href*=example]"); // Elements where href contains "example"
doc.select("[href~=word]"); // Elements where href contains word "word"
// Combinators
doc.select("div p"); // <p> elements inside <div> (descendant)
doc.select("div > p"); // <p> elements directly inside <div> (child)
doc.select("h1 + p"); // <p> immediately after <h1> (adjacent sibling)
doc.select("h1 ~ p"); // <p> after <h1> at same level (general sibling)
// Pseudo-selectors
doc.select(":first-child"); // First child elements
doc.select(":last-child"); // Last child elements
doc.select(":nth-child(2n)"); // Even-numbered children
doc.select(":nth-child(odd)"); // Odd-numbered children
doc.select(":contains(text)"); // Elements containing text
doc.select(":matches(regex)"); // Elements matching regex
doc.select(":empty"); // Elements with no children
doc.select(":has(selector)"); // Elements containing matches for selectorUsage Examples:
// Complex selectors
Elements tableHeaders = doc.select("table tr:first-child th");
Elements requiredInputs = doc.select("input[required]");
Elements checkedBoxes = doc.select("input[type=checkbox]:checked");
// Text content selectors
Elements warnings = doc.select("p:contains(warning)");
Elements phoneNumbers = doc.select("span:matches(\\d{3}-\\d{3}-\\d{4})");
// Structural selectors
Elements oddRows = doc.select("tr:nth-child(odd)");
Elements lastItems = doc.select("li:last-child");
Elements emptyDivs = doc.select("div:empty");
// Relational selectors
Elements linksInNav = doc.select("nav a");
Elements directChildren = doc.select("ul > li");
Elements nextSiblings = doc.select("h2 + p");Elements class extends ArrayList<Element> and provides bulk operations on collections of elements.
/**
* Get the combined text content of all elements.
* @return concatenated text from all elements
*/
public String text();
/**
* Get list of text content from each element.
* @return List of text strings
*/
public List<String> eachText();
/**
* Test if any element has non-empty text content.
* @return true if any element has text
*/
public boolean hasText();
/**
* Get the combined inner HTML of all elements.
* @return concatenated HTML content
*/
public String html();
/**
* Set inner HTML content on all elements.
* @param html HTML content to set
* @return this Elements for chaining
*/
public Elements html(String html);
/**
* Get the combined outer HTML of all elements.
* @return concatenated outer HTML
*/
public String outerHtml();Usage Examples:
Elements paragraphs = doc.select("p");
// Text operations
String allText = paragraphs.text();
List<String> individualTexts = paragraphs.eachText();
boolean hasAnyText = paragraphs.hasText();
// HTML operations
String combinedHtml = paragraphs.html();
paragraphs.html("<strong>New content</strong>");
String outerHtml = paragraphs.outerHtml();Perform attribute operations on all elements in a collection.
/**
* Get attribute value from the first element.
* @param attributeKey attribute name
* @return attribute value from first element
*/
public String attr(String attributeKey);
/**
* Set attribute on all elements.
* @param attributeKey attribute name
* @param attributeValue attribute value
* @return this Elements for chaining
*/
public Elements attr(String attributeKey, String attributeValue);
/**
* Get list of attribute values from all elements.
* @param attributeKey attribute name
* @return List of attribute values
*/
public List<String> eachAttr(String attributeKey);
/**
* Test if any element has the specified attribute.
* @param attributeKey attribute name
* @return true if any element has the attribute
*/
public boolean hasAttr(String attributeKey);
/**
* Remove attribute from all elements.
* @param attributeKey attribute name to remove
* @return this Elements for chaining
*/
public Elements removeAttr(String attributeKey);Usage Examples:
Elements links = doc.select("a");
// Attribute operations
String firstHref = links.attr("href");
links.attr("target", "_blank"); // Set on all links
List<String> allHrefs = links.eachAttr("href");
links.removeAttr("title");
// Check for attributes
boolean anyHasClass = links.hasAttr("class");Manipulate CSS classes on all elements in a collection.
/**
* Add CSS class to all elements.
* @param className class name to add
* @return this Elements for chaining
*/
public Elements addClass(String className);
/**
* Remove CSS class from all elements.
* @param className class name to remove
* @return this Elements for chaining
*/
public Elements removeClass(String className);
/**
* Toggle CSS class on all elements.
* @param className class name to toggle
* @return this Elements for chaining
*/
public Elements toggleClass(String className);
/**
* Test if any element has the specified CSS class.
* @param className class name to test
* @return true if any element has the class
*/
public boolean hasClass(String className);Usage Examples:
Elements buttons = doc.select("button");
// Class operations
buttons.addClass("btn");
buttons.addClass("btn-primary");
buttons.removeClass("disabled");
buttons.toggleClass("active");
boolean anyActive = buttons.hasClass("active");Work with form element values across collections.
/**
* Get form value from the first element.
* @return form value from first element
*/
public String val();
/**
* Set form value on all elements.
* @param value new form value
* @return this Elements for chaining
*/
public Elements val(String value);Usage Examples:
Elements textInputs = doc.select("input[type=text]");
Elements checkboxes = doc.select("input[type=checkbox]");
// Form operations
String firstValue = textInputs.val();
textInputs.val(""); // Clear all text inputs
checkboxes.val("checked"); // Check all checkboxesFurther filter and navigate element collections.
/**
* Find elements within this collection that match the selector.
* @param cssQuery CSS selector
* @return Elements collection of matches
*/
public Elements select(String cssQuery);
/**
* Find first element in collection that matches the selector.
* @param cssQuery CSS selector
* @return first matching Element, or null if none
*/
public Element selectFirst(String cssQuery);
/**
* Remove elements from this collection that match the selector.
* @param cssQuery CSS selector
* @return this Elements with matching elements removed
*/
public Elements not(String cssQuery);
/**
* Test if any element in collection matches the selector.
* @param cssQuery CSS selector
* @return true if any element matches
*/
public boolean is(String cssQuery);
/**
* Get element at specified index as single-element collection.
* @param index zero-based index
* @return Elements containing element at index
*/
public Elements eq(int index);
/**
* Get first element as single-element collection.
* @return Elements containing first element
*/
public Elements first();
/**
* Get last element as single-element collection.
* @return Elements containing last element
*/
public Elements last();Usage Examples:
Elements allLinks = doc.select("a");
// Further filtering
Elements externalLinks = allLinks.select("[href^=http]");
Elements internalLinks = allLinks.not("[href^=http]");
Element firstExternal = allLinks.selectFirst("[href^=http]");
// Collection operations
Elements firstLink = allLinks.first();
Elements lastLink = allLinks.last();
Elements thirdLink = allLinks.eq(2);
// Testing
boolean hasExternal = allLinks.is("[href^=http]");Navigate to sibling elements from a collection.
/**
* Get immediate next sibling elements.
* @return Elements collection of next siblings
*/
public Elements next();
/**
* Get filtered immediate next sibling elements.
* @param cssQuery CSS selector filter
* @return Elements collection of matching next siblings
*/
public Elements next(String cssQuery);
/**
* Get all following sibling elements.
* @return Elements collection of following siblings
*/
public Elements nextAll();
/**
* Get filtered following sibling elements.
* @param cssQuery CSS selector filter
* @return Elements collection of matching following siblings
*/
public Elements nextAll(String cssQuery);
/**
* Get immediate previous sibling elements.
* @return Elements collection of previous siblings
*/
public Elements prev();
/**
* Get filtered immediate previous sibling elements.
* @param cssQuery CSS selector filter
* @return Elements collection of matching previous siblings
*/
public Elements prev(String cssQuery);
/**
* Get all previous sibling elements.
* @return Elements collection of previous siblings
*/
public Elements prevAll();
/**
* Get filtered previous sibling elements.
* @param cssQuery CSS selector filter
* @return Elements collection of matching previous siblings
*/
public Elements prevAll(String cssQuery);Usage Examples:
Elements listItems = doc.select("li");
// Sibling navigation
Elements nextItems = listItems.next();
Elements nextHeaders = listItems.next("h2, h3");
Elements allFollowing = listItems.nextAll();
Elements previousItems = listItems.prev();
Elements allPrevious = listItems.prevAll();Navigate to parent elements from a collection.
/**
* Get parent elements.
* @return Elements collection of parent elements
*/
public Elements parents();Usage Example:
Elements spans = doc.select("span.highlight");
Elements parentElements = spans.parents();
Elements parentDivs = spans.parents().select("div");Modify DOM structure for all elements in a collection.
/**
* Parse and append HTML to all elements.
* @param html HTML to append
* @return this Elements for chaining
*/
public Elements append(String html);
/**
* Parse and prepend HTML to all elements.
* @param html HTML to prepend
* @return this Elements for chaining
*/
public Elements prepend(String html);
/**
* Insert HTML before all elements.
* @param html HTML to insert
* @return this Elements for chaining
*/
public Elements before(String html);
/**
* Insert HTML after all elements.
* @param html HTML to insert
* @return this Elements for chaining
*/
public Elements after(String html);
/**
* Wrap all elements with HTML.
* @param html HTML to wrap with
* @return this Elements for chaining
*/
public Elements wrap(String html);
/**
* Remove wrapper elements but keep their children.
* @return this Elements for chaining
*/
public Elements unwrap();
/**
* Remove all child nodes from all elements.
* @return this Elements for chaining
*/
public Elements empty();
/**
* Remove all elements from the DOM.
* @return this Elements for chaining
*/
public Elements remove();Usage Examples:
Elements paragraphs = doc.select("p");
// Bulk modifications
paragraphs.addClass("paragraph");
paragraphs.append("<span class='marker'>*</span>");
paragraphs.wrap("<div class='content'></div>");
// Remove elements
Elements ads = doc.select(".advertisement");
ads.remove();
// Clear content
Elements containers = doc.select(".container");
containers.empty();Extract specific types of nodes from element collections.
/**
* Get FormElement forms from the selection.
* @return Elements collection containing forms
*/
public Elements forms();
/**
* Get Comment nodes from element descendants.
* @return List of Comment nodes
*/
public List<Comment> comments();
/**
* Get TextNode nodes from element descendants.
* @return List of TextNode nodes
*/
public List<TextNode> textNodes();
/**
* Get DataNode nodes from element descendants.
* @return List of DataNode nodes
*/
public List<DataNode> dataNodes();Usage Examples:
Elements divs = doc.select("div");
// Get specialized collections
Elements forms = divs.forms();
List<Comment> comments = divs.comments();
List<TextNode> textNodes = divs.textNodes();
List<DataNode> scriptData = doc.select("script").dataNodes();
// Work with text nodes
for (TextNode textNode : textNodes) {
String text = textNode.text();
if (text.trim().isEmpty()) {
textNode.remove(); // Remove empty text nodes
}
}This comprehensive CSS selection API provides powerful, jQuery-like capabilities for finding, filtering, and manipulating HTML elements using familiar CSS selector syntax.
Install with Tessl CLI
npx tessl i tessl/maven-org-jsoup--jsoup