CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-org-jsoup--jsoup

Java HTML parser library implementing the WHATWG HTML5 specification for parsing, manipulating, and sanitizing HTML and XML documents.

Pending
Overview
Eval results
Files

css-selection.mddocs/

CSS Selection

CSS selector engine for finding and filtering elements using familiar CSS syntax, plus bulk operations on element collections. jsoup supports the full range of CSS selectors including pseudo-selectors and combinators.

Capabilities

Element Selection

Find elements using CSS selectors with comprehensive syntax support.

/**
 * Find descendant elements that match the CSS selector.
 * @param cssQuery CSS selector query
 * @return Elements collection of matching elements
 */
public Elements select(String cssQuery);

/**
 * Find the first descendant element that matches the CSS selector.
 * @param cssQuery CSS selector query
 * @return first matching Element, or null if none found
 */
public Element selectFirst(String cssQuery);

/**
 * Test if this element matches the CSS selector.
 * @param cssQuery CSS selector query
 * @return true if element matches selector
 */
public boolean is(String cssQuery);

/**
 * Find the closest ancestor element that matches the CSS selector.
 * @param cssQuery CSS selector query
 * @return closest matching ancestor Element, or null if none found
 */
public Element closest(String cssQuery);

Usage Examples:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

Document doc = Jsoup.parse(html);

// Basic selectors
Elements paragraphs = doc.select("p");
Elements divs = doc.select("div");
Element firstLink = doc.selectFirst("a");

// Class and ID selectors
Elements highlighted = doc.select(".highlight");
Element header = doc.selectFirst("#header");
Elements navLinks = doc.select("nav a");

// Attribute selectors
Elements links = doc.select("a[href]");
Elements externalLinks = doc.select("a[href^=http]");
Elements images = doc.select("img[alt]");

// Pseudo-selectors
Elements firstChildren = doc.select("li:first-child");
Elements evenRows = doc.select("tr:nth-child(even)");
Elements hasText = doc.select("p:contains(important)");

Advanced CSS Selectors

jsoup supports the complete CSS selector specification including complex selectors and pseudo-classes.

Selector Types:

// Tag selectors
doc.select("p");              // All <p> elements
doc.select("div");            // All <div> elements

// Class selectors  
doc.select(".class-name");    // Elements with class="class-name"
doc.select("p.highlight");    // <p> elements with class="highlight"

// ID selectors
doc.select("#element-id");    // Element with id="element-id"

// Attribute selectors
doc.select("[href]");         // Elements with href attribute
doc.select("[href=value]");   // Elements where href equals "value"
doc.select("[href^=http]");   // Elements where href starts with "http"
doc.select("[href$=.pdf]");   // Elements where href ends with ".pdf"
doc.select("[href*=example]"); // Elements where href contains "example"
doc.select("[href~=word]");   // Elements where href contains word "word"

// Combinators
doc.select("div p");          // <p> elements inside <div> (descendant)
doc.select("div > p");        // <p> elements directly inside <div> (child)
doc.select("h1 + p");         // <p> immediately after <h1> (adjacent sibling)
doc.select("h1 ~ p");         // <p> after <h1> at same level (general sibling)

// Pseudo-selectors
doc.select(":first-child");   // First child elements
doc.select(":last-child");    // Last child elements
doc.select(":nth-child(2n)"); // Even-numbered children
doc.select(":nth-child(odd)"); // Odd-numbered children
doc.select(":contains(text)"); // Elements containing text
doc.select(":matches(regex)"); // Elements matching regex
doc.select(":empty");         // Elements with no children
doc.select(":has(selector)"); // Elements containing matches for selector

Usage Examples:

// Complex selectors
Elements tableHeaders = doc.select("table tr:first-child th");
Elements requiredInputs = doc.select("input[required]");
Elements checkedBoxes = doc.select("input[type=checkbox]:checked");

// Text content selectors
Elements warnings = doc.select("p:contains(warning)");
Elements phoneNumbers = doc.select("span:matches(\\d{3}-\\d{3}-\\d{4})");

// Structural selectors
Elements oddRows = doc.select("tr:nth-child(odd)");
Elements lastItems = doc.select("li:last-child");
Elements emptyDivs = doc.select("div:empty");

// Relational selectors
Elements linksInNav = doc.select("nav a");
Elements directChildren = doc.select("ul > li");
Elements nextSiblings = doc.select("h2 + p");

Elements Collection Operations

Elements class extends ArrayList<Element> and provides bulk operations on collections of elements.

/**
 * Get the combined text content of all elements.
 * @return concatenated text from all elements
 */
public String text();

/**
 * Get list of text content from each element.
 * @return List of text strings
 */
public List<String> eachText();

/**
 * Test if any element has non-empty text content.
 * @return true if any element has text
 */
public boolean hasText();

/**
 * Get the combined inner HTML of all elements.
 * @return concatenated HTML content
 */
public String html();

/**
 * Set inner HTML content on all elements.
 * @param html HTML content to set
 * @return this Elements for chaining
 */
public Elements html(String html);

/**
 * Get the combined outer HTML of all elements.
 * @return concatenated outer HTML
 */
public String outerHtml();

Usage Examples:

Elements paragraphs = doc.select("p");

// Text operations
String allText = paragraphs.text();
List<String> individualTexts = paragraphs.eachText();
boolean hasAnyText = paragraphs.hasText();

// HTML operations
String combinedHtml = paragraphs.html();
paragraphs.html("<strong>New content</strong>");
String outerHtml = paragraphs.outerHtml();

Bulk Attribute Operations

Perform attribute operations on all elements in a collection.

/**
 * Get attribute value from the first element.
 * @param attributeKey attribute name
 * @return attribute value from first element
 */
public String attr(String attributeKey);

/**
 * Set attribute on all elements.
 * @param attributeKey attribute name
 * @param attributeValue attribute value
 * @return this Elements for chaining
 */
public Elements attr(String attributeKey, String attributeValue);

/**
 * Get list of attribute values from all elements.
 * @param attributeKey attribute name
 * @return List of attribute values
 */
public List<String> eachAttr(String attributeKey);

/**
 * Test if any element has the specified attribute.
 * @param attributeKey attribute name
 * @return true if any element has the attribute
 */
public boolean hasAttr(String attributeKey);

/**
 * Remove attribute from all elements.
 * @param attributeKey attribute name to remove
 * @return this Elements for chaining
 */
public Elements removeAttr(String attributeKey);

Usage Examples:

Elements links = doc.select("a");

// Attribute operations
String firstHref = links.attr("href");
links.attr("target", "_blank");  // Set on all links
List<String> allHrefs = links.eachAttr("href");
links.removeAttr("title");

// Check for attributes
boolean anyHasClass = links.hasAttr("class");

Bulk CSS Class Operations

Manipulate CSS classes on all elements in a collection.

/**
 * Add CSS class to all elements.
 * @param className class name to add
 * @return this Elements for chaining
 */
public Elements addClass(String className);

/**
 * Remove CSS class from all elements.
 * @param className class name to remove
 * @return this Elements for chaining
 */
public Elements removeClass(String className);

/**
 * Toggle CSS class on all elements.
 * @param className class name to toggle
 * @return this Elements for chaining
 */
public Elements toggleClass(String className);

/**
 * Test if any element has the specified CSS class.
 * @param className class name to test
 * @return true if any element has the class
 */
public boolean hasClass(String className);

Usage Examples:

Elements buttons = doc.select("button");

// Class operations
buttons.addClass("btn");
buttons.addClass("btn-primary");
buttons.removeClass("disabled");
buttons.toggleClass("active");

boolean anyActive = buttons.hasClass("active");

Bulk Form Operations

Work with form element values across collections.

/**
 * Get form value from the first element.
 * @return form value from first element
 */
public String val();

/**
 * Set form value on all elements.
 * @param value new form value
 * @return this Elements for chaining
 */
public Elements val(String value);

Usage Examples:

Elements textInputs = doc.select("input[type=text]");
Elements checkboxes = doc.select("input[type=checkbox]");

// Form operations
String firstValue = textInputs.val();
textInputs.val("");  // Clear all text inputs
checkboxes.val("checked");  // Check all checkboxes

Collection Filtering and Traversal

Further filter and navigate element collections.

/**
 * Find elements within this collection that match the selector.
 * @param cssQuery CSS selector
 * @return Elements collection of matches
 */
public Elements select(String cssQuery);

/**
 * Find first element in collection that matches the selector.
 * @param cssQuery CSS selector
 * @return first matching Element, or null if none
 */
public Element selectFirst(String cssQuery);

/**
 * Remove elements from this collection that match the selector.
 * @param cssQuery CSS selector
 * @return this Elements with matching elements removed
 */
public Elements not(String cssQuery);

/**
 * Test if any element in collection matches the selector.
 * @param cssQuery CSS selector
 * @return true if any element matches
 */
public boolean is(String cssQuery);

/**
 * Get element at specified index as single-element collection.
 * @param index zero-based index
 * @return Elements containing element at index
 */
public Elements eq(int index);

/**
 * Get first element as single-element collection.
 * @return Elements containing first element
 */
public Elements first();

/**
 * Get last element as single-element collection.
 * @return Elements containing last element
 */
public Elements last();

Usage Examples:

Elements allLinks = doc.select("a");

// Further filtering
Elements externalLinks = allLinks.select("[href^=http]");
Elements internalLinks = allLinks.not("[href^=http]");
Element firstExternal = allLinks.selectFirst("[href^=http]");

// Collection operations
Elements firstLink = allLinks.first();
Elements lastLink = allLinks.last();
Elements thirdLink = allLinks.eq(2);

// Testing
boolean hasExternal = allLinks.is("[href^=http]");

Sibling Navigation

Navigate to sibling elements from a collection.

/**
 * Get immediate next sibling elements.
 * @return Elements collection of next siblings
 */
public Elements next();

/**
 * Get filtered immediate next sibling elements.
 * @param cssQuery CSS selector filter
 * @return Elements collection of matching next siblings
 */
public Elements next(String cssQuery);

/**
 * Get all following sibling elements.
 * @return Elements collection of following siblings
 */
public Elements nextAll();

/**
 * Get filtered following sibling elements.
 * @param cssQuery CSS selector filter
 * @return Elements collection of matching following siblings
 */
public Elements nextAll(String cssQuery);

/**
 * Get immediate previous sibling elements.
 * @return Elements collection of previous siblings
 */
public Elements prev();

/**
 * Get filtered immediate previous sibling elements.
 * @param cssQuery CSS selector filter
 * @return Elements collection of matching previous siblings
 */
public Elements prev(String cssQuery);

/**
 * Get all previous sibling elements.
 * @return Elements collection of previous siblings
 */
public Elements prevAll();

/**
 * Get filtered previous sibling elements.
 * @param cssQuery CSS selector filter
 * @return Elements collection of matching previous siblings
 */
public Elements prevAll(String cssQuery);

Usage Examples:

Elements listItems = doc.select("li");

// Sibling navigation
Elements nextItems = listItems.next();
Elements nextHeaders = listItems.next("h2, h3");
Elements allFollowing = listItems.nextAll();
Elements previousItems = listItems.prev();
Elements allPrevious = listItems.prevAll();

Parent Navigation

Navigate to parent elements from a collection.

/**
 * Get parent elements.
 * @return Elements collection of parent elements
 */
public Elements parents();

Usage Example:

Elements spans = doc.select("span.highlight");
Elements parentElements = spans.parents();
Elements parentDivs = spans.parents().select("div");

Bulk DOM Modification

Modify DOM structure for all elements in a collection.

/**
 * Parse and append HTML to all elements.
 * @param html HTML to append
 * @return this Elements for chaining
 */
public Elements append(String html);

/**
 * Parse and prepend HTML to all elements.
 * @param html HTML to prepend
 * @return this Elements for chaining
 */
public Elements prepend(String html);

/**
 * Insert HTML before all elements.
 * @param html HTML to insert
 * @return this Elements for chaining
 */
public Elements before(String html);

/**
 * Insert HTML after all elements.
 * @param html HTML to insert
 * @return this Elements for chaining
 */
public Elements after(String html);

/**
 * Wrap all elements with HTML.
 * @param html HTML to wrap with
 * @return this Elements for chaining
 */
public Elements wrap(String html);

/**
 * Remove wrapper elements but keep their children.
 * @return this Elements for chaining
 */
public Elements unwrap();

/**
 * Remove all child nodes from all elements.
 * @return this Elements for chaining
 */
public Elements empty();

/**
 * Remove all elements from the DOM.
 * @return this Elements for chaining
 */
public Elements remove();

Usage Examples:

Elements paragraphs = doc.select("p");

// Bulk modifications
paragraphs.addClass("paragraph");
paragraphs.append("<span class='marker'>*</span>");
paragraphs.wrap("<div class='content'></div>");

// Remove elements
Elements ads = doc.select(".advertisement");
ads.remove();

// Clear content
Elements containers = doc.select(".container");
containers.empty();

Specialized Collections

Extract specific types of nodes from element collections.

/**
 * Get FormElement forms from the selection.
 * @return Elements collection containing forms
 */
public Elements forms();

/**
 * Get Comment nodes from element descendants.
 * @return List of Comment nodes
 */
public List<Comment> comments();

/**
 * Get TextNode nodes from element descendants.
 * @return List of TextNode nodes
 */
public List<TextNode> textNodes();

/**
 * Get DataNode nodes from element descendants.
 * @return List of DataNode nodes
 */
public List<DataNode> dataNodes();

Usage Examples:

Elements divs = doc.select("div");

// Get specialized collections
Elements forms = divs.forms();
List<Comment> comments = divs.comments();
List<TextNode> textNodes = divs.textNodes();
List<DataNode> scriptData = doc.select("script").dataNodes();

// Work with text nodes
for (TextNode textNode : textNodes) {
    String text = textNode.text();
    if (text.trim().isEmpty()) {
        textNode.remove();  // Remove empty text nodes
    }
}

This comprehensive CSS selection API provides powerful, jQuery-like capabilities for finding, filtering, and manipulating HTML elements using familiar CSS selector syntax.

Install with Tessl CLI

npx tessl i tessl/maven-org-jsoup--jsoup

docs

css-selection.md

dom-manipulation.md

form-handling.md

html-sanitization.md

http-connection.md

index.md

parsing.md

tile.json