0
# DOM Manipulation
1
2
Document Object Model manipulation with Document, Element, and Node classes providing methods for traversing, modifying, and extracting content from parsed HTML. jsoup provides a jQuery-like API for intuitive DOM operations.
3
4
## Capabilities
5
6
### Document Operations
7
8
Document class extends Element and represents the root of the HTML document tree.
9
10
```java { .api }
11
/**
12
* Get the document's head element.
13
* @return head Element, or null if not found
14
*/
15
public Element head();
16
17
/**
18
* Get the document's body element.
19
* @return body Element, or null if not found
20
*/
21
public Element body();
22
23
/**
24
* Get the document title text.
25
* @return title text from title element
26
*/
27
public String title();
28
29
/**
30
* Set the document title.
31
* @param title new title text
32
*/
33
public void title(String title);
34
35
/**
36
* Create a new Element with the given tag name.
37
* @param tagName element tag name
38
* @return new Element instance
39
*/
40
public Element createElement(String tagName);
41
42
/**
43
* Get the document's base URI.
44
* @return base URI string
45
*/
46
public String location();
47
48
/**
49
* Get the HTTP connection used to fetch this document.
50
* @return Connection object, or null if document was not fetched via HTTP
51
*/
52
public Connection connection();
53
54
/**
55
* Create empty document shell with basic HTML structure.
56
* @param baseUri base URI for the document
57
* @return new Document with html, head, and body elements
58
*/
59
public static Document createShell(String baseUri);
60
```
61
62
**Usage Examples:**
63
64
```java
65
import org.jsoup.Jsoup;
66
import org.jsoup.nodes.Document;
67
import org.jsoup.nodes.Element;
68
69
Document doc = Jsoup.parse("<html><head><title>Old Title</title></head><body></body></html>");
70
71
// Access document structure
72
Element head = doc.head();
73
Element body = doc.body();
74
75
// Modify document title
76
String currentTitle = doc.title(); // "Old Title"
77
doc.title("New Title");
78
79
// Create and add new elements
80
Element newParagraph = doc.createElement("p");
81
newParagraph.text("Hello World");
82
body.appendChild(newParagraph);
83
84
// Access connection used to fetch document (if fetched via HTTP)
85
Connection conn = doc.connection();
86
if (conn != null) {
87
System.out.println("Document was fetched from: " + conn.request().url());
88
}
89
90
// Create empty document shell
91
Document emptyDoc = Document.createShell("https://example.com");
92
```
93
94
### Element Content Operations
95
96
Manipulate element text content and HTML content.
97
98
```java { .api }
99
/**
100
* Get the combined text content of this element and its descendants.
101
* @return text content with whitespace normalized
102
*/
103
public String text();
104
105
/**
106
* Set the text content of this element (removes all child elements).
107
* @param text new text content
108
* @return this Element for chaining
109
*/
110
public Element text(String text);
111
112
/**
113
* Test if this element has non-empty text content.
114
* @return true if element has text content
115
*/
116
public boolean hasText();
117
118
/**
119
* Get the inner HTML content of this element.
120
* @return HTML content inside this element
121
*/
122
public String html();
123
124
/**
125
* Set the inner HTML content of this element.
126
* @param html new HTML content
127
* @return this Element for chaining
128
*/
129
public Element html(String html);
130
131
/**
132
* Get the outer HTML of this element including the element itself.
133
* @return complete HTML representation
134
*/
135
public String outerHtml();
136
137
/**
138
* Get combined data content for script and style elements.
139
* @return data content
140
*/
141
public String data();
142
```
143
144
**Usage Examples:**
145
146
```java
147
Element paragraph = doc.selectFirst("p");
148
149
// Text operations
150
String text = paragraph.text();
151
paragraph.text("New text content");
152
boolean hasText = paragraph.hasText();
153
154
// HTML operations
155
String innerHtml = paragraph.html();
156
paragraph.html("<strong>Bold text</strong>");
157
String outerHtml = paragraph.outerHtml();
158
159
// Data content (for script/style elements)
160
Element script = doc.selectFirst("script");
161
String scriptContent = script.data();
162
```
163
164
### Attribute Operations
165
166
Manipulate element attributes and properties.
167
168
```java { .api }
169
/**
170
* Get an attribute value by key.
171
* @param attributeKey attribute name
172
* @return attribute value, or empty string if not set
173
*/
174
public String attr(String attributeKey);
175
176
/**
177
* Set an attribute value.
178
* @param attributeKey attribute name
179
* @param attributeValue attribute value
180
* @return this Element for chaining
181
*/
182
public Element attr(String attributeKey, String attributeValue);
183
184
/**
185
* Test if this element has the specified attribute.
186
* @param attributeKey attribute name
187
* @return true if attribute exists
188
*/
189
public boolean hasAttr(String attributeKey);
190
191
/**
192
* Remove an attribute from this element.
193
* @param attributeKey attribute name to remove
194
* @return this Element for chaining
195
*/
196
public Element removeAttr(String attributeKey);
197
198
/**
199
* Get all attributes of this element.
200
* @return Attributes collection
201
*/
202
public Attributes attributes();
203
204
/**
205
* Get data-* attributes as a Map.
206
* @return Map of data attribute keys to values
207
*/
208
public Map<String, String> dataset();
209
210
/**
211
* Get absolute URL for an attribute (if it contains a relative URL).
212
* @param attributeKey attribute name
213
* @return absolute URL, or empty string if not found or not a URL
214
*/
215
public String absUrl(String attributeKey);
216
```
217
218
**Usage Examples:**
219
220
```java
221
Element link = doc.selectFirst("a");
222
223
// Attribute operations
224
String href = link.attr("href");
225
link.attr("href", "https://newlink.com");
226
link.attr("target", "_blank");
227
228
boolean hasClass = link.hasAttr("class");
229
link.removeAttr("target");
230
231
// Get all attributes
232
Attributes attrs = link.attributes();
233
for (Attribute attr : attrs) {
234
System.out.println(attr.getKey() + "=" + attr.getValue());
235
}
236
237
// Data attributes
238
Element div = doc.selectFirst("div[data-id]");
239
Map<String, String> data = div.dataset();
240
String dataId = data.get("id"); // Gets data-id value
241
242
// Absolute URLs
243
String absoluteUrl = link.absUrl("href");
244
```
245
246
### Element Hierarchy Navigation
247
248
Navigate the DOM tree structure to find parent, child, and sibling elements.
249
250
```java { .api }
251
/**
252
* Get the parent element of this element.
253
* @return parent Element, or null if this is root
254
*/
255
public Element parent();
256
257
/**
258
* Get all ancestor elements of this element.
259
* @return Elements collection of ancestors
260
*/
261
public Elements parents();
262
263
/**
264
* Get direct child elements of this element.
265
* @return Elements collection of child elements
266
*/
267
public Elements children();
268
269
/**
270
* Get the number of direct child elements.
271
* @return count of child elements
272
*/
273
public int childrenSize();
274
275
/**
276
* Get a child element by index.
277
* @param index zero-based index
278
* @return child Element at index
279
* @throws IndexOutOfBoundsException if index is invalid
280
*/
281
public Element child(int index);
282
283
/**
284
* Get the first child element.
285
* @return first child Element, or null if no children
286
*/
287
public Element firstElementChild();
288
289
/**
290
* Get the last child element.
291
* @return last child Element, or null if no children
292
*/
293
public Element lastElementChild();
294
```
295
296
**Usage Examples:**
297
298
```java
299
Element paragraph = doc.selectFirst("p");
300
301
// Parent navigation
302
Element parent = paragraph.parent();
303
Elements ancestors = paragraph.parents();
304
305
// Child navigation
306
Elements children = paragraph.children();
307
int childCount = paragraph.childrenSize();
308
309
if (childCount > 0) {
310
Element firstChild = paragraph.child(0);
311
Element lastChild = paragraph.lastElementChild();
312
}
313
314
// Find specific ancestor
315
Element bodyAncestor = paragraph.parents().select("body").first();
316
```
317
318
### Sibling Navigation
319
320
Navigate between sibling elements at the same level.
321
322
```java { .api }
323
/**
324
* Get the next sibling element.
325
* @return next sibling Element, or null if none
326
*/
327
public Element nextElementSibling();
328
329
/**
330
* Get the previous sibling element.
331
* @return previous sibling Element, or null if none
332
*/
333
public Element previousElementSibling();
334
335
/**
336
* Get all sibling elements (excluding this element).
337
* @return Elements collection of siblings
338
*/
339
public Elements siblingElements();
340
341
/**
342
* Get the index of this element among its siblings.
343
* @return zero-based index among element siblings
344
*/
345
public int elementSiblingIndex();
346
```
347
348
**Usage Examples:**
349
350
```java
351
Element listItem = doc.selectFirst("li");
352
353
// Sibling navigation
354
Element nextItem = listItem.nextElementSibling();
355
Element prevItem = listItem.previousElementSibling();
356
Elements allSiblings = listItem.siblingElements();
357
358
int position = listItem.elementSiblingIndex();
359
System.out.println("This is list item #" + (position + 1));
360
```
361
362
### DOM Modification
363
364
Add, remove, and modify DOM structure.
365
366
```java { .api }
367
/**
368
* Add a child node at the end of this element's children.
369
* @param child Node to add
370
* @return this Element for chaining
371
*/
372
public Element appendChild(Node child);
373
374
/**
375
* Add a child node at the beginning of this element's children.
376
* @param child Node to add
377
* @return this Element for chaining
378
*/
379
public Element prependChild(Node child);
380
381
/**
382
* Insert child nodes at the specified index.
383
* @param index insertion index
384
* @param children nodes to insert
385
* @return this Element for chaining
386
*/
387
public Element insertChildren(int index, Collection<? extends Node> children);
388
389
/**
390
* Create and append a new child element.
391
* @param tagName tag name for new element
392
* @return the new child Element
393
*/
394
public Element appendElement(String tagName);
395
396
/**
397
* Create and prepend a new child element.
398
* @param tagName tag name for new element
399
* @return the new child Element
400
*/
401
public Element prependElement(String tagName);
402
403
/**
404
* Add text content at the end of this element.
405
* @param text text to append
406
* @return this Element for chaining
407
*/
408
public Element appendText(String text);
409
410
/**
411
* Add text content at the beginning of this element.
412
* @param text text to prepend
413
* @return this Element for chaining
414
*/
415
public Element prependText(String text);
416
```
417
418
**Usage Examples:**
419
420
```java
421
Element container = doc.selectFirst("div");
422
423
// Add child elements
424
Element newParagraph = doc.createElement("p");
425
newParagraph.text("New paragraph");
426
container.appendChild(newParagraph);
427
428
// Create and add in one step
429
Element header = container.appendElement("h2");
430
header.text("Section Header");
431
432
// Add text content
433
container.appendText("Additional text");
434
container.prependText("Prefix text");
435
436
// Insert at specific position
437
Element span = doc.createElement("span");
438
span.text("Inserted span");
439
container.insertChildren(1, Arrays.asList(span));
440
```
441
442
### HTML Insertion
443
444
Insert HTML content relative to elements.
445
446
```java { .api }
447
/**
448
* Parse and append HTML content to this element.
449
* @param html HTML to parse and append
450
* @return this Element for chaining
451
*/
452
public Element append(String html);
453
454
/**
455
* Parse and prepend HTML content to this element.
456
* @param html HTML to parse and prepend
457
* @return this Element for chaining
458
*/
459
public Element prepend(String html);
460
461
/**
462
* Parse and insert HTML before this element.
463
* @param html HTML to parse and insert
464
* @return this Element for chaining
465
*/
466
public Element before(String html);
467
468
/**
469
* Parse and insert HTML after this element.
470
* @param html HTML to parse and insert
471
* @return this Element for chaining
472
*/
473
public Element after(String html);
474
475
/**
476
* Wrap this element with the provided HTML.
477
* @param html HTML to wrap around this element
478
* @return this Element for chaining
479
*/
480
public Element wrap(String html);
481
482
/**
483
* Remove this element from the DOM but keep its children.
484
* @return first child that replaced this element, or null
485
*/
486
public Node unwrap();
487
```
488
489
**Usage Examples:**
490
491
```java
492
Element paragraph = doc.selectFirst("p");
493
494
// Insert HTML content
495
paragraph.append("<strong>Bold text</strong>");
496
paragraph.prepend("<em>Italic text</em>");
497
498
// Insert relative to element
499
paragraph.before("<hr>");
500
paragraph.after("<br><br>");
501
502
// Wrap element
503
paragraph.wrap("<div class='wrapper'></div>");
504
505
// Unwrap (remove wrapper but keep content)
506
Element wrapper = doc.selectFirst(".wrapper");
507
wrapper.unwrap();
508
```
509
510
### Element Removal and Clearing
511
512
Remove elements and clear content.
513
514
```java { .api }
515
/**
516
* Remove this element from the DOM.
517
* @return this Element
518
*/
519
public Element remove();
520
521
/**
522
* Remove all child nodes from this element.
523
* @return this Element for chaining
524
*/
525
public Element empty();
526
```
527
528
**Usage Examples:**
529
530
```java
531
// Remove specific elements
532
Elements ads = doc.select(".advertisement");
533
ads.remove();
534
535
// Clear element content
536
Element container = doc.selectFirst("#content");
537
container.empty(); // Removes all children but keeps the container
538
```
539
540
### CSS Class Operations
541
542
Manipulate CSS classes on elements.
543
544
```java { .api }
545
/**
546
* Get the CSS class attribute value.
547
* @return class attribute value
548
*/
549
public String className();
550
551
/**
552
* Set the CSS class attribute.
553
* @param className new class attribute value
554
* @return this Element for chaining
555
*/
556
public Element className(String className);
557
558
/**
559
* Get CSS class names as a Set.
560
* @return Set of class names
561
*/
562
public Set<String> classNames();
563
564
/**
565
* Add a CSS class name.
566
* @param className class name to add
567
* @return this Element for chaining
568
*/
569
public Element addClass(String className);
570
571
/**
572
* Remove a CSS class name.
573
* @param className class name to remove
574
* @return this Element for chaining
575
*/
576
public Element removeClass(String className);
577
578
/**
579
* Toggle a CSS class name.
580
* @param className class name to toggle
581
* @return this Element for chaining
582
*/
583
public Element toggleClass(String className);
584
585
/**
586
* Test if this element has the specified CSS class.
587
* @param className class name to test
588
* @return true if element has the class
589
*/
590
public boolean hasClass(String className);
591
```
592
593
**Usage Examples:**
594
595
```java
596
Element div = doc.selectFirst("div");
597
598
// Class operations
599
div.addClass("highlight");
600
div.addClass("active");
601
div.removeClass("hidden");
602
div.toggleClass("expanded");
603
604
boolean isActive = div.hasClass("active");
605
Set<String> classes = div.classNames();
606
607
// Set all classes at once
608
div.className("new-class another-class");
609
```
610
611
### Form Element Values
612
613
Work with form input values.
614
615
```java { .api }
616
/**
617
* Get the form element value (input, textarea, select).
618
* @return element value
619
*/
620
public String val();
621
622
/**
623
* Set the form element value.
624
* @param value new value
625
* @return this Element for chaining
626
*/
627
public Element val(String value);
628
```
629
630
**Usage Examples:**
631
632
```java
633
// Input elements
634
Element textInput = doc.selectFirst("input[type=text]");
635
String currentValue = textInput.val();
636
textInput.val("New value");
637
638
// Textarea
639
Element textarea = doc.selectFirst("textarea");
640
textarea.val("New textarea content");
641
642
// Select elements
643
Element select = doc.selectFirst("select");
644
select.val("option2"); // Select option with value="option2"
645
```
646
647
### Element Cloning
648
649
Create copies of elements.
650
651
```java { .api }
652
/**
653
* Create a deep copy of this element and its descendants.
654
* @return cloned Element
655
*/
656
public Element clone();
657
658
/**
659
* Create a shallow copy of this element (no children).
660
* @return shallow cloned Element
661
*/
662
public Element shallowClone();
663
```
664
665
**Usage Examples:**
666
667
```java
668
Element original = doc.selectFirst("div.template");
669
670
// Deep clone (includes all children)
671
Element fullCopy = original.clone();
672
fullCopy.attr("id", "copy1");
673
674
// Shallow clone (element only, no children)
675
Element shallowCopy = original.shallowClone();
676
shallowCopy.text("New content");
677
678
// Add clones to document
679
doc.body().appendChild(fullCopy);
680
doc.body().appendChild(shallowCopy);
681
```
682
683
## Node Base Class
684
685
All DOM objects inherit from Node, providing basic tree navigation and manipulation.
686
687
```java { .api }
688
/**
689
* Get the node name (tag name for elements, "#text" for text nodes, etc.).
690
* @return node name
691
*/
692
public String nodeName();
693
694
/**
695
* Get child nodes (including text nodes and elements).
696
* @return List of child nodes
697
*/
698
public List<Node> childNodes();
699
700
/**
701
* Get the number of child nodes.
702
* @return count of child nodes
703
*/
704
public int childNodeSize();
705
706
/**
707
* Get parent node.
708
* @return parent Node, or null if root
709
*/
710
public Node parentNode();
711
712
/**
713
* Get the document that contains this node.
714
* @return owner Document
715
*/
716
public Document ownerDocument();
717
718
/**
719
* Remove this node from the DOM.
720
*/
721
public void remove();
722
723
/**
724
* Replace this node with another node.
725
* @param in replacement node
726
*/
727
public void replace(Node in);
728
```
729
730
This comprehensive DOM manipulation API provides all the tools needed for programmatic HTML document modification and content extraction.