0
# HtmlUnit
1
2
HtmlUnit is a headless web browser library for Java that models HTML documents and provides an API for programmatic web interaction. It enables form submission, link clicking, JavaScript execution, and DOM manipulation - simulating user browser behavior for automated testing and web scraping.
3
4
## Package Information
5
6
- **Package Name**: htmlunit
7
- **Package Type**: maven
8
- **Language**: Java
9
- **Installation**: See below for Maven and Gradle
10
11
### Maven
12
13
Add to your `pom.xml`:
14
15
```xml
16
<dependency>
17
<groupId>org.htmlunit</groupId>
18
<artifactId>htmlunit</artifactId>
19
<version>4.17.0-SNAPSHOT</version>
20
</dependency>
21
```
22
23
### Gradle
24
25
Add to your `build.gradle`:
26
27
```groovy
28
implementation 'org.htmlunit:htmlunit:4.17.0-SNAPSHOT'
29
```
30
31
## Core Imports
32
33
```java
34
import org.htmlunit.WebClient;
35
import org.htmlunit.html.HtmlPage;
36
import org.htmlunit.BrowserVersion;
37
```
38
39
For form handling:
40
41
```java
42
import org.htmlunit.html.HtmlForm;
43
import org.htmlunit.html.HtmlTextInput;
44
import org.htmlunit.html.HtmlSubmitInput;
45
import org.htmlunit.html.HtmlSelect;
46
```
47
48
For HTTP requests:
49
50
```java
51
import org.htmlunit.WebRequest;
52
import org.htmlunit.WebResponse;
53
import org.htmlunit.HttpMethod;
54
```
55
56
For JavaScript handling:
57
58
```java
59
import org.htmlunit.AlertHandler;
60
import org.htmlunit.ConfirmHandler;
61
import org.htmlunit.JavaScriptErrorListener;
62
```
63
64
For cookie management:
65
66
```java
67
import org.htmlunit.CookieManager;
68
import org.htmlunit.util.Cookie;
69
```
70
71
## Basic Usage
72
73
```java
74
import org.htmlunit.WebClient;
75
import org.htmlunit.html.HtmlPage;
76
import org.htmlunit.html.HtmlForm;
77
import org.htmlunit.html.HtmlTextInput;
78
import org.htmlunit.BrowserVersion;
79
80
// Create web client
81
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
82
// Configure options
83
webClient.getOptions().setJavaScriptEnabled(true);
84
webClient.getOptions().setCssEnabled(false);
85
webClient.getOptions().setThrowExceptionOnScriptError(false);
86
87
// Navigate to page
88
HtmlPage page = webClient.getPage("http://example.com");
89
System.out.println("Page title: " + page.getTitleText());
90
91
// Find and fill form
92
HtmlForm form = page.getFormByName("loginForm");
93
HtmlTextInput username = form.getInputByName("username");
94
username.setValue("myuser");
95
96
// Submit form
97
HtmlPage result = form.getInputByValue("Login").click();
98
System.out.println("Result: " + result.asNormalizedText());
99
}
100
```
101
102
## Architecture
103
104
HtmlUnit is built around several key components:
105
106
- **WebClient**: Main entry point managing browser configuration, cookie handling, and page navigation
107
- **Page Hierarchy**: Type-safe page representations (HtmlPage, TextPage, UnexpectedPage) with full DOM access
108
- **HTML Elements**: Complete DOM element model with interactive capabilities (forms, links, inputs)
109
- **JavaScript Engine**: Integrated Rhino-based JavaScript execution with browser API simulation
110
- **HTTP Layer**: Customizable HTTP connection handling with request/response processing
111
- **Browser Simulation**: Accurate browser version emulation including user agents and feature support
112
113
## Capabilities
114
115
### Web Client Management
116
117
Core browser functionality including client configuration, page navigation, window management, and resource cleanup. Essential for all web automation tasks.
118
119
```java { .api }
120
public class WebClient implements AutoCloseable {
121
public WebClient();
122
public WebClient(BrowserVersion browserVersion);
123
public <P extends Page> P getPage(String url) throws IOException;
124
public <P extends Page> P getPage(URL url) throws IOException;
125
public <P extends Page> P getPage(WebRequest request) throws IOException;
126
public void close();
127
public WebClientOptions getOptions();
128
public BrowserVersion getBrowserVersion();
129
}
130
131
public class WebClientOptions {
132
public void setJavaScriptEnabled(boolean enabled);
133
public boolean isJavaScriptEnabled();
134
public void setCssEnabled(boolean enabled);
135
public void setThrowExceptionOnScriptError(boolean throwException);
136
public void setTimeout(int timeout);
137
}
138
```
139
140
[Web Client](./web-client.md)
141
142
### Page and DOM Interaction
143
144
HTML page representation and DOM manipulation capabilities including element selection, content extraction, and page structure navigation.
145
146
```java { .api }
147
public class HtmlPage extends SgmlPage {
148
public DomElement getElementById(String id);
149
public DomNodeList<HtmlElement> getElementsByTagName(String name);
150
public String getTitleText();
151
public String asNormalizedText();
152
public List<HtmlForm> getForms();
153
public List<HtmlAnchor> getAnchors();
154
}
155
156
public abstract class HtmlElement extends DomElement {
157
public void click() throws IOException;
158
public String getAttribute(String name);
159
public void setAttribute(String name, String value);
160
public String getId();
161
public void focus();
162
}
163
```
164
165
[Page and DOM](./page-dom.md)
166
167
### Form Handling
168
169
Comprehensive form interaction including input field manipulation, form submission, and all HTML form element types (text, password, checkbox, radio, select).
170
171
```java { .api }
172
public class HtmlForm extends HtmlElement {
173
public <P extends Page> P submit() throws IOException;
174
public <P extends Page> P submit(SubmittableElement submitElement) throws IOException;
175
public HtmlElement getInputByName(String name);
176
public List<HtmlElement> getInputsByName(String name);
177
public HtmlTextArea getTextAreaByName(String name);
178
public HtmlSelect getSelectByName(String name);
179
}
180
181
public abstract class HtmlInput extends HtmlElement {
182
public String getValue();
183
public void setValue(String value);
184
public String getName();
185
public String getType();
186
}
187
```
188
189
[Form Handling](./forms.md)
190
191
### HTTP Request and Response
192
193
HTTP communication layer providing request customization, response processing, header management, and connection configuration.
194
195
```java { .api }
196
public class WebRequest {
197
public WebRequest(URL url);
198
public WebRequest(URL url, HttpMethod method);
199
public URL getUrl();
200
public HttpMethod getHttpMethod();
201
public void setRequestBody(String body);
202
public void setAdditionalHeader(String name, String value);
203
public List<NameValuePair> getRequestParameters();
204
}
205
206
public class WebResponse {
207
public int getStatusCode();
208
public String getStatusMessage();
209
public String getContentAsString();
210
public String getContentType();
211
public List<NameValuePair> getResponseHeaders();
212
}
213
```
214
215
[HTTP Handling](./http.md)
216
217
### JavaScript Integration
218
219
JavaScript engine configuration and event handling including script execution control, error handling, and browser API simulation.
220
221
```java { .api }
222
public interface AlertHandler {
223
void handleAlert(Page page, String message);
224
}
225
226
public interface ConfirmHandler {
227
boolean handleConfirm(Page page, String message);
228
}
229
230
public interface JavaScriptErrorListener {
231
void scriptException(HtmlPage page, ScriptException scriptException);
232
void timeoutError(HtmlPage page, long allowedTime, long executionTime);
233
}
234
```
235
236
[JavaScript](./javascript.md)
237
238
### Cookie Management
239
240
Cookie handling and session management including cookie creation, retrieval, and automatic cookie processing for session maintenance.
241
242
```java { .api }
243
public class CookieManager {
244
public void setCookiesEnabled(boolean enabled);
245
public boolean isCookiesEnabled();
246
public Set<Cookie> getCookies();
247
public void addCookie(Cookie cookie);
248
public void removeCookie(Cookie cookie);
249
public void clearExpired(Date date);
250
}
251
252
public class Cookie {
253
public Cookie(String domain, String name, String value);
254
public String getName();
255
public String getValue();
256
public String getDomain();
257
public String getPath();
258
public Date getExpires();
259
public boolean isSecure();
260
public boolean isHttpOnly();
261
}
262
```
263
264
[Cookie Management](./cookies.md)
265
266
### Window Management
267
268
Browser window and frame management including multiple window handling, window navigation, and frame interactions.
269
270
```java { .api }
271
public interface WebWindow {
272
public Page getEnclosedPage();
273
public void setEnclosedPage(Page page);
274
public String getName();
275
public WebWindow getParentWindow();
276
public WebWindow getTopWindow();
277
public WebClient getWebClient();
278
}
279
280
public class TopLevelWindow implements WebWindow {
281
// Main browser windows
282
}
283
284
public class FrameWindow implements WebWindow {
285
// Frame and iframe windows
286
}
287
```
288
289
[Window Management](./windows.md)
290
291
### Exception Handling
292
293
Error handling and exception management for HTTP errors, JavaScript errors, and element access failures.
294
295
```java { .api }
296
public class FailingHttpStatusCodeException extends RuntimeException {
297
public int getStatusCode();
298
public String getStatusMessage();
299
public WebResponse getResponse();
300
}
301
302
public class ElementNotFoundException extends RuntimeException {
303
// Thrown when elements cannot be found
304
}
305
306
public class ScriptException extends RuntimeException {
307
// JavaScript execution errors
308
}
309
```
310
311
[Exception Handling](./exceptions.md)
312
313
## Types
314
315
```java { .api }
316
public enum BrowserVersion {
317
CHROME, FIREFOX, FIREFOX_ESR, EDGE, BEST_SUPPORTED;
318
319
public boolean isChrome();
320
public boolean isFirefox();
321
public String getUserAgent();
322
}
323
324
public enum HttpMethod {
325
GET, POST, PUT, DELETE, HEAD, OPTIONS, TRACE, PATCH
326
}
327
328
public interface Page {
329
void initialize();
330
void cleanUp();
331
WebResponse getWebResponse();
332
URL getUrl();
333
boolean isHtmlPage();
334
}
335
336
public class NameValuePair {
337
public NameValuePair(String name, String value);
338
public String getName();
339
public String getValue();
340
}
341
342
public interface DomNodeList<T extends DomNode> extends List<T> {
343
// Specialized list interface for DOM nodes
344
// Implements all List methods for accessing DOM elements
345
}
346
347
public interface WebWindow {
348
Page getEnclosedPage();
349
void setEnclosedPage(Page page);
350
String getName();
351
WebWindow getParentWindow();
352
WebWindow getTopWindow();
353
WebClient getWebClient();
354
}
355
356
public class Cookie {
357
public Cookie(String domain, String name, String value);
358
public String getName();
359
public String getValue();
360
public String getDomain();
361
public String getPath();
362
public Date getExpires();
363
public boolean isSecure();
364
public boolean isHttpOnly();
365
}
366
367
public class CookieManager {
368
public void setCookiesEnabled(boolean enabled);
369
public boolean isCookiesEnabled();
370
public Set<Cookie> getCookies();
371
public void addCookie(Cookie cookie);
372
public void removeCookie(Cookie cookie);
373
public void clearExpired(Date date);
374
}
375
376
public class FailingHttpStatusCodeException extends RuntimeException {
377
public int getStatusCode();
378
public String getStatusMessage();
379
public WebResponse getResponse();
380
}
381
382
public class ScriptException extends RuntimeException {
383
// JavaScript execution errors with detailed error information
384
}
385
```