or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-org-htmlunit--htmlunit

A headless browser for Java programs that provides web automation, form handling, JavaScript execution, and DOM manipulation capabilities.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/org.htmlunit/htmlunit@4.17.x

To install, run

npx @tessl/cli install tessl/maven-org-htmlunit--htmlunit@4.17.0

0

# HtmlUnit

1

2

HtmlUnit is a headless web browser library for Java that models HTML documents and provides an API for programmatic web interaction. It enables form submission, link clicking, JavaScript execution, and DOM manipulation - simulating user browser behavior for automated testing and web scraping.

3

4

## Package Information

5

6

- **Package Name**: htmlunit

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**: See below for Maven and Gradle

10

11

### Maven

12

13

Add to your `pom.xml`:

14

15

```xml

16

<dependency>

17

<groupId>org.htmlunit</groupId>

18

<artifactId>htmlunit</artifactId>

19

<version>4.17.0-SNAPSHOT</version>

20

</dependency>

21

```

22

23

### Gradle

24

25

Add to your `build.gradle`:

26

27

```groovy

28

implementation 'org.htmlunit:htmlunit:4.17.0-SNAPSHOT'

29

```

30

31

## Core Imports

32

33

```java

34

import org.htmlunit.WebClient;

35

import org.htmlunit.html.HtmlPage;

36

import org.htmlunit.BrowserVersion;

37

```

38

39

For form handling:

40

41

```java

42

import org.htmlunit.html.HtmlForm;

43

import org.htmlunit.html.HtmlTextInput;

44

import org.htmlunit.html.HtmlSubmitInput;

45

import org.htmlunit.html.HtmlSelect;

46

```

47

48

For HTTP requests:

49

50

```java

51

import org.htmlunit.WebRequest;

52

import org.htmlunit.WebResponse;

53

import org.htmlunit.HttpMethod;

54

```

55

56

For JavaScript handling:

57

58

```java

59

import org.htmlunit.AlertHandler;

60

import org.htmlunit.ConfirmHandler;

61

import org.htmlunit.JavaScriptErrorListener;

62

```

63

64

For cookie management:

65

66

```java

67

import org.htmlunit.CookieManager;

68

import org.htmlunit.util.Cookie;

69

```

70

71

## Basic Usage

72

73

```java

74

import org.htmlunit.WebClient;

75

import org.htmlunit.html.HtmlPage;

76

import org.htmlunit.html.HtmlForm;

77

import org.htmlunit.html.HtmlTextInput;

78

import org.htmlunit.BrowserVersion;

79

80

// Create web client

81

try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {

82

// Configure options

83

webClient.getOptions().setJavaScriptEnabled(true);

84

webClient.getOptions().setCssEnabled(false);

85

webClient.getOptions().setThrowExceptionOnScriptError(false);

86

87

// Navigate to page

88

HtmlPage page = webClient.getPage("http://example.com");

89

System.out.println("Page title: " + page.getTitleText());

90

91

// Find and fill form

92

HtmlForm form = page.getFormByName("loginForm");

93

HtmlTextInput username = form.getInputByName("username");

94

username.setValue("myuser");

95

96

// Submit form

97

HtmlPage result = form.getInputByValue("Login").click();

98

System.out.println("Result: " + result.asNormalizedText());

99

}

100

```

101

102

## Architecture

103

104

HtmlUnit is built around several key components:

105

106

- **WebClient**: Main entry point managing browser configuration, cookie handling, and page navigation

107

- **Page Hierarchy**: Type-safe page representations (HtmlPage, TextPage, UnexpectedPage) with full DOM access

108

- **HTML Elements**: Complete DOM element model with interactive capabilities (forms, links, inputs)

109

- **JavaScript Engine**: Integrated Rhino-based JavaScript execution with browser API simulation

110

- **HTTP Layer**: Customizable HTTP connection handling with request/response processing

111

- **Browser Simulation**: Accurate browser version emulation including user agents and feature support

112

113

## Capabilities

114

115

### Web Client Management

116

117

Core browser functionality including client configuration, page navigation, window management, and resource cleanup. Essential for all web automation tasks.

118

119

```java { .api }

120

public class WebClient implements AutoCloseable {

121

public WebClient();

122

public WebClient(BrowserVersion browserVersion);

123

public <P extends Page> P getPage(String url) throws IOException;

124

public <P extends Page> P getPage(URL url) throws IOException;

125

public <P extends Page> P getPage(WebRequest request) throws IOException;

126

public void close();

127

public WebClientOptions getOptions();

128

public BrowserVersion getBrowserVersion();

129

}

130

131

public class WebClientOptions {

132

public void setJavaScriptEnabled(boolean enabled);

133

public boolean isJavaScriptEnabled();

134

public void setCssEnabled(boolean enabled);

135

public void setThrowExceptionOnScriptError(boolean throwException);

136

public void setTimeout(int timeout);

137

}

138

```

139

140

[Web Client](./web-client.md)

141

142

### Page and DOM Interaction

143

144

HTML page representation and DOM manipulation capabilities including element selection, content extraction, and page structure navigation.

145

146

```java { .api }

147

public class HtmlPage extends SgmlPage {

148

public DomElement getElementById(String id);

149

public DomNodeList<HtmlElement> getElementsByTagName(String name);

150

public String getTitleText();

151

public String asNormalizedText();

152

public List<HtmlForm> getForms();

153

public List<HtmlAnchor> getAnchors();

154

}

155

156

public abstract class HtmlElement extends DomElement {

157

public void click() throws IOException;

158

public String getAttribute(String name);

159

public void setAttribute(String name, String value);

160

public String getId();

161

public void focus();

162

}

163

```

164

165

[Page and DOM](./page-dom.md)

166

167

### Form Handling

168

169

Comprehensive form interaction including input field manipulation, form submission, and all HTML form element types (text, password, checkbox, radio, select).

170

171

```java { .api }

172

public class HtmlForm extends HtmlElement {

173

public <P extends Page> P submit() throws IOException;

174

public <P extends Page> P submit(SubmittableElement submitElement) throws IOException;

175

public HtmlElement getInputByName(String name);

176

public List<HtmlElement> getInputsByName(String name);

177

public HtmlTextArea getTextAreaByName(String name);

178

public HtmlSelect getSelectByName(String name);

179

}

180

181

public abstract class HtmlInput extends HtmlElement {

182

public String getValue();

183

public void setValue(String value);

184

public String getName();

185

public String getType();

186

}

187

```

188

189

[Form Handling](./forms.md)

190

191

### HTTP Request and Response

192

193

HTTP communication layer providing request customization, response processing, header management, and connection configuration.

194

195

```java { .api }

196

public class WebRequest {

197

public WebRequest(URL url);

198

public WebRequest(URL url, HttpMethod method);

199

public URL getUrl();

200

public HttpMethod getHttpMethod();

201

public void setRequestBody(String body);

202

public void setAdditionalHeader(String name, String value);

203

public List<NameValuePair> getRequestParameters();

204

}

205

206

public class WebResponse {

207

public int getStatusCode();

208

public String getStatusMessage();

209

public String getContentAsString();

210

public String getContentType();

211

public List<NameValuePair> getResponseHeaders();

212

}

213

```

214

215

[HTTP Handling](./http.md)

216

217

### JavaScript Integration

218

219

JavaScript engine configuration and event handling including script execution control, error handling, and browser API simulation.

220

221

```java { .api }

222

public interface AlertHandler {

223

void handleAlert(Page page, String message);

224

}

225

226

public interface ConfirmHandler {

227

boolean handleConfirm(Page page, String message);

228

}

229

230

public interface JavaScriptErrorListener {

231

void scriptException(HtmlPage page, ScriptException scriptException);

232

void timeoutError(HtmlPage page, long allowedTime, long executionTime);

233

}

234

```

235

236

[JavaScript](./javascript.md)

237

238

### Cookie Management

239

240

Cookie handling and session management including cookie creation, retrieval, and automatic cookie processing for session maintenance.

241

242

```java { .api }

243

public class CookieManager {

244

public void setCookiesEnabled(boolean enabled);

245

public boolean isCookiesEnabled();

246

public Set<Cookie> getCookies();

247

public void addCookie(Cookie cookie);

248

public void removeCookie(Cookie cookie);

249

public void clearExpired(Date date);

250

}

251

252

public class Cookie {

253

public Cookie(String domain, String name, String value);

254

public String getName();

255

public String getValue();

256

public String getDomain();

257

public String getPath();

258

public Date getExpires();

259

public boolean isSecure();

260

public boolean isHttpOnly();

261

}

262

```

263

264

[Cookie Management](./cookies.md)

265

266

### Window Management

267

268

Browser window and frame management including multiple window handling, window navigation, and frame interactions.

269

270

```java { .api }

271

public interface WebWindow {

272

public Page getEnclosedPage();

273

public void setEnclosedPage(Page page);

274

public String getName();

275

public WebWindow getParentWindow();

276

public WebWindow getTopWindow();

277

public WebClient getWebClient();

278

}

279

280

public class TopLevelWindow implements WebWindow {

281

// Main browser windows

282

}

283

284

public class FrameWindow implements WebWindow {

285

// Frame and iframe windows

286

}

287

```

288

289

[Window Management](./windows.md)

290

291

### Exception Handling

292

293

Error handling and exception management for HTTP errors, JavaScript errors, and element access failures.

294

295

```java { .api }

296

public class FailingHttpStatusCodeException extends RuntimeException {

297

public int getStatusCode();

298

public String getStatusMessage();

299

public WebResponse getResponse();

300

}

301

302

public class ElementNotFoundException extends RuntimeException {

303

// Thrown when elements cannot be found

304

}

305

306

public class ScriptException extends RuntimeException {

307

// JavaScript execution errors

308

}

309

```

310

311

[Exception Handling](./exceptions.md)

312

313

## Types

314

315

```java { .api }

316

public enum BrowserVersion {

317

CHROME, FIREFOX, FIREFOX_ESR, EDGE, BEST_SUPPORTED;

318

319

public boolean isChrome();

320

public boolean isFirefox();

321

public String getUserAgent();

322

}

323

324

public enum HttpMethod {

325

GET, POST, PUT, DELETE, HEAD, OPTIONS, TRACE, PATCH

326

}

327

328

public interface Page {

329

void initialize();

330

void cleanUp();

331

WebResponse getWebResponse();

332

URL getUrl();

333

boolean isHtmlPage();

334

}

335

336

public class NameValuePair {

337

public NameValuePair(String name, String value);

338

public String getName();

339

public String getValue();

340

}

341

342

public interface DomNodeList<T extends DomNode> extends List<T> {

343

// Specialized list interface for DOM nodes

344

// Implements all List methods for accessing DOM elements

345

}

346

347

public interface WebWindow {

348

Page getEnclosedPage();

349

void setEnclosedPage(Page page);

350

String getName();

351

WebWindow getParentWindow();

352

WebWindow getTopWindow();

353

WebClient getWebClient();

354

}

355

356

public class Cookie {

357

public Cookie(String domain, String name, String value);

358

public String getName();

359

public String getValue();

360

public String getDomain();

361

public String getPath();

362

public Date getExpires();

363

public boolean isSecure();

364

public boolean isHttpOnly();

365

}

366

367

public class CookieManager {

368

public void setCookiesEnabled(boolean enabled);

369

public boolean isCookiesEnabled();

370

public Set<Cookie> getCookies();

371

public void addCookie(Cookie cookie);

372

public void removeCookie(Cookie cookie);

373

public void clearExpired(Date date);

374

}

375

376

public class FailingHttpStatusCodeException extends RuntimeException {

377

public int getStatusCode();

378

public String getStatusMessage();

379

public WebResponse getResponse();

380

}

381

382

public class ScriptException extends RuntimeException {

383

// JavaScript execution errors with detailed error information

384

}

385

```