or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-net-sourceforge-htmlunit--htmlunit

A headless browser intended for use in testing web-based applications.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/net.sourceforge.htmlunit/htmlunit@2.70.x

To install, run

npx @tessl/cli install tessl/maven-net-sourceforge-htmlunit--htmlunit@2.70.0

0

# HtmlUnit

1

2

HtmlUnit is a comprehensive headless browser library for Java that enables automated testing and web scraping of web-based applications. It provides a pure Java implementation of a web browser with full HTML, CSS, and JavaScript support, including advanced features like form submission, cookie management, SSL certificate handling, and proxy configuration.

3

4

## Package Information

5

6

- **Package Name**: htmlunit

7

- **Package Type**: maven

8

- **Language**: Java

9

- **GroupId**: net.sourceforge.htmlunit

10

- **ArtifactId**: htmlunit

11

- **Installation**: Add to `pom.xml`: `<dependency><groupId>net.sourceforge.htmlunit</groupId><artifactId>htmlunit</artifactId><version>2.70.0</version></dependency>`

12

13

## Core Imports

14

15

```java

16

import com.gargoylesoftware.htmlunit.WebClient;

17

import com.gargoylesoftware.htmlunit.html.HtmlPage;

18

import com.gargoylesoftware.htmlunit.BrowserVersion;

19

import com.gargoylesoftware.htmlunit.WebRequest;

20

import com.gargoylesoftware.htmlunit.WebResponse;

21

```

22

23

## Basic Usage

24

25

```java

26

import com.gargoylesoftware.htmlunit.WebClient;

27

import com.gargoylesoftware.htmlunit.html.HtmlPage;

28

import com.gargoylesoftware.htmlunit.html.HtmlForm;

29

import com.gargoylesoftware.htmlunit.html.HtmlTextInput;

30

import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;

31

32

// Create a web client instance

33

try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {

34

// Configure options

35

webClient.getOptions().setJavaScriptEnabled(true);

36

webClient.getOptions().setCssEnabled(false);

37

38

// Load a web page

39

HtmlPage page = webClient.getPage("http://example.com");

40

41

// Find and interact with form elements

42

HtmlForm form = page.getFormByName("loginForm");

43

HtmlTextInput usernameField = form.getInputByName("username");

44

usernameField.setValueAttribute("user123");

45

46

HtmlSubmitInput submitButton = form.getInputByType("submit");

47

HtmlPage resultPage = submitButton.click();

48

49

// Extract page content

50

String pageTitle = resultPage.getTitleText();

51

String pageText = resultPage.asText();

52

} // WebClient implements AutoCloseable

53

```

54

55

## Architecture

56

57

HtmlUnit is built around several key components:

58

59

- **WebClient**: Main browser automation class that manages windows, connections, and global settings

60

- **Page Hierarchy**: Different page types (HtmlPage, TextPage, XmlPage) for different content types

61

- **DOM Tree**: Full DOM implementation with element manipulation and CSS selector support

62

- **JavaScript Engine**: Mozilla Rhino-based JavaScript execution with browser API simulation

63

- **HTTP Layer**: Configurable HTTP client with cookie management, authentication, and proxy support

64

- **Browser Simulation**: Accurate simulation of Chrome, Firefox, IE, and Edge browser behaviors

65

66

## Capabilities

67

68

### Browser Automation

69

70

Core browser automation functionality for loading pages, managing windows, and configuring browser behavior. Essential for web scraping and automated testing.

71

72

```java { .api }

73

public class WebClient implements AutoCloseable {

74

public WebClient();

75

public WebClient(BrowserVersion browserVersion);

76

public WebClient(BrowserVersion browserVersion, String proxyHost, int proxyPort);

77

78

public <P extends Page> P getPage(String url) throws IOException, FailingHttpStatusCodeException;

79

public <P extends Page> P getPage(URL url) throws IOException, FailingHttpStatusCodeException;

80

public <P extends Page> P getPage(WebRequest request) throws IOException, FailingHttpStatusCodeException;

81

82

public WebClientOptions getOptions();

83

public BrowserVersion getBrowserVersion();

84

public void close();

85

}

86

```

87

88

[Browser Automation](./browser-automation.md)

89

90

### HTML DOM Manipulation

91

92

Comprehensive HTML DOM access and manipulation with CSS selectors, XPath queries, and element interaction. Perfect for form automation and content extraction.

93

94

```java { .api }

95

public class HtmlPage extends SgmlPage {

96

public HtmlElement getElementById(String id);

97

public List<HtmlElement> getElementsByTagName(String tagName);

98

public List<HtmlElement> getElementsByName(String name);

99

public List<HtmlElement> getElementsByClassName(String className);

100

101

public HtmlElement querySelector(String selectors);

102

public List<HtmlElement> querySelectorAll(String selectors);

103

104

public DomNode getFirstByXPath(String xpathExpr);

105

public List<?> getByXPath(String xpathExpr);

106

107

public String asText();

108

public String asXml();

109

public String getTitleText();

110

}

111

```

112

113

[HTML DOM Manipulation](./html-dom.md)

114

115

### Form Interaction

116

117

Form automation capabilities including field input, selection handling, and form submission. Ideal for login automation and data entry workflows.

118

119

```java { .api }

120

public class HtmlForm extends HtmlElement {

121

public <P extends Page> P submit() throws IOException;

122

public <P extends Page> P submit(SubmittableElement submitElement) throws IOException;

123

public void reset();

124

125

public HtmlElement getInputByName(String name);

126

public List<HtmlElement> getInputsByName(String name);

127

public HtmlTextInput getInputByValue(String value);

128

}

129

130

public abstract class HtmlInput extends HtmlElement implements SubmittableElement {

131

public String getValueAttribute();

132

public void setValueAttribute(String value);

133

public String getNameAttribute();

134

public boolean isDisabled();

135

public void setDisabled(boolean disabled);

136

}

137

```

138

139

[Form Interaction](./forms.md)

140

141

### HTTP Communication

142

143

HTTP request/response handling with full control over headers, methods, authentication, and connection settings. Essential for API testing and advanced web scraping.

144

145

```java { .api }

146

public class WebRequest {

147

public WebRequest(URL url);

148

public WebRequest(URL url, HttpMethod submitMethod);

149

150

public URL getUrl();

151

public void setUrl(URL url);

152

public HttpMethod getHttpMethod();

153

public void setHttpMethod(HttpMethod method);

154

155

public String getRequestBody();

156

public void setRequestBody(String requestBody);

157

public void setAdditionalHeader(String name, String value);

158

public Map<String, String> getAdditionalHeaders();

159

}

160

161

public class WebResponse {

162

public int getStatusCode();

163

public String getStatusMessage();

164

public String getContentAsString();

165

public String getContentAsString(Charset charset);

166

public InputStream getContentAsStream();

167

public List<NameValuePair> getResponseHeaders();

168

public String getResponseHeaderValue(String headerName);

169

}

170

```

171

172

[HTTP Communication](./http.md)

173

174

### JavaScript Execution

175

176

JavaScript engine integration for executing JavaScript code within web pages and handling browser API calls. Critical for modern web application automation.

177

178

```java { .api }

179

public class HtmlPage extends SgmlPage {

180

public ScriptResult executeJavaScript(String sourceCode);

181

public ScriptResult executeJavaScript(String sourceCode, String sourceName, int startLine);

182

}

183

184

public class ScriptResult {

185

public Object getJavaScriptResult();

186

public Page getNewPage();

187

}

188

189

public interface JavaScriptErrorListener {

190

void scriptException(HtmlPage page, ScriptException scriptException);

191

void timeoutError(HtmlPage page, long allowedTime, long executionTime);

192

void malformedScriptURL(HtmlPage page, String url, MalformedURLException malformedURLException);

193

void loadScriptError(HtmlPage page, URL scriptUrl, Exception exception);

194

}

195

```

196

197

[JavaScript Execution](./javascript.md)

198

199

### Window Management

200

201

Browser window and frame management for handling pop-ups, iframes, and multi-window scenarios. Required for complex web application navigation.

202

203

```java { .api }

204

public interface WebWindow {

205

public String getName();

206

public void setName(String name);

207

public Page getEnclosedPage();

208

public void setEnclosedPage(Page page);

209

public WebClient getWebClient();

210

public WebWindow getParentWindow();

211

public WebWindow getTopWindow();

212

public History getHistory();

213

public int getInnerHeight();

214

public int getInnerWidth();

215

}

216

217

public class TopLevelWindow extends WebWindowImpl {

218

// Implementation for top-level browser windows

219

}

220

221

public class DialogWindow extends WebWindowImpl {

222

// Implementation for modal dialog windows

223

}

224

```

225

226

[Window Management](./windows.md)

227

228

### Cookie Management

229

230

HTTP cookie handling with domain scoping, expiration management, and security flags. Essential for session management and authentication workflows.

231

232

```java { .api }

233

public class CookieManager {

234

public void addCookie(Cookie cookie);

235

public Set<Cookie> getCookies();

236

public Set<Cookie> getCookies(URL url);

237

public void clearCookies();

238

public boolean isCookiesEnabled();

239

public void setCookiesEnabled(boolean enabled);

240

}

241

242

public class Cookie {

243

public Cookie(String domain, String name, String value);

244

public Cookie(String domain, String name, String value, String path, Date expires, boolean secure);

245

246

public String getName();

247

public String getValue();

248

public String getDomain();

249

public String getPath();

250

public Date getExpires();

251

public boolean isSecure();

252

public boolean isHttpOnly();

253

}

254

```

255

256

[Cookie Management](./cookies.md)

257

258

## Common Types

259

260

```java { .api }

261

public enum HttpMethod {

262

OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, PATCH

263

}

264

265

public class BrowserVersion {

266

public static final BrowserVersion CHROME;

267

public static final BrowserVersion FIREFOX;

268

public static final BrowserVersion FIREFOX_ESR;

269

public static final BrowserVersion EDGE;

270

public static final BrowserVersion INTERNET_EXPLORER;

271

public static final BrowserVersion BEST_SUPPORTED;

272

273

public String getApplicationName();

274

public String getApplicationVersion();

275

public String getUserAgent();

276

public boolean hasFeature(BrowserFeature feature);

277

}

278

279

public class NameValuePair {

280

public NameValuePair(String name, String value);

281

public String getName();

282

public String getValue();

283

}

284

285

public class FailingHttpStatusCodeException extends RuntimeException {

286

public int getStatusCode();

287

public String getStatusMessage();

288

public WebResponse getResponse();

289

}

290

291

public class ElementNotFoundException extends RuntimeException {

292

// Thrown when element lookups fail

293

}

294

```