A well-tested UTF-8 encoder/decoder written in JavaScript
npx @tessl/cli install tessl/npm-utf8@3.0.00
# UTF-8 JavaScript Library
1
2
UTF-8.js is a well-tested UTF-8 encoder/decoder written in JavaScript that provides proper UTF-8 encoding and decoding according to the WHATWG Encoding Standard. It handles Unicode scalar values correctly and provides comprehensive error handling for malformed input.
3
4
## Package Information
5
6
- **Package Name**: utf8
7
- **Package Type**: npm
8
- **Language**: JavaScript
9
- **Installation**: `npm install utf8`
10
11
## Core Imports
12
13
```javascript
14
const utf8 = require('utf8');
15
```
16
17
For ES modules:
18
19
```javascript
20
import * as utf8 from 'utf8';
21
```
22
23
Browser usage:
24
25
```html
26
<script src="utf8.js"></script>
27
<!-- Creates global utf8 object -->
28
```
29
30
## Basic Usage
31
32
```javascript
33
const utf8 = require('utf8');
34
35
// Encode JavaScript string to UTF-8 byte string
36
const encoded = utf8.encode('Hello, 世界!');
37
console.log(encoded); // Output: UTF-8 encoded byte string
38
39
// Decode UTF-8 byte string back to JavaScript string
40
const decoded = utf8.decode(encoded);
41
console.log(decoded); // Output: 'Hello, 世界!'
42
43
// Check library version
44
console.log(utf8.version); // Output: '3.0.0'
45
```
46
47
## Capabilities
48
49
### UTF-8 Encoding
50
51
Encodes JavaScript strings as UTF-8 byte strings with proper Unicode scalar value handling.
52
53
```javascript { .api }
54
/**
55
* Encodes any given JavaScript string as UTF-8
56
* @param {string} string - JavaScript string to encode as UTF-8
57
* @returns {string} UTF-8-encoded byte string
58
* @throws {Error} When input contains non-scalar values (lone surrogates)
59
*/
60
utf8.encode(string);
61
```
62
63
**Usage Examples:**
64
65
```javascript
66
// Basic ASCII encoding
67
utf8.encode('Hello');
68
// → 'Hello'
69
70
// Unicode characters
71
utf8.encode('\xA9'); // U+00A9 COPYRIGHT SIGN
72
// → '\xC2\xA9'
73
74
// Supplementary characters (surrogate pairs)
75
utf8.encode('\uD800\uDC01'); // U+10001 LINEAR B SYLLABLE B038 E
76
// → '\xF0\x90\x80\x81'
77
78
// Multi-byte Unicode
79
utf8.encode('世界'); // Chinese characters
80
// → '\xE4\xB8\x96\xE7\x95\x8C'
81
```
82
83
**Error Handling:**
84
85
```javascript
86
try {
87
// This will throw an error due to lone surrogate
88
utf8.encode('\uD800'); // High surrogate without matching low surrogate
89
} catch (error) {
90
console.error(error.message); // "Lone surrogate U+D800 is not a scalar value"
91
}
92
```
93
94
### UTF-8 Decoding
95
96
Decodes UTF-8 byte strings back to JavaScript strings with malformed input detection.
97
98
```javascript { .api }
99
/**
100
* Decodes any given UTF-8-encoded string as UTF-8
101
* @param {string} byteString - UTF-8 encoded byte string to decode
102
* @returns {string} JavaScript string (UTF-8 decoded)
103
* @throws {Error} When malformed UTF-8 is detected
104
*/
105
utf8.decode(byteString);
106
```
107
108
**Usage Examples:**
109
110
```javascript
111
// Basic decoding
112
utf8.decode('\xC2\xA9');
113
// → '\xA9' (U+00A9 COPYRIGHT SIGN)
114
115
// Supplementary characters
116
utf8.decode('\xF0\x90\x80\x81');
117
// → '\uD800\uDC01' (U+10001 LINEAR B SYLLABLE B038 E)
118
119
// Multi-byte sequences
120
utf8.decode('\xE4\xB8\x96\xE7\x95\x8C');
121
// → '世界'
122
```
123
124
**Error Handling:**
125
126
```javascript
127
try {
128
// This will throw an error due to malformed UTF-8
129
utf8.decode('\xFF\xFE'); // Invalid UTF-8 sequence
130
} catch (error) {
131
console.error(error.message); // "Invalid UTF-8 detected"
132
}
133
134
try {
135
// This will throw an error due to incomplete sequence
136
utf8.decode('\xC2'); // Incomplete 2-byte sequence
137
} catch (error) {
138
console.error(error.message); // "Invalid byte index"
139
}
140
```
141
142
### Version Information
143
144
Provides the semantic version number of the library.
145
146
```javascript { .api }
147
/**
148
* Semantic version number of the utf8 library
149
* @type {string}
150
*/
151
utf8.version;
152
```
153
154
**Usage Example:**
155
156
```javascript
157
console.log(`Using utf8.js version ${utf8.version}`);
158
// Output: "Using utf8.js version 3.0.0"
159
```
160
161
## Error Types
162
163
The library throws standard JavaScript `Error` objects with descriptive messages:
164
165
- **Lone Surrogate Error**: Thrown when `encode()` encounters unpaired surrogate characters
166
- **Invalid UTF-8 Error**: Thrown when `decode()` encounters malformed UTF-8 sequences
167
- **Invalid Byte Index Error**: Thrown when decoding exceeds byte boundaries
168
- **Invalid Continuation Byte Error**: Thrown when UTF-8 continuation bytes are malformed
169
170
## Unicode Support
171
172
- **Full Unicode Range**: Supports U+0000 to U+10FFFF (complete Unicode range)
173
- **Surrogate Pair Handling**: Properly handles JavaScript surrogate pairs for supplementary characters
174
- **Scalar Value Validation**: Enforces Unicode scalar value requirements (no lone surrogates)
175
- **Standards Compliance**: Implements UTF-8 encoding per WHATWG Encoding Standard
176
- **Multi-byte Sequences**: Correctly handles 1-4 byte UTF-8 sequences
177
178
## Browser Compatibility
179
180
The library has been tested and works in:
181
- Chrome 27+
182
- Firefox 3+
183
- Safari 4+
184
- Opera 10+
185
- Internet Explorer 6+
186
- Node.js v0.10.0+
187
- Various JavaScript engines (Rhino, Narwhal, RingoJS, PhantomJS)