UTF-8 encoder and decoder for robust text processing with validation
npx @tessl/cli install tessl/npm-stablelib--utf8@2.0.00
# @stablelib/utf8
1
2
@stablelib/utf8 provides robust UTF-8 encoding and decoding functionality implemented in TypeScript. It handles conversion between JavaScript strings and UTF-8 byte arrays with comprehensive validation of both UTF-16 surrogate pairs and UTF-8 byte sequences.
3
4
## Package Information
5
6
- **Package Name**: @stablelib/utf8
7
- **Package Type**: npm
8
- **Language**: TypeScript
9
- **Installation**: `npm install @stablelib/utf8`
10
11
## Core Imports
12
13
```typescript
14
import { encode, decode, encodedLength } from "@stablelib/utf8";
15
```
16
17
For CommonJS:
18
19
```javascript
20
const { encode, decode, encodedLength } = require("@stablelib/utf8");
21
```
22
23
## Basic Usage
24
25
```typescript
26
import { encode, decode, encodedLength } from "@stablelib/utf8";
27
28
// Encode a string to UTF-8 bytes
29
const text = "Hello, δΈη! π";
30
const bytes = encode(text);
31
32
// Calculate encoded length without encoding
33
const length = encodedLength(text);
34
console.log(length === bytes.length); // true
35
36
// Decode UTF-8 bytes back to string
37
const decoded = decode(bytes);
38
console.log(decoded === text); // true
39
40
// Handle validation errors
41
try {
42
// This will throw for invalid UTF-16 input
43
encode("Invalid surrogate pair: \uD800");
44
} catch (error) {
45
console.error(error.message); // "utf8: invalid string"
46
}
47
```
48
49
## Capabilities
50
51
### String Encoding
52
53
Converts JavaScript strings to UTF-8 byte arrays with validation.
54
55
```typescript { .api }
56
/**
57
* Encodes the given string into UTF-8 byte array.
58
* Throws if the source string has invalid UTF-16 encoding.
59
* @param s - The string to encode
60
* @returns UTF-8 encoded byte array
61
* @throws Error with message "utf8: invalid string" for invalid UTF-16
62
*/
63
function encode(s: string): Uint8Array;
64
```
65
66
**Usage Examples:**
67
68
```typescript
69
import { encode } from "@stablelib/utf8";
70
71
// Basic ASCII
72
const ascii = encode("Hello");
73
// Result: Uint8Array([72, 101, 108, 108, 111])
74
75
// Unicode characters
76
const unicode = encode("γγγ«γ‘γ―");
77
// Result: UTF-8 encoded bytes for Japanese text
78
79
// Emoji with surrogate pairs
80
const emoji = encode("π");
81
// Result: UTF-8 encoded bytes for Earth emoji
82
83
// Error handling
84
try {
85
encode("Invalid: \uD800"); // Lone high surrogate
86
} catch (error) {
87
console.error(error.message); // "utf8: invalid string"
88
}
89
```
90
91
### Byte Decoding
92
93
Converts UTF-8 byte arrays back to JavaScript strings with validation.
94
95
```typescript { .api }
96
/**
97
* Decodes the given byte array from UTF-8 into a string.
98
* Throws if encoding is invalid.
99
* @param arr - The UTF-8 byte array to decode
100
* @returns Decoded string
101
* @throws Error with message "utf8: invalid source encoding" for invalid UTF-8
102
*/
103
function decode(arr: Uint8Array): string;
104
```
105
106
**Usage Examples:**
107
108
```typescript
109
import { decode } from "@stablelib/utf8";
110
111
// Basic decoding
112
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
113
const text = decode(bytes);
114
// Result: "Hello"
115
116
// Unicode decoding
117
const unicodeBytes = new Uint8Array([227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]);
118
const unicodeText = decode(unicodeBytes);
119
// Result: "γγγ«γ‘γ―"
120
121
// Error handling
122
try {
123
decode(new Uint8Array([0xFF])); // Invalid UTF-8 byte
124
} catch (error) {
125
console.error(error.message); // "utf8: invalid source encoding"
126
}
127
```
128
129
### Length Calculation
130
131
Calculates the number of bytes required to encode a string without performing the actual encoding.
132
133
```typescript { .api }
134
/**
135
* Returns the number of bytes required to encode the given string into UTF-8.
136
* Throws if the source string has invalid UTF-16 encoding.
137
* @param s - The string to measure
138
* @returns Number of bytes needed for UTF-8 encoding
139
* @throws Error with message "utf8: invalid string" for invalid UTF-16
140
*/
141
function encodedLength(s: string): number;
142
```
143
144
**Usage Examples:**
145
146
```typescript
147
import { encodedLength, encode } from "@stablelib/utf8";
148
149
// Calculate length for memory allocation
150
const text = "Hello, δΈη!";
151
const length = encodedLength(text);
152
console.log(length); // 13 bytes
153
154
// Verify length matches actual encoding
155
const encoded = encode(text);
156
console.log(length === encoded.length); // true
157
158
// Performance optimization - check size before encoding
159
if (encodedLength(largeText) > MAX_BUFFER_SIZE) {
160
throw new Error("Text too large to encode");
161
}
162
```
163
164
## Error Handling
165
166
The library provides comprehensive validation with descriptive error messages:
167
168
### UTF-16 Validation Errors
169
170
Thrown by `encode()` and `encodedLength()` for invalid UTF-16 input:
171
172
- **Error Message**: `"utf8: invalid string"`
173
- **Common Causes**:
174
- Lone high surrogate (0xD800-0xDBFF) without matching low surrogate
175
- Lone low surrogate (0xDC00-0xDFFF) without preceding high surrogate
176
- Invalid surrogate pair sequences
177
178
### UTF-8 Validation Errors
179
180
Thrown by `decode()` for invalid UTF-8 byte sequences:
181
182
- **Error Message**: `"utf8: invalid source encoding"`
183
- **Common Causes**:
184
- Invalid start bytes (0xFE, 0xFF)
185
- Incomplete multi-byte sequences
186
- Invalid continuation bytes
187
- Overlong encodings
188
- Invalid code points (surrogate range, above U+10FFFF)
189
190
## Performance Characteristics
191
192
- **Zero Dependencies**: No runtime dependencies for maximum compatibility
193
- **Efficient Encoding**: Single-pass algorithm with pre-calculated buffer allocation
194
- **Validation**: Comprehensive validation without performance degradation
195
- **Memory Safe**: Proper bounds checking for all array access
196
- **TypeScript**: Full type safety with accurate type definitions