or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/npm-stablelib--utf8

UTF-8 encoder and decoder for robust text processing with validation

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/@stablelib/utf8@2.0.x

To install, run

npx @tessl/cli install tessl/npm-stablelib--utf8@2.0.0

0

# @stablelib/utf8

1

2

@stablelib/utf8 provides robust UTF-8 encoding and decoding functionality implemented in TypeScript. It handles conversion between JavaScript strings and UTF-8 byte arrays with comprehensive validation of both UTF-16 surrogate pairs and UTF-8 byte sequences.

3

4

## Package Information

5

6

- **Package Name**: @stablelib/utf8

7

- **Package Type**: npm

8

- **Language**: TypeScript

9

- **Installation**: `npm install @stablelib/utf8`

10

11

## Core Imports

12

13

```typescript

14

import { encode, decode, encodedLength } from "@stablelib/utf8";

15

```

16

17

For CommonJS:

18

19

```javascript

20

const { encode, decode, encodedLength } = require("@stablelib/utf8");

21

```

22

23

## Basic Usage

24

25

```typescript

26

import { encode, decode, encodedLength } from "@stablelib/utf8";

27

28

// Encode a string to UTF-8 bytes

29

const text = "Hello, δΈ–η•Œ! 🌍";

30

const bytes = encode(text);

31

32

// Calculate encoded length without encoding

33

const length = encodedLength(text);

34

console.log(length === bytes.length); // true

35

36

// Decode UTF-8 bytes back to string

37

const decoded = decode(bytes);

38

console.log(decoded === text); // true

39

40

// Handle validation errors

41

try {

42

// This will throw for invalid UTF-16 input

43

encode("Invalid surrogate pair: \uD800");

44

} catch (error) {

45

console.error(error.message); // "utf8: invalid string"

46

}

47

```

48

49

## Capabilities

50

51

### String Encoding

52

53

Converts JavaScript strings to UTF-8 byte arrays with validation.

54

55

```typescript { .api }

56

/**

57

* Encodes the given string into UTF-8 byte array.

58

* Throws if the source string has invalid UTF-16 encoding.

59

* @param s - The string to encode

60

* @returns UTF-8 encoded byte array

61

* @throws Error with message "utf8: invalid string" for invalid UTF-16

62

*/

63

function encode(s: string): Uint8Array;

64

```

65

66

**Usage Examples:**

67

68

```typescript

69

import { encode } from "@stablelib/utf8";

70

71

// Basic ASCII

72

const ascii = encode("Hello");

73

// Result: Uint8Array([72, 101, 108, 108, 111])

74

75

// Unicode characters

76

const unicode = encode("こんにけは");

77

// Result: UTF-8 encoded bytes for Japanese text

78

79

// Emoji with surrogate pairs

80

const emoji = encode("🌍");

81

// Result: UTF-8 encoded bytes for Earth emoji

82

83

// Error handling

84

try {

85

encode("Invalid: \uD800"); // Lone high surrogate

86

} catch (error) {

87

console.error(error.message); // "utf8: invalid string"

88

}

89

```

90

91

### Byte Decoding

92

93

Converts UTF-8 byte arrays back to JavaScript strings with validation.

94

95

```typescript { .api }

96

/**

97

* Decodes the given byte array from UTF-8 into a string.

98

* Throws if encoding is invalid.

99

* @param arr - The UTF-8 byte array to decode

100

* @returns Decoded string

101

* @throws Error with message "utf8: invalid source encoding" for invalid UTF-8

102

*/

103

function decode(arr: Uint8Array): string;

104

```

105

106

**Usage Examples:**

107

108

```typescript

109

import { decode } from "@stablelib/utf8";

110

111

// Basic decoding

112

const bytes = new Uint8Array([72, 101, 108, 108, 111]);

113

const text = decode(bytes);

114

// Result: "Hello"

115

116

// Unicode decoding

117

const unicodeBytes = new Uint8Array([227, 129, 147, 227, 130, 147, 227, 129, 171, 227, 129, 161, 227, 129, 175]);

118

const unicodeText = decode(unicodeBytes);

119

// Result: "こんにけは"

120

121

// Error handling

122

try {

123

decode(new Uint8Array([0xFF])); // Invalid UTF-8 byte

124

} catch (error) {

125

console.error(error.message); // "utf8: invalid source encoding"

126

}

127

```

128

129

### Length Calculation

130

131

Calculates the number of bytes required to encode a string without performing the actual encoding.

132

133

```typescript { .api }

134

/**

135

* Returns the number of bytes required to encode the given string into UTF-8.

136

* Throws if the source string has invalid UTF-16 encoding.

137

* @param s - The string to measure

138

* @returns Number of bytes needed for UTF-8 encoding

139

* @throws Error with message "utf8: invalid string" for invalid UTF-16

140

*/

141

function encodedLength(s: string): number;

142

```

143

144

**Usage Examples:**

145

146

```typescript

147

import { encodedLength, encode } from "@stablelib/utf8";

148

149

// Calculate length for memory allocation

150

const text = "Hello, δΈ–η•Œ!";

151

const length = encodedLength(text);

152

console.log(length); // 13 bytes

153

154

// Verify length matches actual encoding

155

const encoded = encode(text);

156

console.log(length === encoded.length); // true

157

158

// Performance optimization - check size before encoding

159

if (encodedLength(largeText) > MAX_BUFFER_SIZE) {

160

throw new Error("Text too large to encode");

161

}

162

```

163

164

## Error Handling

165

166

The library provides comprehensive validation with descriptive error messages:

167

168

### UTF-16 Validation Errors

169

170

Thrown by `encode()` and `encodedLength()` for invalid UTF-16 input:

171

172

- **Error Message**: `"utf8: invalid string"`

173

- **Common Causes**:

174

- Lone high surrogate (0xD800-0xDBFF) without matching low surrogate

175

- Lone low surrogate (0xDC00-0xDFFF) without preceding high surrogate

176

- Invalid surrogate pair sequences

177

178

### UTF-8 Validation Errors

179

180

Thrown by `decode()` for invalid UTF-8 byte sequences:

181

182

- **Error Message**: `"utf8: invalid source encoding"`

183

- **Common Causes**:

184

- Invalid start bytes (0xFE, 0xFF)

185

- Incomplete multi-byte sequences

186

- Invalid continuation bytes

187

- Overlong encodings

188

- Invalid code points (surrogate range, above U+10FFFF)

189

190

## Performance Characteristics

191

192

- **Zero Dependencies**: No runtime dependencies for maximum compatibility

193

- **Efficient Encoding**: Single-pass algorithm with pre-calculated buffer allocation

194

- **Validation**: Comprehensive validation without performance degradation

195

- **Memory Safe**: Proper bounds checking for all array access

196

- **TypeScript**: Full type safety with accurate type definitions