or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/npm-utf8

A well-tested UTF-8 encoder/decoder written in JavaScript

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/utf8@3.0.x

To install, run

npx @tessl/cli install tessl/npm-utf8@3.0.0

0

# UTF-8 JavaScript Library

1

2

UTF-8.js is a well-tested UTF-8 encoder/decoder written in JavaScript that provides proper UTF-8 encoding and decoding according to the WHATWG Encoding Standard. It handles Unicode scalar values correctly and provides comprehensive error handling for malformed input.

3

4

## Package Information

5

6

- **Package Name**: utf8

7

- **Package Type**: npm

8

- **Language**: JavaScript

9

- **Installation**: `npm install utf8`

10

11

## Core Imports

12

13

```javascript

14

const utf8 = require('utf8');

15

```

16

17

For ES modules:

18

19

```javascript

20

import * as utf8 from 'utf8';

21

```

22

23

Browser usage:

24

25

```html

26

<script src="utf8.js"></script>

27

<!-- Creates global utf8 object -->

28

```

29

30

## Basic Usage

31

32

```javascript

33

const utf8 = require('utf8');

34

35

// Encode JavaScript string to UTF-8 byte string

36

const encoded = utf8.encode('Hello, 世界!');

37

console.log(encoded); // Output: UTF-8 encoded byte string

38

39

// Decode UTF-8 byte string back to JavaScript string

40

const decoded = utf8.decode(encoded);

41

console.log(decoded); // Output: 'Hello, 世界!'

42

43

// Check library version

44

console.log(utf8.version); // Output: '3.0.0'

45

```

46

47

## Capabilities

48

49

### UTF-8 Encoding

50

51

Encodes JavaScript strings as UTF-8 byte strings with proper Unicode scalar value handling.

52

53

```javascript { .api }

54

/**

55

* Encodes any given JavaScript string as UTF-8

56

* @param {string} string - JavaScript string to encode as UTF-8

57

* @returns {string} UTF-8-encoded byte string

58

* @throws {Error} When input contains non-scalar values (lone surrogates)

59

*/

60

utf8.encode(string);

61

```

62

63

**Usage Examples:**

64

65

```javascript

66

// Basic ASCII encoding

67

utf8.encode('Hello');

68

// → 'Hello'

69

70

// Unicode characters

71

utf8.encode('\xA9'); // U+00A9 COPYRIGHT SIGN

72

// → '\xC2\xA9'

73

74

// Supplementary characters (surrogate pairs)

75

utf8.encode('\uD800\uDC01'); // U+10001 LINEAR B SYLLABLE B038 E

76

// → '\xF0\x90\x80\x81'

77

78

// Multi-byte Unicode

79

utf8.encode('世界'); // Chinese characters

80

// → '\xE4\xB8\x96\xE7\x95\x8C'

81

```

82

83

**Error Handling:**

84

85

```javascript

86

try {

87

// This will throw an error due to lone surrogate

88

utf8.encode('\uD800'); // High surrogate without matching low surrogate

89

} catch (error) {

90

console.error(error.message); // "Lone surrogate U+D800 is not a scalar value"

91

}

92

```

93

94

### UTF-8 Decoding

95

96

Decodes UTF-8 byte strings back to JavaScript strings with malformed input detection.

97

98

```javascript { .api }

99

/**

100

* Decodes any given UTF-8-encoded string as UTF-8

101

* @param {string} byteString - UTF-8 encoded byte string to decode

102

* @returns {string} JavaScript string (UTF-8 decoded)

103

* @throws {Error} When malformed UTF-8 is detected

104

*/

105

utf8.decode(byteString);

106

```

107

108

**Usage Examples:**

109

110

```javascript

111

// Basic decoding

112

utf8.decode('\xC2\xA9');

113

// → '\xA9' (U+00A9 COPYRIGHT SIGN)

114

115

// Supplementary characters

116

utf8.decode('\xF0\x90\x80\x81');

117

// → '\uD800\uDC01' (U+10001 LINEAR B SYLLABLE B038 E)

118

119

// Multi-byte sequences

120

utf8.decode('\xE4\xB8\x96\xE7\x95\x8C');

121

// → '世界'

122

```

123

124

**Error Handling:**

125

126

```javascript

127

try {

128

// This will throw an error due to malformed UTF-8

129

utf8.decode('\xFF\xFE'); // Invalid UTF-8 sequence

130

} catch (error) {

131

console.error(error.message); // "Invalid UTF-8 detected"

132

}

133

134

try {

135

// This will throw an error due to incomplete sequence

136

utf8.decode('\xC2'); // Incomplete 2-byte sequence

137

} catch (error) {

138

console.error(error.message); // "Invalid byte index"

139

}

140

```

141

142

### Version Information

143

144

Provides the semantic version number of the library.

145

146

```javascript { .api }

147

/**

148

* Semantic version number of the utf8 library

149

* @type {string}

150

*/

151

utf8.version;

152

```

153

154

**Usage Example:**

155

156

```javascript

157

console.log(`Using utf8.js version ${utf8.version}`);

158

// Output: "Using utf8.js version 3.0.0"

159

```

160

161

## Error Types

162

163

The library throws standard JavaScript `Error` objects with descriptive messages:

164

165

- **Lone Surrogate Error**: Thrown when `encode()` encounters unpaired surrogate characters

166

- **Invalid UTF-8 Error**: Thrown when `decode()` encounters malformed UTF-8 sequences

167

- **Invalid Byte Index Error**: Thrown when decoding exceeds byte boundaries

168

- **Invalid Continuation Byte Error**: Thrown when UTF-8 continuation bytes are malformed

169

170

## Unicode Support

171

172

- **Full Unicode Range**: Supports U+0000 to U+10FFFF (complete Unicode range)

173

- **Surrogate Pair Handling**: Properly handles JavaScript surrogate pairs for supplementary characters

174

- **Scalar Value Validation**: Enforces Unicode scalar value requirements (no lone surrogates)

175

- **Standards Compliance**: Implements UTF-8 encoding per WHATWG Encoding Standard

176

- **Multi-byte Sequences**: Correctly handles 1-4 byte UTF-8 sequences

177

178

## Browser Compatibility

179

180

The library has been tested and works in:

181

- Chrome 27+

182

- Firefox 3+

183

- Safari 4+

184

- Opera 10+

185

- Internet Explorer 6+

186

- Node.js v0.10.0+

187

- Various JavaScript engines (Rhino, Narwhal, RingoJS, PhantomJS)