Tessl Tile for npm/unorm@1.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# Unorm
1

2
Unorm is a JavaScript Unicode normalization library that provides all four Unicode normalization forms (NFC, NFD, NFKC, NFKD) according to Unicode 8.0 standard. It serves as both a standalone library and a polyfill for `String.prototype.normalize()` in environments that don't natively support it.
3

4
## Package Information
5

6
- **Package Name**: unorm
7
- **Package Type**: npm
8
- **Language**: JavaScript
9
- **Installation**: `npm install unorm`
10

11
## Core Imports
12

13
```javascript
14
const unorm = require('unorm');
15
```
16

17
For AMD (RequireJS):
18

19
```javascript
20
define(['unorm'], function(unorm) {
21
    // Use unorm functions
22
});
23
```
24

25
In browser (global):
26

27
```javascript
28
// Available as global unorm object
29
unorm.nfc(string);
30
```
31

32
## Basic Usage
33

34
```javascript
35
const unorm = require('unorm');
36

37
// Example text with mixed Unicode forms
38
const text = 'The \u212B symbol invented by A. J. \u00C5ngstr\u00F6m';
39

40
// Apply different normalization forms
41
const nfcText = unorm.nfc(text);    // Canonical composition
42
const nfdText = unorm.nfd(text);    // Canonical decomposition  
43
const nfkcText = unorm.nfkc(text);  // Compatibility composition
44
const nfkdText = unorm.nfkd(text);  // Compatibility decomposition
45

46
console.log('Original:', text);
47
console.log('NFC:', nfcText);
48
console.log('NFD:', nfdText);
49
console.log('NFKC:', nfkcText);
50
console.log('NFKD:', nfkdText);
51

52
// Using as String.prototype.normalize polyfill
53
console.log('Polyfill:', text.normalize('NFC'));
54
```
55

56
## Architecture
57

58
Unorm implements Unicode normalization according to Unicode Standard Annex #15, providing a comprehensive solution for text normalization in JavaScript environments.
59

60
### Core Components
61

62
- **Normalization Engine**: Unicode character decomposition and composition engine with built-in Unicode data tables
63
- **Polyfill System**: Automatic detection and implementation of `String.prototype.normalize()` when native support is unavailable  
64
- **Multi-Environment Support**: Works consistently across CommonJS (Node.js), AMD (RequireJS), and browser global contexts
65

66
### Unicode Normalization Forms
67

68
Unicode normalization addresses the fact that the same text can be represented in multiple ways using different combinations of base characters and combining marks.
69

70
**Canonical vs Compatibility:**
71
- **Canonical**: Deals with different representations of the same abstract character (e.g., é as single codepoint vs. e + combining accent)
72
- **Compatibility**: Also handles formatting differences and alternative representations (e.g., superscript/subscript digits)
73

74
**Decomposition vs Composition:**
75
- **Decomposition**: Breaks composite characters into base characters plus combining marks
76
- **Composition**: Combines base characters and marks into single composite characters where possible
77

78
**The Four Forms:**
79
- **NFC** (Canonical Composition): Most common form, produces composed characters when possible
80
- **NFD** (Canonical Decomposition): Breaks down composed characters, useful for mark removal and analysis
81
- **NFKC** (Compatibility Composition): Like NFC but also normalizes compatibility characters (superscripts, etc.)
82
- **NFKD** (Compatibility Decomposition): Most decomposed form, ideal for search and indexing operations
83

84
### Polyfill Mechanism
85

86
The library automatically detects if `String.prototype.normalize()` is available in the current environment. If not present, it adds the method using `Object.defineProperty()` with proper error handling that matches the ECMAScript specification. The `shimApplied` property indicates whether the polyfill was activated.
87

88
## Capabilities
89

90
### Canonical Composition (NFC)
91

92
Applies canonical decomposition followed by canonical composition to produce a composed form.
93

94
```javascript { .api }
95
/**
96
 * Normalize string using Canonical Decomposition followed by Canonical Composition
97
 * @param {string} str - String to normalize
98
 * @returns {string} NFC normalized string
99
 */
100
function nfc(str);
101
```
102

103
**Usage Example:**
104

105
```javascript
106
const unorm = require('unorm');
107

108
// Combining characters are composed into single codepoints when possible
109
const result = unorm.nfc('a\u0308'); // ä (combining diaeresis) -> ä (single codepoint)
110
console.log(result); // "\u00e4"
111
```
112

113
### Canonical Decomposition (NFD)
114

115
Applies canonical decomposition to produce a decomposed form where composite characters are broken down into base characters plus combining marks.
116

117
```javascript { .api }
118
/**
119
 * Normalize string using Canonical Decomposition
120
 * @param {string} str - String to normalize  
121
 * @returns {string} NFD normalized string
122
 */
123
function nfd(str);
124
```
125

126
**Usage Example:**
127

128
```javascript
129
const unorm = require('unorm');
130

131
// Composite characters are decomposed into base + combining marks
132
const result = unorm.nfd('ä'); // ä (single codepoint) -> a + combining diaeresis
133
console.log(result); // "a\u0308"
134
```
135

136
### Compatibility Composition (NFKC)
137

138
Applies compatibility decomposition followed by canonical composition, replacing compatibility characters with their canonical equivalents.
139

140
```javascript { .api }
141
/**
142
 * Normalize string using Compatibility Decomposition followed by Canonical Composition
143
 * @param {string} str - String to normalize
144
 * @returns {string} NFKC normalized string
145
 */
146
function nfkc(str);
147
```
148

149
**Usage Example:**
150

151
```javascript
152
const unorm = require('unorm');
153

154
// Compatibility characters like subscripts are replaced with normal equivalents
155
const result = unorm.nfkc('CO₂'); // Subscript 2 becomes normal 2
156
console.log(result); // "CO2"
157
```
158

159
### Compatibility Decomposition (NFKD)
160

161
Applies compatibility decomposition to replace compatibility characters with their canonical forms and decompose composite characters.
162

163
```javascript { .api }
164
/**
165
 * Normalize string using Compatibility Decomposition  
166
 * @param {string} str - String to normalize
167
 * @returns {string} NFKD normalized string
168
 */
169
function nfkd(str);
170
```
171

172
**Usage Example:**
173

174
```javascript
175
const unorm = require('unorm');
176

177
// Useful for search/indexing by removing combining marks
178
const text = 'Ångström';
179
const normalized = unorm.nfkd(text);
180
const withoutMarks = normalized.replace(/[\u0300-\u036F]/g, ''); // Remove combining marks
181
console.log(withoutMarks); // "Angstrom"
182
```
183

184
### String.prototype.normalize Polyfill
185

186
Automatically provides `String.prototype.normalize()` method when not natively available in the JavaScript environment.
187

188
```javascript { .api }
189
/**
190
 * Polyfill for String.prototype.normalize method
191
 * @param {string} [form="NFC"] - Normalization form: "NFC", "NFD", "NFKC", or "NFKD"
192
 * @returns {string} Normalized string according to specified form
193
 * @throws {TypeError} When called on null or undefined
194
 * @throws {RangeError} When invalid normalization form provided
195
 */
196
String.prototype.normalize(form);
197
```
198

199
**Usage Examples:**
200

201
```javascript
202
// When native normalize() isn't available, unorm provides it
203
require('unorm'); // Automatically adds polyfill if needed
204

205
const text = 'café';
206
console.log(text.normalize('NFC'));  // Uses unorm's implementation
207
console.log(text.normalize('NFD'));  // Decomposes é into e + combining accent
208
console.log(text.normalize('NFKC')); // Same as NFC for this example
209
console.log(text.normalize('NFKD')); // Same as NFD for this example
210

211
// Error handling
212
try {
213
    text.normalize('INVALID'); // Throws RangeError
214
} catch (error) {
215
    console.error(error.message); // "Invalid normalization form: INVALID"
216
}
217
```
218

219
### Polyfill Status Detection
220

221
Property to check whether the String.prototype.normalize polyfill was applied.
222

223
```javascript { .api }
224
/**
225
 * Boolean indicating whether String.prototype.normalize polyfill was applied
226
 * @type {boolean}
227
 */
228
unorm.shimApplied;
229
```
230

231
**Usage Example:**
232

233
```javascript
234
const unorm = require('unorm');
235

236
if (unorm.shimApplied) {
237
    console.log('String.prototype.normalize polyfill was applied');
238
} else {
239
    console.log('Native String.prototype.normalize is available');
240
}
241
```
242

243
## Types
244

245
```javascript { .api }
246
/**
247
 * Main unorm module interface
248
 */
249
interface UnormModule {
250
    /** Canonical Decomposition followed by Canonical Composition */
251
    nfc: (str: string) => string;
252
    /** Canonical Decomposition */
253
    nfd: (str: string) => string;
254
    /** Compatibility Decomposition followed by Canonical Composition */
255
    nfkc: (str: string) => string;
256
    /** Compatibility Decomposition */
257
    nfkd: (str: string) => string;
258
    /** Whether String.prototype.normalize polyfill was applied */
259
    shimApplied: boolean;
260
}
261

262
/**
263
 * Valid normalization forms for String.prototype.normalize
264
 */
265
type NormalizationForm = "NFC" | "NFD" | "NFKC" | "NFKD";
266
```
267

268
## Common Use Cases
269

270
### Text Search and Indexing
271

272
```javascript
273
const unorm = require('unorm');
274

275
function normalizeForSearch(text) {
276
    // Use NFKD to decompose, then remove combining marks for search
277
    const decomposed = unorm.nfkd(text);
278
    return decomposed.replace(/[\u0300-\u036F]/g, ''); // Remove combining marks
279
}
280

281
const searchTerm = normalizeForSearch('café');
282
const document = normalizeForSearch('I love café au lait');
283
console.log(document.includes(searchTerm)); // true
284
```
285

286
### String Comparison
287

288
```javascript
289
const unorm = require('unorm');
290

291
function compareStrings(str1, str2) {
292
    // Normalize both strings to same form for accurate comparison
293
    return unorm.nfc(str1) === unorm.nfc(str2);
294
}
295

296
const text1 = 'é'; // Single codepoint
297
const text2 = 'e\u0301'; // e + combining acute accent
298
console.log(compareStrings(text1, text2)); // true
299
```
300

301
### Data Cleaning
302

303
```javascript
304
const unorm = require('unorm');
305

306
function cleanUserInput(input) {
307
    // Normalize to consistent form and trim
308
    return unorm.nfc(input.trim());
309
}
310

311
const userInput = '  café  '; // With inconsistent Unicode
312
const cleaned = cleanUserInput(userInput);
313
console.log(cleaned); // Normalized "café"
314
```
315

316
## Browser Compatibility
317

318
- **Modern Browsers**: Works in all modern browsers
319
- **Legacy Support**: Requires ES5 features (Object.defineProperty)
320
- **Recommended**: Use [es5-shim](https://github.com/kriskowal/es5-shim) for older browsers
321
- **Node.js**: Supports Node.js >= 0.4.0

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/