Tiny JavaScript tokenizer that never fails and is almost spec-compliant
npx @tessl/cli install tessl/npm-js-tokens@9.0.00
# js-tokens
1
2
js-tokens is a tiny, regex-powered, lenient JavaScript tokenizer that never fails and is almost spec-compliant. It provides a generator function that turns JavaScript code strings into token objects, making it perfect for syntax highlighting, code formatting, linters, and any application requiring reliable JavaScript tokenization.
3
4
## Package Information
5
6
- **Package Name**: js-tokens
7
- **Package Type**: npm
8
- **Language**: JavaScript (TypeScript definitions included)
9
- **Installation**: `npm install js-tokens`
10
11
## Core Imports
12
13
```javascript
14
const jsTokens = require("js-tokens");
15
```
16
17
For ES modules:
18
19
```javascript
20
import jsTokens from "js-tokens";
21
```
22
23
## Basic Usage
24
25
```javascript
26
const jsTokens = require("js-tokens");
27
28
// Basic tokenization
29
const code = 'JSON.stringify({k:3.14**2}, null /*replacer*/, "\\t")';
30
const tokens = Array.from(jsTokens(code));
31
32
// Extract token values
33
const tokenValues = tokens.map(token => token.value);
34
console.log(tokenValues.join("|"));
35
// Output: JSON|.|stringify|(|{|k|:|3.14|**|2|}|,| |null| |/*replacer*/|,| |"\t"|)
36
37
// Loop over tokens
38
for (const token of jsTokens("hello, !world")) {
39
console.log(`${token.type}: ${token.value}`);
40
}
41
42
// JSX tokenization
43
const jsxCode = '<div>Hello {"world"}!</div>';
44
const jsxTokens = Array.from(jsTokens(jsxCode, { jsx: true }));
45
```
46
47
## Architecture
48
49
js-tokens is built around a single core function with the following key characteristics:
50
51
- **Never fails**: Always returns tokens even for invalid JavaScript, never throws errors
52
- **Lenient parsing**: Handles incomplete/malformed code gracefully
53
- **Context-aware**: Differentiates between regex and division operators based on preceding tokens
54
- **Regex-powered**: Uses optimized regular expressions for fast tokenization
55
- **Position-preserving**: Token values can be concatenated to reconstruct the original input
56
- **ECMAScript compliant**: Nearly fully ECMAScript 2024 compliant with minimal shortcuts
57
58
## Capabilities
59
60
### JavaScript Tokenization
61
62
Core tokenization function that converts JavaScript code strings into detailed token objects with comprehensive type information.
63
64
```javascript { .api }
65
/**
66
* Tokenizes JavaScript code into an iterable of token objects
67
* @param input - JavaScript code string to tokenize
68
* @param options - Optional configuration object
69
* @returns Iterable of Token objects for regular JavaScript
70
*/
71
function jsTokens(input: string, options?: { jsx?: boolean }): Iterable<Token>;
72
73
/**
74
* Tokenizes JavaScript code with JSX support
75
* @param input - JavaScript/JSX code string to tokenize
76
* @param options - Configuration object with jsx: true
77
* @returns Iterable of Token and JSXToken objects
78
*/
79
function jsTokens(
80
input: string,
81
options: { jsx: true }
82
): Iterable<Token | JSXToken>;
83
```
84
85
### Standard JavaScript Tokens
86
87
js-tokens recognizes 17 different token types for standard JavaScript code:
88
89
```typescript { .api }
90
type Token =
91
| { type: "StringLiteral"; value: string; closed: boolean }
92
| { type: "NoSubstitutionTemplate"; value: string; closed: boolean }
93
| { type: "TemplateHead"; value: string }
94
| { type: "TemplateMiddle"; value: string }
95
| { type: "TemplateTail"; value: string; closed: boolean }
96
| { type: "RegularExpressionLiteral"; value: string; closed: boolean }
97
| { type: "MultiLineComment"; value: string; closed: boolean }
98
| { type: "SingleLineComment"; value: string }
99
| { type: "HashbangComment"; value: string }
100
| { type: "IdentifierName"; value: string }
101
| { type: "PrivateIdentifier"; value: string }
102
| { type: "NumericLiteral"; value: string }
103
| { type: "Punctuator"; value: string }
104
| { type: "WhiteSpace"; value: string }
105
| { type: "LineTerminatorSequence"; value: string }
106
| { type: "Invalid"; value: string };
107
```
108
109
**Key Token Properties:**
110
111
- `type`: Token classification (one of the 17 standard types)
112
- `value`: The actual text content of the token
113
- `closed`: Boolean property on certain tokens (StringLiteral, NoSubstitutionTemplate, TemplateTail, RegularExpressionLiteral, MultiLineComment, JSXString) indicating if they are properly terminated
114
115
### JSX Tokens
116
117
When JSX mode is enabled (`{ jsx: true }`), js-tokens additionally recognizes 5 JSX-specific token types:
118
119
```typescript { .api }
120
type JSXToken =
121
| { type: "JSXString"; value: string; closed: boolean }
122
| { type: "JSXText"; value: string }
123
| { type: "JSXIdentifier"; value: string }
124
| { type: "JSXPunctuator"; value: string }
125
| { type: "JSXInvalid"; value: string };
126
```
127
128
**JSX Mode Behavior:**
129
- Returns mixed Token and JSXToken objects as appropriate
130
- JSX runs can also contain WhiteSpace, LineTerminatorSequence, MultiLineComment, and SingleLineComment tokens
131
- Switches between outputting runs of Token and runs of JSXToken based on context
132
133
### Error Handling
134
135
js-tokens never throws errors and always produces meaningful output:
136
137
- **Invalid JavaScript**: Produces "Invalid" tokens for unrecognized characters
138
- **Incomplete tokens**: Uses `closed: false` property to indicate incomplete strings, templates, regex, etc.
139
- **JSX errors**: Produces "JSXInvalid" tokens when JSX mode encounters invalid characters
140
- **Extreme inputs**: May fail with regex engine limits, but handles normal code gracefully
141
142
**Example with incomplete tokens:**
143
144
```javascript
145
const tokens = Array.from(jsTokens('"unclosed string\n'));
146
// Produces: { type: "StringLiteral", value: '"unclosed string', closed: false }
147
148
const regexTokens = Array.from(jsTokens('/unclosed regex\n'));
149
// Produces: { type: "RegularExpressionLiteral", value: '/unclosed regex', closed: false }
150
```
151
152
## Types
153
154
### Options Configuration
155
156
```typescript { .api }
157
interface TokenizeOptions {
158
/** Enable JSX support (default: false) */
159
jsx?: boolean;
160
}
161
```
162
163
### Token Base Properties
164
165
All tokens include these base properties:
166
167
```typescript { .api }
168
interface BaseToken {
169
/** Token type classification */
170
type: string;
171
/** Original text content of the token */
172
value: string;
173
}
174
```
175
176
### Closed Property Tokens
177
178
Tokens that can be incomplete include a `closed` property:
179
180
```typescript { .api }
181
interface ClosedToken extends BaseToken {
182
/** Whether the token is properly closed/terminated */
183
closed: boolean;
184
}
185
```
186
187
**Tokens with `closed` property:**
188
- StringLiteral
189
- NoSubstitutionTemplate
190
- TemplateTail
191
- RegularExpressionLiteral
192
- MultiLineComment
193
- JSXString
194
195
**Token Examples:**
196
197
```javascript
198
// Closed string: { type: "StringLiteral", value: '"hello"', closed: true }
199
// Unclosed string: { type: "StringLiteral", value: '"hello', closed: false }
200
// Closed regex: { type: "RegularExpressionLiteral", value: '/abc/g', closed: true }
201
// Unclosed regex: { type: "RegularExpressionLiteral", value: '/abc', closed: false }
202
```