or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/npm-js-tokens

Tiny JavaScript tokenizer that never fails and is almost spec-compliant

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/js-tokens@9.0.x

To install, run

npx @tessl/cli install tessl/npm-js-tokens@9.0.0

0

# js-tokens

1

2

js-tokens is a tiny, regex-powered, lenient JavaScript tokenizer that never fails and is almost spec-compliant. It provides a generator function that turns JavaScript code strings into token objects, making it perfect for syntax highlighting, code formatting, linters, and any application requiring reliable JavaScript tokenization.

3

4

## Package Information

5

6

- **Package Name**: js-tokens

7

- **Package Type**: npm

8

- **Language**: JavaScript (TypeScript definitions included)

9

- **Installation**: `npm install js-tokens`

10

11

## Core Imports

12

13

```javascript

14

const jsTokens = require("js-tokens");

15

```

16

17

For ES modules:

18

19

```javascript

20

import jsTokens from "js-tokens";

21

```

22

23

## Basic Usage

24

25

```javascript

26

const jsTokens = require("js-tokens");

27

28

// Basic tokenization

29

const code = 'JSON.stringify({k:3.14**2}, null /*replacer*/, "\\t")';

30

const tokens = Array.from(jsTokens(code));

31

32

// Extract token values

33

const tokenValues = tokens.map(token => token.value);

34

console.log(tokenValues.join("|"));

35

// Output: JSON|.|stringify|(|{|k|:|3.14|**|2|}|,| |null| |/*replacer*/|,| |"\t"|)

36

37

// Loop over tokens

38

for (const token of jsTokens("hello, !world")) {

39

console.log(`${token.type}: ${token.value}`);

40

}

41

42

// JSX tokenization

43

const jsxCode = '<div>Hello {"world"}!</div>';

44

const jsxTokens = Array.from(jsTokens(jsxCode, { jsx: true }));

45

```

46

47

## Architecture

48

49

js-tokens is built around a single core function with the following key characteristics:

50

51

- **Never fails**: Always returns tokens even for invalid JavaScript, never throws errors

52

- **Lenient parsing**: Handles incomplete/malformed code gracefully

53

- **Context-aware**: Differentiates between regex and division operators based on preceding tokens

54

- **Regex-powered**: Uses optimized regular expressions for fast tokenization

55

- **Position-preserving**: Token values can be concatenated to reconstruct the original input

56

- **ECMAScript compliant**: Nearly fully ECMAScript 2024 compliant with minimal shortcuts

57

58

## Capabilities

59

60

### JavaScript Tokenization

61

62

Core tokenization function that converts JavaScript code strings into detailed token objects with comprehensive type information.

63

64

```javascript { .api }

65

/**

66

* Tokenizes JavaScript code into an iterable of token objects

67

* @param input - JavaScript code string to tokenize

68

* @param options - Optional configuration object

69

* @returns Iterable of Token objects for regular JavaScript

70

*/

71

function jsTokens(input: string, options?: { jsx?: boolean }): Iterable<Token>;

72

73

/**

74

* Tokenizes JavaScript code with JSX support

75

* @param input - JavaScript/JSX code string to tokenize

76

* @param options - Configuration object with jsx: true

77

* @returns Iterable of Token and JSXToken objects

78

*/

79

function jsTokens(

80

input: string,

81

options: { jsx: true }

82

): Iterable<Token | JSXToken>;

83

```

84

85

### Standard JavaScript Tokens

86

87

js-tokens recognizes 17 different token types for standard JavaScript code:

88

89

```typescript { .api }

90

type Token =

91

| { type: "StringLiteral"; value: string; closed: boolean }

92

| { type: "NoSubstitutionTemplate"; value: string; closed: boolean }

93

| { type: "TemplateHead"; value: string }

94

| { type: "TemplateMiddle"; value: string }

95

| { type: "TemplateTail"; value: string; closed: boolean }

96

| { type: "RegularExpressionLiteral"; value: string; closed: boolean }

97

| { type: "MultiLineComment"; value: string; closed: boolean }

98

| { type: "SingleLineComment"; value: string }

99

| { type: "HashbangComment"; value: string }

100

| { type: "IdentifierName"; value: string }

101

| { type: "PrivateIdentifier"; value: string }

102

| { type: "NumericLiteral"; value: string }

103

| { type: "Punctuator"; value: string }

104

| { type: "WhiteSpace"; value: string }

105

| { type: "LineTerminatorSequence"; value: string }

106

| { type: "Invalid"; value: string };

107

```

108

109

**Key Token Properties:**

110

111

- `type`: Token classification (one of the 17 standard types)

112

- `value`: The actual text content of the token

113

- `closed`: Boolean property on certain tokens (StringLiteral, NoSubstitutionTemplate, TemplateTail, RegularExpressionLiteral, MultiLineComment, JSXString) indicating if they are properly terminated

114

115

### JSX Tokens

116

117

When JSX mode is enabled (`{ jsx: true }`), js-tokens additionally recognizes 5 JSX-specific token types:

118

119

```typescript { .api }

120

type JSXToken =

121

| { type: "JSXString"; value: string; closed: boolean }

122

| { type: "JSXText"; value: string }

123

| { type: "JSXIdentifier"; value: string }

124

| { type: "JSXPunctuator"; value: string }

125

| { type: "JSXInvalid"; value: string };

126

```

127

128

**JSX Mode Behavior:**

129

- Returns mixed Token and JSXToken objects as appropriate

130

- JSX runs can also contain WhiteSpace, LineTerminatorSequence, MultiLineComment, and SingleLineComment tokens

131

- Switches between outputting runs of Token and runs of JSXToken based on context

132

133

### Error Handling

134

135

js-tokens never throws errors and always produces meaningful output:

136

137

- **Invalid JavaScript**: Produces "Invalid" tokens for unrecognized characters

138

- **Incomplete tokens**: Uses `closed: false` property to indicate incomplete strings, templates, regex, etc.

139

- **JSX errors**: Produces "JSXInvalid" tokens when JSX mode encounters invalid characters

140

- **Extreme inputs**: May fail with regex engine limits, but handles normal code gracefully

141

142

**Example with incomplete tokens:**

143

144

```javascript

145

const tokens = Array.from(jsTokens('"unclosed string\n'));

146

// Produces: { type: "StringLiteral", value: '"unclosed string', closed: false }

147

148

const regexTokens = Array.from(jsTokens('/unclosed regex\n'));

149

// Produces: { type: "RegularExpressionLiteral", value: '/unclosed regex', closed: false }

150

```

151

152

## Types

153

154

### Options Configuration

155

156

```typescript { .api }

157

interface TokenizeOptions {

158

/** Enable JSX support (default: false) */

159

jsx?: boolean;

160

}

161

```

162

163

### Token Base Properties

164

165

All tokens include these base properties:

166

167

```typescript { .api }

168

interface BaseToken {

169

/** Token type classification */

170

type: string;

171

/** Original text content of the token */

172

value: string;

173

}

174

```

175

176

### Closed Property Tokens

177

178

Tokens that can be incomplete include a `closed` property:

179

180

```typescript { .api }

181

interface ClosedToken extends BaseToken {

182

/** Whether the token is properly closed/terminated */

183

closed: boolean;

184

}

185

```

186

187

**Tokens with `closed` property:**

188

- StringLiteral

189

- NoSubstitutionTemplate

190

- TemplateTail

191

- RegularExpressionLiteral

192

- MultiLineComment

193

- JSXString

194

195

**Token Examples:**

196

197

```javascript

198

// Closed string: { type: "StringLiteral", value: '"hello"', closed: true }

199

// Unclosed string: { type: "StringLiteral", value: '"hello', closed: false }

200

// Closed regex: { type: "RegularExpressionLiteral", value: '/abc/g', closed: true }

201

// Unclosed regex: { type: "RegularExpressionLiteral", value: '/abc', closed: false }

202

```