or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdmiddleware.mdparser.mdserialization.mdtokenization.mdutilities.md

tokenization.mddocs/

0

# Tokenization

1

2

Low-level parsing utilities for character-by-character CSS analysis, token extraction, and custom parsing workflows.

3

4

## Capabilities

5

6

### High-level Tokenization

7

8

#### Tokenize Function

9

10

Converts CSS strings into arrays of tokens for analysis and custom processing.

11

12

```javascript { .api }

13

/**

14

* Convert CSS string into array of tokens

15

* @param value - CSS string to tokenize

16

* @returns Array of string tokens

17

*/

18

function tokenize(value: string): string[];

19

```

20

21

**Usage Examples:**

22

23

```javascript

24

import { tokenize } from 'stylis';

25

26

// Basic tokenization

27

const tokens = tokenize('h1 h2 h3 [h4 h5] fn(args) "a b c"');

28

console.log(tokens);

29

// ['h1', 'h2', 'h3', '[h4 h5]', 'fn', '(args)', '"a b c"']

30

31

// CSS property tokenization

32

const propTokens = tokenize('margin: 10px 20px;');

33

console.log(propTokens);

34

// ['margin', ':', '10px', '20px', ';']

35

36

// Complex selector tokenization

37

const selectorTokens = tokenize('.class:hover > .child[attr="value"]');

38

```

39

40

### Parser State Management

41

42

#### State Variables

43

44

Global variables that track the current parsing state during tokenization.

45

46

```javascript { .api }

47

let line: number; // Current line number in parsing

48

let column: number; // Current column number in parsing

49

let length: number; // Length of current input string

50

let position: number; // Current position in input string

51

let character: number; // Current character code

52

let characters: string; // Current input string being parsed

53

```

54

55

#### Alloc Function

56

57

Initializes the tokenizer state with a new input string and resets parsing position.

58

59

```javascript { .api }

60

/**

61

* Initialize tokenizer state with input string

62

* @param value - CSS string to prepare for parsing

63

* @returns Empty array (parsing workspace)

64

*/

65

function alloc(value: string): any[];

66

```

67

68

#### Dealloc Function

69

70

Cleans up tokenizer state and returns the final value.

71

72

```javascript { .api }

73

/**

74

* Clean up tokenizer state and return value

75

* @param value - Value to return after cleanup

76

* @returns The passed value after state cleanup

77

*/

78

function dealloc(value: any): any;

79

```

80

81

### Character Navigation

82

83

#### Character Reading Functions

84

85

Functions for moving through and examining characters in the input stream.

86

87

```javascript { .api }

88

/**

89

* Get current character code without advancing position

90

* @returns Current character code (0 if at end)

91

*/

92

function char(): number;

93

94

/**

95

* Move to previous character and return its character code

96

* @returns Previous character code

97

*/

98

function prev(): number;

99

100

/**

101

* Move to next character and return its character code

102

* @returns Next character code (0 if at end)

103

*/

104

function next(): number;

105

106

/**

107

* Look at current character without advancing position

108

* @returns Current character code

109

*/

110

function peek(): number;

111

112

/**

113

* Get current position in input string

114

* @returns Current character position

115

*/

116

function caret(): number;

117

```

118

119

#### String Extraction

120

121

```javascript { .api }

122

/**

123

* Extract substring from current parsing context

124

* @param begin - Start position

125

* @param end - End position

126

* @returns Extracted substring

127

*/

128

function slice(begin: number, end: number): string;

129

```

130

131

### Token Type Classification

132

133

#### Token Function

134

135

Classifies character codes into token types for parsing decisions.

136

137

```javascript { .api }

138

/**

139

* Get token type for character code

140

* @param type - Character code to classify

141

* @returns Token type number (0-5)

142

*/

143

function token(type: number): number;

144

```

145

146

**Token Type Classifications:**

147

- **5**: Whitespace tokens (0, 9, 10, 13, 32) - `\0`, `\t`, `\n`, `\r`, space

148

- **4**: Isolate tokens (33, 43, 44, 47, 62, 64, 126, 59, 123, 125) - `!`, `+`, `,`, `/`, `>`, `@`, `~`, `;`, `{`, `}`

149

- **3**: Accompanied tokens (58) - `:`

150

- **2**: Opening delimit tokens (34, 39, 40, 91) - `"`, `'`, `(`, `[`

151

- **1**: Closing delimit tokens (41, 93) - `)`, `]`

152

- **0**: Default/identifier tokens

153

154

### Specialized Parsing Functions

155

156

#### Delimiter Handling

157

158

```javascript { .api }

159

/**

160

* Parse delimited content (quotes, brackets, parentheses)

161

* @param type - Delimiter character code

162

* @returns Delimited content as string

163

*/

164

function delimit(type: number): string;

165

166

/**

167

* Find matching delimiter position

168

* @param type - Opening delimiter character code

169

* @returns Position of matching closing delimiter

170

*/

171

function delimiter(type: number): number;

172

```

173

174

#### Whitespace Processing

175

176

```javascript { .api }

177

/**

178

* Handle whitespace characters during parsing

179

* @param type - Previous character type for context

180

* @returns Space character or empty string based on context

181

*/

182

function whitespace(type: number): string;

183

```

184

185

#### Escape Sequence Handling

186

187

```javascript { .api }

188

/**

189

* Handle CSS escape sequences

190

* @param index - Starting position of escape sequence

191

* @param count - Maximum characters to process

192

* @returns Processed escape sequence

193

*/

194

function escaping(index: number, count: number): string;

195

```

196

197

#### Comment Processing

198

199

```javascript { .api }

200

/**

201

* Parse CSS comment blocks (/* */ and //)

202

* @param type - Comment type indicator

203

* @param index - Starting position

204

* @returns Complete comment string with delimiters

205

*/

206

function commenter(type: number, index: number): string;

207

```

208

209

#### Identifier Extraction

210

211

```javascript { .api }

212

/**

213

* Parse CSS identifiers (class names, property names, etc.)

214

* @param index - Starting position of identifier

215

* @returns Identifier string

216

*/

217

function identifier(index: number): string;

218

```

219

220

## AST Node Management

221

222

### Node Creation

223

224

```javascript { .api }

225

/**

226

* Create AST node object with metadata

227

* @param value - Node value/content

228

* @param root - Root node reference

229

* @param parent - Parent node reference

230

* @param type - Node type string

231

* @param props - Node properties

232

* @param children - Child nodes

233

* @param length - Character length

234

* @param siblings - Sibling nodes array

235

* @returns AST node object

236

*/

237

function node(

238

value: string,

239

root: object | null,

240

parent: object | null,

241

type: string,

242

props: string[] | string,

243

children: object[] | string,

244

length: number,

245

siblings: object[]

246

): object;

247

```

248

249

### Node Manipulation

250

251

```javascript { .api }

252

/**

253

* Copy AST node with modifications

254

* @param root - Source node to copy

255

* @param props - Properties to override

256

* @returns New AST node with modifications

257

*/

258

function copy(root: object, props: object): object;

259

260

/**

261

* Lift node to root level in AST hierarchy

262

* @param root - Node to lift

263

* @returns void (modifies node structure)

264

*/

265

function lift(root: object): void;

266

```

267

268

## Custom Tokenization Examples

269

270

### Token Analysis

271

272

```javascript

273

import { tokenize, alloc, next, token, dealloc } from 'stylis';

274

275

// Analyze token types in CSS

276

function analyzeTokens(css) {

277

alloc(css);

278

const analysis = [];

279

280

while (next()) {

281

const charCode = char();

282

const tokenType = token(charCode);

283

const charStr = String.fromCharCode(charCode);

284

285

analysis.push({

286

char: charStr,

287

code: charCode,

288

type: tokenType,

289

position: caret()

290

});

291

}

292

293

return dealloc(analysis);

294

}

295

```

296

297

### Custom Parser

298

299

```javascript

300

import { alloc, next, peek, char, slice, caret, dealloc } from 'stylis';

301

302

// Simple custom property parser

303

function parseCustomProperties(css) {

304

alloc(css);

305

const properties = [];

306

307

while (next()) {

308

if (char() === 45 && peek() === 45) { // --

309

const start = caret() - 1;

310

311

// Find end of property name

312

while (next() && char() !== 58) {} // Find :

313

const nameEnd = caret() - 1;

314

315

// Find end of property value

316

while (next() && char() !== 59) {} // Find ;

317

const valueEnd = caret();

318

319

properties.push({

320

name: slice(start, nameEnd),

321

value: slice(nameEnd + 1, valueEnd - 1).trim()

322

});

323

}

324

}

325

326

return dealloc(properties);

327

}

328

```

329

330

### Character-by-Character Processing

331

332

```javascript

333

import { alloc, next, char, dealloc } from 'stylis';

334

335

// Count specific characters in CSS

336

function countCharacters(css, targetChar) {

337

alloc(css);

338

let count = 0;

339

const targetCode = targetChar.charCodeAt(0);

340

341

while (next()) {

342

if (char() === targetCode) {

343

count++;

344

}

345

}

346

347

return dealloc(count);

348

}

349

350

// Usage

351

const braceCount = countCharacters('.class { color: red; }', '{'); // 1

352

const semicolonCount = countCharacters('a: 1; b: 2; c: 3;', ';'); // 3

353

```

354

355

## Error Handling

356

357

Tokenization functions are designed to handle malformed CSS gracefully:

358

359

- **Invalid Characters**: Skipped or treated as identifiers

360

- **Unmatched Delimiters**: Parsing continues to end of input

361

- **Escape Sequences**: Invalid escapes are preserved as-is

362

- **End of Input**: Functions return appropriate default values (0 for characters, empty strings for content)

363

364

The tokenizer maintains internal state consistency even when processing malformed input, allowing higher-level parsers to make recovery decisions.