0
# Tokenization
1
2
Low-level parsing utilities for character-by-character CSS analysis, token extraction, and custom parsing workflows.
3
4
## Capabilities
5
6
### High-level Tokenization
7
8
#### Tokenize Function
9
10
Converts CSS strings into arrays of tokens for analysis and custom processing.
11
12
```javascript { .api }
13
/**
14
* Convert CSS string into array of tokens
15
* @param value - CSS string to tokenize
16
* @returns Array of string tokens
17
*/
18
function tokenize(value: string): string[];
19
```
20
21
**Usage Examples:**
22
23
```javascript
24
import { tokenize } from 'stylis';
25
26
// Basic tokenization
27
const tokens = tokenize('h1 h2 h3 [h4 h5] fn(args) "a b c"');
28
console.log(tokens);
29
// ['h1', 'h2', 'h3', '[h4 h5]', 'fn', '(args)', '"a b c"']
30
31
// CSS property tokenization
32
const propTokens = tokenize('margin: 10px 20px;');
33
console.log(propTokens);
34
// ['margin', ':', '10px', '20px', ';']
35
36
// Complex selector tokenization
37
const selectorTokens = tokenize('.class:hover > .child[attr="value"]');
38
```
39
40
### Parser State Management
41
42
#### State Variables
43
44
Global variables that track the current parsing state during tokenization.
45
46
```javascript { .api }
47
let line: number; // Current line number in parsing
48
let column: number; // Current column number in parsing
49
let length: number; // Length of current input string
50
let position: number; // Current position in input string
51
let character: number; // Current character code
52
let characters: string; // Current input string being parsed
53
```
54
55
#### Alloc Function
56
57
Initializes the tokenizer state with a new input string and resets parsing position.
58
59
```javascript { .api }
60
/**
61
* Initialize tokenizer state with input string
62
* @param value - CSS string to prepare for parsing
63
* @returns Empty array (parsing workspace)
64
*/
65
function alloc(value: string): any[];
66
```
67
68
#### Dealloc Function
69
70
Cleans up tokenizer state and returns the final value.
71
72
```javascript { .api }
73
/**
74
* Clean up tokenizer state and return value
75
* @param value - Value to return after cleanup
76
* @returns The passed value after state cleanup
77
*/
78
function dealloc(value: any): any;
79
```
80
81
### Character Navigation
82
83
#### Character Reading Functions
84
85
Functions for moving through and examining characters in the input stream.
86
87
```javascript { .api }
88
/**
89
* Get current character code without advancing position
90
* @returns Current character code (0 if at end)
91
*/
92
function char(): number;
93
94
/**
95
* Move to previous character and return its character code
96
* @returns Previous character code
97
*/
98
function prev(): number;
99
100
/**
101
* Move to next character and return its character code
102
* @returns Next character code (0 if at end)
103
*/
104
function next(): number;
105
106
/**
107
* Look at current character without advancing position
108
* @returns Current character code
109
*/
110
function peek(): number;
111
112
/**
113
* Get current position in input string
114
* @returns Current character position
115
*/
116
function caret(): number;
117
```
118
119
#### String Extraction
120
121
```javascript { .api }
122
/**
123
* Extract substring from current parsing context
124
* @param begin - Start position
125
* @param end - End position
126
* @returns Extracted substring
127
*/
128
function slice(begin: number, end: number): string;
129
```
130
131
### Token Type Classification
132
133
#### Token Function
134
135
Classifies character codes into token types for parsing decisions.
136
137
```javascript { .api }
138
/**
139
* Get token type for character code
140
* @param type - Character code to classify
141
* @returns Token type number (0-5)
142
*/
143
function token(type: number): number;
144
```
145
146
**Token Type Classifications:**
147
- **5**: Whitespace tokens (0, 9, 10, 13, 32) - `\0`, `\t`, `\n`, `\r`, space
148
- **4**: Isolate tokens (33, 43, 44, 47, 62, 64, 126, 59, 123, 125) - `!`, `+`, `,`, `/`, `>`, `@`, `~`, `;`, `{`, `}`
149
- **3**: Accompanied tokens (58) - `:`
150
- **2**: Opening delimit tokens (34, 39, 40, 91) - `"`, `'`, `(`, `[`
151
- **1**: Closing delimit tokens (41, 93) - `)`, `]`
152
- **0**: Default/identifier tokens
153
154
### Specialized Parsing Functions
155
156
#### Delimiter Handling
157
158
```javascript { .api }
159
/**
160
* Parse delimited content (quotes, brackets, parentheses)
161
* @param type - Delimiter character code
162
* @returns Delimited content as string
163
*/
164
function delimit(type: number): string;
165
166
/**
167
* Find matching delimiter position
168
* @param type - Opening delimiter character code
169
* @returns Position of matching closing delimiter
170
*/
171
function delimiter(type: number): number;
172
```
173
174
#### Whitespace Processing
175
176
```javascript { .api }
177
/**
178
* Handle whitespace characters during parsing
179
* @param type - Previous character type for context
180
* @returns Space character or empty string based on context
181
*/
182
function whitespace(type: number): string;
183
```
184
185
#### Escape Sequence Handling
186
187
```javascript { .api }
188
/**
189
* Handle CSS escape sequences
190
* @param index - Starting position of escape sequence
191
* @param count - Maximum characters to process
192
* @returns Processed escape sequence
193
*/
194
function escaping(index: number, count: number): string;
195
```
196
197
#### Comment Processing
198
199
```javascript { .api }
200
/**
201
* Parse CSS comment blocks (/* */ and //)
202
* @param type - Comment type indicator
203
* @param index - Starting position
204
* @returns Complete comment string with delimiters
205
*/
206
function commenter(type: number, index: number): string;
207
```
208
209
#### Identifier Extraction
210
211
```javascript { .api }
212
/**
213
* Parse CSS identifiers (class names, property names, etc.)
214
* @param index - Starting position of identifier
215
* @returns Identifier string
216
*/
217
function identifier(index: number): string;
218
```
219
220
## AST Node Management
221
222
### Node Creation
223
224
```javascript { .api }
225
/**
226
* Create AST node object with metadata
227
* @param value - Node value/content
228
* @param root - Root node reference
229
* @param parent - Parent node reference
230
* @param type - Node type string
231
* @param props - Node properties
232
* @param children - Child nodes
233
* @param length - Character length
234
* @param siblings - Sibling nodes array
235
* @returns AST node object
236
*/
237
function node(
238
value: string,
239
root: object | null,
240
parent: object | null,
241
type: string,
242
props: string[] | string,
243
children: object[] | string,
244
length: number,
245
siblings: object[]
246
): object;
247
```
248
249
### Node Manipulation
250
251
```javascript { .api }
252
/**
253
* Copy AST node with modifications
254
* @param root - Source node to copy
255
* @param props - Properties to override
256
* @returns New AST node with modifications
257
*/
258
function copy(root: object, props: object): object;
259
260
/**
261
* Lift node to root level in AST hierarchy
262
* @param root - Node to lift
263
* @returns void (modifies node structure)
264
*/
265
function lift(root: object): void;
266
```
267
268
## Custom Tokenization Examples
269
270
### Token Analysis
271
272
```javascript
273
import { tokenize, alloc, next, token, dealloc } from 'stylis';
274
275
// Analyze token types in CSS
276
function analyzeTokens(css) {
277
alloc(css);
278
const analysis = [];
279
280
while (next()) {
281
const charCode = char();
282
const tokenType = token(charCode);
283
const charStr = String.fromCharCode(charCode);
284
285
analysis.push({
286
char: charStr,
287
code: charCode,
288
type: tokenType,
289
position: caret()
290
});
291
}
292
293
return dealloc(analysis);
294
}
295
```
296
297
### Custom Parser
298
299
```javascript
300
import { alloc, next, peek, char, slice, caret, dealloc } from 'stylis';
301
302
// Simple custom property parser
303
function parseCustomProperties(css) {
304
alloc(css);
305
const properties = [];
306
307
while (next()) {
308
if (char() === 45 && peek() === 45) { // --
309
const start = caret() - 1;
310
311
// Find end of property name
312
while (next() && char() !== 58) {} // Find :
313
const nameEnd = caret() - 1;
314
315
// Find end of property value
316
while (next() && char() !== 59) {} // Find ;
317
const valueEnd = caret();
318
319
properties.push({
320
name: slice(start, nameEnd),
321
value: slice(nameEnd + 1, valueEnd - 1).trim()
322
});
323
}
324
}
325
326
return dealloc(properties);
327
}
328
```
329
330
### Character-by-Character Processing
331
332
```javascript
333
import { alloc, next, char, dealloc } from 'stylis';
334
335
// Count specific characters in CSS
336
function countCharacters(css, targetChar) {
337
alloc(css);
338
let count = 0;
339
const targetCode = targetChar.charCodeAt(0);
340
341
while (next()) {
342
if (char() === targetCode) {
343
count++;
344
}
345
}
346
347
return dealloc(count);
348
}
349
350
// Usage
351
const braceCount = countCharacters('.class { color: red; }', '{'); // 1
352
const semicolonCount = countCharacters('a: 1; b: 2; c: 3;', ';'); // 3
353
```
354
355
## Error Handling
356
357
Tokenization functions are designed to handle malformed CSS gracefully:
358
359
- **Invalid Characters**: Skipped or treated as identifiers
360
- **Unmatched Delimiters**: Parsing continues to end of input
361
- **Escape Sequences**: Invalid escapes are preserved as-is
362
- **End of Input**: Functions return appropriate default values (0 for characters, empty strings for content)
363
364
The tokenizer maintains internal state consistency even when processing malformed input, allowing higher-level parsers to make recovery decisions.