0
# Character Sets
1
2
Predefined character set utilities that generate common regex character class tokens. These functions create structured token representations for standard character classes like digits, words, and whitespace.
3
4
## Capabilities
5
6
### Word Characters
7
8
Creates character sets for word characters (letters, digits, and underscore).
9
10
```typescript { .api }
11
/**
12
* Creates a character set token for word characters (\w equivalent)
13
* Includes: a-z, A-Z, 0-9, and underscore (_)
14
* @returns Set token representing [a-zA-Z0-9_]
15
*/
16
function words(): Set;
17
18
/**
19
* Creates a negated character set token for non-word characters (\W equivalent)
20
* Matches any character except: a-z, A-Z, 0-9, and underscore (_)
21
* @returns Set token representing [^a-zA-Z0-9_]
22
*/
23
function notWords(): Set;
24
```
25
26
**Usage Examples:**
27
28
```typescript
29
import { words, notWords, reconstruct } from "ret";
30
31
// Generate word character set
32
const wordSet = words();
33
// Result: { type: types.SET, set: [...], not: false }
34
35
// Generate non-word character set
36
const nonWordSet = notWords();
37
// Result: { type: types.SET, set: [...], not: true }
38
39
// Reconstruct to regex strings
40
reconstruct(wordSet); // "\\w"
41
reconstruct(nonWordSet); // "\\W"
42
```
43
44
### Digit Characters
45
46
Creates character sets for numeric digits.
47
48
```typescript { .api }
49
/**
50
* Creates a character set token for digit characters (\d equivalent)
51
* Includes: 0-9
52
* @returns Set token representing [0-9]
53
*/
54
function ints(): Set;
55
56
/**
57
* Creates a negated character set token for non-digit characters (\D equivalent)
58
* Matches any character except: 0-9
59
* @returns Set token representing [^0-9]
60
*/
61
function notInts(): Set;
62
```
63
64
**Usage Examples:**
65
66
```typescript
67
import { ints, notInts, reconstruct } from "ret";
68
69
// Generate digit character set
70
const digitSet = ints();
71
// Result: { type: types.SET, set: [{ type: types.RANGE, from: 48, to: 57 }], not: false }
72
73
// Generate non-digit character set
74
const nonDigitSet = notInts();
75
// Result: { type: types.SET, set: [{ type: types.RANGE, from: 48, to: 57 }], not: true }
76
77
// Reconstruct to regex strings
78
reconstruct(digitSet); // "\\d"
79
reconstruct(nonDigitSet); // "\\D"
80
```
81
82
### Whitespace Characters
83
84
Creates character sets for whitespace characters.
85
86
```typescript { .api }
87
/**
88
* Creates a character set token for whitespace characters (\s equivalent)
89
* Includes: space, tab, newline, carriage return, form feed, vertical tab, and Unicode whitespace
90
* @returns Set token representing whitespace characters
91
*/
92
function whitespace(): Set;
93
94
/**
95
* Creates a negated character set token for non-whitespace characters (\S equivalent)
96
* Matches any character except whitespace characters
97
* @returns Set token representing non-whitespace characters
98
*/
99
function notWhitespace(): Set;
100
```
101
102
**Usage Examples:**
103
104
```typescript
105
import { whitespace, notWhitespace, reconstruct } from "ret";
106
107
// Generate whitespace character set
108
const spaceSet = whitespace();
109
// Result: { type: types.SET, set: [...extensive whitespace chars...], not: false }
110
111
// Generate non-whitespace character set
112
const nonSpaceSet = notWhitespace();
113
// Result: { type: types.SET, set: [...extensive whitespace chars...], not: true }
114
115
// Reconstruct to regex strings
116
reconstruct(spaceSet); // "\\s"
117
reconstruct(nonSpaceSet); // "\\S"
118
```
119
120
### Any Character
121
122
Creates a character set representing the dot (.) metacharacter.
123
124
```typescript { .api }
125
/**
126
* Creates a character set token for any character except line terminators (. equivalent)
127
* Matches any character except: \n, \r, \u2028 (line separator), \u2029 (paragraph separator)
128
* @returns Set token representing any character except line terminators
129
*/
130
function anyChar(): Set;
131
```
132
133
**Usage Examples:**
134
135
```typescript
136
import { anyChar, reconstruct } from "ret";
137
138
// Generate any-character set
139
const anySet = anyChar();
140
// Result: { type: types.SET, set: [line terminator chars], not: true }
141
142
// Reconstruct to regex string
143
reconstruct(anySet); // "."
144
```
145
146
## Character Set Structure
147
148
All character set functions return `Set` tokens with the following structure:
149
150
```typescript { .api }
151
interface Set {
152
type: types.SET;
153
set: SetTokens; // Array of characters and ranges
154
not: boolean; // Whether the set is negated
155
}
156
157
// SetTokens contain individual characters or character ranges
158
type SetTokens = (Range | Char | Set)[];
159
160
interface Range {
161
type: types.RANGE;
162
from: number; // Start character code
163
to: number; // End character code
164
}
165
166
interface Char {
167
type: types.CHAR;
168
value: number; // Character code
169
}
170
```
171
172
## Common Use Cases
173
174
### Building Custom Regex with Predefined Sets
175
176
```typescript
177
import { tokenizer, words, ints, reconstruct, types } from "ret";
178
179
// Create a pattern that matches word characters followed by digits
180
const customPattern = {
181
type: types.ROOT,
182
stack: [
183
{ type: types.REPETITION, min: 1, max: Infinity, value: words() },
184
{ type: types.REPETITION, min: 1, max: Infinity, value: ints() }
185
]
186
};
187
188
reconstruct(customPattern); // "\\w+\\d+"
189
```
190
191
### Analyzing Existing Patterns
192
193
```typescript
194
import { tokenizer, words, notWords } from "ret";
195
196
// Parse a regex and identify if it uses standard character classes
197
const tokens = tokenizer("\\w+@\\w+\\.\\w+");
198
// This would parse to tokens using words() sets for \\w patterns
199
```
200
201
### Character Set Composition
202
203
```typescript
204
import { words, ints, whitespace } from "ret";
205
206
// These character sets can be composed into more complex patterns
207
// or used individually in token construction for regex generation
208
const wordChars = words().set; // Get the underlying character/range array
209
const digitChars = ints().set; // Get digit character ranges
210
const spaceChars = whitespace().set; // Get whitespace character definitions
211
```