or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

character-sets.mdindex.mdreconstruction.mdtokenization.md

character-sets.mddocs/

0

# Character Sets

1

2

Predefined character set utilities that generate common regex character class tokens. These functions create structured token representations for standard character classes like digits, words, and whitespace.

3

4

## Capabilities

5

6

### Word Characters

7

8

Creates character sets for word characters (letters, digits, and underscore).

9

10

```typescript { .api }

11

/**

12

* Creates a character set token for word characters (\w equivalent)

13

* Includes: a-z, A-Z, 0-9, and underscore (_)

14

* @returns Set token representing [a-zA-Z0-9_]

15

*/

16

function words(): Set;

17

18

/**

19

* Creates a negated character set token for non-word characters (\W equivalent)

20

* Matches any character except: a-z, A-Z, 0-9, and underscore (_)

21

* @returns Set token representing [^a-zA-Z0-9_]

22

*/

23

function notWords(): Set;

24

```

25

26

**Usage Examples:**

27

28

```typescript

29

import { words, notWords, reconstruct } from "ret";

30

31

// Generate word character set

32

const wordSet = words();

33

// Result: { type: types.SET, set: [...], not: false }

34

35

// Generate non-word character set

36

const nonWordSet = notWords();

37

// Result: { type: types.SET, set: [...], not: true }

38

39

// Reconstruct to regex strings

40

reconstruct(wordSet); // "\\w"

41

reconstruct(nonWordSet); // "\\W"

42

```

43

44

### Digit Characters

45

46

Creates character sets for numeric digits.

47

48

```typescript { .api }

49

/**

50

* Creates a character set token for digit characters (\d equivalent)

51

* Includes: 0-9

52

* @returns Set token representing [0-9]

53

*/

54

function ints(): Set;

55

56

/**

57

* Creates a negated character set token for non-digit characters (\D equivalent)

58

* Matches any character except: 0-9

59

* @returns Set token representing [^0-9]

60

*/

61

function notInts(): Set;

62

```

63

64

**Usage Examples:**

65

66

```typescript

67

import { ints, notInts, reconstruct } from "ret";

68

69

// Generate digit character set

70

const digitSet = ints();

71

// Result: { type: types.SET, set: [{ type: types.RANGE, from: 48, to: 57 }], not: false }

72

73

// Generate non-digit character set

74

const nonDigitSet = notInts();

75

// Result: { type: types.SET, set: [{ type: types.RANGE, from: 48, to: 57 }], not: true }

76

77

// Reconstruct to regex strings

78

reconstruct(digitSet); // "\\d"

79

reconstruct(nonDigitSet); // "\\D"

80

```

81

82

### Whitespace Characters

83

84

Creates character sets for whitespace characters.

85

86

```typescript { .api }

87

/**

88

* Creates a character set token for whitespace characters (\s equivalent)

89

* Includes: space, tab, newline, carriage return, form feed, vertical tab, and Unicode whitespace

90

* @returns Set token representing whitespace characters

91

*/

92

function whitespace(): Set;

93

94

/**

95

* Creates a negated character set token for non-whitespace characters (\S equivalent)

96

* Matches any character except whitespace characters

97

* @returns Set token representing non-whitespace characters

98

*/

99

function notWhitespace(): Set;

100

```

101

102

**Usage Examples:**

103

104

```typescript

105

import { whitespace, notWhitespace, reconstruct } from "ret";

106

107

// Generate whitespace character set

108

const spaceSet = whitespace();

109

// Result: { type: types.SET, set: [...extensive whitespace chars...], not: false }

110

111

// Generate non-whitespace character set

112

const nonSpaceSet = notWhitespace();

113

// Result: { type: types.SET, set: [...extensive whitespace chars...], not: true }

114

115

// Reconstruct to regex strings

116

reconstruct(spaceSet); // "\\s"

117

reconstruct(nonSpaceSet); // "\\S"

118

```

119

120

### Any Character

121

122

Creates a character set representing the dot (.) metacharacter.

123

124

```typescript { .api }

125

/**

126

* Creates a character set token for any character except line terminators (. equivalent)

127

* Matches any character except: \n, \r, \u2028 (line separator), \u2029 (paragraph separator)

128

* @returns Set token representing any character except line terminators

129

*/

130

function anyChar(): Set;

131

```

132

133

**Usage Examples:**

134

135

```typescript

136

import { anyChar, reconstruct } from "ret";

137

138

// Generate any-character set

139

const anySet = anyChar();

140

// Result: { type: types.SET, set: [line terminator chars], not: true }

141

142

// Reconstruct to regex string

143

reconstruct(anySet); // "."

144

```

145

146

## Character Set Structure

147

148

All character set functions return `Set` tokens with the following structure:

149

150

```typescript { .api }

151

interface Set {

152

type: types.SET;

153

set: SetTokens; // Array of characters and ranges

154

not: boolean; // Whether the set is negated

155

}

156

157

// SetTokens contain individual characters or character ranges

158

type SetTokens = (Range | Char | Set)[];

159

160

interface Range {

161

type: types.RANGE;

162

from: number; // Start character code

163

to: number; // End character code

164

}

165

166

interface Char {

167

type: types.CHAR;

168

value: number; // Character code

169

}

170

```

171

172

## Common Use Cases

173

174

### Building Custom Regex with Predefined Sets

175

176

```typescript

177

import { tokenizer, words, ints, reconstruct, types } from "ret";

178

179

// Create a pattern that matches word characters followed by digits

180

const customPattern = {

181

type: types.ROOT,

182

stack: [

183

{ type: types.REPETITION, min: 1, max: Infinity, value: words() },

184

{ type: types.REPETITION, min: 1, max: Infinity, value: ints() }

185

]

186

};

187

188

reconstruct(customPattern); // "\\w+\\d+"

189

```

190

191

### Analyzing Existing Patterns

192

193

```typescript

194

import { tokenizer, words, notWords } from "ret";

195

196

// Parse a regex and identify if it uses standard character classes

197

const tokens = tokenizer("\\w+@\\w+\\.\\w+");

198

// This would parse to tokens using words() sets for \\w patterns

199

```

200

201

### Character Set Composition

202

203

```typescript

204

import { words, ints, whitespace } from "ret";

205

206

// These character sets can be composed into more complex patterns

207

// or used individually in token construction for regex generation

208

const wordChars = words().set; // Get the underlying character/range array

209

const digitChars = ints().set; // Get digit character ranges

210

const spaceChars = whitespace().set; // Get whitespace character definitions

211

```