or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

config.mdextraction.mdgenerator.mdindex.mdrules-variants.mdtypes.mdutilities.md

extraction.mddocs/

0

# Content Extraction

1

2

The content extraction system in UnoCSS Core identifies utility classes from source code across different file types. It provides a pluggable architecture for custom extraction strategies and handles various content sources.

3

4

## Extractor Interface

5

6

```typescript { .api }

7

interface Extractor {

8

name: string;

9

order?: number;

10

extract?: (ctx: ExtractorContext) => Awaitable<Set<string> | CountableSet<string> | string[] | undefined | void>;

11

}

12

13

interface ExtractorContext {

14

readonly original: string;

15

code: string;

16

id?: string;

17

extracted: Set<string> | CountableSet<string>;

18

envMode?: 'dev' | 'build';

19

}

20

```

21

22

**Extractor properties:**

23

- **name**: Unique identifier for the extractor

24

- **order**: Processing order (lower numbers processed first)

25

- **extract**: Function that identifies utility classes from code

26

27

**ExtractorContext properties:**

28

- **original**: Original unmodified source code

29

- **code**: Current code (may be modified by previous extractors)

30

- **id**: File identifier or path

31

- **extracted**: Set to add found utility classes to

32

- **envMode**: Current environment mode (dev or build)

33

34

## Default Extractor

35

36

```typescript { .api }

37

const extractorSplit: Extractor = {

38

name: '@unocss/core/extractor-split',

39

order: 0,

40

extract({ code }) {

41

return splitCode(code);

42

}

43

};

44

45

const extractorDefault: Extractor = extractorSplit;

46

47

function splitCode(code: string): string[];

48

```

49

50

The default extractor splits source code by whitespace and common delimiters to identify potential utility classes.

51

52

### Split Patterns

53

54

```typescript { .api }

55

const defaultSplitRE: RegExp = /[\\:]?[\s'"`;{}]+/g;

56

const splitWithVariantGroupRE: RegExp = /([\\:]?[\s"'`;<>]|:\(|\)"|\)\s)/g;

57

```

58

59

- **defaultSplitRE**: Standard splitting pattern for most content

60

- **splitWithVariantGroupRE**: Enhanced pattern that handles variant groups like `hover:(text-red bg-blue)`

61

62

## Extractor Application

63

64

```typescript { .api }

65

// From UnoGenerator class

66

applyExtractors(

67

code: string,

68

id?: string,

69

extracted?: Set<string>

70

): Promise<Set<string>>;

71

applyExtractors(

72

code: string,

73

id?: string,

74

extracted?: CountableSet<string>

75

): Promise<CountableSet<string>>;

76

```

77

78

Applies all configured extractors to source code in order, accumulating results.

79

80

## Custom Extractors

81

82

### Basic Custom Extractor

83

84

```typescript

85

const customExtractor: Extractor = {

86

name: 'my-custom-extractor',

87

order: 10,

88

extract({ code, id }) {

89

const classes = new Set<string>();

90

91

// Extract from class attributes

92

const classMatches = code.matchAll(/class="([^"]+)"/g);

93

for (const match of classMatches) {

94

const classNames = match[1].split(/\s+/);

95

classNames.forEach(name => classes.add(name));

96

}

97

98

return classes;

99

}

100

};

101

```

102

103

### File-Type Specific Extractor

104

105

```typescript

106

const vueExtractor: Extractor = {

107

name: 'vue-extractor',

108

extract({ code, id }) {

109

// Only process .vue files

110

if (!id?.endsWith('.vue')) return;

111

112

const classes = new Set<string>();

113

114

// Extract from template section

115

const templateMatch = code.match(/<template[^>]*>([\s\S]*?)<\/template>/);

116

if (templateMatch) {

117

const template = templateMatch[1];

118

const classMatches = template.matchAll(/(?:class|:class)="([^"]+)"/g);

119

for (const match of classMatches) {

120

match[1].split(/\s+/).forEach(cls => classes.add(cls));

121

}

122

}

123

124

return classes;

125

}

126

};

127

```

128

129

### Regex-Based Extractor

130

131

```typescript

132

const regexExtractor: Extractor = {

133

name: 'regex-extractor',

134

extract({ code }) {

135

const classes = new Set<string>();

136

137

// Multiple regex patterns

138

const patterns = [

139

/className\s*=\s*["']([^"']+)["']/g,

140

/class\s*=\s*["']([^"']+)["']/g,

141

/tw`([^`]+)`/g, // Tagged template literals

142

];

143

144

for (const pattern of patterns) {

145

const matches = code.matchAll(pattern);

146

for (const match of matches) {

147

match[1].split(/\s+/).filter(Boolean).forEach(cls => classes.add(cls));

148

}

149

}

150

151

return classes;

152

}

153

};

154

```

155

156

## CountableSet for Frequency Tracking

157

158

```typescript { .api }

159

class CountableSet<K> extends Set<K> {

160

getCount(key: K): number;

161

setCount(key: K, count: number): this;

162

add(key: K): this;

163

delete(key: K): boolean;

164

clear(): void;

165

}

166

167

function isCountableSet<T = string>(value: any): value is CountableSet<T>;

168

```

169

170

CountableSet tracks how many times each utility class appears, useful for usage analytics and optimization.

171

172

### Using CountableSet

173

174

```typescript

175

const frequencyExtractor: Extractor = {

176

name: 'frequency-extractor',

177

extract({ code, extracted }) {

178

if (!isCountableSet(extracted)) return;

179

180

const matches = code.matchAll(/class="([^"]+)"/g);

181

for (const match of matches) {

182

const classes = match[1].split(/\s+/);

183

classes.forEach(cls => {

184

const current = extracted.getCount(cls);

185

extracted.setCount(cls, current + 1);

186

});

187

}

188

}

189

};

190

```

191

192

## Content Sources Configuration

193

194

```typescript { .api }

195

interface ContentOptions {

196

filesystem?: string[];

197

inline?: (string | { code: string, id?: string } | (() => Awaitable<string | { code: string, id?: string }>))[];

198

pipeline?: false | {

199

include?: FilterPattern;

200

exclude?: FilterPattern;

201

};

202

}

203

204

type FilterPattern = ReadonlyArray<string | RegExp> | string | RegExp | null;

205

```

206

207

**Content source types:**

208

- **filesystem**: Glob patterns for file system scanning

209

- **inline**: Inline code strings or functions returning code

210

- **pipeline**: Build tool integration filters

211

212

### Content Configuration Examples

213

214

```typescript

215

const uno = await createGenerator({

216

content: {

217

// Scan specific file patterns

218

filesystem: [

219

'src/**/*.{js,ts,jsx,tsx}',

220

'components/**/*.vue',

221

'pages/**/*.html'

222

],

223

224

// Include inline content

225

inline: [

226

'flex items-center justify-between',

227

{ code: '<div class="p-4 bg-white">Content</div>', id: 'inline-1' },

228

() => fetchDynamicContent()

229

],

230

231

// Pipeline filters for build tools

232

pipeline: {

233

include: [/\.(vue|jsx|tsx)$/],

234

exclude: [/node_modules/, /\.test\./]

235

}

236

}

237

});

238

```

239

240

## Advanced Extraction Patterns

241

242

### Template Literal Extraction

243

244

```typescript

245

const templateLiteralExtractor: Extractor = {

246

name: 'template-literal',

247

extract({ code }) {

248

const classes = new Set<string>();

249

250

// Extract from various template literal patterns

251

const patterns = [

252

/tw`([^`]+)`/g, // tw`class names`

253

/css\s*`[^`]*@apply\s+([^;`]+)/g, // CSS @apply statements

254

/className=\{`([^`]+)`\}/g, // React template literals

255

];

256

257

for (const pattern of patterns) {

258

const matches = code.matchAll(pattern);

259

for (const match of matches) {

260

const utilities = match[1]

261

.split(/\s+/)

262

.filter(cls => cls && !cls.includes('${'));

263

utilities.forEach(cls => classes.add(cls));

264

}

265

}

266

267

return classes;

268

}

269

};

270

```

271

272

### Comment-Based Extraction

273

274

```typescript

275

const commentExtractor: Extractor = {

276

name: 'comment-extractor',

277

extract({ code }) {

278

const classes = new Set<string>();

279

280

// Extract from special comments

281

const commentPattern = /\/\*\s*@unocss:\s*([^*]+)\s*\*\//g;

282

const matches = code.matchAll(commentPattern);

283

284

for (const match of matches) {

285

const utilities = match[1].split(/\s+/).filter(Boolean);

286

utilities.forEach(cls => classes.add(cls));

287

}

288

289

return classes;

290

}

291

};

292

```

293

294

## Extractor Best Practices

295

296

### Performance Optimization

297

298

1. **Early Returns**: Return early for irrelevant files

299

2. **Compiled Regex**: Pre-compile regex patterns outside the extract function

300

3. **Set Operations**: Use Set for deduplication

301

4. **Order Matters**: Set appropriate order values for processing sequence

302

303

### Error Handling

304

305

```typescript

306

const robustExtractor: Extractor = {

307

name: 'robust-extractor',

308

extract({ code, id }) {

309

try {

310

const classes = new Set<string>();

311

312

// Extraction logic with potential errors

313

const parsed = parseComplexSyntax(code);

314

315

return classes;

316

} catch (error) {

317

console.warn(`Extraction failed for ${id}:`, error);

318

return new Set<string>();

319

}

320

}

321

};

322

```

323

324

### Testing Extractors

325

326

```typescript

327

// Test utility for extractor development

328

async function testExtractor(extractor: Extractor, code: string, id?: string) {

329

const extracted = new Set<string>();

330

const context: ExtractorContext = {

331

original: code,

332

code,

333

id,

334

extracted,

335

envMode: 'build'

336

};

337

338

const result = await extractor.extract?.(context);

339

return result || extracted;

340

}

341

342

// Usage

343

const result = await testExtractor(customExtractor, '<div class="flex p-4">Test</div>');

344

console.log(result); // Set { 'flex', 'p-4' }

345

```