0
# Content Extraction
1
2
The content extraction system in UnoCSS Core identifies utility classes from source code across different file types. It provides a pluggable architecture for custom extraction strategies and handles various content sources.
3
4
## Extractor Interface
5
6
```typescript { .api }
7
interface Extractor {
8
name: string;
9
order?: number;
10
extract?: (ctx: ExtractorContext) => Awaitable<Set<string> | CountableSet<string> | string[] | undefined | void>;
11
}
12
13
interface ExtractorContext {
14
readonly original: string;
15
code: string;
16
id?: string;
17
extracted: Set<string> | CountableSet<string>;
18
envMode?: 'dev' | 'build';
19
}
20
```
21
22
**Extractor properties:**
23
- **name**: Unique identifier for the extractor
24
- **order**: Processing order (lower numbers processed first)
25
- **extract**: Function that identifies utility classes from code
26
27
**ExtractorContext properties:**
28
- **original**: Original unmodified source code
29
- **code**: Current code (may be modified by previous extractors)
30
- **id**: File identifier or path
31
- **extracted**: Set to add found utility classes to
32
- **envMode**: Current environment mode (dev or build)
33
34
## Default Extractor
35
36
```typescript { .api }
37
const extractorSplit: Extractor = {
38
name: '@unocss/core/extractor-split',
39
order: 0,
40
extract({ code }) {
41
return splitCode(code);
42
}
43
};
44
45
const extractorDefault: Extractor = extractorSplit;
46
47
function splitCode(code: string): string[];
48
```
49
50
The default extractor splits source code by whitespace and common delimiters to identify potential utility classes.
51
52
### Split Patterns
53
54
```typescript { .api }
55
const defaultSplitRE: RegExp = /[\\:]?[\s'"`;{}]+/g;
56
const splitWithVariantGroupRE: RegExp = /([\\:]?[\s"'`;<>]|:\(|\)"|\)\s)/g;
57
```
58
59
- **defaultSplitRE**: Standard splitting pattern for most content
60
- **splitWithVariantGroupRE**: Enhanced pattern that handles variant groups like `hover:(text-red bg-blue)`
61
62
## Extractor Application
63
64
```typescript { .api }
65
// From UnoGenerator class
66
applyExtractors(
67
code: string,
68
id?: string,
69
extracted?: Set<string>
70
): Promise<Set<string>>;
71
applyExtractors(
72
code: string,
73
id?: string,
74
extracted?: CountableSet<string>
75
): Promise<CountableSet<string>>;
76
```
77
78
Applies all configured extractors to source code in order, accumulating results.
79
80
## Custom Extractors
81
82
### Basic Custom Extractor
83
84
```typescript
85
const customExtractor: Extractor = {
86
name: 'my-custom-extractor',
87
order: 10,
88
extract({ code, id }) {
89
const classes = new Set<string>();
90
91
// Extract from class attributes
92
const classMatches = code.matchAll(/class="([^"]+)"/g);
93
for (const match of classMatches) {
94
const classNames = match[1].split(/\s+/);
95
classNames.forEach(name => classes.add(name));
96
}
97
98
return classes;
99
}
100
};
101
```
102
103
### File-Type Specific Extractor
104
105
```typescript
106
const vueExtractor: Extractor = {
107
name: 'vue-extractor',
108
extract({ code, id }) {
109
// Only process .vue files
110
if (!id?.endsWith('.vue')) return;
111
112
const classes = new Set<string>();
113
114
// Extract from template section
115
const templateMatch = code.match(/<template[^>]*>([\s\S]*?)<\/template>/);
116
if (templateMatch) {
117
const template = templateMatch[1];
118
const classMatches = template.matchAll(/(?:class|:class)="([^"]+)"/g);
119
for (const match of classMatches) {
120
match[1].split(/\s+/).forEach(cls => classes.add(cls));
121
}
122
}
123
124
return classes;
125
}
126
};
127
```
128
129
### Regex-Based Extractor
130
131
```typescript
132
const regexExtractor: Extractor = {
133
name: 'regex-extractor',
134
extract({ code }) {
135
const classes = new Set<string>();
136
137
// Multiple regex patterns
138
const patterns = [
139
/className\s*=\s*["']([^"']+)["']/g,
140
/class\s*=\s*["']([^"']+)["']/g,
141
/tw`([^`]+)`/g, // Tagged template literals
142
];
143
144
for (const pattern of patterns) {
145
const matches = code.matchAll(pattern);
146
for (const match of matches) {
147
match[1].split(/\s+/).filter(Boolean).forEach(cls => classes.add(cls));
148
}
149
}
150
151
return classes;
152
}
153
};
154
```
155
156
## CountableSet for Frequency Tracking
157
158
```typescript { .api }
159
class CountableSet<K> extends Set<K> {
160
getCount(key: K): number;
161
setCount(key: K, count: number): this;
162
add(key: K): this;
163
delete(key: K): boolean;
164
clear(): void;
165
}
166
167
function isCountableSet<T = string>(value: any): value is CountableSet<T>;
168
```
169
170
CountableSet tracks how many times each utility class appears, useful for usage analytics and optimization.
171
172
### Using CountableSet
173
174
```typescript
175
const frequencyExtractor: Extractor = {
176
name: 'frequency-extractor',
177
extract({ code, extracted }) {
178
if (!isCountableSet(extracted)) return;
179
180
const matches = code.matchAll(/class="([^"]+)"/g);
181
for (const match of matches) {
182
const classes = match[1].split(/\s+/);
183
classes.forEach(cls => {
184
const current = extracted.getCount(cls);
185
extracted.setCount(cls, current + 1);
186
});
187
}
188
}
189
};
190
```
191
192
## Content Sources Configuration
193
194
```typescript { .api }
195
interface ContentOptions {
196
filesystem?: string[];
197
inline?: (string | { code: string, id?: string } | (() => Awaitable<string | { code: string, id?: string }>))[];
198
pipeline?: false | {
199
include?: FilterPattern;
200
exclude?: FilterPattern;
201
};
202
}
203
204
type FilterPattern = ReadonlyArray<string | RegExp> | string | RegExp | null;
205
```
206
207
**Content source types:**
208
- **filesystem**: Glob patterns for file system scanning
209
- **inline**: Inline code strings or functions returning code
210
- **pipeline**: Build tool integration filters
211
212
### Content Configuration Examples
213
214
```typescript
215
const uno = await createGenerator({
216
content: {
217
// Scan specific file patterns
218
filesystem: [
219
'src/**/*.{js,ts,jsx,tsx}',
220
'components/**/*.vue',
221
'pages/**/*.html'
222
],
223
224
// Include inline content
225
inline: [
226
'flex items-center justify-between',
227
{ code: '<div class="p-4 bg-white">Content</div>', id: 'inline-1' },
228
() => fetchDynamicContent()
229
],
230
231
// Pipeline filters for build tools
232
pipeline: {
233
include: [/\.(vue|jsx|tsx)$/],
234
exclude: [/node_modules/, /\.test\./]
235
}
236
}
237
});
238
```
239
240
## Advanced Extraction Patterns
241
242
### Template Literal Extraction
243
244
```typescript
245
const templateLiteralExtractor: Extractor = {
246
name: 'template-literal',
247
extract({ code }) {
248
const classes = new Set<string>();
249
250
// Extract from various template literal patterns
251
const patterns = [
252
/tw`([^`]+)`/g, // tw`class names`
253
/css\s*`[^`]*@apply\s+([^;`]+)/g, // CSS @apply statements
254
/className=\{`([^`]+)`\}/g, // React template literals
255
];
256
257
for (const pattern of patterns) {
258
const matches = code.matchAll(pattern);
259
for (const match of matches) {
260
const utilities = match[1]
261
.split(/\s+/)
262
.filter(cls => cls && !cls.includes('${'));
263
utilities.forEach(cls => classes.add(cls));
264
}
265
}
266
267
return classes;
268
}
269
};
270
```
271
272
### Comment-Based Extraction
273
274
```typescript
275
const commentExtractor: Extractor = {
276
name: 'comment-extractor',
277
extract({ code }) {
278
const classes = new Set<string>();
279
280
// Extract from special comments
281
const commentPattern = /\/\*\s*@unocss:\s*([^*]+)\s*\*\//g;
282
const matches = code.matchAll(commentPattern);
283
284
for (const match of matches) {
285
const utilities = match[1].split(/\s+/).filter(Boolean);
286
utilities.forEach(cls => classes.add(cls));
287
}
288
289
return classes;
290
}
291
};
292
```
293
294
## Extractor Best Practices
295
296
### Performance Optimization
297
298
1. **Early Returns**: Return early for irrelevant files
299
2. **Compiled Regex**: Pre-compile regex patterns outside the extract function
300
3. **Set Operations**: Use Set for deduplication
301
4. **Order Matters**: Set appropriate order values for processing sequence
302
303
### Error Handling
304
305
```typescript
306
const robustExtractor: Extractor = {
307
name: 'robust-extractor',
308
extract({ code, id }) {
309
try {
310
const classes = new Set<string>();
311
312
// Extraction logic with potential errors
313
const parsed = parseComplexSyntax(code);
314
315
return classes;
316
} catch (error) {
317
console.warn(`Extraction failed for ${id}:`, error);
318
return new Set<string>();
319
}
320
}
321
};
322
```
323
324
### Testing Extractors
325
326
```typescript
327
// Test utility for extractor development
328
async function testExtractor(extractor: Extractor, code: string, id?: string) {
329
const extracted = new Set<string>();
330
const context: ExtractorContext = {
331
original: code,
332
code,
333
id,
334
extracted,
335
envMode: 'build'
336
};
337
338
const result = await extractor.extract?.(context);
339
return result || extracted;
340
}
341
342
// Usage
343
const result = await testExtractor(customExtractor, '<div class="flex p-4">Test</div>');
344
console.log(result); // Set { 'flex', 'p-4' }
345
```