0
# Content Extraction
1
2
The flexible extractor system for analyzing different file types and extracting CSS selectors from content files.
3
4
## Capabilities
5
6
### Extractor Function Type
7
8
Function signature for content extractors that analyze files and return selectors.
9
10
```typescript { .api }
11
/**
12
* Function type for extracting selectors from content
13
* @param content - Content string to analyze
14
* @returns Extracted selectors as detailed object or string array
15
*/
16
type ExtractorFunction<T = string> = (content: T) => ExtractorResult;
17
```
18
19
### Extractor Result Types
20
21
Union type representing the output of extractor functions.
22
23
```typescript { .api }
24
type ExtractorResult = ExtractorResultDetailed | string[];
25
```
26
27
### Detailed Extractor Result
28
29
Comprehensive result object with categorized selectors extracted from content.
30
31
```typescript { .api }
32
interface ExtractorResultDetailed {
33
/** HTML/CSS attributes */
34
attributes: {
35
/** Attribute names (e.g., 'class', 'id', 'data-toggle') */
36
names: string[];
37
/** Attribute values */
38
values: string[];
39
};
40
/** CSS class names */
41
classes: string[];
42
/** CSS ID selectors */
43
ids: string[];
44
/** HTML tag names */
45
tags: string[];
46
/** Selectors that couldn't be categorized */
47
undetermined: string[];
48
}
49
```
50
51
### Extractor Configuration
52
53
Configuration object linking file extensions to their corresponding extractor functions.
54
55
```typescript { .api }
56
interface Extractors {
57
/** File extensions this extractor handles (e.g., ['.html', '.vue']) */
58
extensions: string[];
59
/** Extractor function for processing content */
60
extractor: ExtractorFunction;
61
}
62
```
63
64
**Usage Example:**
65
66
```typescript
67
import { PurgeCSS, ExtractorFunction } from "purgecss";
68
69
// Custom extractor for Vue files
70
const vueExtractor: ExtractorFunction = (content: string) => {
71
const classes = content.match(/class="([^"]+)"/g) || [];
72
return classes.map(cls => cls.replace(/class="([^"]+)"/, '$1').split(' ')).flat();
73
};
74
75
// Configure extractor
76
const extractors = [{
77
extensions: ['.vue'],
78
extractor: vueExtractor
79
}];
80
81
const results = await new PurgeCSS().purge({
82
content: ['src/**/*.vue'],
83
css: ['styles/*.css'],
84
extractors
85
});
86
```
87
88
## ExtractorResultSets Class
89
90
Management class for organizing and querying extracted selectors from content analysis.
91
92
### Constructor and Merging
93
94
```typescript { .api }
95
/**
96
* ExtractorResultSets constructor
97
* @param er - Initial extractor result to populate the sets
98
*/
99
constructor(er: ExtractorResult);
100
101
/**
102
* Merge another extractor result or result set into this one
103
* @param that - ExtractorResult or ExtractorResultSets to merge
104
* @returns This instance for method chaining
105
*/
106
merge(that: ExtractorResult | ExtractorResultSets): this;
107
```
108
109
### Selector Query Methods
110
111
Methods for checking the presence of specific selector types.
112
113
```typescript { .api }
114
/**
115
* Check if a CSS class name exists in the extracted selectors
116
* @param name - Class name to check
117
* @returns True if class is found
118
*/
119
hasClass(name: string): boolean;
120
121
/**
122
* Check if a CSS ID selector exists in the extracted selectors
123
* @param id - ID selector to check
124
* @returns True if ID is found
125
*/
126
hasId(id: string): boolean;
127
128
/**
129
* Check if an HTML tag name exists in the extracted selectors
130
* @param tag - Tag name to check
131
* @returns True if tag is found
132
*/
133
hasTag(tag: string): boolean;
134
135
/**
136
* Check if an attribute name exists in the extracted selectors
137
* @param name - Attribute name to check
138
* @returns True if attribute name is found
139
*/
140
hasAttrName(name: string): boolean;
141
142
/**
143
* Check if an attribute value exists in the extracted selectors
144
* @param value - Attribute value to check
145
* @returns True if attribute value is found
146
*/
147
hasAttrValue(value: string): boolean;
148
```
149
150
### Advanced Attribute Matching
151
152
Methods for sophisticated attribute selector matching.
153
154
```typescript { .api }
155
/**
156
* Check if any attribute values start with the given prefix
157
* @param prefix - Prefix to match against attribute values
158
* @returns True if matching prefix is found
159
*/
160
hasAttrPrefix(prefix: string): boolean;
161
162
/**
163
* Check if any attribute values end with the given suffix
164
* @param suffix - Suffix to match against attribute values
165
* @returns True if matching suffix is found
166
*/
167
hasAttrSuffix(suffix: string): boolean;
168
169
/**
170
* Check if any attribute values contain the given substring
171
* @param substr - Substring to match (supports space-separated words)
172
* @returns True if matching substring is found
173
*/
174
hasAttrSubstr(substr: string): boolean;
175
```
176
177
**Usage Examples:**
178
179
```typescript
180
import { PurgeCSS, ExtractorResultSets } from "purgecss";
181
182
const purgeCSS = new PurgeCSS();
183
184
// Extract selectors from files
185
const selectors = await purgeCSS.extractSelectorsFromFiles(
186
['src/**/*.html', 'src/**/*.js'],
187
[{ extensions: ['.html', '.js'], extractor: defaultExtractor }]
188
);
189
190
// Query the extracted selectors
191
if (selectors.hasClass('btn-primary')) {
192
console.log('btn-primary class found in content');
193
}
194
195
if (selectors.hasAttrPrefix('data-')) {
196
console.log('Data attributes found in content');
197
}
198
199
// Extract from raw content
200
const rawSelectors = await purgeCSS.extractSelectorsFromString([
201
{ raw: '<div class="hero-banner" id="main"></div>', extension: 'html' }
202
], extractors);
203
204
// Merge multiple result sets
205
const combinedSelectors = new ExtractorResultSets([])
206
.merge(selectors)
207
.merge(rawSelectors);
208
```
209
210
## Utility Functions
211
212
### Merge Extractor Selectors
213
214
Utility function for merging multiple extractor results into a single set.
215
216
```typescript { .api }
217
/**
218
* Merge multiple extractor selectors into a single result set
219
* @param extractors - Variable number of ExtractorResultDetailed or ExtractorResultSets
220
* @returns Merged ExtractorResultSets containing all selectors
221
*/
222
function mergeExtractorSelectors(
223
...extractors: (ExtractorResultDetailed | ExtractorResultSets)[]
224
): ExtractorResultSets;
225
```
226
227
**Usage Example:**
228
229
```typescript
230
import { mergeExtractorSelectors } from "purgecss";
231
232
const htmlSelectors = await purgeCSS.extractSelectorsFromFiles(['*.html'], htmlExtractors);
233
const jsSelectors = await purgeCSS.extractSelectorsFromFiles(['*.js'], jsExtractors);
234
235
const combinedSelectors = mergeExtractorSelectors(htmlSelectors, jsSelectors);
236
```
237
238
## Default Extractor
239
240
The default extractor used when no specific extractor is configured for a file type.
241
242
**Default Implementation:**
243
```typescript
244
const defaultExtractor = (content: string): ExtractorResult =>
245
content.match(/[A-Za-z0-9_-]+/g) || [];
246
```
247
248
The default extractor uses a simple regex pattern to match alphanumeric sequences, underscores, and hyphens - suitable for basic CSS class names and IDs.