0
# File Processing
1
2
Parse, highlight, query, and tag source files using Tree-sitter parsers. These commands provide the core runtime functionality for applying Tree-sitter parsers to actual source code files with various output formats and processing options.
3
4
## Capabilities
5
6
### File Parsing
7
8
Parse source files using Tree-sitter parsers with multiple output formats including syntax trees, XML, GraphViz dot notation, and JSON.
9
10
```bash { .api }
11
tree-sitter parse [options] <files...> # Alias: p
12
```
13
14
**Input Sources:**
15
- `<files...>`: One or more file paths to parse
16
- `--paths <file>`: File containing list of paths to parse
17
- `--test-number <n>, -n`: Parse contents of specific test from corpus
18
- Stdin: Read from standard input if no files specified
19
20
**Output Formats:**
21
- `--xml, -x`: Output parse tree in XML format
22
- `--cst, -c`: Output in pretty-printed CST (Concrete Syntax Tree) format
23
- `--dot`: Output in GraphViz dot format for visualization
24
- `--json, -j`: Output parsing results in JSON format
25
- Default: Human-readable tree format
26
27
**Processing Options:**
28
- `--scope <scope>`: Select language by TextMate scope instead of file extension
29
- `--debug [type], -d`: Show parsing debug log (normal, pretty, or quiet)
30
- `--debug-build, -0`: Use debug-compiled parser
31
- `--debug-graph, -D`: Generate log.html with debug visualization
32
- `--wasm`: Use WebAssembly parsers instead of native libraries
33
- `--stat, -s`: Show parsing statistics
34
- `--timeout <microseconds>`: Set parsing timeout
35
- `--time, -t`: Measure and display execution time
36
- `--quiet, -q`: Suppress main output
37
- `--encoding <encoding>`: Input file encoding (utf8, utf16le, utf16be)
38
- `--open-log`: Open log.html in browser (requires --debug-graph)
39
- `--rebuild, -r`: Force rebuild parser before parsing
40
- `--no-ranges`: Omit byte/point ranges in output
41
42
**Edit Simulation:**
43
- `--edits <edit>`: Apply edits in format "row,col|position delcount insert_text"
44
45
**Example:**
46
```bash
47
# Parse single file
48
tree-sitter parse input.js
49
50
# Parse multiple files
51
tree-sitter parse src/*.js
52
53
# Parse with XML output
54
tree-sitter parse --xml input.js
55
56
# Parse from file list
57
tree-sitter parse --paths file-list.txt
58
59
# Parse with debug visualization
60
tree-sitter parse --debug-graph --open-log input.js
61
62
# Parse with statistics
63
tree-sitter parse --stat --time input.js
64
65
# Parse specific test case
66
tree-sitter parse --test-number 42
67
68
# Parse with edit simulation
69
tree-sitter parse --edits "10,5|50 3 new_text" input.js
70
71
# Parse using specific language scope
72
tree-sitter parse --scope source.python input.py
73
```
74
75
### Syntax Highlighting
76
77
Apply syntax highlighting to source files using Tree-sitter highlight queries with customizable themes and output formats.
78
79
```bash { .api }
80
tree-sitter highlight [options] <files...> # Alias: hi
81
```
82
83
**Input Sources:**
84
- `<files...>`: One or more file paths to highlight
85
- `--paths <file>`: File containing list of paths to highlight
86
- `--test-number <n>, -n`: Highlight contents of specific test
87
- Stdin: Read from standard input if no files specified
88
89
**Output Formats:**
90
- `--html, -H`: Generate HTML output with embedded styles
91
- `--css-classes`: Use CSS classes instead of inline styles (with --html)
92
- Default: ANSI color codes for terminal display
93
94
**Highlighting Options:**
95
- `--scope <scope>`: Select language by TextMate scope
96
- `--captures-path <file>`: Path to custom captures file
97
- `--query-paths <files>`: Paths to custom highlight query files
98
- `--check`: Validate highlighting captures against standards
99
- `--time, -t`: Measure execution time
100
- `--quiet, -q`: Suppress main output
101
- `--config-path <path>`: Use alternative config.json file
102
103
**Example:**
104
```bash
105
# Highlight file with terminal colors
106
tree-sitter highlight input.js
107
108
# Generate HTML output
109
tree-sitter highlight --html input.js > output.html
110
111
# Use CSS classes for styling
112
tree-sitter highlight --html --css-classes input.js
113
114
# Highlight with custom queries
115
tree-sitter highlight --query-paths custom-highlights.scm input.js
116
117
# Validate highlight captures
118
tree-sitter highlight --check input.js
119
120
# Highlight specific test
121
tree-sitter highlight --test-number 5
122
123
# Use custom scope
124
tree-sitter highlight --scope source.tsx input.tsx
125
```
126
127
### Query Execution
128
129
Execute Tree-sitter queries against source files to find specific syntax patterns and extract structured information.
130
131
```bash { .api }
132
tree-sitter query <query_file> <files...> # Alias: q
133
```
134
135
**Arguments:**
136
- `<query_file>`: Path to file containing Tree-sitter query
137
- `<files...>`: Source files to query against
138
139
**Input Sources:**
140
- `--paths <file>`: File containing list of paths to query
141
- `--test-number <n>, -n`: Query contents of specific test
142
- Stdin: Read from standard input if no files specified
143
144
**Query Options:**
145
- `--captures, -c`: Order results by captures instead of matches
146
- `--byte-range <start:end>`: Limit query execution to byte range
147
- `--row-range <start:end>`: Limit query execution to row range
148
- `--scope <scope>`: Select language by TextMate scope
149
- `--test`: Run query validation tests
150
- `--time, -t`: Measure execution time
151
- `--quiet, -q`: Suppress main output
152
- `--config-path <path>`: Use alternative config.json file
153
154
**Example:**
155
```bash
156
# Execute query on files
157
tree-sitter query functions.scm src/*.js
158
159
# Query with capture ordering
160
tree-sitter query --captures patterns.scm input.js
161
162
# Query specific byte range
163
tree-sitter query --byte-range 100:500 query.scm input.js
164
165
# Query specific line range
166
tree-sitter query --row-range 10:50 query.scm input.js
167
168
# Run query tests
169
tree-sitter query --test validation.scm
170
171
# Query from file list
172
tree-sitter query patterns.scm --paths file-list.txt
173
174
# Query specific test case
175
tree-sitter query patterns.scm --test-number 3
176
```
177
178
### Tag Generation
179
180
Generate ctags-style tags from source files for code navigation and symbol indexing.
181
182
```bash { .api }
183
tree-sitter tags [options] <files...>
184
```
185
186
**Input Sources:**
187
- `<files...>`: One or more file paths to process
188
- `--paths <file>`: File containing list of paths to process
189
- `--test-number <n>, -n`: Generate tags from specific test
190
- Stdin: Read from standard input if no files specified
191
192
**Tag Options:**
193
- `--scope <scope>`: Select language by TextMate scope
194
- `--time, -t`: Measure execution time
195
- `--quiet, -q`: Suppress main output
196
- `--config-path <path>`: Use alternative config.json file
197
198
**Output Format:**
199
Standard ctags format with symbol information:
200
- Symbol name
201
- File location
202
- Line number
203
- Symbol type (function, class, variable, etc.)
204
- Additional metadata
205
206
**Example:**
207
```bash
208
# Generate tags for files
209
tree-sitter tags src/*.js
210
211
# Generate from file list
212
tree-sitter tags --paths file-list.txt
213
214
# Generate with timing information
215
tree-sitter tags --time src/
216
217
# Generate from specific test
218
tree-sitter tags --test-number 10
219
220
# Use specific language scope
221
tree-sitter tags --scope source.python *.py
222
```
223
224
## Advanced Processing Features
225
226
### Multi-language Support
227
228
Process files written in different languages within the same command:
229
230
```bash
231
# Parse files with automatic language detection
232
tree-sitter parse src/main.py lib/utils.js config.json
233
234
# Override language detection with scope
235
tree-sitter highlight --scope source.tsx component.js
236
```
237
238
### Batch Processing
239
240
Efficiently process large numbers of files:
241
242
```bash
243
# Use file list for batch processing
244
find src/ -name "*.js" > js-files.txt
245
tree-sitter parse --paths js-files.txt
246
247
# Process with parallel execution (system dependent)
248
tree-sitter parse src/**/*.js
249
```
250
251
### Input Encoding Support
252
253
Handle files with different character encodings:
254
255
```bash
256
# Parse UTF-16 encoded files
257
tree-sitter parse --encoding utf16le input.txt
258
259
# Parse UTF-16 big-endian files
260
tree-sitter parse --encoding utf16be input.txt
261
```
262
263
### Error Recovery and Debugging
264
265
Diagnose parsing issues and errors:
266
267
```bash
268
# Enable debug output for troubleshooting
269
tree-sitter parse --debug input.js
270
271
# Generate visual debugging information
272
tree-sitter parse --debug-graph --open-log problematic.js
273
274
# Parse with timeout to handle infinite loops
275
tree-sitter parse --timeout 5000000 input.js
276
277
# Show detailed statistics
278
tree-sitter parse --stat --time input.js
279
```
280
281
### Edit Simulation
282
283
Test incremental parsing by simulating edits:
284
285
```bash
286
# Simulate single edit: delete 3 chars at position 50, insert "new_text"
287
tree-sitter parse --edits "10,5|50 3 new_text" input.js
288
289
# Simulate multiple edits
290
tree-sitter parse --edits "1,0|0 0 prefix" --edits "5,10|25 2 fix" input.js
291
```
292
293
**Edit Format:** `"row,col|position delcount insert_text"`
294
- `row,col`: Line and column of edit
295
- `position`: Byte offset in file
296
- `delcount`: Number of characters to delete
297
- `insert_text`: Text to insert
298
299
## Output Processing
300
301
### JSON Output Structure
302
303
When using `--json` flag, parse command outputs structured data:
304
305
```json
306
{
307
"parse_summaries": [
308
{
309
"file": "input.js",
310
"successful": true,
311
"start": {"row": 0, "column": 0},
312
"end": {"row": 10, "column": 5},
313
"duration": 1500,
314
"bytes": 1024
315
}
316
],
317
"cumulative_stats": {
318
"successful_parses": 1,
319
"total_parses": 1,
320
"total_bytes": 1024,
321
"total_duration": 1500
322
}
323
}
324
```
325
326
### Visualization Formats
327
328
#### GraphViz Dot Output
329
330
Generate graph visualizations of parse trees:
331
332
```bash
333
tree-sitter parse --dot input.js | dot -Tpng -o tree.png
334
```
335
336
#### XML Output
337
338
Structured XML representation for integration with XML tools:
339
340
```bash
341
tree-sitter parse --xml input.js | xmllint --format -
342
```
343
344
### Performance Monitoring
345
346
Track parsing performance across files:
347
348
```bash
349
# Monitor parsing speed
350
tree-sitter parse --stat --time src/*.js
351
352
# Generate performance reports
353
tree-sitter parse --json --stat src/ > perf-report.json
354
```