0
# Core Scanning Engine
1
2
The core scanning engine provides the main functionality for running semgrep analysis on codebases. It handles target discovery, rule execution, baseline comparison, and result aggregation.
3
4
## Core Imports
5
6
```python
7
from semgrep.run_scan import run_scan, run_scan_and_return_json, baseline_run
8
from semgrep.output import OutputHandler, OutputSettings
9
from semgrep.target_manager import TargetManager
10
from semgrep.rule_match import RuleMatchMap
11
from semgrep.baseline import BaselineHandler
12
from semgrep.semgrep_types import EngineType
13
from pathlib import Path
14
from typing import List, Optional, Sequence, FrozenSet, Union, Any, Dict
15
```
16
17
## Capabilities
18
19
### Main Scanning Functions
20
21
Primary functions for executing semgrep scans with full configuration support.
22
23
```python { .api }
24
def run_scan(
25
*,
26
dump_command_for_core: bool = False,
27
time_flag: bool = False,
28
matching_explanations: bool = False,
29
engine_type: EngineType = EngineType.OSS,
30
run_secrets: bool = False,
31
output_handler: OutputHandler,
32
scanning_roots: Sequence[str],
33
**kwargs
34
):
35
"""
36
Execute a semgrep scan with comprehensive configuration options.
37
38
Key Parameters (keyword-only):
39
- output_handler (OutputHandler): Required handler for output formatting
40
- scanning_roots (Sequence[str]): Required sequence of paths to scan
41
- dump_command_for_core (bool): Debug flag for core command dumping
42
- time_flag (bool): Enable timing information
43
- matching_explanations (bool): Include pattern matching explanations
44
- engine_type (EngineType): Scan engine type (OSS by default)
45
- run_secrets (bool): Enable secrets scanning
46
- **kwargs: Many additional configuration parameters
47
48
Returns:
49
Complex tuple with scan results, errors, statistics, and metadata
50
51
Note: This function has 50+ parameters. See source code for complete signature.
52
"""
53
54
def run_scan_and_return_json(
55
*,
56
config: Path,
57
scanning_roots: List[Path],
58
output_settings: Optional[OutputSettings] = None,
59
**kwargs: Any
60
) -> Union[Dict[str, Any], str]:
61
"""
62
Execute a semgrep scan and return results as JSON.
63
64
Parameters (keyword-only):
65
- config (Path): Path to configuration file
66
- scanning_roots (List[Path]): List of paths to scan
67
- output_settings (OutputSettings, optional): Output formatting configuration
68
- **kwargs: Additional scan parameters passed to run_scan
69
70
Returns:
71
Union[Dict[str, Any], str]: Scan results in JSON format or JSON string
72
"""
73
```
74
75
### Baseline Scanning
76
77
Functions for comparing current scan results against a baseline to identify new findings.
78
79
```python { .api }
80
def baseline_run(
81
baseline_handler: BaselineHandler,
82
baseline_commit: Optional[str],
83
rule_matches_by_rule: RuleMatchMap,
84
all_subprojects: List[Union[out.UnresolvedSubproject, out.ResolvedSubproject]],
85
scanning_root_strings: FrozenSet[Path],
86
**kwargs
87
):
88
"""
89
Execute a baseline scan to compare against previous results.
90
91
Parameters:
92
- baseline_handler (BaselineHandler): Handler for baseline comparison logic
93
- baseline_commit (Optional[str]): Git commit hash for baseline comparison
94
- rule_matches_by_rule (RuleMatchMap): Current scan results by rule
95
- all_subprojects (List): List of project and subproject configurations
96
- scanning_root_strings (FrozenSet[Path]): Set of scanning root paths
97
- **kwargs: Additional scan parameters
98
99
Returns:
100
Baseline comparison results and metadata
101
"""
102
```
103
104
### Dependency Analysis
105
106
Functions for dependency-aware rule filtering and dependency resolution.
107
108
```python { .api }
109
def filter_dependency_aware_rules(rules, dependencies):
110
"""
111
Filter rules based on project dependencies.
112
113
Parameters:
114
- rules (list): List of Rule objects to filter
115
- dependencies (dict): Project dependency information
116
117
Returns:
118
list: Filtered rules applicable to the dependencies
119
"""
120
121
def resolve_dependencies(target_manager, config):
122
"""
123
Resolve project dependencies for dependency-aware analysis.
124
125
Parameters:
126
- target_manager (TargetManager): Target file manager
127
- config (Config): Scan configuration
128
129
Returns:
130
dict: Resolved dependency information
131
"""
132
```
133
134
### Utility Functions
135
136
Helper functions for scan optimization and environment setup.
137
138
```python { .api }
139
def adjust_python_recursion_limit(new_limit=None):
140
"""
141
Adjust Python recursion limit for deep scanning operations.
142
143
Parameters:
144
- new_limit (int, optional): New recursion limit, defaults to calculated value
145
146
Returns:
147
int: Previous recursion limit
148
"""
149
```
150
151
## Types
152
153
```python { .api }
154
# Import required types
155
from semgrep.output import OutputHandler, OutputSettings
156
from semgrep.rule_match import RuleMatchMap
157
from semgrep.baseline import BaselineHandler
158
from semgrep.semgrep_types import EngineType
159
from semgrep import semgrep_output_v1 as out
160
from typing import Tuple, List, Dict, Set, Path, Any, Union, Optional
161
162
# Return types for run_scan are complex tuples containing:
163
# - FilteredMatches: Processed rule matches
164
# - List[SemgrepError]: Any errors encountered
165
# - Set[Path]: Files that were processed
166
# - FileTargetingLog: File targeting information
167
# - List[Rule]: Rules that were executed
168
# - ProfileManager: Performance profiling data
169
# - OutputExtra: Additional output metadata
170
# - Collection[out.MatchSeverity]: Severity information
171
# - Dict with dependency information
172
# - Various counts and subproject information
173
```