0
# Semgrep
1
2
Semgrep is a fast, open-source static analysis tool that searches code, finds bugs, and enforces secure guardrails and coding standards across 30+ programming languages. It provides semantic code search capabilities that go beyond simple string matching, allowing developers to write rules that look like the code they want to find.
3
4
## Package Information
5
6
- **Package Name**: semgrep
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install semgrep`
10
11
## Core Imports
12
13
```python
14
import semgrep
15
```
16
17
For programmatic scanning:
18
19
```python
20
from semgrep.run_scan import run_scan, run_scan_and_return_json
21
from semgrep.config_resolver import get_config, Config
22
```
23
24
For result processing:
25
26
```python
27
from semgrep.rule_match import RuleMatch, RuleMatches
28
from semgrep.rule import Rule
29
```
30
31
## Basic Usage
32
33
```python
34
from pathlib import Path
35
from semgrep.run_scan import run_scan_and_return_json
36
from semgrep.config_resolver import get_config
37
from semgrep.output import OutputSettings
38
39
# Configure a scan with rules
40
config, errors = get_config(
41
pattern=None, # Use specific rules instead of pattern
42
lang=None, # Detect languages automatically
43
config_strs=["p/security-audit"], # Use predefined ruleset
44
project_url=None,
45
no_rewrite_rule_ids=False,
46
replacement=None
47
)
48
49
if errors:
50
print(f"Configuration errors: {errors}")
51
exit(1)
52
53
# Write config to temporary file (required by API)
54
config_path = Path("/tmp/semgrep_config.yml")
55
# Note: In practice, you would need to serialize the config to YAML
56
57
# Run scan and get JSON results
58
results = run_scan_and_return_json(
59
config=config_path,
60
scanning_roots=[Path(".")], # Scan current directory
61
output_settings=OutputSettings()
62
)
63
64
# Process results
65
if isinstance(results, dict):
66
for rule_match in results.get('results', []):
67
print(f"Rule: {rule_match['check_id']}")
68
print(f"File: {rule_match['path']}")
69
print(f"Message: {rule_match['message']}")
70
print(f"Severity: {rule_match['extra']['severity']}")
71
```
72
73
## Architecture
74
75
Semgrep's architecture consists of several key components that work together to provide comprehensive static analysis:
76
77
- **Core Engine**: The `semgrep-core` binary (written in OCaml) handles pattern matching and semantic analysis
78
- **Python CLI**: Provides user interface, configuration management, and result processing
79
- **Rule System**: YAML-based rules that define patterns to search for in code
80
- **Target Management**: File discovery, filtering, and processing pipeline
81
- **Output Formatters**: Multiple output formats for different tools and workflows
82
- **CI/CD Integration**: Built-in support for various continuous integration platforms
83
84
This design allows semgrep to efficiently analyze large codebases while providing flexible configuration and integration options for different development workflows.
85
86
## Capabilities
87
88
### Core Scanning Engine
89
90
Main scanning functionality for running semgrep analysis on codebases, including baseline scanning, dependency-aware analysis, and result processing.
91
92
```python { .api }
93
def run_scan(target_manager, config, **kwargs): ...
94
def run_scan_and_return_json(target_manager, config, **kwargs): ...
95
def baseline_run(baseline_handler, **kwargs): ...
96
```
97
98
[Core Scanning](./core-scanning.md)
99
100
### Configuration Management
101
102
Tools for loading, validating, and managing semgrep configurations from various sources including local files, registries, and cloud platforms.
103
104
```python { .api }
105
class Config: ...
106
class ConfigLoader: ...
107
def get_config(pattern, lang, configs, **kwargs): ...
108
def resolve_config(config_strings): ...
109
```
110
111
[Configuration](./configuration.md)
112
113
### Rule and Match Processing
114
115
Classes and functions for working with semgrep rules and processing scan results, including rule validation and match filtering.
116
117
```python { .api }
118
class Rule: ...
119
class RuleMatch: ...
120
class RuleMatches: ...
121
def validate_single_rule(rule_dict): ...
122
```
123
124
[Rules and Matches](./rules-matches.md)
125
126
### Output and Formatting
127
128
Comprehensive output formatting system supporting multiple formats including JSON, SARIF, text, and XML for integration with various tools and workflows.
129
130
```python { .api }
131
class OutputHandler: ...
132
class OutputSettings: ...
133
class JsonFormatter: ...
134
class SarifFormatter: ...
135
```
136
137
[Output Formatting](./output-formatting.md)
138
139
### Error Handling System
140
141
Exception hierarchy and error management functions for handling various types of errors that can occur during scanning and rule processing.
142
143
```python { .api }
144
class SemgrepError(Exception): ...
145
class SemgrepCoreError(SemgrepError): ...
146
class InvalidRuleSchemaError(SemgrepError): ...
147
def select_real_errors(errors): ...
148
```
149
150
[Error Handling](./error-handling.md)
151
152
### CI/CD Integration
153
154
Classes and utilities for integrating semgrep into various continuous integration and deployment platforms with automatic metadata detection.
155
156
```python { .api }
157
class GitMeta: ...
158
class GithubMeta(GitMeta): ...
159
class GitlabMeta(GitMeta): ...
160
class CircleCIMeta(GitMeta): ...
161
```
162
163
[CI/CD Integration](./cicd-integration.md)
164
165
### Target Management
166
167
File discovery, filtering, and processing system for managing scan targets with support for language detection and exclusion patterns.
168
169
```python { .api }
170
class TargetManager: ...
171
class ScanningRoot: ...
172
class Target: ...
173
class FilteredFiles: ...
174
```
175
176
[Target Management](./target-management.md)