0
# Data Extraction
1
2
Functions for extracting package metadata from different sources including project directories, distribution files, and PyPI packages. These modules handle the complexities of parsing various packaging formats, build systems, and configuration files.
3
4
## Capabilities
5
6
### Project Directory Analysis
7
8
**pyroma.projectdata Module**
9
10
Functions for extracting metadata from local project directories using modern Python build systems.
11
12
**METADATA_MAP**
13
14
```python { .api }
15
METADATA_MAP: dict
16
"""Mapping of metadata field names between different packaging systems.
17
18
Maps standard setuptools/wheel metadata field names to their pyroma equivalents.
19
Used internally to normalize metadata from different sources like setup.cfg,
20
pyproject.toml, and wheel metadata.
21
22
Example mappings:
23
- 'summary' -> 'description'
24
- 'classifier' -> 'classifiers'
25
- 'home_page' -> 'url'
26
- 'requires_python' -> 'python_requires'
27
"""
28
```
29
30
**get_data(path)**
31
32
```python { .api }
33
def get_data(path):
34
"""Extract package metadata from a project directory.
35
36
Args:
37
path: Absolute path to project directory
38
39
Returns:
40
dict: Package metadata including name, version, description,
41
classifiers, dependencies, and build system information
42
43
Raises:
44
build.BuildException: If project cannot be built or analyzed
45
"""
46
```
47
48
Primary function for project analysis that:
49
50
- Uses Python build system to extract metadata
51
- Supports pyproject.toml, setup.cfg, and setup.py configurations
52
- Handles both isolated and non-isolated build environments
53
- Automatically falls back through different extraction methods
54
- Adds internal flags for missing configuration files
55
56
**build_metadata(path, isolated=None)**
57
58
```python { .api }
59
def build_metadata(path, isolated=None):
60
"""Build wheel metadata using the project's build system.
61
62
Args:
63
path: Project directory path
64
isolated: Whether to use build isolation (None for auto-detection)
65
66
Returns:
67
Metadata object containing package information
68
69
Raises:
70
build.BuildBackendException: If build system fails
71
"""
72
```
73
74
**map_metadata_keys(metadata)**
75
76
```python { .api }
77
def map_metadata_keys(metadata) -> dict:
78
"""Convert metadata object to standardized dictionary format.
79
80
Args:
81
metadata: Metadata object from build system
82
83
Returns:
84
dict: Normalized metadata with standardized key names
85
"""
86
```
87
88
**get_build_data(path, isolated=None)**
89
90
```python { .api }
91
def get_build_data(path, isolated=None):
92
"""Extract package data using modern build system.
93
94
Args:
95
path: Project directory path
96
isolated: Whether to use build isolation
97
98
Returns:
99
dict: Package metadata with build system information
100
"""
101
```
102
103
**get_setupcfg_data(path)**
104
105
```python { .api }
106
def get_setupcfg_data(path):
107
"""Extract metadata from setup.cfg configuration file.
108
109
Args:
110
path: Project directory path
111
112
Returns:
113
dict: Metadata from setup.cfg file
114
115
Raises:
116
Exception: If setup.cfg cannot be parsed
117
"""
118
```
119
120
**get_setuppy_data(path)**
121
122
```python { .api }
123
def get_setuppy_data(path):
124
"""Extract metadata by executing setup.py (fallback method).
125
126
Args:
127
path: Absolute path to project directory
128
129
Returns:
130
dict: Package metadata from setup.py execution
131
132
Note:
133
This is a fallback method for legacy projects and
134
adds '_stoneage_setuppy' flag to indicate usage
135
"""
136
```
137
138
### Distribution File Analysis
139
140
**pyroma.distributiondata Module**
141
142
Functions for analyzing packaged distribution files (tar.gz, zip, wheel, egg).
143
144
**get_data(path)**
145
146
```python { .api }
147
def get_data(path):
148
"""Extract metadata from a distribution file.
149
150
Args:
151
path: Path to distribution file (.tar.gz, .zip, .egg, etc.)
152
153
Returns:
154
dict: Package metadata extracted from distribution
155
156
Raises:
157
ValueError: If file type is not supported
158
tarfile.TarError: If tar file is corrupted
159
zipfile.BadZipFile: If zip file is corrupted
160
"""
161
```
162
163
Analyzes distribution files by:
164
165
- Safely extracting archives to temporary directories
166
- Using project data extraction on extracted contents
167
- Supporting multiple archive formats (tar.gz, tar.bz2, zip, egg)
168
- Protecting against path traversal attacks (CVE-2007-4559)
169
- Cleaning up temporary files automatically
170
171
### PyPI Package Analysis
172
173
**pyroma.pypidata Module**
174
175
Functions for analyzing packages published on PyPI using the PyPI API and XML-RPC interface.
176
177
**get_data(project)**
178
179
```python { .api }
180
def get_data(project):
181
"""Extract metadata from a PyPI package.
182
183
Args:
184
project: PyPI package name
185
186
Returns:
187
dict: Complete package metadata including PyPI-specific information
188
189
Raises:
190
ValueError: If package not found on PyPI or HTTP error occurs
191
requests.RequestException: If network request fails
192
"""
193
```
194
195
Comprehensive PyPI analysis that:
196
197
- Downloads package metadata from PyPI JSON API
198
- Retrieves ownership information via XML-RPC
199
- Downloads and analyzes source distributions when available
200
- Combines PyPI metadata with distribution analysis
201
- Adds PyPI-specific flags (_owners, _has_sdist, _pypi_downloads)
202
203
Usage examples:
204
205
```python
206
# Analyze local project
207
from pyroma.projectdata import get_data
208
data = get_data('/path/to/project')
209
210
# Analyze distribution file
211
from pyroma.distributiondata import get_data
212
data = get_data('package-1.0.tar.gz')
213
214
# Analyze PyPI package
215
from pyroma.pypidata import get_data
216
data = get_data('requests')
217
```
218
219
### Helper Classes
220
221
**FakeContext**
222
223
```python { .api }
224
class FakeContext:
225
"""Context manager for temporarily changing working directory and sys.path.
226
227
Used internally for safe execution of setup.py files without
228
affecting the global Python environment.
229
"""
230
231
def __init__(self, path): ...
232
def __enter__(self): ...
233
def __exit__(self, exc_type, exc_val, exc_tb): ...
234
```
235
236
**SetupMonkey**
237
238
```python { .api }
239
class SetupMonkey:
240
"""Context manager that monkey-patches setup() calls to capture metadata.
241
242
Intercepts calls to distutils.core.setup() and setuptools.setup()
243
to extract package metadata without executing setup commands.
244
"""
245
246
def __enter__(self): ...
247
def __exit__(self, exc_type, exc_val, exc_tb): ...
248
def get_data(self): ...
249
```