or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core-analysis.mddata-extraction.mdindex.mdrating-system.md

data-extraction.mddocs/

0

# Data Extraction

1

2

Functions for extracting package metadata from different sources including project directories, distribution files, and PyPI packages. These modules handle the complexities of parsing various packaging formats, build systems, and configuration files.

3

4

## Capabilities

5

6

### Project Directory Analysis

7

8

**pyroma.projectdata Module**

9

10

Functions for extracting metadata from local project directories using modern Python build systems.

11

12

**METADATA_MAP**

13

14

```python { .api }

15

METADATA_MAP: dict

16

"""Mapping of metadata field names between different packaging systems.

17

18

Maps standard setuptools/wheel metadata field names to their pyroma equivalents.

19

Used internally to normalize metadata from different sources like setup.cfg,

20

pyproject.toml, and wheel metadata.

21

22

Example mappings:

23

- 'summary' -> 'description'

24

- 'classifier' -> 'classifiers'

25

- 'home_page' -> 'url'

26

- 'requires_python' -> 'python_requires'

27

"""

28

```

29

30

**get_data(path)**

31

32

```python { .api }

33

def get_data(path):

34

"""Extract package metadata from a project directory.

35

36

Args:

37

path: Absolute path to project directory

38

39

Returns:

40

dict: Package metadata including name, version, description,

41

classifiers, dependencies, and build system information

42

43

Raises:

44

build.BuildException: If project cannot be built or analyzed

45

"""

46

```

47

48

Primary function for project analysis that:

49

50

- Uses Python build system to extract metadata

51

- Supports pyproject.toml, setup.cfg, and setup.py configurations

52

- Handles both isolated and non-isolated build environments

53

- Automatically falls back through different extraction methods

54

- Adds internal flags for missing configuration files

55

56

**build_metadata(path, isolated=None)**

57

58

```python { .api }

59

def build_metadata(path, isolated=None):

60

"""Build wheel metadata using the project's build system.

61

62

Args:

63

path: Project directory path

64

isolated: Whether to use build isolation (None for auto-detection)

65

66

Returns:

67

Metadata object containing package information

68

69

Raises:

70

build.BuildBackendException: If build system fails

71

"""

72

```

73

74

**map_metadata_keys(metadata)**

75

76

```python { .api }

77

def map_metadata_keys(metadata) -> dict:

78

"""Convert metadata object to standardized dictionary format.

79

80

Args:

81

metadata: Metadata object from build system

82

83

Returns:

84

dict: Normalized metadata with standardized key names

85

"""

86

```

87

88

**get_build_data(path, isolated=None)**

89

90

```python { .api }

91

def get_build_data(path, isolated=None):

92

"""Extract package data using modern build system.

93

94

Args:

95

path: Project directory path

96

isolated: Whether to use build isolation

97

98

Returns:

99

dict: Package metadata with build system information

100

"""

101

```

102

103

**get_setupcfg_data(path)**

104

105

```python { .api }

106

def get_setupcfg_data(path):

107

"""Extract metadata from setup.cfg configuration file.

108

109

Args:

110

path: Project directory path

111

112

Returns:

113

dict: Metadata from setup.cfg file

114

115

Raises:

116

Exception: If setup.cfg cannot be parsed

117

"""

118

```

119

120

**get_setuppy_data(path)**

121

122

```python { .api }

123

def get_setuppy_data(path):

124

"""Extract metadata by executing setup.py (fallback method).

125

126

Args:

127

path: Absolute path to project directory

128

129

Returns:

130

dict: Package metadata from setup.py execution

131

132

Note:

133

This is a fallback method for legacy projects and

134

adds '_stoneage_setuppy' flag to indicate usage

135

"""

136

```

137

138

### Distribution File Analysis

139

140

**pyroma.distributiondata Module**

141

142

Functions for analyzing packaged distribution files (tar.gz, zip, wheel, egg).

143

144

**get_data(path)**

145

146

```python { .api }

147

def get_data(path):

148

"""Extract metadata from a distribution file.

149

150

Args:

151

path: Path to distribution file (.tar.gz, .zip, .egg, etc.)

152

153

Returns:

154

dict: Package metadata extracted from distribution

155

156

Raises:

157

ValueError: If file type is not supported

158

tarfile.TarError: If tar file is corrupted

159

zipfile.BadZipFile: If zip file is corrupted

160

"""

161

```

162

163

Analyzes distribution files by:

164

165

- Safely extracting archives to temporary directories

166

- Using project data extraction on extracted contents

167

- Supporting multiple archive formats (tar.gz, tar.bz2, zip, egg)

168

- Protecting against path traversal attacks (CVE-2007-4559)

169

- Cleaning up temporary files automatically

170

171

### PyPI Package Analysis

172

173

**pyroma.pypidata Module**

174

175

Functions for analyzing packages published on PyPI using the PyPI API and XML-RPC interface.

176

177

**get_data(project)**

178

179

```python { .api }

180

def get_data(project):

181

"""Extract metadata from a PyPI package.

182

183

Args:

184

project: PyPI package name

185

186

Returns:

187

dict: Complete package metadata including PyPI-specific information

188

189

Raises:

190

ValueError: If package not found on PyPI or HTTP error occurs

191

requests.RequestException: If network request fails

192

"""

193

```

194

195

Comprehensive PyPI analysis that:

196

197

- Downloads package metadata from PyPI JSON API

198

- Retrieves ownership information via XML-RPC

199

- Downloads and analyzes source distributions when available

200

- Combines PyPI metadata with distribution analysis

201

- Adds PyPI-specific flags (_owners, _has_sdist, _pypi_downloads)

202

203

Usage examples:

204

205

```python

206

# Analyze local project

207

from pyroma.projectdata import get_data

208

data = get_data('/path/to/project')

209

210

# Analyze distribution file

211

from pyroma.distributiondata import get_data

212

data = get_data('package-1.0.tar.gz')

213

214

# Analyze PyPI package

215

from pyroma.pypidata import get_data

216

data = get_data('requests')

217

```

218

219

### Helper Classes

220

221

**FakeContext**

222

223

```python { .api }

224

class FakeContext:

225

"""Context manager for temporarily changing working directory and sys.path.

226

227

Used internally for safe execution of setup.py files without

228

affecting the global Python environment.

229

"""

230

231

def __init__(self, path): ...

232

def __enter__(self): ...

233

def __exit__(self, exc_type, exc_val, exc_tb): ...

234

```

235

236

**SetupMonkey**

237

238

```python { .api }

239

class SetupMonkey:

240

"""Context manager that monkey-patches setup() calls to capture metadata.

241

242

Intercepts calls to distutils.core.setup() and setuptools.setup()

243

to extract package metadata without executing setup commands.

244

"""

245

246

def __enter__(self): ...

247

def __exit__(self, exc_type, exc_val, exc_tb): ...

248

def get_data(self): ...

249

```