Tessl Tile for pypi/pystow@0.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

archives.md cloud-storage.md configuration.md data-formats.md directory-management.md file-operations.md index.md module-class.md nltk-integration.md web-scraping.md

index.mddocs/

0
# PyStow
1

2
PyStow is a Python library that provides a standardized and configurable way to manage data directories for Python applications. It offers a simple API for creating and accessing application-specific data directories in a user's file system, with support for nested directory structures, automatic directory creation, and environment variable-based configuration.
3

4
The library enables developers to easily download, cache, and manage files from the internet with built-in support for various data formats including CSV, RDF, Excel, and compressed archives (ZIP, TAR, LZMA, GZ). It includes functionality for ensuring files are downloaded only once and cached locally, with features for handling tabular data through pandas integration, RDF data through rdflib integration, and provides configurable storage locations that respect both traditional home directory patterns and XDG Base Directory specifications.
5

6
## Package Information
7

8
- **Package Name**: pystow
9
- **Language**: Python
10
- **Installation**: `pip install pystow`
11

12
## Core Imports
13

14
```python
15
import pystow
16

17
# Most common usage patterns
18
module = pystow.module("myapp")
19
path = pystow.join("myapp", "data")
20
data = pystow.ensure_csv("myapp", url="https://example.com/data.csv")
21
```
22

23
## Basic Usage
24

25
### Directory Management
26
```python
27
import pystow
28

29
# Get a module for your application
30
module = pystow.module("myapp")
31

32
# Create nested directories and get paths
33
data_dir = module.join("datasets", "version1")
34
config_path = module.join("config", name="settings.json")
35

36
# Using functional API
37
path = pystow.join("myapp", "data", name="file.txt")
38
```
39

40
### File Download and Caching
41
```python
42
import pystow
43

44
# Download and cache a file
45
path = pystow.ensure(
46
    "myapp", "data",
47
    url="https://example.com/dataset.csv",
48
    name="dataset.csv"
49
)
50

51
# File is automatically cached - subsequent calls return the cached version
52
# Use force=True to re-download
53
path = pystow.ensure(
54
    "myapp", "data",
55
    url="https://example.com/dataset.csv",
56
    name="dataset.csv",
57
    force=True
58
)
59
```
60

61
### Data Format Integration
62
```python
63
import pystow
64
import pandas as pd
65

66
# Download and load CSV as DataFrame
67
df = pystow.ensure_csv(
68
    "myapp", "datasets",
69
    url="https://example.com/data.csv"
70
)
71

72
# Download and parse JSON
73
data = pystow.ensure_json(
74
    "myapp", "config",
75
    url="https://api.example.com/config.json"
76
)
77

78
# Work with compressed files
79
graph = pystow.ensure_rdf(
80
    "myapp", "ontologies",
81
    url="https://example.com/ontology.rdf.gz",
82
    parse_kwargs={"format": "xml"}
83
)
84
```
85

86
## Architecture
87

88
PyStow is built around a modular architecture with two main usage patterns:
89

90
1. **Functional API**: Direct function calls for quick operations (`pystow.ensure()`, `pystow.join()`)
91
2. **Module-based API**: Create Module instances for organized data management (`pystow.module()`)
92

93
The core `Module` class manages directory structures and provides methods for file operations, while the functional API provides convenient shortcuts for common tasks. All operations support:
94

95
- **Configurable base directories** via environment variables
96
- **Version-aware storage** for handling different data versions
97
- **Automatic directory creation** with the `ensure_exists` parameter
98
- **Force re-download capabilities** for cache invalidation
99
- **Flexible data format support** through specialized ensure/load/dump methods
100

101
## Capabilities
102

103
### [Directory Management](./directory-management.md)
104
Core functionality for creating and managing application data directories with configurable storage locations and automatic directory creation.
105

106
```python { .api }
107
def module(key: str, *subkeys: str, ensure_exists: bool = True) -> Module:
108
    """Return a module for the application.
109
    
110
    Args:
111
        key: The name of the module. No funny characters. The envvar <key>_HOME where
112
            key is uppercased is checked first before using the default home directory.
113
        subkeys: A sequence of additional strings to join. If none are given, returns
114
            the directory for this module.  
115
        ensure_exists: Should all directories be created automatically? Defaults to true.
116
    
117
    Returns:
118
        The module object that manages getting and ensuring
119
    """
120

121
def join(key: str, *subkeys: str, name: str | None = None, ensure_exists: bool = True, version: VersionHint = None) -> Path:
122
    """Return the home data directory for the given module.
123
    
124
    Args:
125
        key: The name of the module. No funny characters. The envvar <key>_HOME where
126
            key is uppercased is checked first before using the default home directory.
127
        subkeys: A sequence of additional strings to join
128
        name: The name of the file (optional) inside the folder
129
        ensure_exists: Should all directories be created automatically? Defaults to true.
130
        version: The optional version, or no-argument callable that returns an
131
            optional version. This is prepended before the subkeys.
132
    
133
    Returns:
134
        The path of the directory or subdirectory for the given module.
135
    """
136
```
137

138
### [File Download and Caching](./file-operations.md)
139
Comprehensive file download system with caching, compression support, and cloud storage integration.
140

141
```python { .api }
142
def ensure(key: str, *subkeys: str, url: str, name: str | None = None, version: VersionHint = None, force: bool = False, download_kwargs: Mapping[str, Any] | None = None) -> Path:
143
    """Ensure a file is downloaded.
144
    
145
    Args:
146
        key: The name of the module. No funny characters. The envvar <key>_HOME where
147
            key is uppercased is checked first before using the default home directory.
148
        subkeys: A sequence of additional strings to join. If none are given, returns
149
            the directory for this module.
150
        url: The URL to download.
151
        name: Overrides the name of the file at the end of the URL, if given. Also
152
            useful for URLs that don't have proper filenames with extensions.
153
        version: The optional version, or no-argument callable that returns an
154
            optional version. This is prepended before the subkeys.
155
        force: Should the download be done again, even if the path already exists?
156
            Defaults to false.
157
        download_kwargs: Keyword arguments to pass through to pystow.utils.download.
158
    
159
    Returns:
160
        The path of the file that has been downloaded (or already exists)
161
    """
162
```
163

164
### [Data Format Support](./data-formats.md)
165
Built-in support for common data formats including CSV, JSON, XML, RDF, Excel, and Python objects with pandas and specialized library integration.
166

167
```python { .api }
168
def ensure_csv(key: str, *subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: Mapping[str, Any] | None = None, read_csv_kwargs: Mapping[str, Any] | None = None) -> pd.DataFrame:
169
    """Download a CSV and open as a dataframe with pandas.
170
    
171
    Args:
172
        key: The module name
173
        subkeys: A sequence of additional strings to join. If none are given, returns
174
            the directory for this module.
175
        url: The URL to download.
176
        name: Overrides the name of the file at the end of the URL, if given. Also
177
            useful for URLs that don't have proper filenames with extensions.
178
        force: Should the download be done again, even if the path already exists?
179
            Defaults to false.
180
        download_kwargs: Keyword arguments to pass through to pystow.utils.download.
181
        read_csv_kwargs: Keyword arguments to pass through to pandas.read_csv.
182
    
183
    Returns:
184
        A pandas DataFrame
185
    """
186

187
def ensure_json(key: str, *subkeys: str, url: str, name: str | None = None, force: bool = False, download_kwargs: Mapping[str, Any] | None = None, open_kwargs: Mapping[str, Any] | None = None, json_load_kwargs: Mapping[str, Any] | None = None) -> JSON:
188
    """Download JSON and open with json.
189
    
190
    Args:
191
        key: The module name
192
        subkeys: A sequence of additional strings to join. If none are given, returns
193
            the directory for this module.
194
        url: The URL to download.
195
        name: Overrides the name of the file at the end of the URL, if given. Also
196
            useful for URLs that don't have proper filenames with extensions.
197
        force: Should the download be done again, even if the path already exists?
198
            Defaults to false.
199
        download_kwargs: Keyword arguments to pass through to pystow.utils.download.
200
        open_kwargs: Additional keyword arguments passed to open
201
        json_load_kwargs: Keyword arguments to pass through to json.load.
202
    
203
    Returns:
204
        A JSON object (list, dict, etc.)
205
    """
206
```
207

208
### [Web Scraping](./web-scraping.md)
209
HTML parsing and web content extraction with BeautifulSoup integration for downloading and parsing web pages.
210

211
```python { .api }
212
def ensure_soup(key: str, *subkeys: str, url: str, name: str | None = None, version: VersionHint = None, force: bool = False, download_kwargs: Mapping[str, Any] | None = None, beautiful_soup_kwargs: Mapping[str, Any] | None = None) -> bs4.BeautifulSoup:
213
    """Ensure a webpage is downloaded and parsed with BeautifulSoup.
214
    
215
    Args:
216
        key: The name of the module. No funny characters. The envvar <key>_HOME where
217
            key is uppercased is checked first before using the default home directory.
218
        subkeys: A sequence of additional strings to join. If none are given,
219
            returns the directory for this module.
220
        url: The URL to download.
221
        name: Overrides the name of the file at the end of the URL, if given.
222
            Also useful for URLs that don't have proper filenames with extensions.
223
        version: The optional version, or no-argument callable that returns an
224
            optional version. This is prepended before the subkeys.
225
        force: Should the download be done again, even if the path already
226
            exists? Defaults to false.
227
        download_kwargs: Keyword arguments to pass through to pystow.utils.download.
228
        beautiful_soup_kwargs: Additional keyword arguments passed to BeautifulSoup
229
    
230
    Returns:
231
        An BeautifulSoup object
232
    """
233
```
234

235
### [Archive and Compression](./archives.md)
236
Support for compressed archives including ZIP, TAR, GZIP, LZMA, and BZ2 with automatic extraction and content access.
237

238
```python { .api }
239
def ensure_untar(key: str, *subkeys: str, url: str, name: str | None = None, directory: str | None = None, force: bool = False, download_kwargs: Mapping[str, Any] | None = None, extract_kwargs: Mapping[str, Any] | None = None) -> Path:
240
    """Ensure a file is downloaded and untarred.
241
    
242
    Args:
243
        key: The name of the module. No funny characters. The envvar <key>_HOME where
244
            key is uppercased is checked first before using the default home directory.
245
        subkeys: A sequence of additional strings to join. If none are given, returns
246
            the directory for this module.
247
        url: The URL to download.
248
        name: Overrides the name of the file at the end of the URL, if given. Also
249
            useful for URLs that don't have proper filenames with extensions.
250
        directory: Overrides the name of the directory into which the tar archive is
251
            extracted. If none given, will use the stem of the file name that gets
252
            downloaded.
253
        force: Should the download be done again, even if the path already exists?
254
            Defaults to false.
255
        download_kwargs: Keyword arguments to pass through to pystow.utils.download.
256
        extract_kwargs: Keyword arguments to pass to tarfile.TarFile.extract_all.
257
    
258
    Returns:
259
        The path of the directory where the file that has been downloaded gets
260
        extracted to
261
    """
262
```
263

264
### [Cloud Storage Integration](./cloud-storage.md)
265
Download files from cloud storage services including AWS S3 and Google Drive with authentication support.
266

267
```python { .api }
268
def ensure_from_s3(key: str, *subkeys: str, s3_bucket: str, s3_key: str | Sequence[str], name: str | None = None, force: bool = False, **kwargs: Any) -> Path:
269
    """Ensure a file is downloaded from AWS S3.
270
    
271
    Args:
272
        key: The name of the module. No funny characters. The envvar <key>_HOME where
273
            key is uppercased is checked first before using the default home directory.
274
        subkeys: A sequence of additional strings to join. If none are given, returns
275
            the directory for this module.
276
        s3_bucket: The S3 bucket name
277
        s3_key: The S3 key name
278
        name: Overrides the name of the file at the end of the S3 key, if given.
279
        force: Should the download be done again, even if the path already exists?
280
            Defaults to false.
281
        kwargs: Remaining kwargs to forward to Module.ensure_from_s3.
282
    
283
    Returns:
284
        The path of the file that has been downloaded (or already exists)
285
    """
286
```
287

288
### [Configuration Management](./configuration.md)
289
Environment variable and INI file-based configuration system for storing API keys, URLs, and other settings.
290

291
```python { .api }
292
def get_config(module: str, key: str, *, passthrough: X | None = None, default: X | None = None, dtype: type[X] | None = None, raise_on_missing: bool = False) -> Any:
293
    """Get a configuration value.
294
    
295
    Args:
296
        module: Name of the module (e.g., pybel) to get configuration for
297
        key: Name of the key (e.g., connection)
298
        passthrough: If this is not none, will get returned
299
        default: If the environment and configuration files don't contain anything,
300
            this is returned.
301
        dtype: The datatype to parse out. Can either be int, float,
302
            bool, or str. If none, defaults to str.
303
        raise_on_missing: If true, will raise a value error if no data is found and
304
            no default is given
305
    
306
    Returns:
307
        The config value or the default.
308
    
309
    Raises:
310
        ConfigError: If raise_on_missing conditions are met
311
    """
312

313
def write_config(module: str, key: str, value: str) -> None:
314
    """Write a configuration value.
315
    
316
    Args:
317
        module: The name of the app (e.g., indra)
318
        key: The key of the configuration in the app
319
        value: The value of the configuration in the app
320
    """
321
```
322

323
### [NLTK Integration](./nltk-integration.md)
324
Integration with NLTK (Natural Language Toolkit) for managing linguistic data resources.
325

326
```python { .api }
327
def ensure_nltk(resource: str = "stopwords") -> tuple[Path, bool]:
328
    """Ensure NLTK data is downloaded in a standard way.
329
    
330
    Args:
331
        resource: Name of the resource to download, e.g., stopwords
332
    
333
    Returns:
334
        A pair of the NLTK cache directory and a boolean that says if download was successful
335
    """
336
```
337

338
### [Module Class API](./module-class.md)
339
The core Module class that provides object-oriented interface for data directory management with all file operations as methods.
340

341
```python { .api }
342
class Module:
343
    """The class wrapping the directory lookup implementation."""
344
    
345
    def __init__(self, base: str | Path, ensure_exists: bool = True) -> None:
346
        """Initialize the module.
347
        
348
        Args:
349
            base: The base directory for the module
350
            ensure_exists: Should the base directory be created automatically?
351
                Defaults to true.
352
        """
353
    
354
    @classmethod
355
    def from_key(cls, key: str, *subkeys: str, ensure_exists: bool = True) -> Module:
356
        """Get a module for the given directory or one of its subdirectories.
357
        
358
        Args:
359
            key: The name of the module. No funny characters. The envvar <key>_HOME
360
                where key is uppercased is checked first before using the default home
361
                directory.
362
            subkeys: A sequence of additional strings to join. If none are given,
363
                returns the directory for this module.
364
            ensure_exists: Should all directories be created automatically? Defaults
365
                to true.
366
        
367
        Returns:
368
            A module
369
        """
370
```
371

372
## Type Definitions
373

374
```python { .api }
375
from typing import Union, Optional, Callable, Any
376
from pathlib import Path
377

378
# Version specification type
379
VersionHint = Union[None, str, Callable[[], Optional[str]]]
380

381
# JSON data type
382
JSON = Any
383

384
# File provider function type  
385
Provider = Callable[..., None]
386

387
# HTTP timeout specification
388
TimeoutHint = Union[int, float, None, tuple[Union[float, int], Union[float, int]]]
389
```
390

391
## Exception Classes
392

393
```python { .api }
394
class ConfigError(ValueError):
395
    """Raised when configuration can not be looked up."""
396
    
397
    def __init__(self, module: str, key: str):
398
        """Initialize the configuration error.
399
        
400
        Args:
401
            module: Name of the module, e.g., bioportal
402
            key: Name of the key inside the module, e.g., api_key
403
        """
404
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/