Tessl Tile for pypi/scanpy@1.11.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

analysis-tools.md data-io.md datasets.md external-tools.md index.md preprocessing.md queries.md spatial-analysis.md utilities.md visualization.md

utilities.mddocs/

0
# Utilities and Settings
1

2
Scanpy provides various utility functions, configuration options, and helper tools for managing analysis workflows, extracting data, and configuring the analysis environment.
3

4
## Capabilities
5

6
### Global Settings and Configuration
7

8
Configure scanpy's behavior and matplotlib plotting parameters.
9

10
```python { .api }
11
# Global settings object
12
settings: ScanpyConfig
13

14
class ScanpyConfig:
15
    """Global scanpy configuration object."""
16
    
17
    # Core settings
18
    verbosity: int = 1  # Logging verbosity level (0-5)
19
    n_jobs: int = 1     # Number of parallel jobs (-1 for all cores)
20
    
21
    # Data settings  
22
    max_memory: str = '2G'  # Maximum memory for operations
23
    n_pcs: int = 50         # Default number of PCs
24
    
25
    # Figure settings
26
    figdir: str = './figures/'  # Default figure output directory
27
    file_format_figs: str = 'pdf'  # Default figure format
28
    dpi: int = 80              # Default DPI for figures
29
    dpi_save: int = 150        # DPI for saved figures
30
    transparent: bool = False   # Transparent backgrounds
31
    
32
    # Cache settings
33
    cache_compression: str = 'lzf'  # Compression for cached files
34
    
35
    def set_figure_params(self, dpi=80, dpi_save=150, transparent=False, fontsize=14, color_map='viridis', format='pdf', facecolor='white', **kwargs):
36
        """
37
        Set matplotlib figure parameters.
38
        
39
        Parameters:
40
        - dpi (int): Resolution for display
41
        - dpi_save (int): Resolution for saved figures
42
        - transparent (bool): Transparent background
43
        - fontsize (int): Base font size
44
        - color_map (str): Default colormap
45
        - format (str): Default save format
46
        - facecolor (str): Figure background color
47
        - **kwargs: Additional matplotlib rcParams
48
        """
49
```
50

51
### Data Extraction Utilities
52

53
Extract and manipulate data from AnnData objects.
54

55
```python { .api }
56
def obs_df(adata, keys=None, obsm_keys=None, layer=None, gene_symbols=None, use_raw=False):
57
    """
58
    Extract observation metadata as pandas DataFrame.
59
    
60
    Parameters:
61
    - adata (AnnData): Annotated data object
62
    - keys (list, optional): Keys from obs to include
63
    - obsm_keys (list, optional): Keys from obsm to include  
64
    - layer (str, optional): Layer to extract data from
65
    - gene_symbols (str, optional): Gene symbols key
66
    - use_raw (bool): Use raw data
67
    
68
    Returns:
69
    DataFrame: Observation data with requested keys
70
    """
71

72
def var_df(adata, keys=None, varm_keys=None, layer=None):
73
    """
74
    Extract variable metadata as pandas DataFrame.
75
    
76
    Parameters:
77
    - adata (AnnData): Annotated data object
78
    - keys (list, optional): Keys from var to include
79
    - varm_keys (list, optional): Keys from varm to include
80
    - layer (str, optional): Layer to extract data from
81
    
82
    Returns:
83
    DataFrame: Variable data with requested keys
84
    """
85

86
def rank_genes_groups_df(adata, group=None, key='rank_genes_groups', pval_cutoff=None, log2fc_min=None, log2fc_max=None, gene_symbols=None):
87
    """
88
    Extract ranked genes results as pandas DataFrame.
89
    
90
    Parameters:
91
    - adata (AnnData): Annotated data object
92
    - group (str, optional): Specific group to extract
93
    - key (str): Key for ranked genes results
94
    - pval_cutoff (float, optional): P-value cutoff
95
    - log2fc_min (float, optional): Minimum log2 fold change
96
    - log2fc_max (float, optional): Maximum log2 fold change
97
    - gene_symbols (str, optional): Gene symbols key
98
    
99
    Returns:
100
    DataFrame: Ranked genes with statistics
101
    """
102

103
def aggregate(adata, by, func='mean', layer=None, obsm=None, varm=None):
104
    """
105
    Aggregate observations by grouping variable.
106
    
107
    Parameters:
108
    - adata (AnnData): Annotated data object
109
    - by (str): Key in obs for grouping
110
    - func (str or callable): Aggregation function
111
    - layer (str, optional): Layer to aggregate
112
    - obsm (str, optional): Obsm key to aggregate
113
    - varm (str, optional): Varm key to aggregate
114
    
115
    Returns:
116
    AnnData: Aggregated data object
117
    """
118
```
119

120
### Internal Data Access Utilities
121

122
Low-level utilities for accessing AnnData representations.
123

124
```python { .api }
125
def _get_obs_rep(adata, use_rep=None, n_pcs=None, use_raw=False, layer=None, obsm=None, obsp=None):
126
    """
127
    Get observation representation for analysis.
128
    
129
    Parameters:
130
    - adata (AnnData): Annotated data object
131
    - use_rep (str, optional): Representation key in obsm
132
    - n_pcs (int, optional): Number of PCs if using PCA
133
    - use_raw (bool): Use raw data
134
    - layer (str, optional): Layer to use
135
    - obsm (str, optional): Obsm key
136
    - obsp (str, optional): Obsp key
137
    
138
    Returns:
139
    array: Data representation
140
    """
141

142
def _set_obs_rep(adata, X_new, use_rep=None, n_pcs=None, layer=None, obsm=None):
143
    """
144
    Set observation representation in AnnData.
145
    
146
    Parameters:
147
    - adata (AnnData): Annotated data object
148
    - X_new (array): New data representation
149
    - use_rep (str, optional): Representation key
150
    - n_pcs (int, optional): Number of PCs
151
    - layer (str, optional): Layer key
152
    - obsm (str, optional): Obsm key
153
    """
154

155
def _check_mask(adata, mask_var, mask_obs=None):
156
    """
157
    Validate and process mask for subsetting.
158
    
159
    Parameters:
160
    - adata (AnnData): Annotated data object
161
    - mask_var (array or str): Variable mask
162
    - mask_obs (array or str, optional): Observation mask
163
    
164
    Returns:
165
    tuple: Processed masks
166
    """
167
```
168

169
### Logging and Verbosity
170

171
Control logging output and verbosity levels.
172

173
```python { .api }
174
def print_versions():
175
    """
176
    Print version information for scanpy and dependencies.
177
    
178
    Returns:
179
    None: Prints version information to stdout
180
    """
181

182
# Logging levels
183
CRITICAL: int = 50
184
ERROR: int = 40  
185
WARNING: int = 30
186
INFO: int = 20
187
DEBUG: int = 10
188
HINT: int = 15  # Custom level between INFO and DEBUG
189

190
# Verbosity levels
191
class Verbosity:
192
    """Verbosity level enumeration."""
193
    error: int = 0
194
    warn: int = 1
195
    info: int = 2
196
    hint: int = 3
197
    debug: int = 4
198
    trace: int = 5
199
```
200

201
### Memory and Performance Utilities
202

203
Tools for managing memory usage and performance.
204

205
```python { .api }
206
def memory_usage():
207
    """
208
    Get current memory usage.
209
    
210
    Returns:
211
    str: Memory usage information
212
    """
213

214
def check_versions():
215
    """
216
    Check versions of key dependencies.
217
    
218
    Returns:
219
    None: Prints warnings for version issues
220
    """
221
```
222

223
### File and Path Utilities
224

225
Utilities for working with files and paths.
226

227
```python { .api }
228
def _check_datasetdir_exists():
229
    """Check if dataset directory exists."""
230
    
231
def _get_filename_from_key(key):
232
    """Generate filename from key."""
233
    
234
def _doc_params(**kwds):
235
    """Decorator for parameter documentation."""
236
```
237

238
### Plotting Configuration
239

240
Configure matplotlib and plotting behavior.
241

242
```python { .api }
243
def set_figure_params(scanpy=True, dpi=80, dpi_save=150, transparent=False, fontsize=14, color_map='viridis', format='pdf', facecolor='white', **kwargs):
244
    """
245
    Set global figure parameters for matplotlib.
246
    
247
    Parameters:
248
    - scanpy (bool): Use scanpy-specific settings
249
    - dpi (int): Display resolution
250
    - dpi_save (int): Save resolution  
251
    - transparent (bool): Transparent background
252
    - fontsize (int): Base font size
253
    - color_map (str): Default colormap
254
    - format (str): Default save format
255
    - facecolor (str): Figure background color
256
    - **kwargs: Additional rcParams
257
    """
258

259
def reset_rcParams():
260
    """Reset matplotlib rcParams to defaults."""
261
```
262

263
### Constants and Enumerations
264

265
Important constants used throughout scanpy.
266

267
```python { .api }
268
# Default number of PCs
269
N_PCS: int = 50
270

271
# Default number of diffusion components  
272
N_DCS: int = 15
273

274
# File format constants
275
FIGDIR_DEFAULT: str = './figures/'
276
FORMAT_DEFAULT: str = 'pdf'
277

278
# Cache settings
279
CACHE_DEFAULT: str = './cache/'
280
```
281

282
## Usage Examples
283

284
### Configuring Scanpy Settings
285

286
```python
287
import scanpy as sc
288

289
# Set verbosity level
290
sc.settings.verbosity = 3  # hint level
291

292
# Configure parallel processing
293
sc.settings.n_jobs = -1  # use all available cores
294

295
# Set figure parameters
296
sc.settings.set_figure_params(
297
    dpi=100, 
298
    dpi_save=300,
299
    fontsize=12,
300
    color_map='plasma',
301
    format='png',
302
    transparent=True
303
)
304

305
# Set output directory
306
sc.settings.figdir = './my_figures/'
307

308
# Check current settings
309
print(f"Verbosity: {sc.settings.verbosity}")
310
print(f"N jobs: {sc.settings.n_jobs}")
311
print(f"Figure dir: {sc.settings.figdir}")
312
```
313

314
### Data Extraction and Analysis
315

316
```python
317
# Extract observation data with specific columns
318
obs_data = sc.get.obs_df(adata, keys=['total_counts', 'n_genes', 'leiden'])
319
print(obs_data.head())
320

321
# Get ranked genes as DataFrame
322
marker_genes = sc.get.rank_genes_groups_df(adata, group='0')
323
top_genes = marker_genes.head(20)
324

325
# Extract variable information
326
var_data = sc.get.var_df(adata, keys=['highly_variable', 'dispersions'])
327

328
# Aggregate data by clusters
329
adata_agg = sc.get.aggregate(adata, by='leiden', func='mean')
330
print(f"Aggregated to {adata_agg.n_obs} pseudo-bulk samples")
331
```
332

333
### Working with Different Data Representations
334

335
```python
336
# Get PCA representation
337
X_pca = sc.get._get_obs_rep(adata, use_rep='X_pca', n_pcs=30)
338
print(f"PCA shape: {X_pca.shape}")
339

340
# Get UMAP representation
341
X_umap = sc.get._get_obs_rep(adata, use_rep='X_umap')
342
print(f"UMAP shape: {X_umap.shape}")
343

344
# Get raw data representation
345
X_raw = sc.get._get_obs_rep(adata, use_raw=True)
346
print(f"Raw data shape: {X_raw.shape}")
347
```
348

349
### Environment and Version Information
350

351
```python
352
# Print comprehensive version information
353
sc.logging.print_versions()
354

355
# Check for version compatibility issues
356
sc._utils.check_versions()
357

358
# Print memory usage
359
print(f"Current memory usage: {sc._utils.memory_usage()}")
360
```
361

362
### Advanced Configuration
363

364
```python
365
# Custom matplotlib configuration
366
sc.pl.set_rcParams_scanpy(fontsize=10, color_map='viridis')
367

368
# Reset to defaults
369
sc.pl.set_rcParams_defaults()
370

371
# Fine-grained matplotlib control
372
import matplotlib.pyplot as plt
373
plt.rcParams['figure.figsize'] = (8, 6)
374
plt.rcParams['axes.grid'] = True
375
plt.rcParams['grid.alpha'] = 0.3
376

377
# Apply custom color palette
378
import seaborn as sns
379
custom_palette = sns.color_palette("husl", 8)
380
sc.pl.palettes.default_20 = custom_palette
381
```
382

383
### Performance Optimization
384

385
```python
386
# Configure for large datasets
387
sc.settings.max_memory = '16G'  # Set memory limit
388
sc.settings.n_jobs = 8         # Limit parallel jobs
389
sc.settings.verbosity = 1      # Reduce logging overhead
390

391
# Enable caching for repeated operations
392
sc.settings.cachedir = '/tmp/scanpy_cache/'
393

394
# Use chunked operations for large matrices
395
sc.pp.scale(adata, chunked=True, chunk_size=1000)
396
```
397

398
### Custom Analysis Workflows
399

400
```python
401
def run_standard_analysis(adata, resolution=0.5, n_pcs=50):
402
    """Custom analysis function using scanpy utilities."""
403
    
404
    # Configure for this analysis
405
    original_verbosity = sc.settings.verbosity
406
    sc.settings.verbosity = 2
407
    
408
    try:
409
        # Preprocessing
410
        sc.pp.filter_cells(adata, min_genes=200)
411
        sc.pp.filter_genes(adata, min_cells=3)
412
        sc.pp.normalize_total(adata, target_sum=1e4)
413
        sc.pp.log1p(adata)
414
        
415
        # Analysis
416
        sc.pp.highly_variable_genes(adata)
417
        adata.raw = adata
418
        adata = adata[:, adata.var.highly_variable]
419
        sc.pp.scale(adata)
420
        sc.pp.pca(adata, n_comps=n_pcs)
421
        sc.pp.neighbors(adata)
422
        sc.tl.umap(adata)
423
        sc.tl.leiden(adata, resolution=resolution)
424
        
425
        # Extract results
426
        results = {
427
            'clusters': sc.get.obs_df(adata, keys=['leiden']),
428
            'embedding': sc.get._get_obs_rep(adata, use_rep='X_umap'),
429
            'n_clusters': len(adata.obs['leiden'].unique())
430
        }
431
        
432
        return adata, results
433
        
434
    finally:
435
        # Restore original settings
436
        sc.settings.verbosity = original_verbosity
437

438
# Run analysis
439
adata_processed, analysis_results = run_standard_analysis(adata)
440
print(f"Found {analysis_results['n_clusters']} clusters")
441
```
442

443
### Debugging and Troubleshooting
444

445
```python
446
# Enable debug logging
447
sc.settings.verbosity = 4  # debug level
448

449
# Check data integrity
450
def check_adata_integrity(adata):
451
    """Check AnnData object for common issues."""
452
    print(f"Shape: {adata.shape}")
453
    print(f"Data type: {adata.X.dtype}")
454
    print(f"Sparse: {scipy.sparse.issparse(adata.X)}")
455
    print(f"NaN values: {np.isnan(adata.X.data).sum() if scipy.sparse.issparse(adata.X) else np.isnan(adata.X).sum()}")
456
    print(f"Negative values: {(adata.X.data < 0).sum() if scipy.sparse.issparse(adata.X) else (adata.X < 0).sum()}")
457
    
458
    # Check for common issues
459
    if adata.obs.index.duplicated().any():
460
        print("WARNING: Duplicate observation names found")
461
    if adata.var.index.duplicated().any():
462
        print("WARNING: Duplicate variable names found")
463

464
check_adata_integrity(adata)
465

466
# Memory profiling for large operations
467
import time
468
start_time = time.time()
469
start_memory = sc._utils.memory_usage()
470

471
# Your analysis here
472
sc.pp.neighbors(adata, n_neighbors=15)
473

474
end_time = time.time()
475
end_memory = sc._utils.memory_usage()
476

477
print(f"Operation took {end_time - start_time:.2f} seconds")
478
print(f"Memory before: {start_memory}")  
479
print(f"Memory after: {end_memory}")
480
```
481

482
## Configuration Files
483

484
### Setting up scanpy configuration
485

486
```python
487
# Create configuration file (~/.scanpy/config.yaml)
488
import os
489
import yaml
490

491
config_dir = os.path.expanduser('~/.scanpy')
492
os.makedirs(config_dir, exist_ok=True)
493

494
config = {
495
    'verbosity': 2,
496
    'n_jobs': -1,
497
    'figdir': './figures/',
498
    'file_format_figs': 'pdf',
499
    'dpi_save': 300,
500
    'transparent': True
501
}
502

503
with open(os.path.join(config_dir, 'config.yaml'), 'w') as f:
504
    yaml.dump(config, f)
505
```
506

507
## Best Practices
508

509
### Settings Management
510

511
1. **Consistent Configuration**: Set global parameters at the start of analysis
512
2. **Resource Management**: Configure `n_jobs` and `max_memory` based on system
513
3. **Reproducibility**: Set random seeds and document settings used
514
4. **Output Management**: Organize figure output with descriptive directories
515

516
### Performance Tips
517

518
1. **Memory Efficiency**: Use appropriate data types and sparse matrices
519
2. **Parallel Processing**: Enable multiprocessing for CPU-intensive operations  
520
3. **Chunked Operations**: Use chunked processing for very large datasets
521
4. **Caching**: Enable caching for repeated computations
522

523
### Debugging
524

525
1. **Logging Levels**: Use appropriate verbosity for development vs production
526
2. **Data Validation**: Check data integrity before analysis
527
3. **Version Tracking**: Document software versions for reproducibility
528
4. **Error Handling**: Implement proper error handling in custom workflows

Version

Tile

Files

utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utilities.mddocs/