Tessl Tile for pypi/ydata-profiling@4.16.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

analysis-components.md configuration.md console-interface.md core-profiling.md index.md pandas-integration.md report-comparison.md

configuration.mddocs/

0
# Configuration
1

2
Comprehensive configuration system for customizing analysis depth, statistical computations, visualizations, and report output formats. The configuration system provides fine-grained control over every aspect of the profiling process.
3

4
## Capabilities
5

6
### Settings Class
7

8
Main configuration class providing comprehensive control over profiling behavior and report generation.
9

10
```python { .api }
11
class Settings:
12
    def __init__(self, **kwargs):
13
        """
14
        Initialize Settings with configuration parameters.
15

16
        Parameters:
17
        - **kwargs: configuration parameters for various analysis components
18
        """
19
    
20
    # Core configuration sections
21
    dataset: DatasetConfig
22
    variables: VariablesConfig  
23
    correlations: CorrelationsConfig
24
    interactions: InteractionsConfig
25
    plot: PlotConfig
26
    html: HtmlConfig
27
    style: StyleConfig
28
    
29
    # Global settings
30
    title: str = "Profiling Report"
31
    pool_size: int = 0
32
    progress_bar: bool = True
33
    lazy: bool = True
34
```
35

36
**Usage Example:**
37

38
```python
39
from ydata_profiling import ProfileReport
40
from ydata_profiling.config import Settings
41

42
# Create custom configuration
43
config = Settings()
44
config.title = "Custom Dataset Analysis"
45
config.pool_size = 4
46
config.progress_bar = True
47

48
# Apply configuration to report
49
report = ProfileReport(df, config=config)
50
report.to_file("custom_report.html")
51
```
52

53
### Configuration Loading
54

55
Load configuration from files or preset configurations.
56

57
```python { .api }
58
class Config:
59
    @staticmethod
60
    def get_config(config_file: Optional[Union[str, Path]] = None) -> Settings:
61
        """
62
        Load configuration from file or return default configuration.
63

64
        Parameters:
65
        - config_file: path to YAML configuration file
66

67
        Returns:
68
        Settings object with loaded configuration
69
        """
70
```
71

72
**Usage Example:**
73

74
```python
75
from ydata_profiling.config import Config
76
from ydata_profiling import ProfileReport
77

78
# Load from configuration file
79
config = Config.get_config("my_config.yaml")
80
report = ProfileReport(df, config=config)
81

82
# Use preset configurations
83
minimal_report = ProfileReport(df, minimal=True)
84
explorative_report = ProfileReport(df, explorative=True)
85
sensitive_report = ProfileReport(df, sensitive=True)
86
```
87

88
### Dataset Configuration
89

90
Configuration for dataset-level metadata and processing options.
91

92
```python { .api }
93
class DatasetConfig:
94
    """Configuration for dataset-level settings."""
95
    
96
    # Dataset metadata
97
    description: str = ""
98
    creator: str = ""
99
    author: str = ""
100
    copyright_holder: str = ""
101
    copyright_year: str = ""
102
    url: str = ""
103
    
104
    # Processing options
105
    sample: Optional[dict] = None
106
    duplicates: Optional[dict] = None
107
```
108

109
**Usage Example:**
110

111
```python
112
config = Settings()
113
config.dataset.description = "Customer transaction data for Q4 2023"
114
config.dataset.creator = "Data Science Team"
115
config.dataset.author = "John Doe"
116

117
report = ProfileReport(df, config=config)
118
```
119

120
### Variables Configuration
121

122
Configuration for variable-specific analysis settings across different data types.
123

124
```python { .api }
125
class VariablesConfig:
126
    """Configuration for variable-specific analysis."""
127
    
128
    # Variable type configurations
129
    descriptions: dict = {}
130
    
131
    # Type-specific settings
132
    num: NumVarsConfig
133
    cat: CatVarsConfig
134
    bool: BoolVarsConfig
135
    text: TextVarsConfig
136
    file: FileVarsConfig
137
    path: PathVarsConfig
138
    image: ImageVarsConfig
139
    url: UrlVarsConfig
140
    timeseries: TimeseriesVarsConfig
141
```
142

143
```python { .api }
144
class NumVarsConfig:
145
    """Numeric variables configuration."""
146
    
147
    low_categorical_threshold: int = 5
148
    chi_squared_threshold: float = 0.999
149
    skewness_threshold: int = 20
150
    kurtosis_threshold: int = 20
151
    
152
class CatVarsConfig:
153
    """Categorical variables configuration."""
154
    
155
    length: bool = True
156
    characters: bool = True
157
    words: bool = True
158
    cardinality_threshold: int = 50
159
    
160
class TextVarsConfig:
161
    """Text variables configuration."""
162
    
163
    length: bool = True
164
    characters: bool = True  
165
    words: bool = True
166
    redact: bool = False
167
```
168

169
**Usage Example:**
170

171
```python
172
config = Settings()
173

174
# Configure numeric variables
175
config.variables.num.low_categorical_threshold = 10
176
config.variables.num.skewness_threshold = 15
177

178
# Configure categorical variables  
179
config.variables.cat.cardinality_threshold = 100
180
config.variables.cat.length = True
181

182
# Configure text variables
183
config.variables.text.redact = True  # Hide sensitive text
184

185
report = ProfileReport(df, config=config)
186
```
187

188
### Correlation Configuration
189

190
Configuration for correlation analysis and visualization.
191

192
```python { .api }
193
class CorrelationsConfig:
194
    """Configuration for correlation analysis."""
195
    
196
    pearson: CorrelationConfig
197
    spearman: CorrelationConfig
198
    kendall: CorrelationConfig
199
    cramers: CorrelationConfig
200
    phik: CorrelationConfig
201
    auto: CorrelationConfig
202

203
class CorrelationConfig:
204
    """Individual correlation method configuration."""
205
    
206
    calculate: bool = True
207
    warn_high_cardinality: bool = True
208
    threshold: float = 0.9
209
```
210

211
**Usage Example:**
212

213
```python
214
config = Settings()
215

216
# Enable/disable specific correlation methods
217
config.correlations.pearson.calculate = True
218
config.correlations.spearman.calculate = True
219
config.correlations.kendall.calculate = False
220

221
# Set correlation thresholds
222
config.correlations.pearson.threshold = 0.8
223
config.correlations.auto.warn_high_cardinality = True
224

225
report = ProfileReport(df, config=config)
226
```
227

228
### Plot Configuration
229

230
Configuration for visualizations and plotting options.
231

232
```python { .api }
233
class PlotConfig:
234
    """Configuration for plot generation."""
235
    
236
    # Plot settings
237
    histogram: dict = {}
238
    correlation: dict = {}
239
    missing: dict = {}
240
    
241
    # Image settings  
242
    dpi: int = 800
243
    image_format: str = "svg"
244
```
245

246
**Usage Example:**
247

248
```python
249
config = Settings()
250

251
# Configure plot settings
252
config.plot.dpi = 300
253
config.plot.image_format = "png"
254

255
# Configure histogram settings
256
config.plot.histogram = {
257
    "bins": 50,
258
    "max_bins": 250
259
}
260

261
# Configure correlation plots
262
config.plot.correlation = {
263
    "cmap": "RdYlBu_r",
264
    "bad": "#000000"
265
}
266

267
report = ProfileReport(df, config=config)
268
```
269

270
### HTML Configuration
271

272
Configuration for HTML report generation and styling.
273

274
```python { .api }
275
class HtmlConfig:
276
    """Configuration for HTML report generation."""
277
    
278
    # Report structure
279
    minify_html: bool = True
280
    use_local_assets: bool = True
281
    inline: bool = True
282
    
283
    # Navigation and layout
284
    navbar_show: bool = True
285
    full_width: bool = False
286
    
287
    # Content sections
288
    style: dict = {}
289
```
290

291
**Usage Example:**
292

293
```python
294
config = Settings()
295

296
# Configure HTML output
297
config.html.minify_html = False  # Keep HTML readable
298
config.html.full_width = True    # Use full browser width
299
config.html.navbar_show = True   # Show navigation bar
300

301
# Custom styling
302
config.html.style = {
303
    "primary_color": "#337ab7",
304
    "logo": "https://company.com/logo.png"
305
}
306

307
report = ProfileReport(df, config=config)
308
```
309

310
### Spark Configuration
311

312
Configuration for Spark DataFrame processing.
313

314
```python { .api }
315
class SparkSettings:
316
    def __init__(self, **kwargs):
317
        """
318
        Initialize Spark-specific configuration.
319

320
        Parameters:
321
        - **kwargs: Spark configuration parameters
322
        """
323
    
324
    # Spark-specific settings
325
    executor_memory: str = "2g"
326
    executor_cores: int = 2
327
    max_result_size: str = "1g"
328
```
329

330
**Usage Example:**
331

332
```python
333
from ydata_profiling.config import SparkSettings
334
from ydata_profiling import ProfileReport
335

336
# Configure Spark settings
337
spark_config = SparkSettings()
338
spark_config.executor_memory = "4g"
339
spark_config.executor_cores = 4
340

341
# Use with Spark DataFrame
342
from pyspark.sql import SparkSession
343
spark = SparkSession.builder.appName("Profiling").getOrCreate()
344
spark_df = spark.read.csv("large_dataset.csv", header=True, inferSchema=True)
345

346
report = ProfileReport(spark_df, config=spark_config)
347
```
348

349
### Configuration Files
350

351
YAML configuration file format for persistent settings.
352

353
**Example Configuration File (`config.yaml`):**
354

355
```yaml
356
title: "Production Data Report"
357
pool_size: 8
358
progress_bar: true
359

360
dataset:
361
  description: "Customer transaction dataset"
362
  creator: "Data Engineering Team"
363

364
variables:
365
  num:
366
    low_categorical_threshold: 10
367
    skewness_threshold: 20
368
  cat:
369
    cardinality_threshold: 50
370
  text:
371
    redact: false
372

373
correlations:
374
  pearson:
375
    calculate: true
376
    threshold: 0.9
377
  spearman:
378
    calculate: true
379
  kendall:
380
    calculate: false
381

382
plot:
383
  dpi: 300
384
  image_format: "png"
385

386
html:
387
  minify_html: true
388
  full_width: false
389
```
390

391
**Usage with Configuration File:**
392

393
```python
394
from ydata_profiling import ProfileReport
395

396
# Load configuration from file
397
report = ProfileReport(df, config_file="config.yaml")
398
report.to_file("production_report.html")
399
```
400

401
### SparkSettings Class
402

403
Specialized configuration class optimized for Spark DataFrames with performance-focused defaults.
404

405
```python { .api }
406
class SparkSettings(Settings):
407
    """
408
    Specialized Settings class for Spark DataFrames with optimized configurations.
409
    
410
    Inherits from Settings but with performance-focused defaults that disable
411
    computationally expensive operations for large-scale Spark datasets.
412
    """
413
    
414
    # Performance optimizations
415
    infer_dtypes: bool = False
416
    correlations: Dict[str, bool] = {
417
        "spearman": True,
418
        "pearson": True,
419
        "auto": False,  # Disabled for performance
420
        "phi_k": False,
421
        "cramers": False,
422
        "kendall": False
423
    }
424
    
425
    # Disabled heavy computations
426
    interactions_continuous: bool = False
427
    missing_diagrams: Dict[str, bool] = {
428
        "bar": False,
429
        "matrix": False, 
430
        "dendrogram": False,
431
        "heatmap": False
432
    }
433
    
434
    # Reduced sampling
435
    samples_tail: int = 0
436
    samples_random: int = 0
437
```
438

439
**Usage Example:**
440

441
```python
442
from ydata_profiling import ProfileReport
443
from ydata_profiling.config import SparkSettings
444
from pyspark.sql import SparkSession
445

446
# Create Spark DataFrame
447
spark = SparkSession.builder.appName("Profiling").getOrCreate()
448
spark_df = spark.read.csv("large_dataset.csv", header=True, inferSchema=True)
449

450
# Use SparkSettings for optimal performance
451
config = SparkSettings()
452
config.title = "Large Dataset Analysis"
453

454
report = ProfileReport(spark_df, config=config)
455
report.to_file("spark_report.html")
456
```
457

458
### Configuration Methods
459

460
Advanced methods for managing and updating configuration settings.
461

462
```python { .api }
463
def update(self, updates: dict) -> 'Settings':
464
    """
465
    Merge updates with existing configuration.
466
    
467
    Parameters:
468
    - updates: dictionary with configuration updates
469
    
470
    Returns:
471
    Updated Settings instance
472
    """
473

474
@staticmethod
475
def from_file(config_file: Union[Path, str]) -> 'Settings':
476
    """
477
    Create Settings from YAML configuration file.
478
    
479
    Parameters:
480
    - config_file: path to YAML configuration file
481
    
482
    Returns:
483
    Settings instance with loaded configuration
484
    """
485

486
@property
487
def primary_color(self) -> str:
488
    """
489
    Get primary color for backward compatibility.
490
    
491
    Returns:
492
    Primary color from style configuration
493
    """
494
```
495

496
**Usage Example:**
497

498
```python
499
from ydata_profiling.config import Settings
500
from pathlib import Path
501

502
# Load from file
503
config = Settings.from_file("custom_config.yaml")
504

505
# Update specific settings
506
updates = {
507
    "title": "Updated Report Title",
508
    "plot": {
509
        "dpi": 600,
510
        "image_format": "png"
511
    },
512
    "vars": {
513
        "cat": {
514
            "redact": True
515
        }
516
    }
517
}
518

519
updated_config = config.update(updates)
520

521
# Use updated configuration
522
report = ProfileReport(df, config=updated_config)
523
```
524

525
### Preset Configurations
526

527
Built-in configuration presets for common use cases.
528

529
**Built-in Presets:**
530

531
```python
532
# Minimal mode - fast profiling with reduced computation
533
ProfileReport(df, minimal=True)
534

535
# Explorative mode - comprehensive analysis with all features
536
ProfileReport(df, explorative=True)
537

538
# Sensitive mode - privacy-aware profiling
539
ProfileReport(df, sensitive=True)
540

541
# Time-series mode - specialized for time-series data
542
ProfileReport(df, tsmode=True, sortby='timestamp')
543
```
544

545
**Preset Details:**
546

547
- **Minimal**: Disables correlations, missing diagrams, and type inference for speed
548
- **Explorative**: Enables advanced text analysis, file analysis, and memory profiling
549
- **Sensitive**: Redacts categorical/text values and disables sample display
550
- **Time-series**: Enables autocorrelation analysis and time-based sorting
551
```

Version

Tile

Files

configuration.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

configuration.mddocs/