Tessl Tile for pypi/pandas-profiling@3.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# pandas-profiling
1

2
A Python library that provides comprehensive one-line Exploratory Data Analysis (EDA) for pandas DataFrames. It generates detailed profile reports including statistical summaries, data quality warnings, visualizations, and insights that go far beyond basic `df.describe()` functionality.
3

4
## Package Information
5

6
- **Package Name**: pandas-profiling
7
- **Language**: Python
8
- **Installation**: `pip install pandas-profiling`
9
- **Optional extras**: `pip install pandas-profiling[notebook,unicode]`
10

11
## Core Imports
12

13
```python
14
from pandas_profiling import ProfileReport
15
```
16

17
For dataset comparison:
18

19
```python
20
from pandas_profiling import compare
21
```
22

23
To enable pandas DataFrame.profile_report() method:
24

25
```python
26
import pandas_profiling  # Adds profile_report() method to DataFrames
27
```
28

29
For configuration:
30

31
```python
32
from pandas_profiling.config import Settings
33
```
34

35
## Basic Usage
36

37
```python
38
import pandas as pd
39
from pandas_profiling import ProfileReport
40

41
# Load your data
42
df = pd.read_csv('your_data.csv')
43

44
# Generate profile report
45
profile = ProfileReport(df, title="Data Profile Report")
46

47
# View in Jupyter notebook
48
profile.to_widgets()
49

50
# Or export to HTML file
51
profile.to_file("profile_report.html")
52

53
# Or get as JSON
54
json_data = profile.to_json()
55
```
56

57
## Architecture
58

59
pandas-profiling is built around a modular architecture:
60

61
- **ProfileReport**: Central class that orchestrates data analysis and report generation
62
- **Configuration System**: Flexible settings management through the Settings class and configuration models
63
- **Analysis Pipeline**: Automated type inference, statistical analysis, and visualization generation
64
- **Export System**: Multiple output formats (HTML, JSON, Jupyter widgets)
65
- **pandas Integration**: Automatic DataFrame method extension for seamless workflow integration
66

67
## Types
68

69
```python { .api }
70
from typing import Any, Dict, List, Optional, Union, Tuple
71
from pathlib import Path
72
import pandas as pd
73
from visions import VisionsTypeset
74

75
# Key classes from pandas_profiling
76
class Settings: ...  # Configuration management class
77
class BaseSummarizer: ...  # Summary generation interface
78
```
79

80
## Capabilities
81

82
### Profile Report Generation
83

84
The core functionality for creating comprehensive data analysis reports from pandas DataFrames.
85

86
```python { .api }
87
class ProfileReport:
88
    def __init__(
89
        self,
90
        df: Optional[pd.DataFrame] = None,
91
        minimal: bool = False,
92
        explorative: bool = False,
93
        sensitive: bool = False,
94
        dark_mode: bool = False,
95
        orange_mode: bool = False,
96
        tsmode: bool = False,
97
        sortby: Optional[str] = None,
98
        sample: Optional[dict] = None,
99
        config_file: Union[Path, str] = None,
100
        lazy: bool = True,
101
        typeset: Optional[VisionsTypeset] = None,
102
        summarizer: Optional[BaseSummarizer] = None,
103
        config: Optional[Settings] = None,
104
        **kwargs
105
    ):
106
        """
107
        Generate a ProfileReport based on a pandas DataFrame.
108
        
109
        Parameters:
110
        - df: pandas DataFrame to analyze
111
        - minimal: use minimal computation mode for faster processing
112
        - explorative: enable advanced analysis features
113
        - sensitive: enable privacy-aware mode for sensitive data
114
        - dark_mode: apply dark theme styling
115
        - orange_mode: apply orange theme styling
116
        - tsmode: enable time series analysis mode
117
        - sortby: column name for time series sorting
118
        - sample: optional sample data dict with name, caption, data
119
        - config_file: path to YAML configuration file
120
        - lazy: compute analysis when needed (default True)
121
        - typeset: custom type inference system
122
        - summarizer: custom summary generation system
123
        - config: Settings object for configuration
124
        - **kwargs: additional configuration options
125
        """
126
```
127

128
### Report Export and Display
129

130
Methods for outputting and displaying the generated profile report.
131

132
```python { .api }
133
class ProfileReport:
134
    def to_file(self, output_file: Union[str, Path], silent: bool = True) -> None:
135
        """
136
        Export report to HTML or JSON file.
137
        
138
        Parameters:
139
        - output_file: path for output file (.html or .json extension)
140
        - silent: suppress progress output
141
        """
142
    
143
    def to_html(self) -> str:
144
        """
145
        Get HTML representation of the report.
146
        
147
        Returns:
148
        str: Complete HTML report as string
149
        """
150
    
151
    def to_json(self) -> str:
152
        """
153
        Get JSON representation of the report.
154
        
155
        Returns:
156
        str: Complete report data as JSON string
157
        """
158
    
159
    def to_widgets(self) -> Any:
160
        """
161
        Display report as interactive Jupyter widgets.
162
        
163
        Returns:
164
        Widget object for Jupyter notebook display
165
        """
166
    
167
    def to_notebook_iframe(self) -> None:
168
        """
169
        Display report as embedded HTML iframe in Jupyter notebook.
170
        """
171
```
172

173
### Data Access and Analysis
174

175
Methods for accessing specific analysis results and data insights.
176

177
```python { .api }
178
class ProfileReport:
179
    def get_description(self) -> dict:
180
        """
181
        Get the complete analysis description dictionary.
182
        
183
        Returns:
184
        dict: Complete analysis results and metadata
185
        """
186
    
187
    def get_duplicates(self) -> Optional[pd.DataFrame]:
188
        """
189
        Get DataFrame containing duplicate rows.
190
        
191
        Returns:
192
        DataFrame or None: Duplicate rows if any exist
193
        """
194
    
195
    def get_sample(self) -> dict:
196
        """
197
        Get sample data information.
198
        
199
        Returns:
200
        dict: Sample data with metadata
201
        """
202
    
203
    def get_rejected_variables(self) -> set:
204
        """
205
        Get set of variable names that were rejected from analysis.
206
        
207
        Returns:
208
        set: Variable names excluded from the report
209
        """
210
```
211

212
### Report Comparison
213

214
Functionality for comparing multiple datasets and generating comparison reports.
215

216
```python { .api }
217
def compare(
218
    reports: List[ProfileReport],
219
    config: Optional[Settings] = None,
220
    compute: bool = False
221
) -> ProfileReport:
222
    """
223
    Compare multiple ProfileReport objects.
224
    
225
    Parameters:
226
    - reports: list of ProfileReport objects to compare
227
    - config: optional Settings object for the merged report
228
    - compute: recompute profiles using config (recommended for different settings)
229
    
230
    Returns:
231
    ProfileReport: Comparison report highlighting differences and similarities
232
    """
233

234
class ProfileReport:
235
    def compare(
236
        self,
237
        other: ProfileReport,
238
        config: Optional[Settings] = None
239
    ) -> ProfileReport:
240
        """
241
        Compare this report with another ProfileReport.
242
        
243
        Parameters:
244
        - other: ProfileReport object to compare against
245
        - config: optional Settings object for the merged report
246
        
247
        Returns:
248
        ProfileReport: Comparison report
249
        """
250
```
251

252
### Configuration Management
253

254
Comprehensive configuration system for customizing analysis and report generation.
255

256
```python { .api }
257
class Settings:
258
    def __init__(self):
259
        """
260
        Create new Settings configuration object with default values.
261
        """
262
    
263
    def update(self, updates: dict) -> Settings:
264
        """
265
        Update configuration with new values.
266
        
267
        Parameters:
268
        - updates: dictionary of configuration updates
269
        
270
        Returns:
271
        Settings: New Settings object with updated values
272
        """
273
    
274
    @classmethod
275
    def from_file(cls, config_file: Union[Path, str]) -> Settings:
276
        """
277
        Load configuration from YAML file.
278
        
279
        Parameters:
280
        - config_file: path to YAML configuration file
281
        
282
        Returns:
283
        Settings: Configuration loaded from file
284
        """
285

286
class Config:
287
    @staticmethod
288
    def get_arg_groups(key: str) -> dict:
289
        """
290
        Get predefined configuration group.
291
        
292
        Parameters:
293
        - key: configuration group name ('sensitive', 'explorative', 'dark_mode', 'orange_mode')
294
        
295
        Returns:
296
        dict: Configuration dictionary for the specified group
297
        """
298
    
299
    @staticmethod
300
    def shorthands(kwargs: dict, split: bool = True) -> Tuple[dict, dict]:
301
        """
302
        Process configuration shortcuts and expand them.
303
        
304
        Parameters:
305
        - kwargs: configuration dictionary with potential shortcuts
306
        - split: whether to split into shorthand and regular configs
307
        
308
        Returns:
309
        tuple: (shorthand_config, regular_config) dictionaries
310
        """
311
```
312

313
### DataFrame Integration
314

315
Automatic extension of pandas DataFrame with profiling functionality.
316

317
```python { .api }
318
# Automatically available after importing pandas_profiling
319
class DataFrame:
320
    def profile_report(self, **kwargs) -> ProfileReport:
321
        """
322
        Generate a ProfileReport for this DataFrame.
323
        
324
        Parameters:
325
        - **kwargs: arguments passed to ProfileReport constructor
326
        
327
        Returns:
328
        ProfileReport: Analysis report for this DataFrame
329
        """
330
```
331

332
### Cache Management
333

334
Methods for managing analysis computation caching.
335

336
```python { .api }
337
class ProfileReport:
338
    def invalidate_cache(self, subset: Optional[str] = None) -> None:
339
        """
340
        Clear cached computations to force recomputation.
341
        
342
        Parameters:
343
        - subset: optional cache subset to clear (None clears all)
344
        """
345
```
346

347
## Configuration Options
348

349
The Settings class provides extensive configuration through nested models:
350

351
### Variable Analysis Configuration
352
- **NumVars**: Numerical variable analysis settings (quantiles, thresholds)
353
- **CatVars**: Categorical variable analysis settings (length, character analysis)
354
- **BoolVars**: Boolean variable analysis settings
355
- **TimeseriesVars**: Time series analysis configuration
356
- **FileVars**: File path analysis settings
357
- **PathVars**: Path analysis settings
358
- **ImageVars**: Image analysis settings
359
- **UrlVars**: URL analysis settings
360

361
### Visualization Configuration
362
- **Plot**: General plotting configuration
363
- **Histogram**: Histogram visualization settings
364
- **CorrelationPlot**: Correlation plot settings  
365
- **MissingPlot**: Missing data visualization
366
- **Html**: HTML output formatting
367
- **Style**: Visual styling and themes
368

369
### Analysis Configuration
370
- **Correlations**: Correlation analysis settings
371
- **Duplicates**: Duplicate detection configuration
372
- **Interactions**: Variable interaction analysis
373
- **Samples**: Data sampling configuration
374
- **Variables**: General variable analysis settings
375

376
### Output Configuration
377
- **Notebook**: Jupyter notebook integration settings
378
- **Iframe**: HTML iframe configuration
379

380
## Enums and Constants
381

382
```python { .api }
383
from enum import Enum
384

385
class Theme(Enum):
386
    """Available visual themes for reports."""
387
    flatly = "flatly"
388
    united = "united"
389
    # Additional theme values available
390

391
class ImageType(Enum):
392
    """Supported image output formats."""
393
    png = "png"
394
    svg = "svg"
395

396
class IframeAttribute(Enum):
397
    """HTML iframe attribute options."""
398
    srcdoc = "srcdoc"
399
    src = "src"
400
```
401

402
## Usage Examples
403

404
### Time Series Analysis
405

406
```python
407
import pandas as pd
408
from pandas_profiling import ProfileReport
409

410
# Load time series data
411
df = pd.read_csv('timeseries_data.csv')
412
df['date'] = pd.to_datetime(df['date'])
413

414
# Generate time series report
415
profile = ProfileReport(
416
    df,
417
    title="Time Series Analysis",
418
    tsmode=True,
419
    sortby='date'
420
)
421
profile.to_file("timeseries_report.html")
422
```
423

424
### Sensitive Data Handling
425

426
```python
427
from pandas_profiling import ProfileReport
428

429
# Generate privacy-aware report
430
profile = ProfileReport(
431
    df,
432
    title="Sensitive Data Report", 
433
    sensitive=True  # Redacts potentially sensitive information
434
)
435
profile.to_widgets()
436
```
437

438
### Custom Configuration
439

440
```python
441
from pandas_profiling import ProfileReport
442
from pandas_profiling.config import Settings
443

444
# Create custom configuration
445
config = Settings()
446
config = config.update({
447
    'vars': {
448
        'num': {'quantiles': [0.1, 0.5, 0.9]},
449
        'cat': {'characters': True, 'words': True}
450
    },
451
    'correlations': {
452
        'pearson': {'threshold': 0.8}
453
    }
454
})
455

456
profile = ProfileReport(df, config=config)
457
profile.to_file("custom_report.html")
458
```
459

460
### Comparing Datasets
461

462
```python
463
from pandas_profiling import ProfileReport, compare
464

465
# Create reports for different datasets
466
report1 = ProfileReport(df_before, title="Before Processing")
467
report2 = ProfileReport(df_after, title="After Processing")
468

469
# Generate comparison report
470
comparison = compare([report1, report2])
471
comparison.to_file("comparison_report.html")
472
```
473

474
### Command Line Usage
475

476
```bash
477
# Generate report from CSV file
478
pandas_profiling --title "My Report" data.csv report.html
479

480
# Use custom configuration
481
pandas_profiling --config_file config.yaml data.csv report.html
482
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/