Tessl Tile for pypi/ydata-profiling@4.16.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

analysis-components.md configuration.md console-interface.md core-profiling.md index.md pandas-integration.md report-comparison.md

core-profiling.mddocs/

0
# Core Profiling
1

2
Primary functionality for generating comprehensive data profile reports from DataFrames, including statistical analysis, data quality assessment, and automated report generation with customizable analysis depth and output formats.
3

4
## Capabilities
5

6
### ProfileReport Class
7

8
Main class for creating comprehensive data profiling reports from pandas or Spark DataFrames with extensive customization options.
9

10
```python { .api }
11
class ProfileReport:
12
    def __init__(
13
        self,
14
        df: Optional[Union[pd.DataFrame, sDataFrame]] = None,
15
        minimal: bool = False,
16
        tsmode: bool = False,
17
        sortby: Optional[str] = None,
18
        sensitive: bool = False,
19
        explorative: bool = False,
20
        sample: Optional[dict] = None,
21
        config_file: Optional[Union[Path, str]] = None,
22
        lazy: bool = True,
23
        typeset: Optional[VisionsTypeset] = None,
24
        summarizer: Optional[BaseSummarizer] = None,
25
        config: Optional[Settings] = None,
26
        type_schema: Optional[dict] = None,
27
        **kwargs
28
    ):
29
        """
30
        Generate a ProfileReport based on a pandas or spark.sql DataFrame.
31

32
        Parameters:
33
        - df: pandas or spark.sql DataFrame to analyze
34
        - minimal: use minimal computation mode for faster processing
35
        - tsmode: activate time-series analysis for numerical variables
36
        - sortby: column name to sort dataset by (for time-series mode)
37
        - sensitive: hide values for categorical/text variables for privacy
38
        - explorative: enable additional analysis features
39
        - sample: sampling configuration dictionary
40
        - config_file: path to YAML configuration file
41
        - lazy: defer computation until report generation
42
        - typeset: custom visions typeset for type inference
43
        - summarizer: custom statistical summarizer
44
        - config: Settings object for configuration
45
        - type_schema: manual type specification dictionary
46
        - **kwargs: additional configuration parameters
47
        """
48
```
49

50
**Usage Example:**
51

52
```python
53
import pandas as pd
54
from ydata_profiling import ProfileReport
55

56
# Basic usage
57
df = pd.read_csv('data.csv')
58
report = ProfileReport(df, title="My Dataset Report")
59

60
# Minimal mode for large datasets
61
report = ProfileReport(df, minimal=True)
62

63
# Time-series analysis
64
report = ProfileReport(df, tsmode=True, sortby='timestamp')
65

66
# Custom configuration
67
report = ProfileReport(
68
    df,
69
    explorative=True,
70
    sensitive=False,
71
    title="Detailed Analysis",
72
    pool_size=4
73
)
74
```
75

76
### Report Generation Methods
77

78
Methods for generating and exporting profiling reports in various formats.
79

80
```python { .api }
81
def to_file(self, output_file: Union[str, Path], silent: bool = True) -> None:
82
    """
83
    Save the report to an HTML file.
84

85
    Parameters:
86
    - output_file: path where to save the report
87
    - silent: suppress progress information
88
    """
89

90
def to_html(self) -> str:
91
    """
92
    Generate HTML report content as string.
93

94
    Returns:
95
    Complete HTML report as string
96
    """
97

98
def to_json(self) -> str:
99
    """
100
    Generate JSON representation of the report.
101

102
    Returns:
103
    JSON string containing all analysis results
104
    """
105

106
def to_notebook_iframe(self) -> None:
107
    """
108
    Display the report in a Jupyter notebook iframe.
109
    """
110

111
def to_widgets(self) -> Any:
112
    """
113
    Generate interactive Jupyter widgets for the report.
114

115
    Returns:
116
    Widget object for interactive exploration
117
    """
118
```
119

120
**Usage Example:**
121

122
```python
123
# Generate report
124
report = ProfileReport(df)
125

126
# Export to HTML file
127
report.to_file("my_report.html")
128

129
# Get HTML content as string
130
html_content = report.to_html()
131

132
# Get JSON representation
133
json_data = report.to_json()
134

135
# Display in Jupyter notebook
136
report.to_notebook_iframe()
137

138
# Create interactive widgets
139
widgets = report.to_widgets()
140
```
141

142
### Data Access Methods
143

144
Methods for accessing underlying data and analysis results.
145

146
```python { .api }
147
def get_description(self) -> BaseDescription:
148
    """
149
    Get the complete dataset description with all analysis results.
150

151
    Returns:
152
    BaseDescription object containing statistical summaries,
153
    correlations, missing data patterns, and data quality alerts
154
    """
155

156
def get_duplicates(self) -> Optional[pd.DataFrame]:
157
    """
158
    Get duplicate rows from the dataset.
159

160
    Returns:
161
    DataFrame containing all duplicate rows, or None if no duplicates
162
    """
163

164
def get_sample(self) -> dict:
165
    """
166
    Get data samples from the dataset.
167

168
    Returns:
169
    Dictionary containing head, tail, and random samples
170
    """
171

172
def get_rejected_variables(self) -> set:
173
    """
174
    Get variables that were rejected during analysis.
175

176
    Returns:
177
    Set of column names that were rejected
178
    """
179
```
180

181
**Usage Example:**
182

183
```python
184
report = ProfileReport(df)
185

186
# Get complete analysis description
187
description = report.get_description()
188

189
# Access duplicate rows
190
duplicates = report.get_duplicates()
191
print(f"Found {len(duplicates)} duplicate rows")
192

193
# Get data samples
194
samples = report.get_sample()
195
print("Sample data:", samples['head'])
196

197
# Check rejected variables
198
rejected = report.get_rejected_variables()
199
if rejected:
200
    print(f"Rejected variables: {rejected}")
201
```
202

203
### Report Management Methods
204

205
Methods for managing report state and comparisons.
206

207
```python { .api }
208
def invalidate_cache(self, subset: Optional[str] = None) -> None:
209
    """
210
    Clear cached analysis results to force recomputation.
211

212
    Parameters:
213
    - subset: cache subset to invalidate ("rendering", "report", or None for all)
214
    """
215

216
def compare(self, other: 'ProfileReport', config: Optional[Settings] = None) -> 'ProfileReport':
217
    """
218
    Compare this report with another ProfileReport.
219

220
    Parameters:
221
    - other: another ProfileReport to compare against
222
    - config: configuration for comparison analysis
223

224
    Returns:
225
    New ProfileReport containing comparison results
226
    """
227
```
228

229
**Usage Example:**
230

231
```python
232
# Create reports for two datasets
233
report1 = ProfileReport(df1, title="Dataset 1")
234
report2 = ProfileReport(df2, title="Dataset 2")
235

236
# Compare reports
237
comparison = report1.compare(report2)
238
comparison.to_file("comparison_report.html")
239

240
# Force recomputation
241
report1.invalidate_cache()
242
updated_html = report1.to_html()
243
```
244

245
### Properties
246

247
Key properties for accessing report components and metadata.
248

249
```python { .api }
250
@property
251
def typeset(self) -> VisionsTypeset:
252
    """Get the typeset used for data type inference."""
253

254
@property
255
def summarizer(self) -> BaseSummarizer:
256
    """Get the statistical summarizer used for analysis."""
257

258
@property
259
def description_set(self) -> BaseDescription:
260
    """Get the complete dataset description."""
261

262
@property
263
def df_hash(self) -> str:
264
    """Get hash of the source DataFrame."""
265

266
@property
267
def report(self) -> Root:
268
    """Get the report structure object."""
269

270
@property
271
def html(self) -> str:
272
    """Get HTML report content."""
273

274
@property
275
def json(self) -> str:
276
    """Get JSON report content."""
277

278
@property
279
def widgets(self) -> Any:
280
    """Get report widgets."""
281
```
282

283
**Usage Example:**
284

285
```python
286
report = ProfileReport(df)
287

288
# Access report properties
289
print(f"Report title: {report.config.title}")
290
print(f"DataFrame hash: {report.df_hash}")
291

292
# Access analysis components
293
typeset = report.typeset
294
summarizer = report.summarizer
295
description = report.description_set
296

297
# Get report content
298
html_report = report.html
299
json_report = report.json
300
```
301

302
### Serialization Methods
303

304
Methods for serializing and deserializing ProfileReport objects for storage and transmission.
305

306
```python { .api }
307
def dumps(self) -> bytes:
308
    """
309
    Serialize ProfileReport to bytes.
310
    
311
    Returns:
312
    Serialized ProfileReport as bytes
313
    """
314

315
def loads(data: bytes) -> Union['ProfileReport', 'SerializeReport']:
316
    """
317
    Deserialize ProfileReport from bytes.
318
    
319
    Parameters:
320
    - data: serialized ProfileReport bytes
321
    
322
    Returns:
323
    Deserialized ProfileReport instance
324
    """
325

326
def dump(self, output_file: Union[Path, str]) -> None:
327
    """
328
    Save serialized ProfileReport to file.
329
    
330
    Parameters:
331
    - output_file: path where to save the serialized report
332
    """
333

334
def load(load_file: Union[Path, str]) -> Union['ProfileReport', 'SerializeReport']:
335
    """
336
    Load ProfileReport from serialized file.
337
    
338
    Parameters:
339
    - load_file: path to serialized report file
340
    
341
    Returns:
342
    Loaded ProfileReport instance
343
    """
344
```
345

346
**Usage Example:**
347

348
```python
349
import pickle
350
from pathlib import Path
351

352
# Create and serialize report
353
report = ProfileReport(df, title="My Dataset")
354

355
# Serialize to bytes
356
serialized_bytes = report.dumps()
357

358
# Save to file
359
report.dump("my_report.pkl")
360

361
# Load from file
362
loaded_report = ProfileReport.load("my_report.pkl")
363

364
# Deserialize from bytes
365
restored_report = ProfileReport.loads(serialized_bytes)
366

367
# Use loaded report
368
restored_report.to_file("restored_report.html")
369
```
370

371
### Great Expectations Integration
372

373
Integration with Great Expectations for automated data validation and expectation suite generation.
374

375
```python { .api }
376
def to_expectation_suite(
377
    self,
378
    suite_name: Optional[str] = None,
379
    data_context: Optional[Any] = None,
380
    save_suite: bool = True,
381
    run_validation: bool = True,
382
    build_data_docs: bool = True,
383
    handler: Optional[Handler] = None
384
) -> Any:
385
    """
386
    Generate Great Expectations expectation suite from profiling results.
387
    
388
    Parameters:
389
    - suite_name: name for the expectation suite
390
    - data_context: Great Expectations data context
391
    - save_suite: whether to save the suite to the data context
392
    - run_validation: whether to run validation after creating suite
393
    - build_data_docs: whether to build data docs after suite creation
394
    - handler: custom handler for expectation generation
395
    
396
    Returns:
397
    Great Expectations expectation suite object
398
    """
399
```
400

401
**Usage Example:**
402

403
```python
404
import great_expectations as ge
405
from ydata_profiling import ProfileReport
406

407
# Create ProfileReport
408
report = ProfileReport(df, title="Data Validation")
409

410
# Generate Great Expectations suite
411
suite = report.to_expectation_suite(
412
    suite_name="my_dataset_expectations",
413
    save_suite=True,
414
    run_validation=True
415
)
416

417
# The suite can now be used for ongoing data validation
418
print(f"Created expectation suite with {len(suite.expectations)} expectations")
419
```

Version

Tile

Files

core-profiling.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

core-profiling.mddocs/