or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# pandas-profiling

1

2

A Python library that provides comprehensive one-line Exploratory Data Analysis (EDA) for pandas DataFrames. It generates detailed profile reports including statistical summaries, data quality warnings, visualizations, and insights that go far beyond basic `df.describe()` functionality.

3

4

## Package Information

5

6

- **Package Name**: pandas-profiling

7

- **Language**: Python

8

- **Installation**: `pip install pandas-profiling`

9

- **Optional extras**: `pip install pandas-profiling[notebook,unicode]`

10

11

## Core Imports

12

13

```python

14

from pandas_profiling import ProfileReport

15

```

16

17

For dataset comparison:

18

19

```python

20

from pandas_profiling import compare

21

```

22

23

To enable pandas DataFrame.profile_report() method:

24

25

```python

26

import pandas_profiling # Adds profile_report() method to DataFrames

27

```

28

29

For configuration:

30

31

```python

32

from pandas_profiling.config import Settings

33

```

34

35

## Basic Usage

36

37

```python

38

import pandas as pd

39

from pandas_profiling import ProfileReport

40

41

# Load your data

42

df = pd.read_csv('your_data.csv')

43

44

# Generate profile report

45

profile = ProfileReport(df, title="Data Profile Report")

46

47

# View in Jupyter notebook

48

profile.to_widgets()

49

50

# Or export to HTML file

51

profile.to_file("profile_report.html")

52

53

# Or get as JSON

54

json_data = profile.to_json()

55

```

56

57

## Architecture

58

59

pandas-profiling is built around a modular architecture:

60

61

- **ProfileReport**: Central class that orchestrates data analysis and report generation

62

- **Configuration System**: Flexible settings management through the Settings class and configuration models

63

- **Analysis Pipeline**: Automated type inference, statistical analysis, and visualization generation

64

- **Export System**: Multiple output formats (HTML, JSON, Jupyter widgets)

65

- **pandas Integration**: Automatic DataFrame method extension for seamless workflow integration

66

67

## Types

68

69

```python { .api }

70

from typing import Any, Dict, List, Optional, Union, Tuple

71

from pathlib import Path

72

import pandas as pd

73

from visions import VisionsTypeset

74

75

# Key classes from pandas_profiling

76

class Settings: ... # Configuration management class

77

class BaseSummarizer: ... # Summary generation interface

78

```

79

80

## Capabilities

81

82

### Profile Report Generation

83

84

The core functionality for creating comprehensive data analysis reports from pandas DataFrames.

85

86

```python { .api }

87

class ProfileReport:

88

def __init__(

89

self,

90

df: Optional[pd.DataFrame] = None,

91

minimal: bool = False,

92

explorative: bool = False,

93

sensitive: bool = False,

94

dark_mode: bool = False,

95

orange_mode: bool = False,

96

tsmode: bool = False,

97

sortby: Optional[str] = None,

98

sample: Optional[dict] = None,

99

config_file: Union[Path, str] = None,

100

lazy: bool = True,

101

typeset: Optional[VisionsTypeset] = None,

102

summarizer: Optional[BaseSummarizer] = None,

103

config: Optional[Settings] = None,

104

**kwargs

105

):

106

"""

107

Generate a ProfileReport based on a pandas DataFrame.

108

109

Parameters:

110

- df: pandas DataFrame to analyze

111

- minimal: use minimal computation mode for faster processing

112

- explorative: enable advanced analysis features

113

- sensitive: enable privacy-aware mode for sensitive data

114

- dark_mode: apply dark theme styling

115

- orange_mode: apply orange theme styling

116

- tsmode: enable time series analysis mode

117

- sortby: column name for time series sorting

118

- sample: optional sample data dict with name, caption, data

119

- config_file: path to YAML configuration file

120

- lazy: compute analysis when needed (default True)

121

- typeset: custom type inference system

122

- summarizer: custom summary generation system

123

- config: Settings object for configuration

124

- **kwargs: additional configuration options

125

"""

126

```

127

128

### Report Export and Display

129

130

Methods for outputting and displaying the generated profile report.

131

132

```python { .api }

133

class ProfileReport:

134

def to_file(self, output_file: Union[str, Path], silent: bool = True) -> None:

135

"""

136

Export report to HTML or JSON file.

137

138

Parameters:

139

- output_file: path for output file (.html or .json extension)

140

- silent: suppress progress output

141

"""

142

143

def to_html(self) -> str:

144

"""

145

Get HTML representation of the report.

146

147

Returns:

148

str: Complete HTML report as string

149

"""

150

151

def to_json(self) -> str:

152

"""

153

Get JSON representation of the report.

154

155

Returns:

156

str: Complete report data as JSON string

157

"""

158

159

def to_widgets(self) -> Any:

160

"""

161

Display report as interactive Jupyter widgets.

162

163

Returns:

164

Widget object for Jupyter notebook display

165

"""

166

167

def to_notebook_iframe(self) -> None:

168

"""

169

Display report as embedded HTML iframe in Jupyter notebook.

170

"""

171

```

172

173

### Data Access and Analysis

174

175

Methods for accessing specific analysis results and data insights.

176

177

```python { .api }

178

class ProfileReport:

179

def get_description(self) -> dict:

180

"""

181

Get the complete analysis description dictionary.

182

183

Returns:

184

dict: Complete analysis results and metadata

185

"""

186

187

def get_duplicates(self) -> Optional[pd.DataFrame]:

188

"""

189

Get DataFrame containing duplicate rows.

190

191

Returns:

192

DataFrame or None: Duplicate rows if any exist

193

"""

194

195

def get_sample(self) -> dict:

196

"""

197

Get sample data information.

198

199

Returns:

200

dict: Sample data with metadata

201

"""

202

203

def get_rejected_variables(self) -> set:

204

"""

205

Get set of variable names that were rejected from analysis.

206

207

Returns:

208

set: Variable names excluded from the report

209

"""

210

```

211

212

### Report Comparison

213

214

Functionality for comparing multiple datasets and generating comparison reports.

215

216

```python { .api }

217

def compare(

218

reports: List[ProfileReport],

219

config: Optional[Settings] = None,

220

compute: bool = False

221

) -> ProfileReport:

222

"""

223

Compare multiple ProfileReport objects.

224

225

Parameters:

226

- reports: list of ProfileReport objects to compare

227

- config: optional Settings object for the merged report

228

- compute: recompute profiles using config (recommended for different settings)

229

230

Returns:

231

ProfileReport: Comparison report highlighting differences and similarities

232

"""

233

234

class ProfileReport:

235

def compare(

236

self,

237

other: ProfileReport,

238

config: Optional[Settings] = None

239

) -> ProfileReport:

240

"""

241

Compare this report with another ProfileReport.

242

243

Parameters:

244

- other: ProfileReport object to compare against

245

- config: optional Settings object for the merged report

246

247

Returns:

248

ProfileReport: Comparison report

249

"""

250

```

251

252

### Configuration Management

253

254

Comprehensive configuration system for customizing analysis and report generation.

255

256

```python { .api }

257

class Settings:

258

def __init__(self):

259

"""

260

Create new Settings configuration object with default values.

261

"""

262

263

def update(self, updates: dict) -> Settings:

264

"""

265

Update configuration with new values.

266

267

Parameters:

268

- updates: dictionary of configuration updates

269

270

Returns:

271

Settings: New Settings object with updated values

272

"""

273

274

@classmethod

275

def from_file(cls, config_file: Union[Path, str]) -> Settings:

276

"""

277

Load configuration from YAML file.

278

279

Parameters:

280

- config_file: path to YAML configuration file

281

282

Returns:

283

Settings: Configuration loaded from file

284

"""

285

286

class Config:

287

@staticmethod

288

def get_arg_groups(key: str) -> dict:

289

"""

290

Get predefined configuration group.

291

292

Parameters:

293

- key: configuration group name ('sensitive', 'explorative', 'dark_mode', 'orange_mode')

294

295

Returns:

296

dict: Configuration dictionary for the specified group

297

"""

298

299

@staticmethod

300

def shorthands(kwargs: dict, split: bool = True) -> Tuple[dict, dict]:

301

"""

302

Process configuration shortcuts and expand them.

303

304

Parameters:

305

- kwargs: configuration dictionary with potential shortcuts

306

- split: whether to split into shorthand and regular configs

307

308

Returns:

309

tuple: (shorthand_config, regular_config) dictionaries

310

"""

311

```

312

313

### DataFrame Integration

314

315

Automatic extension of pandas DataFrame with profiling functionality.

316

317

```python { .api }

318

# Automatically available after importing pandas_profiling

319

class DataFrame:

320

def profile_report(self, **kwargs) -> ProfileReport:

321

"""

322

Generate a ProfileReport for this DataFrame.

323

324

Parameters:

325

- **kwargs: arguments passed to ProfileReport constructor

326

327

Returns:

328

ProfileReport: Analysis report for this DataFrame

329

"""

330

```

331

332

### Cache Management

333

334

Methods for managing analysis computation caching.

335

336

```python { .api }

337

class ProfileReport:

338

def invalidate_cache(self, subset: Optional[str] = None) -> None:

339

"""

340

Clear cached computations to force recomputation.

341

342

Parameters:

343

- subset: optional cache subset to clear (None clears all)

344

"""

345

```

346

347

## Configuration Options

348

349

The Settings class provides extensive configuration through nested models:

350

351

### Variable Analysis Configuration

352

- **NumVars**: Numerical variable analysis settings (quantiles, thresholds)

353

- **CatVars**: Categorical variable analysis settings (length, character analysis)

354

- **BoolVars**: Boolean variable analysis settings

355

- **TimeseriesVars**: Time series analysis configuration

356

- **FileVars**: File path analysis settings

357

- **PathVars**: Path analysis settings

358

- **ImageVars**: Image analysis settings

359

- **UrlVars**: URL analysis settings

360

361

### Visualization Configuration

362

- **Plot**: General plotting configuration

363

- **Histogram**: Histogram visualization settings

364

- **CorrelationPlot**: Correlation plot settings

365

- **MissingPlot**: Missing data visualization

366

- **Html**: HTML output formatting

367

- **Style**: Visual styling and themes

368

369

### Analysis Configuration

370

- **Correlations**: Correlation analysis settings

371

- **Duplicates**: Duplicate detection configuration

372

- **Interactions**: Variable interaction analysis

373

- **Samples**: Data sampling configuration

374

- **Variables**: General variable analysis settings

375

376

### Output Configuration

377

- **Notebook**: Jupyter notebook integration settings

378

- **Iframe**: HTML iframe configuration

379

380

## Enums and Constants

381

382

```python { .api }

383

from enum import Enum

384

385

class Theme(Enum):

386

"""Available visual themes for reports."""

387

flatly = "flatly"

388

united = "united"

389

# Additional theme values available

390

391

class ImageType(Enum):

392

"""Supported image output formats."""

393

png = "png"

394

svg = "svg"

395

396

class IframeAttribute(Enum):

397

"""HTML iframe attribute options."""

398

srcdoc = "srcdoc"

399

src = "src"

400

```

401

402

## Usage Examples

403

404

### Time Series Analysis

405

406

```python

407

import pandas as pd

408

from pandas_profiling import ProfileReport

409

410

# Load time series data

411

df = pd.read_csv('timeseries_data.csv')

412

df['date'] = pd.to_datetime(df['date'])

413

414

# Generate time series report

415

profile = ProfileReport(

416

df,

417

title="Time Series Analysis",

418

tsmode=True,

419

sortby='date'

420

)

421

profile.to_file("timeseries_report.html")

422

```

423

424

### Sensitive Data Handling

425

426

```python

427

from pandas_profiling import ProfileReport

428

429

# Generate privacy-aware report

430

profile = ProfileReport(

431

df,

432

title="Sensitive Data Report",

433

sensitive=True # Redacts potentially sensitive information

434

)

435

profile.to_widgets()

436

```

437

438

### Custom Configuration

439

440

```python

441

from pandas_profiling import ProfileReport

442

from pandas_profiling.config import Settings

443

444

# Create custom configuration

445

config = Settings()

446

config = config.update({

447

'vars': {

448

'num': {'quantiles': [0.1, 0.5, 0.9]},

449

'cat': {'characters': True, 'words': True}

450

},

451

'correlations': {

452

'pearson': {'threshold': 0.8}

453

}

454

})

455

456

profile = ProfileReport(df, config=config)

457

profile.to_file("custom_report.html")

458

```

459

460

### Comparing Datasets

461

462

```python

463

from pandas_profiling import ProfileReport, compare

464

465

# Create reports for different datasets

466

report1 = ProfileReport(df_before, title="Before Processing")

467

report2 = ProfileReport(df_after, title="After Processing")

468

469

# Generate comparison report

470

comparison = compare([report1, report2])

471

comparison.to_file("comparison_report.html")

472

```

473

474

### Command Line Usage

475

476

```bash

477

# Generate report from CSV file

478

pandas_profiling --title "My Report" data.csv report.html

479

480

# Use custom configuration

481

pandas_profiling --config_file config.yaml data.csv report.html

482

```