or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-components.mdconfiguration.mdconsole-interface.mdcore-profiling.mdindex.mdpandas-integration.mdreport-comparison.md

configuration.mddocs/

0

# Configuration

1

2

Comprehensive configuration system for customizing analysis depth, statistical computations, visualizations, and report output formats. The configuration system provides fine-grained control over every aspect of the profiling process.

3

4

## Capabilities

5

6

### Settings Class

7

8

Main configuration class providing comprehensive control over profiling behavior and report generation.

9

10

```python { .api }

11

class Settings:

12

def __init__(self, **kwargs):

13

"""

14

Initialize Settings with configuration parameters.

15

16

Parameters:

17

- **kwargs: configuration parameters for various analysis components

18

"""

19

20

# Core configuration sections

21

dataset: DatasetConfig

22

variables: VariablesConfig

23

correlations: CorrelationsConfig

24

interactions: InteractionsConfig

25

plot: PlotConfig

26

html: HtmlConfig

27

style: StyleConfig

28

29

# Global settings

30

title: str = "Profiling Report"

31

pool_size: int = 0

32

progress_bar: bool = True

33

lazy: bool = True

34

```

35

36

**Usage Example:**

37

38

```python

39

from ydata_profiling import ProfileReport

40

from ydata_profiling.config import Settings

41

42

# Create custom configuration

43

config = Settings()

44

config.title = "Custom Dataset Analysis"

45

config.pool_size = 4

46

config.progress_bar = True

47

48

# Apply configuration to report

49

report = ProfileReport(df, config=config)

50

report.to_file("custom_report.html")

51

```

52

53

### Configuration Loading

54

55

Load configuration from files or preset configurations.

56

57

```python { .api }

58

class Config:

59

@staticmethod

60

def get_config(config_file: Optional[Union[str, Path]] = None) -> Settings:

61

"""

62

Load configuration from file or return default configuration.

63

64

Parameters:

65

- config_file: path to YAML configuration file

66

67

Returns:

68

Settings object with loaded configuration

69

"""

70

```

71

72

**Usage Example:**

73

74

```python

75

from ydata_profiling.config import Config

76

from ydata_profiling import ProfileReport

77

78

# Load from configuration file

79

config = Config.get_config("my_config.yaml")

80

report = ProfileReport(df, config=config)

81

82

# Use preset configurations

83

minimal_report = ProfileReport(df, minimal=True)

84

explorative_report = ProfileReport(df, explorative=True)

85

sensitive_report = ProfileReport(df, sensitive=True)

86

```

87

88

### Dataset Configuration

89

90

Configuration for dataset-level metadata and processing options.

91

92

```python { .api }

93

class DatasetConfig:

94

"""Configuration for dataset-level settings."""

95

96

# Dataset metadata

97

description: str = ""

98

creator: str = ""

99

author: str = ""

100

copyright_holder: str = ""

101

copyright_year: str = ""

102

url: str = ""

103

104

# Processing options

105

sample: Optional[dict] = None

106

duplicates: Optional[dict] = None

107

```

108

109

**Usage Example:**

110

111

```python

112

config = Settings()

113

config.dataset.description = "Customer transaction data for Q4 2023"

114

config.dataset.creator = "Data Science Team"

115

config.dataset.author = "John Doe"

116

117

report = ProfileReport(df, config=config)

118

```

119

120

### Variables Configuration

121

122

Configuration for variable-specific analysis settings across different data types.

123

124

```python { .api }

125

class VariablesConfig:

126

"""Configuration for variable-specific analysis."""

127

128

# Variable type configurations

129

descriptions: dict = {}

130

131

# Type-specific settings

132

num: NumVarsConfig

133

cat: CatVarsConfig

134

bool: BoolVarsConfig

135

text: TextVarsConfig

136

file: FileVarsConfig

137

path: PathVarsConfig

138

image: ImageVarsConfig

139

url: UrlVarsConfig

140

timeseries: TimeseriesVarsConfig

141

```

142

143

```python { .api }

144

class NumVarsConfig:

145

"""Numeric variables configuration."""

146

147

low_categorical_threshold: int = 5

148

chi_squared_threshold: float = 0.999

149

skewness_threshold: int = 20

150

kurtosis_threshold: int = 20

151

152

class CatVarsConfig:

153

"""Categorical variables configuration."""

154

155

length: bool = True

156

characters: bool = True

157

words: bool = True

158

cardinality_threshold: int = 50

159

160

class TextVarsConfig:

161

"""Text variables configuration."""

162

163

length: bool = True

164

characters: bool = True

165

words: bool = True

166

redact: bool = False

167

```

168

169

**Usage Example:**

170

171

```python

172

config = Settings()

173

174

# Configure numeric variables

175

config.variables.num.low_categorical_threshold = 10

176

config.variables.num.skewness_threshold = 15

177

178

# Configure categorical variables

179

config.variables.cat.cardinality_threshold = 100

180

config.variables.cat.length = True

181

182

# Configure text variables

183

config.variables.text.redact = True # Hide sensitive text

184

185

report = ProfileReport(df, config=config)

186

```

187

188

### Correlation Configuration

189

190

Configuration for correlation analysis and visualization.

191

192

```python { .api }

193

class CorrelationsConfig:

194

"""Configuration for correlation analysis."""

195

196

pearson: CorrelationConfig

197

spearman: CorrelationConfig

198

kendall: CorrelationConfig

199

cramers: CorrelationConfig

200

phik: CorrelationConfig

201

auto: CorrelationConfig

202

203

class CorrelationConfig:

204

"""Individual correlation method configuration."""

205

206

calculate: bool = True

207

warn_high_cardinality: bool = True

208

threshold: float = 0.9

209

```

210

211

**Usage Example:**

212

213

```python

214

config = Settings()

215

216

# Enable/disable specific correlation methods

217

config.correlations.pearson.calculate = True

218

config.correlations.spearman.calculate = True

219

config.correlations.kendall.calculate = False

220

221

# Set correlation thresholds

222

config.correlations.pearson.threshold = 0.8

223

config.correlations.auto.warn_high_cardinality = True

224

225

report = ProfileReport(df, config=config)

226

```

227

228

### Plot Configuration

229

230

Configuration for visualizations and plotting options.

231

232

```python { .api }

233

class PlotConfig:

234

"""Configuration for plot generation."""

235

236

# Plot settings

237

histogram: dict = {}

238

correlation: dict = {}

239

missing: dict = {}

240

241

# Image settings

242

dpi: int = 800

243

image_format: str = "svg"

244

```

245

246

**Usage Example:**

247

248

```python

249

config = Settings()

250

251

# Configure plot settings

252

config.plot.dpi = 300

253

config.plot.image_format = "png"

254

255

# Configure histogram settings

256

config.plot.histogram = {

257

"bins": 50,

258

"max_bins": 250

259

}

260

261

# Configure correlation plots

262

config.plot.correlation = {

263

"cmap": "RdYlBu_r",

264

"bad": "#000000"

265

}

266

267

report = ProfileReport(df, config=config)

268

```

269

270

### HTML Configuration

271

272

Configuration for HTML report generation and styling.

273

274

```python { .api }

275

class HtmlConfig:

276

"""Configuration for HTML report generation."""

277

278

# Report structure

279

minify_html: bool = True

280

use_local_assets: bool = True

281

inline: bool = True

282

283

# Navigation and layout

284

navbar_show: bool = True

285

full_width: bool = False

286

287

# Content sections

288

style: dict = {}

289

```

290

291

**Usage Example:**

292

293

```python

294

config = Settings()

295

296

# Configure HTML output

297

config.html.minify_html = False # Keep HTML readable

298

config.html.full_width = True # Use full browser width

299

config.html.navbar_show = True # Show navigation bar

300

301

# Custom styling

302

config.html.style = {

303

"primary_color": "#337ab7",

304

"logo": "https://company.com/logo.png"

305

}

306

307

report = ProfileReport(df, config=config)

308

```

309

310

### Spark Configuration

311

312

Configuration for Spark DataFrame processing.

313

314

```python { .api }

315

class SparkSettings:

316

def __init__(self, **kwargs):

317

"""

318

Initialize Spark-specific configuration.

319

320

Parameters:

321

- **kwargs: Spark configuration parameters

322

"""

323

324

# Spark-specific settings

325

executor_memory: str = "2g"

326

executor_cores: int = 2

327

max_result_size: str = "1g"

328

```

329

330

**Usage Example:**

331

332

```python

333

from ydata_profiling.config import SparkSettings

334

from ydata_profiling import ProfileReport

335

336

# Configure Spark settings

337

spark_config = SparkSettings()

338

spark_config.executor_memory = "4g"

339

spark_config.executor_cores = 4

340

341

# Use with Spark DataFrame

342

from pyspark.sql import SparkSession

343

spark = SparkSession.builder.appName("Profiling").getOrCreate()

344

spark_df = spark.read.csv("large_dataset.csv", header=True, inferSchema=True)

345

346

report = ProfileReport(spark_df, config=spark_config)

347

```

348

349

### Configuration Files

350

351

YAML configuration file format for persistent settings.

352

353

**Example Configuration File (`config.yaml`):**

354

355

```yaml

356

title: "Production Data Report"

357

pool_size: 8

358

progress_bar: true

359

360

dataset:

361

description: "Customer transaction dataset"

362

creator: "Data Engineering Team"

363

364

variables:

365

num:

366

low_categorical_threshold: 10

367

skewness_threshold: 20

368

cat:

369

cardinality_threshold: 50

370

text:

371

redact: false

372

373

correlations:

374

pearson:

375

calculate: true

376

threshold: 0.9

377

spearman:

378

calculate: true

379

kendall:

380

calculate: false

381

382

plot:

383

dpi: 300

384

image_format: "png"

385

386

html:

387

minify_html: true

388

full_width: false

389

```

390

391

**Usage with Configuration File:**

392

393

```python

394

from ydata_profiling import ProfileReport

395

396

# Load configuration from file

397

report = ProfileReport(df, config_file="config.yaml")

398

report.to_file("production_report.html")

399

```

400

401

### SparkSettings Class

402

403

Specialized configuration class optimized for Spark DataFrames with performance-focused defaults.

404

405

```python { .api }

406

class SparkSettings(Settings):

407

"""

408

Specialized Settings class for Spark DataFrames with optimized configurations.

409

410

Inherits from Settings but with performance-focused defaults that disable

411

computationally expensive operations for large-scale Spark datasets.

412

"""

413

414

# Performance optimizations

415

infer_dtypes: bool = False

416

correlations: Dict[str, bool] = {

417

"spearman": True,

418

"pearson": True,

419

"auto": False, # Disabled for performance

420

"phi_k": False,

421

"cramers": False,

422

"kendall": False

423

}

424

425

# Disabled heavy computations

426

interactions_continuous: bool = False

427

missing_diagrams: Dict[str, bool] = {

428

"bar": False,

429

"matrix": False,

430

"dendrogram": False,

431

"heatmap": False

432

}

433

434

# Reduced sampling

435

samples_tail: int = 0

436

samples_random: int = 0

437

```

438

439

**Usage Example:**

440

441

```python

442

from ydata_profiling import ProfileReport

443

from ydata_profiling.config import SparkSettings

444

from pyspark.sql import SparkSession

445

446

# Create Spark DataFrame

447

spark = SparkSession.builder.appName("Profiling").getOrCreate()

448

spark_df = spark.read.csv("large_dataset.csv", header=True, inferSchema=True)

449

450

# Use SparkSettings for optimal performance

451

config = SparkSettings()

452

config.title = "Large Dataset Analysis"

453

454

report = ProfileReport(spark_df, config=config)

455

report.to_file("spark_report.html")

456

```

457

458

### Configuration Methods

459

460

Advanced methods for managing and updating configuration settings.

461

462

```python { .api }

463

def update(self, updates: dict) -> 'Settings':

464

"""

465

Merge updates with existing configuration.

466

467

Parameters:

468

- updates: dictionary with configuration updates

469

470

Returns:

471

Updated Settings instance

472

"""

473

474

@staticmethod

475

def from_file(config_file: Union[Path, str]) -> 'Settings':

476

"""

477

Create Settings from YAML configuration file.

478

479

Parameters:

480

- config_file: path to YAML configuration file

481

482

Returns:

483

Settings instance with loaded configuration

484

"""

485

486

@property

487

def primary_color(self) -> str:

488

"""

489

Get primary color for backward compatibility.

490

491

Returns:

492

Primary color from style configuration

493

"""

494

```

495

496

**Usage Example:**

497

498

```python

499

from ydata_profiling.config import Settings

500

from pathlib import Path

501

502

# Load from file

503

config = Settings.from_file("custom_config.yaml")

504

505

# Update specific settings

506

updates = {

507

"title": "Updated Report Title",

508

"plot": {

509

"dpi": 600,

510

"image_format": "png"

511

},

512

"vars": {

513

"cat": {

514

"redact": True

515

}

516

}

517

}

518

519

updated_config = config.update(updates)

520

521

# Use updated configuration

522

report = ProfileReport(df, config=updated_config)

523

```

524

525

### Preset Configurations

526

527

Built-in configuration presets for common use cases.

528

529

**Built-in Presets:**

530

531

```python

532

# Minimal mode - fast profiling with reduced computation

533

ProfileReport(df, minimal=True)

534

535

# Explorative mode - comprehensive analysis with all features

536

ProfileReport(df, explorative=True)

537

538

# Sensitive mode - privacy-aware profiling

539

ProfileReport(df, sensitive=True)

540

541

# Time-series mode - specialized for time-series data

542

ProfileReport(df, tsmode=True, sortby='timestamp')

543

```

544

545

**Preset Details:**

546

547

- **Minimal**: Disables correlations, missing diagrams, and type inference for speed

548

- **Explorative**: Enables advanced text analysis, file analysis, and memory profiling

549

- **Sensitive**: Redacts categorical/text values and disables sample display

550

- **Time-series**: Enables autocorrelation analysis and time-based sorting

551

```