Tessl Tile for pypi/ydata-profiling@4.16.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

analysis-components.md configuration.md console-interface.md core-profiling.md index.md pandas-integration.md report-comparison.md

console-interface.mddocs/

0
# Console Interface
1

2
Command-line interface for generating profiling reports directly from CSV files without writing Python code. The console interface provides a convenient way to generate reports in automated workflows, CI/CD pipelines, and for users who prefer command-line tools.
3

4
## Capabilities
5

6
### Command-Line Executable
7

8
The `ydata_profiling` command-line tool allows direct profiling of CSV files with customizable options.
9

10
```bash { .api }
11
ydata_profiling [OPTIONS] INPUT_FILE OUTPUT_FILE
12
```
13

14
**Parameters:**
15
- `INPUT_FILE`: Path to the CSV file to profile
16
- `OUTPUT_FILE`: Path where to save the HTML report
17
- `--title TEXT`: Report title (default: "YData Profiling Report")  
18
- `--config_file PATH`: Path to YAML configuration file
19
- `--minimal`: Use minimal configuration for faster processing
20
- `--explorative`: Use explorative configuration for comprehensive analysis
21
- `--sensitive`: Use sensitive configuration for privacy-aware profiling
22
- `--pool_size INTEGER`: Number of worker processes (default: 0, auto-detect CPU count)
23
- `--progress_bar / --no_progress_bar`: Show/hide progress bar (default: show)
24
- `--help`: Show help message and exit
25

26
**Usage Examples:**
27

28
```bash
29
# Basic usage
30
ydata_profiling data.csv report.html
31

32
# With custom title
33
ydata_profiling --title "Sales Data Analysis" sales.csv sales_report.html
34

35
# Using minimal mode for faster processing
36
ydata_profiling --minimal large_dataset.csv quick_report.html
37

38
# Using explorative mode for comprehensive analysis
39
ydata_profiling --explorative --title "Detailed Analysis" data.csv detailed_report.html
40

41
# With custom configuration file
42
ydata_profiling --config_file custom_config.yaml data.csv custom_report.html
43

44
# For sensitive data with privacy controls
45
ydata_profiling --sensitive customer_data.csv privacy_report.html
46

47
# With custom worker processes
48
ydata_profiling --pool_size 8 --title "Multi-threaded Analysis" data.csv report.html
49
```
50

51
### Legacy Command Support
52

53
For backward compatibility, the deprecated `pandas_profiling` command is also available:
54

55
```bash { .api }
56
pandas_profiling [OPTIONS] INPUT_FILE OUTPUT_FILE
57
```
58

59
**Note:** The `pandas_profiling` command has identical functionality to `ydata_profiling` but is deprecated. Use `ydata_profiling` for new projects.
60

61
### Configuration File Support
62

63
Use YAML configuration files with the console interface for complex customization:
64

65
**Example config.yaml:**
66
```yaml
67
title: "Production Data Report"
68
pool_size: 4
69
progress_bar: true
70

71
dataset:
72
  description: "Customer transaction dataset"
73
  creator: "Data Engineering Team"
74

75
vars:
76
  num:
77
    low_categorical_threshold: 10
78
  cat:
79
    cardinality_threshold: 50
80
    redact: false
81

82
correlations:
83
  pearson:
84
    calculate: true
85
  spearman:
86
    calculate: true
87

88
plot:
89
  dpi: 300
90
  image_format: "png"
91

92
html:
93
  minify_html: true
94
  full_width: false
95
```
96

97
**Usage:**
98
```bash
99
ydata_profiling --config_file config.yaml data.csv production_report.html
100
```
101

102
### Integration with Shell Scripts
103

104
Integrate the console interface into shell scripts and automation workflows:
105

106
**Batch Processing Script:**
107
```bash
108
#!/bin/bash
109

110
# Process multiple CSV files
111
for file in data/*.csv; do
112
    base_name=$(basename "$file" .csv)
113
    echo "Processing $file..."
114
    
115
    ydata_profiling \
116
        --title "Analysis of $base_name" \
117
        --explorative \
118
        --pool_size 4 \
119
        "$file" \
120
        "reports/${base_name}_report.html"
121
        
122
    echo "Report saved: reports/${base_name}_report.html"
123
done
124

125
echo "All files processed!"
126
```
127

128
**CI/CD Pipeline Integration:**
129
```bash
130
# In your CI/CD pipeline
131
ydata_profiling \
132
    --title "Data Quality Check - Build $BUILD_NUMBER" \
133
    --config_file .ydata_profiling_config.yaml \
134
    data/input.csv \
135
    artifacts/data_quality_report.html
136

137
# Check if report was generated successfully
138
if [ -f "artifacts/data_quality_report.html" ]; then
139
    echo "Data quality report generated successfully"
140
else
141
    echo "Failed to generate data quality report"
142
    exit 1
143
fi
144
```
145

146
### Error Handling and Exit Codes
147

148
The console interface provides meaningful exit codes for automation:
149

150
- **0**: Success - Report generated successfully
151
- **1**: General error - Invalid arguments or processing failure
152
- **2**: Input file error - File not found or not readable
153
- **3**: Output file error - Cannot write to output location
154
- **4**: Configuration error - Invalid configuration file or settings
155

156
**Example Error Handling:**
157
```bash
158
#!/bin/bash
159

160
ydata_profiling data.csv report.html
161
exit_code=$?
162

163
case $exit_code in
164
    0)
165
        echo "Success: Report generated"
166
        ;;
167
    1)
168
        echo "Error: General processing failure"
169
        exit 1
170
        ;;
171
    2)
172
        echo "Error: Cannot read input file"
173
        exit 1
174
        ;;
175
    3)
176
        echo "Error: Cannot write output file"
177
        exit 1
178
        ;;
179
    4)
180
        echo "Error: Invalid configuration"
181
        exit 1
182
        ;;
183
    *)
184
        echo "Error: Unknown error (code: $exit_code)"
185
        exit 1
186
        ;;
187
esac
188
```
189

190
### Performance Considerations
191

192
For optimal performance with the console interface:
193

194
**Large Files:**
195
```bash
196
# Use minimal mode for files > 1GB
197
ydata_profiling --minimal --pool_size 8 large_file.csv quick_report.html
198

199
# Custom configuration for memory optimization
200
echo "
201
pool_size: 8
202
infer_dtypes: false
203
correlations:
204
  auto: false
205
missing_diagrams:
206
  matrix: false
207
  dendrogram: false
208
" > minimal_config.yaml
209

210
ydata_profiling --config_file minimal_config.yaml large_file.csv report.html
211
```
212

213
**Multiple Files:**
214
```bash
215
# Process files in parallel using background processes
216
for file in data/*.csv; do
217
    ydata_profiling --minimal "$file" "reports/$(basename "$file" .csv).html" &
218
done
219
wait  # Wait for all background processes to complete
220
```
221

222
### Integration with Data Pipelines
223

224
Common integration patterns with data processing tools:
225

226
**Apache Airflow DAG:**
227
```python
228
from airflow import DAG
229
from airflow.operators.bash import BashOperator
230

231
profiling_task = BashOperator(
232
    task_id='generate_data_profile',
233
    bash_command='''
234
    ydata_profiling \
235
        --title "Daily Data Quality Report - {{ ds }}" \
236
        --config_file /opt/airflow/configs/profiling_config.yaml \
237
        /data/daily_data_{{ ds }}.csv \
238
        /reports/daily_profile_{{ ds }}.html
239
    ''',
240
    dag=dag
241
)
242
```
243

244
**Make Integration:**
245
```makefile
246
# Makefile for data profiling
247
.PHONY: profile-data
248

249
profile-data: data/processed.csv
250
	ydata_profiling \
251
		--title "Data Processing Report" \
252
		--explorative \
253
		data/processed.csv \
254
		reports/data_profile.html
255

256
reports/data_profile.html: data/processed.csv
257
	@mkdir -p reports
258
	ydata_profiling \
259
		--config_file config/profiling.yaml \
260
		$< $@
261
```

Version

Tile

Files

console-interface.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

console-interface.mddocs/