0
# Console Interface
1
2
Command-line interface for generating profiling reports directly from CSV files without writing Python code. The console interface provides a convenient way to generate reports in automated workflows, CI/CD pipelines, and for users who prefer command-line tools.
3
4
## Capabilities
5
6
### Command-Line Executable
7
8
The `ydata_profiling` command-line tool allows direct profiling of CSV files with customizable options.
9
10
```bash { .api }
11
ydata_profiling [OPTIONS] INPUT_FILE OUTPUT_FILE
12
```
13
14
**Parameters:**
15
- `INPUT_FILE`: Path to the CSV file to profile
16
- `OUTPUT_FILE`: Path where to save the HTML report
17
- `--title TEXT`: Report title (default: "YData Profiling Report")
18
- `--config_file PATH`: Path to YAML configuration file
19
- `--minimal`: Use minimal configuration for faster processing
20
- `--explorative`: Use explorative configuration for comprehensive analysis
21
- `--sensitive`: Use sensitive configuration for privacy-aware profiling
22
- `--pool_size INTEGER`: Number of worker processes (default: 0, auto-detect CPU count)
23
- `--progress_bar / --no_progress_bar`: Show/hide progress bar (default: show)
24
- `--help`: Show help message and exit
25
26
**Usage Examples:**
27
28
```bash
29
# Basic usage
30
ydata_profiling data.csv report.html
31
32
# With custom title
33
ydata_profiling --title "Sales Data Analysis" sales.csv sales_report.html
34
35
# Using minimal mode for faster processing
36
ydata_profiling --minimal large_dataset.csv quick_report.html
37
38
# Using explorative mode for comprehensive analysis
39
ydata_profiling --explorative --title "Detailed Analysis" data.csv detailed_report.html
40
41
# With custom configuration file
42
ydata_profiling --config_file custom_config.yaml data.csv custom_report.html
43
44
# For sensitive data with privacy controls
45
ydata_profiling --sensitive customer_data.csv privacy_report.html
46
47
# With custom worker processes
48
ydata_profiling --pool_size 8 --title "Multi-threaded Analysis" data.csv report.html
49
```
50
51
### Legacy Command Support
52
53
For backward compatibility, the deprecated `pandas_profiling` command is also available:
54
55
```bash { .api }
56
pandas_profiling [OPTIONS] INPUT_FILE OUTPUT_FILE
57
```
58
59
**Note:** The `pandas_profiling` command has identical functionality to `ydata_profiling` but is deprecated. Use `ydata_profiling` for new projects.
60
61
### Configuration File Support
62
63
Use YAML configuration files with the console interface for complex customization:
64
65
**Example config.yaml:**
66
```yaml
67
title: "Production Data Report"
68
pool_size: 4
69
progress_bar: true
70
71
dataset:
72
description: "Customer transaction dataset"
73
creator: "Data Engineering Team"
74
75
vars:
76
num:
77
low_categorical_threshold: 10
78
cat:
79
cardinality_threshold: 50
80
redact: false
81
82
correlations:
83
pearson:
84
calculate: true
85
spearman:
86
calculate: true
87
88
plot:
89
dpi: 300
90
image_format: "png"
91
92
html:
93
minify_html: true
94
full_width: false
95
```
96
97
**Usage:**
98
```bash
99
ydata_profiling --config_file config.yaml data.csv production_report.html
100
```
101
102
### Integration with Shell Scripts
103
104
Integrate the console interface into shell scripts and automation workflows:
105
106
**Batch Processing Script:**
107
```bash
108
#!/bin/bash
109
110
# Process multiple CSV files
111
for file in data/*.csv; do
112
base_name=$(basename "$file" .csv)
113
echo "Processing $file..."
114
115
ydata_profiling \
116
--title "Analysis of $base_name" \
117
--explorative \
118
--pool_size 4 \
119
"$file" \
120
"reports/${base_name}_report.html"
121
122
echo "Report saved: reports/${base_name}_report.html"
123
done
124
125
echo "All files processed!"
126
```
127
128
**CI/CD Pipeline Integration:**
129
```bash
130
# In your CI/CD pipeline
131
ydata_profiling \
132
--title "Data Quality Check - Build $BUILD_NUMBER" \
133
--config_file .ydata_profiling_config.yaml \
134
data/input.csv \
135
artifacts/data_quality_report.html
136
137
# Check if report was generated successfully
138
if [ -f "artifacts/data_quality_report.html" ]; then
139
echo "Data quality report generated successfully"
140
else
141
echo "Failed to generate data quality report"
142
exit 1
143
fi
144
```
145
146
### Error Handling and Exit Codes
147
148
The console interface provides meaningful exit codes for automation:
149
150
- **0**: Success - Report generated successfully
151
- **1**: General error - Invalid arguments or processing failure
152
- **2**: Input file error - File not found or not readable
153
- **3**: Output file error - Cannot write to output location
154
- **4**: Configuration error - Invalid configuration file or settings
155
156
**Example Error Handling:**
157
```bash
158
#!/bin/bash
159
160
ydata_profiling data.csv report.html
161
exit_code=$?
162
163
case $exit_code in
164
0)
165
echo "Success: Report generated"
166
;;
167
1)
168
echo "Error: General processing failure"
169
exit 1
170
;;
171
2)
172
echo "Error: Cannot read input file"
173
exit 1
174
;;
175
3)
176
echo "Error: Cannot write output file"
177
exit 1
178
;;
179
4)
180
echo "Error: Invalid configuration"
181
exit 1
182
;;
183
*)
184
echo "Error: Unknown error (code: $exit_code)"
185
exit 1
186
;;
187
esac
188
```
189
190
### Performance Considerations
191
192
For optimal performance with the console interface:
193
194
**Large Files:**
195
```bash
196
# Use minimal mode for files > 1GB
197
ydata_profiling --minimal --pool_size 8 large_file.csv quick_report.html
198
199
# Custom configuration for memory optimization
200
echo "
201
pool_size: 8
202
infer_dtypes: false
203
correlations:
204
auto: false
205
missing_diagrams:
206
matrix: false
207
dendrogram: false
208
" > minimal_config.yaml
209
210
ydata_profiling --config_file minimal_config.yaml large_file.csv report.html
211
```
212
213
**Multiple Files:**
214
```bash
215
# Process files in parallel using background processes
216
for file in data/*.csv; do
217
ydata_profiling --minimal "$file" "reports/$(basename "$file" .csv).html" &
218
done
219
wait # Wait for all background processes to complete
220
```
221
222
### Integration with Data Pipelines
223
224
Common integration patterns with data processing tools:
225
226
**Apache Airflow DAG:**
227
```python
228
from airflow import DAG
229
from airflow.operators.bash import BashOperator
230
231
profiling_task = BashOperator(
232
task_id='generate_data_profile',
233
bash_command='''
234
ydata_profiling \
235
--title "Daily Data Quality Report - {{ ds }}" \
236
--config_file /opt/airflow/configs/profiling_config.yaml \
237
/data/daily_data_{{ ds }}.csv \
238
/reports/daily_profile_{{ ds }}.html
239
''',
240
dag=dag
241
)
242
```
243
244
**Make Integration:**
245
```makefile
246
# Makefile for data profiling
247
.PHONY: profile-data
248
249
profile-data: data/processed.csv
250
ydata_profiling \
251
--title "Data Processing Report" \
252
--explorative \
253
data/processed.csv \
254
reports/data_profile.html
255
256
reports/data_profile.html: data/processed.csv
257
@mkdir -p reports
258
ydata_profiling \
259
--config_file config/profiling.yaml \
260
$< $@
261
```