or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

analysis-components.mdconfiguration.mdconsole-interface.mdcore-profiling.mdindex.mdpandas-integration.mdreport-comparison.md

console-interface.mddocs/

0

# Console Interface

1

2

Command-line interface for generating profiling reports directly from CSV files without writing Python code. The console interface provides a convenient way to generate reports in automated workflows, CI/CD pipelines, and for users who prefer command-line tools.

3

4

## Capabilities

5

6

### Command-Line Executable

7

8

The `ydata_profiling` command-line tool allows direct profiling of CSV files with customizable options.

9

10

```bash { .api }

11

ydata_profiling [OPTIONS] INPUT_FILE OUTPUT_FILE

12

```

13

14

**Parameters:**

15

- `INPUT_FILE`: Path to the CSV file to profile

16

- `OUTPUT_FILE`: Path where to save the HTML report

17

- `--title TEXT`: Report title (default: "YData Profiling Report")

18

- `--config_file PATH`: Path to YAML configuration file

19

- `--minimal`: Use minimal configuration for faster processing

20

- `--explorative`: Use explorative configuration for comprehensive analysis

21

- `--sensitive`: Use sensitive configuration for privacy-aware profiling

22

- `--pool_size INTEGER`: Number of worker processes (default: 0, auto-detect CPU count)

23

- `--progress_bar / --no_progress_bar`: Show/hide progress bar (default: show)

24

- `--help`: Show help message and exit

25

26

**Usage Examples:**

27

28

```bash

29

# Basic usage

30

ydata_profiling data.csv report.html

31

32

# With custom title

33

ydata_profiling --title "Sales Data Analysis" sales.csv sales_report.html

34

35

# Using minimal mode for faster processing

36

ydata_profiling --minimal large_dataset.csv quick_report.html

37

38

# Using explorative mode for comprehensive analysis

39

ydata_profiling --explorative --title "Detailed Analysis" data.csv detailed_report.html

40

41

# With custom configuration file

42

ydata_profiling --config_file custom_config.yaml data.csv custom_report.html

43

44

# For sensitive data with privacy controls

45

ydata_profiling --sensitive customer_data.csv privacy_report.html

46

47

# With custom worker processes

48

ydata_profiling --pool_size 8 --title "Multi-threaded Analysis" data.csv report.html

49

```

50

51

### Legacy Command Support

52

53

For backward compatibility, the deprecated `pandas_profiling` command is also available:

54

55

```bash { .api }

56

pandas_profiling [OPTIONS] INPUT_FILE OUTPUT_FILE

57

```

58

59

**Note:** The `pandas_profiling` command has identical functionality to `ydata_profiling` but is deprecated. Use `ydata_profiling` for new projects.

60

61

### Configuration File Support

62

63

Use YAML configuration files with the console interface for complex customization:

64

65

**Example config.yaml:**

66

```yaml

67

title: "Production Data Report"

68

pool_size: 4

69

progress_bar: true

70

71

dataset:

72

description: "Customer transaction dataset"

73

creator: "Data Engineering Team"

74

75

vars:

76

num:

77

low_categorical_threshold: 10

78

cat:

79

cardinality_threshold: 50

80

redact: false

81

82

correlations:

83

pearson:

84

calculate: true

85

spearman:

86

calculate: true

87

88

plot:

89

dpi: 300

90

image_format: "png"

91

92

html:

93

minify_html: true

94

full_width: false

95

```

96

97

**Usage:**

98

```bash

99

ydata_profiling --config_file config.yaml data.csv production_report.html

100

```

101

102

### Integration with Shell Scripts

103

104

Integrate the console interface into shell scripts and automation workflows:

105

106

**Batch Processing Script:**

107

```bash

108

#!/bin/bash

109

110

# Process multiple CSV files

111

for file in data/*.csv; do

112

base_name=$(basename "$file" .csv)

113

echo "Processing $file..."

114

115

ydata_profiling \

116

--title "Analysis of $base_name" \

117

--explorative \

118

--pool_size 4 \

119

"$file" \

120

"reports/${base_name}_report.html"

121

122

echo "Report saved: reports/${base_name}_report.html"

123

done

124

125

echo "All files processed!"

126

```

127

128

**CI/CD Pipeline Integration:**

129

```bash

130

# In your CI/CD pipeline

131

ydata_profiling \

132

--title "Data Quality Check - Build $BUILD_NUMBER" \

133

--config_file .ydata_profiling_config.yaml \

134

data/input.csv \

135

artifacts/data_quality_report.html

136

137

# Check if report was generated successfully

138

if [ -f "artifacts/data_quality_report.html" ]; then

139

echo "Data quality report generated successfully"

140

else

141

echo "Failed to generate data quality report"

142

exit 1

143

fi

144

```

145

146

### Error Handling and Exit Codes

147

148

The console interface provides meaningful exit codes for automation:

149

150

- **0**: Success - Report generated successfully

151

- **1**: General error - Invalid arguments or processing failure

152

- **2**: Input file error - File not found or not readable

153

- **3**: Output file error - Cannot write to output location

154

- **4**: Configuration error - Invalid configuration file or settings

155

156

**Example Error Handling:**

157

```bash

158

#!/bin/bash

159

160

ydata_profiling data.csv report.html

161

exit_code=$?

162

163

case $exit_code in

164

0)

165

echo "Success: Report generated"

166

;;

167

1)

168

echo "Error: General processing failure"

169

exit 1

170

;;

171

2)

172

echo "Error: Cannot read input file"

173

exit 1

174

;;

175

3)

176

echo "Error: Cannot write output file"

177

exit 1

178

;;

179

4)

180

echo "Error: Invalid configuration"

181

exit 1

182

;;

183

*)

184

echo "Error: Unknown error (code: $exit_code)"

185

exit 1

186

;;

187

esac

188

```

189

190

### Performance Considerations

191

192

For optimal performance with the console interface:

193

194

**Large Files:**

195

```bash

196

# Use minimal mode for files > 1GB

197

ydata_profiling --minimal --pool_size 8 large_file.csv quick_report.html

198

199

# Custom configuration for memory optimization

200

echo "

201

pool_size: 8

202

infer_dtypes: false

203

correlations:

204

auto: false

205

missing_diagrams:

206

matrix: false

207

dendrogram: false

208

" > minimal_config.yaml

209

210

ydata_profiling --config_file minimal_config.yaml large_file.csv report.html

211

```

212

213

**Multiple Files:**

214

```bash

215

# Process files in parallel using background processes

216

for file in data/*.csv; do

217

ydata_profiling --minimal "$file" "reports/$(basename "$file" .csv).html" &

218

done

219

wait # Wait for all background processes to complete

220

```

221

222

### Integration with Data Pipelines

223

224

Common integration patterns with data processing tools:

225

226

**Apache Airflow DAG:**

227

```python

228

from airflow import DAG

229

from airflow.operators.bash import BashOperator

230

231

profiling_task = BashOperator(

232

task_id='generate_data_profile',

233

bash_command='''

234

ydata_profiling \

235

--title "Daily Data Quality Report - {{ ds }}" \

236

--config_file /opt/airflow/configs/profiling_config.yaml \

237

/data/daily_data_{{ ds }}.csv \

238

/reports/daily_profile_{{ ds }}.html

239

''',

240

dag=dag

241

)

242

```

243

244

**Make Integration:**

245

```makefile

246

# Makefile for data profiling

247

.PHONY: profile-data

248

249

profile-data: data/processed.csv

250

ydata_profiling \

251

--title "Data Processing Report" \

252

--explorative \

253

data/processed.csv \

254

reports/data_profile.html

255

256

reports/data_profile.html: data/processed.csv

257

@mkdir -p reports

258

ydata_profiling \

259

--config_file config/profiling.yaml \

260

$< $@

261

```