or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

aesthetic-mappings.mdcoordinate-systems.mdcore-plotting.mdfaceting.mdgeometric-objects.mdguides-and-legends.mdindex.mdlabels-and-annotations.mdposition-adjustments.mdsample-datasets.mdscales-and-axes.mdstatistical-transformations.mdthemes-and-styling.mdwatermarks.md

statistical-transformations.mddocs/

0

# Statistical Transformations

1

2

Statistical transformations (stats) transform your data before visualization through operations like binning, density estimation, smoothing, and statistical summaries. Stats compute new variables that can be mapped to aesthetics, enabling sophisticated data visualizations that go beyond raw data plotting. Each stat has computed aesthetics that provide access to the transformed values.

3

4

## Capabilities

5

6

### Identity and Counting Stats

7

8

Basic transformations including pass-through data and counting operations.

9

10

```python { .api }

11

def stat_identity(mapping=None, data=None, **kwargs):

12

"""

13

Identity transformation (no change to data).

14

15

Use this when you want to plot data as-is without any statistical transformation.

16

"""

17

18

def stat_count(mapping=None, data=None, **kwargs):

19

"""

20

Count the number of observations at each x position.

21

22

Required aesthetics: x

23

Optional aesthetics: weight

24

25

Computed aesthetics:

26

- count: number of observations

27

- prop: proportion of total observations

28

"""

29

30

def stat_sum(mapping=None, data=None, **kwargs):

31

"""

32

Sum overlapping points and map sum to size.

33

34

Required aesthetics: x, y

35

Optional aesthetics: size, weight

36

37

Computed aesthetics:

38

- n: sum of weights (or count if no weights)

39

- prop: proportion of total

40

"""

41

42

def stat_unique(mapping=None, data=None, **kwargs):

43

"""

44

Remove duplicate rows in data.

45

46

Useful for preventing overplotting when you have duplicate points.

47

"""

48

```

49

50

### Binning and Histograms

51

52

Transform continuous data into discrete bins for histograms and related visualizations.

53

54

```python { .api }

55

def stat_bin(mapping=None, data=None, bins=30, binwidth=None, center=None,

56

boundary=None, closed='right', pad=False, **kwargs):

57

"""

58

Bin data for histograms.

59

60

Required aesthetics: x

61

Optional aesthetics: weight

62

63

Parameters:

64

- bins: int, number of bins

65

- binwidth: float, width of bins

66

- center: float, center of one bin

67

- boundary: float, boundary of one bin

68

- closed: str, which side of interval is closed ('right', 'left')

69

- pad: bool, whether to pad bins

70

71

Computed aesthetics:

72

- count: number of observations in bin

73

- density: density of observations

74

- ncount: normalized count

75

- ndensity: normalized density

76

- width: bin width

77

"""

78

79

def stat_bin_2d(mapping=None, data=None, bins=30, binwidth=None, drop=True,

80

**kwargs):

81

"""

82

2D binning for heatmaps.

83

84

Required aesthetics: x, y

85

Optional aesthetics: weight

86

87

Parameters:

88

- bins: int or tuple, number of bins in each direction

89

- binwidth: float or tuple, width of bins

90

- drop: bool, whether to drop empty bins

91

92

Computed aesthetics:

93

- count: number of observations in bin

94

- density: density of observations

95

"""

96

97

def stat_bin2d(mapping=None, data=None, bins=30, binwidth=None, drop=True,

98

**kwargs):

99

"""

100

2D binning for heatmaps - alternative name for stat_bin_2d.

101

102

Required aesthetics: x, y

103

Optional aesthetics: weight

104

105

Parameters:

106

- bins: int or tuple, number of bins in each direction

107

- binwidth: float or tuple, width of bins

108

- drop: bool, whether to drop empty bins

109

110

Computed aesthetics:

111

- count: number of observations in bin

112

- density: density of observations

113

"""

114

115

def stat_bindot(mapping=None, data=None, binaxis='x', method='dotdensity',

116

binwidth=None, **kwargs):

117

"""

118

Bin data for dot plots.

119

120

Required aesthetics: x

121

122

Parameters:

123

- binaxis: str, axis to bin along ('x', 'y')

124

- method: str, binning method ('dotdensity', 'histodot')

125

- binwidth: float, width of bins

126

127

Computed aesthetics:

128

- count: number of observations in bin

129

- binwidth: width of bin

130

"""

131

```

132

133

### Density Estimation

134

135

Compute smooth density estimates for continuous distributions.

136

137

```python { .api }

138

def stat_density(mapping=None, data=None, bw='nrd0', adjust=1, kernel='gaussian',

139

n=512, trim=False, **kwargs):

140

"""

141

Compute smooth density estimates.

142

143

Required aesthetics: x

144

Optional aesthetics: weight

145

146

Parameters:

147

- bw: str or float, bandwidth selection method or value

148

- adjust: float, bandwidth adjustment factor

149

- kernel: str, kernel function ('gaussian', 'epanechnikov', etc.)

150

- n: int, number of evaluation points

151

- trim: bool, whether to trim density to data range

152

153

Computed aesthetics:

154

- density: density estimate

155

- count: density * number of observations

156

- scaled: density scaled to maximum of 1

157

"""

158

159

def stat_density_2d(mapping=None, data=None, **kwargs):

160

"""

161

2D density estimation for contour plots.

162

163

Required aesthetics: x, y

164

Optional aesthetics: weight

165

166

Computed aesthetics:

167

- level: contour level

168

- piece: contour piece identifier

169

"""

170

171

def stat_ydensity(mapping=None, data=None, **kwargs):

172

"""

173

Density estimates for violin plots.

174

175

Required aesthetics: x, y

176

177

Computed aesthetics:

178

- density: density estimate

179

- scaled: density scaled within groups

180

- count: density * number of observations

181

- violinwidth: density scaled for violin width

182

"""

183

```

184

185

### Smoothing and Trend Lines

186

187

Fit smooth curves and trend lines to data.

188

189

```python { .api }

190

def stat_smooth(mapping=None, data=None, method='auto', formula=None, se=True,

191

n=80, span=0.75, level=0.95, **kwargs):

192

"""

193

Compute smoothed conditional means.

194

195

Required aesthetics: x, y

196

Optional aesthetics: weight

197

198

Parameters:

199

- method: str, smoothing method ('auto', 'lm', 'glm', 'gam', 'loess')

200

- formula: str, model formula (for 'lm', 'glm', 'gam')

201

- se: bool, whether to compute confidence interval

202

- n: int, number of points to evaluate

203

- span: float, smoothing span (for 'loess')

204

- level: float, confidence level

205

206

Computed aesthetics:

207

- y: predicted values

208

- ymin, ymax: confidence interval bounds (if se=True)

209

- se: standard errors

210

"""

211

212

def stat_quantile(mapping=None, data=None, quantiles=None, formula=None,

213

**kwargs):

214

"""

215

Compute quantile regression lines.

216

217

Required aesthetics: x, y

218

Optional aesthetics: weight

219

220

Parameters:

221

- quantiles: list, quantiles to compute (default: [0.25, 0.5, 0.75])

222

- formula: str, model formula

223

224

Computed aesthetics:

225

- quantile: quantile level

226

"""

227

```

228

229

### Box Plot and Summary Statistics

230

231

Compute statistical summaries for box plots and related visualizations.

232

233

```python { .api }

234

def stat_boxplot(mapping=None, data=None, coef=1.5, **kwargs):

235

"""

236

Compute box plot statistics.

237

238

Required aesthetics: x or y (one discrete, one continuous)

239

Optional aesthetics: weight

240

241

Parameters:

242

- coef: float, multiplier for outlier detection

243

244

Computed aesthetics:

245

- lower: lower hinge (25th percentile)

246

- upper: upper hinge (75th percentile)

247

- middle: median (50th percentile)

248

- ymin: lower whisker

249

- ymax: upper whisker

250

- outliers: outlier values

251

"""

252

253

def stat_summary(mapping=None, data=None, fun_data=None, fun_y=None,

254

fun_ymax=None, fun_ymin=None, **kwargs):

255

"""

256

Summarize y values at each x.

257

258

Required aesthetics: x, y

259

260

Parameters:

261

- fun_data: function, returns dict with summary statistics

262

- fun_y: function, compute y summary

263

- fun_ymax, fun_ymin: functions, compute y range

264

265

Computed aesthetics depend on functions used:

266

- y: summary statistic

267

- ymin, ymax: range statistics (if computed)

268

"""

269

270

def stat_summary_bin(mapping=None, data=None, bins=30, **kwargs):

271

"""

272

Summarize y values in bins of x.

273

274

Required aesthetics: x, y

275

276

Parameters:

277

- bins: int, number of bins

278

- fun_data, fun_y, fun_ymax, fun_ymin: summary functions

279

280

Computed aesthetics:

281

- x: bin centers

282

- y: summary statistic

283

- ymin, ymax: range statistics (if computed)

284

"""

285

```

286

287

### Geometric and Spatial Stats

288

289

Compute geometric transformations and spatial statistics.

290

291

```python { .api }

292

def stat_hull(mapping=None, data=None, **kwargs):

293

"""

294

Compute convex hull of points.

295

296

Required aesthetics: x, y

297

Optional aesthetics: group

298

299

Returns hull vertices in order for drawing polygon.

300

"""

301

302

def stat_ellipse(mapping=None, data=None, type='t', level=0.95, segments=51,

303

**kwargs):

304

"""

305

Compute confidence ellipses.

306

307

Required aesthetics: x, y

308

309

Parameters:

310

- type: str, ellipse type ('t', 'norm', 'euclid')

311

- level: float, confidence level

312

- segments: int, number of points in ellipse

313

314

Computed aesthetics:

315

- x, y: ellipse boundary points

316

"""

317

318

def stat_sina(mapping=None, data=None, **kwargs):

319

"""

320

Compute sina plot positions (jittered violin).

321

322

Required aesthetics: x, y

323

324

Positions points based on local density to create violin-like shape

325

with individual points visible.

326

"""

327

```

328

329

### Distribution and Probability Stats

330

331

Work with probability distributions and cumulative distributions.

332

333

```python { .api }

334

def stat_ecdf(mapping=None, data=None, n=None, pad=True, **kwargs):

335

"""

336

Compute empirical cumulative distribution function.

337

338

Required aesthetics: x

339

340

Parameters:

341

- n: int, number of points to evaluate (default: use all data points)

342

- pad: bool, whether to pad with additional points

343

344

Computed aesthetics:

345

- y: cumulative probability

346

"""

347

348

def stat_qq(mapping=None, data=None, distribution='norm', dparams=None, **kwargs):

349

"""

350

Compute quantile-quantile plot statistics.

351

352

Required aesthetics: sample

353

354

Parameters:

355

- distribution: str or scipy distribution, theoretical distribution

356

- dparams: tuple, distribution parameters

357

358

Computed aesthetics:

359

- theoretical: theoretical quantiles

360

- sample: sample quantiles

361

"""

362

363

def stat_qq_line(mapping=None, data=None, distribution='norm', dparams=None,

364

**kwargs):

365

"""

366

Compute reference line for Q-Q plots.

367

368

Required aesthetics: sample

369

370

Parameters:

371

- distribution: str or scipy distribution, theoretical distribution

372

- dparams: tuple, distribution parameters

373

374

Computed aesthetics:

375

- slope, intercept: line parameters

376

"""

377

```

378

379

### Function and Point Density Stats

380

381

Evaluate functions and compute point densities.

382

383

```python { .api }

384

def stat_function(mapping=None, data=None, fun=None, xlim=None, n=101,

385

args=None, **kwargs):

386

"""

387

Evaluate and plot functions.

388

389

Parameters:

390

- fun: function, function to evaluate

391

- xlim: tuple, x range to evaluate over

392

- n: int, number of points to evaluate

393

- args: tuple, additional arguments to function

394

395

Computed aesthetics:

396

- x: evaluation points

397

- y: function values

398

"""

399

400

def stat_pointdensity(mapping=None, data=None, **kwargs):

401

"""

402

Compute local point density.

403

404

Required aesthetics: x, y

405

406

Computed aesthetics:

407

- density: local point density

408

- ndensity: normalized density

409

"""

410

```

411

412

## Usage Patterns

413

414

### Using Computed Aesthetics

415

```python

416

# Map fill to computed count in histogram

417

ggplot(data, aes(x='value')) + \

418

geom_histogram(aes(fill=after_stat('count')), stat='bin', bins=20)

419

420

# Use density instead of count for histogram

421

ggplot(data, aes(x='value')) + \

422

geom_histogram(aes(y=after_stat('density')), stat='bin', bins=20)

423

424

# Color points by local density

425

ggplot(data, aes(x='x', y='y')) + \

426

geom_point(aes(color=after_stat('density')), stat='pointdensity')

427

```

428

429

### Custom Statistical Summaries

430

```python

431

# Custom summary function

432

def mean_se(x):

433

return {'y': np.mean(x), 'ymin': np.mean(x) - np.std(x)/np.sqrt(len(x)),

434

'ymax': np.mean(x) + np.std(x)/np.sqrt(len(x))}

435

436

ggplot(data, aes(x='group', y='value')) + \

437

stat_summary(fun_data=mean_se, geom='pointrange')

438

```

439

440

### Combining Stats with Geoms

441

```python

442

# Density curve with rug plot

443

ggplot(data, aes(x='value')) + \

444

stat_density(geom='line') + \

445

geom_rug(sides='b')

446

447

# Smooth with confidence band

448

ggplot(data, aes(x='x', y='y')) + \

449

geom_point(alpha=0.5) + \

450

stat_smooth(method='lm', se=True)

451

```