CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-vega

A declarative visualization grammar for creating interactive data visualizations through JSON specifications.

Pending
Overview
Eval results
Files

statistics.mddocs/

Statistical Functions

Vega's comprehensive statistical analysis library provides distribution functions, regression analysis, binning, bootstrap methods, and random number generation for data analysis and visualization.

Capabilities

Distribution Functions

Core statistical distribution analysis and data binning.

/**
 * Calculate optimal bandwidth for kernel density estimation using Normal Reference Distribution rule
 * @param values - Array of numeric values
 * @returns Optimal bandwidth value
 */
function bandwidthNRD(values: number[]): number;

/**
 * Create histogram bins for data
 * @param options - Binning configuration options
 * @returns Binning function
 */
function bin(options: BinOptions): (values: number[]) => Bin[];

/**
 * Create dot plot bins with specified parameters
 * @param options - Dot binning options
 * @returns Dot binning function
 */
function dotbin(options: DotBinOptions): (values: number[]) => DotBin[];

/**
 * Calculate data quantiles
 * @param values - Array of numeric values
 * @param p - Array of quantile probabilities (0-1)
 * @returns Array of quantile values
 */
function quantiles(values: number[], p: number[]): number[];

/**
 * Calculate data quartiles (25th, 50th, 75th percentiles)
 * @param values - Array of numeric values
 * @returns Tuple of [Q1, Q2, Q3] values
 */
function quartiles(values: number[]): [number, number, number];

interface BinOptions {
  /** Minimum bin boundary */
  min?: number;
  
  /** Maximum bin boundary */
  max?: number;
  
  /** Number of bins */
  maxbins?: number;
  
  /** Bin step size */
  step?: number;
  
  /** Bin boundaries array */
  steps?: number[];
  
  /** Nice bin boundaries */
  nice?: boolean;
}

interface Bin {
  /** Bin start value */
  x0: number;
  
  /** Bin end value */
  x1: number;
  
  /** Count of values in bin */
  count: number;
}

interface DotBinOptions {
  /** Dot size */
  size?: number;
  
  /** Spacing between dots */
  spacing?: number;
  
  /** Grouping field */
  groupby?: string;
}

interface DotBin {
  /** Bin position */
  bin: number;
  
  /** Values in bin */
  values: any[];
  
  /** Bin count */
  count: number;
}

Random Number Generation

Comprehensive random number generation with multiple distribution support.

/**
 * Set global random number generator
 * @param rng - Random number generator function
 */
function setRandom(rng: () => number): void;

/**
 * Get global random number generator
 * @returns Current random number generator
 */
function random(): () => number;

/**
 * Linear congruential generator for reproducible random numbers
 * @param seed - Optional seed value
 * @returns Random number generator function
 */
function randomLCG(seed?: number): () => number;

/**
 * Random integer generator
 * @param min - Minimum value (inclusive)
 * @param max - Maximum value (exclusive)
 * @returns Random integer generator function
 */
function randomInteger(min: number, max: number): () => number;

/**
 * Kernel density estimation random sampling
 * @param data - Data points for KDE
 * @param bandwidth - KDE bandwidth
 * @returns Random sampler function
 */
function randomKDE(data: number[], bandwidth?: number): () => number;

/**
 * Random sampling from mixture distributions
 * @param distributions - Array of distribution objects
 * @param weights - Optional weights for each distribution
 * @returns Random sampler function
 */
function randomMixture(distributions: Distribution[], weights?: number[]): () => number;

interface Distribution {
  /** Distribution sampler function */
  sample: () => number;
  
  /** Distribution weight */
  weight?: number;
}

Normal Distribution

Complete normal distribution functions including sampling, density, and quantiles.

/**
 * Random normal distribution generator
 * @param mu - Mean (default: 0)
 * @param sigma - Standard deviation (default: 1)
 * @returns Random normal sampler function
 */
function randomNormal(mu?: number, sigma?: number): () => number;

/**
 * Sample from normal distribution
 * @param mu - Mean
 * @param sigma - Standard deviation
 * @returns Random sample
 */
function sampleNormal(mu: number, sigma: number): number;

/**
 * Normal distribution probability density function
 * @param x - Value to evaluate
 * @param mu - Mean
 * @param sigma - Standard deviation
 * @returns Density value
 */
function densityNormal(x: number, mu: number, sigma: number): number;

/**
 * Normal distribution cumulative distribution function
 * @param x - Value to evaluate
 * @param mu - Mean
 * @param sigma - Standard deviation
 * @returns Cumulative probability
 */
function cumulativeNormal(x: number, mu: number, sigma: number): number;

/**
 * Normal distribution quantile function (inverse CDF)
 * @param p - Probability (0-1)
 * @param mu - Mean
 * @param sigma - Standard deviation
 * @returns Quantile value
 */
function quantileNormal(p: number, mu: number, sigma: number): number;

Log-Normal Distribution

Log-normal distribution functions for skewed data analysis.

/**
 * Random log-normal distribution generator
 * @param mu - Log-scale mean (default: 0)
 * @param sigma - Log-scale standard deviation (default: 1)
 * @returns Random log-normal sampler function
 */
function randomLogNormal(mu?: number, sigma?: number): () => number;

/**
 * Sample from log-normal distribution
 * @param mu - Log-scale mean
 * @param sigma - Log-scale standard deviation
 * @returns Random sample
 */
function sampleLogNormal(mu: number, sigma: number): number;

/**
 * Log-normal distribution probability density function
 * @param x - Value to evaluate
 * @param mu - Log-scale mean
 * @param sigma - Log-scale standard deviation
 * @returns Density value
 */
function densityLogNormal(x: number, mu: number, sigma: number): number;

/**
 * Log-normal distribution cumulative distribution function
 * @param x - Value to evaluate
 * @param mu - Log-scale mean
 * @param sigma - Log-scale standard deviation
 * @returns Cumulative probability
 */
function cumulativeLogNormal(x: number, mu: number, sigma: number): number;

/**
 * Log-normal distribution quantile function
 * @param p - Probability (0-1)
 * @param mu - Log-scale mean
 * @param sigma - Log-scale standard deviation
 * @returns Quantile value
 */
function quantileLogNormal(p: number, mu: number, sigma: number): number;

Uniform Distribution

Uniform distribution functions for random sampling within ranges.

/**
 * Random uniform distribution generator
 * @param min - Minimum value (default: 0)
 * @param max - Maximum value (default: 1)
 * @returns Random uniform sampler function
 */
function randomUniform(min?: number, max?: number): () => number;

/**
 * Sample from uniform distribution
 * @param min - Minimum value
 * @param max - Maximum value
 * @returns Random sample
 */
function sampleUniform(min: number, max: number): number;

/**
 * Uniform distribution probability density function
 * @param x - Value to evaluate
 * @param min - Minimum value
 * @param max - Maximum value
 * @returns Density value
 */
function densityUniform(x: number, min: number, max: number): number;

/**
 * Uniform distribution cumulative distribution function
 * @param x - Value to evaluate
 * @param min - Minimum value
 * @param max - Maximum value
 * @returns Cumulative probability
 */
function cumulativeUniform(x: number, min: number, max: number): number;

/**
 * Uniform distribution quantile function
 * @param p - Probability (0-1)
 * @param min - Minimum value
 * @param max - Maximum value
 * @returns Quantile value
 */
function quantileUniform(p: number, min: number, max: number): number;

Regression Analysis

Comprehensive regression methods for trend analysis and curve fitting.

/**
 * Constant regression (horizontal line)
 * @param data - Array of [x, y] data points
 * @returns Regression result with constant value
 */
function regressionConstant(data: [number, number][]): RegressionResult;

/**
 * Linear regression (y = ax + b)
 * @param data - Array of [x, y] data points
 * @returns Linear regression result
 */
function regressionLinear(data: [number, number][]): RegressionResult;

/**
 * Logarithmic regression (y = a * log(x) + b)
 * @param data - Array of [x, y] data points
 * @returns Logarithmic regression result
 */
function regressionLog(data: [number, number][]): RegressionResult;

/**
 * Exponential regression (y = a * e^(b * x))
 * @param data - Array of [x, y] data points
 * @returns Exponential regression result
 */
function regressionExp(data: [number, number][]): RegressionResult;

/**
 * Power regression (y = a * x^b)
 * @param data - Array of [x, y] data points
 * @returns Power regression result
 */
function regressionPow(data: [number, number][]): RegressionResult;

/**
 * Quadratic regression (y = ax^2 + bx + c)
 * @param data - Array of [x, y] data points
 * @returns Quadratic regression result
 */
function regressionQuad(data: [number, number][]): RegressionResult;

/**
 * Polynomial regression of specified degree
 * @param data - Array of [x, y] data points
 * @param order - Polynomial degree
 * @returns Polynomial regression result
 */
function regressionPoly(data: [number, number][], order: number): RegressionResult;

/**
 * LOESS (locally weighted regression) smoothing
 * @param data - Array of [x, y] data points
 * @param options - LOESS configuration options
 * @returns LOESS regression result
 */
function regressionLoess(data: [number, number][], options?: LoessOptions): RegressionResult;

interface RegressionResult {
  /** Regression coefficients */
  coef: number[];
  
  /** R-squared value */
  rSquared: number;
  
  /** Prediction function */
  predict: (x: number) => number;
  
  /** Generate curve points */
  points: (n?: number) => [number, number][];
}

interface LoessOptions {
  /** Bandwidth parameter (0-1) */
  bandwidth?: number;
  
  /** Number of robustness iterations */
  robustnessIterations?: number;
  
  /** Accuracy threshold */
  accuracy?: number;
}

Bootstrap Methods

Bootstrap resampling for confidence interval estimation.

/**
 * Bootstrap confidence interval calculation
 * @param data - Original dataset
 * @param statistic - Statistic function to bootstrap
 * @param options - Bootstrap configuration
 * @returns Bootstrap confidence interval
 */
function bootstrapCI(
  data: any[], 
  statistic: (sample: any[]) => number, 
  options?: BootstrapOptions
): BootstrapResult;

/**
 * Sample points along a curve
 * @param curve - Curve function or data points
 * @param options - Sampling options
 * @returns Array of sampled points
 */
function sampleCurve(
  curve: ((x: number) => number) | [number, number][], 
  options?: SampleCurveOptions
): [number, number][];

interface BootstrapOptions {
  /** Number of bootstrap samples */
  samples?: number;
  
  /** Confidence level (default: 0.95) */
  confidence?: number;
  
  /** Random number generator */
  random?: () => number;
}

interface BootstrapResult {
  /** Lower confidence bound */
  lo: number;
  
  /** Upper confidence bound */
  hi: number;
  
  /** Bootstrap samples */
  samples?: number[];
}

interface SampleCurveOptions {
  /** Number of sample points */
  steps?: number;
  
  /** X-axis range */
  extent?: [number, number];
  
  /** Minimum X value */
  min?: number;
  
  /** Maximum X value */
  max?: number;
}

Usage Examples

Data Binning

import { bin, quantiles } from "vega";

const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

// Create histogram bins
const binFunc = bin({ maxbins: 5, nice: true });
const bins = binFunc(data);
console.log(bins);
// [{x0: 0, x1: 2, count: 2}, {x0: 2, x1: 4, count: 2}, ...]

// Calculate quantiles
const q = quantiles(data, [0.25, 0.5, 0.75]);
console.log(q); // [3.25, 5.5, 7.75]

Random Sampling

import { randomNormal, randomUniform, setRandom, randomLCG } from "vega";

// Use reproducible random generator
setRandom(randomLCG(42));

// Generate normal random numbers
const normal = randomNormal(0, 1);
const samples = Array.from({length: 1000}, () => normal());

// Generate uniform random numbers
const uniform = randomUniform(0, 100);
const uniformSamples = Array.from({length: 100}, () => uniform());

Regression Analysis

import { regressionLinear, regressionLoess, sampleCurve } from "vega";

const data = [
  [1, 2], [2, 4], [3, 7], [4, 9], [5, 12]
];

// Linear regression
const linear = regressionLinear(data);
console.log(linear.coef); // [slope, intercept]
console.log(linear.rSquared); // R-squared value

// Generate prediction points
const predictions = linear.points(50);

// LOESS smoothing
const loess = regressionLoess(data, {
  bandwidth: 0.3,
  robustnessIterations: 2
});

const smoothed = loess.points(100);

Distribution Functions

import { 
  densityNormal, 
  cumulativeNormal, 
  quantileNormal,
  sampleNormal 
} from "vega";

// Calculate normal distribution values
const x = 1.5;
const mu = 0, sigma = 1;

const density = densityNormal(x, mu, sigma);
const cumulative = cumulativeNormal(x, mu, sigma);
const quantile = quantileNormal(0.95, mu, sigma);

console.log({ density, cumulative, quantile });

// Generate samples
const samples = Array.from({length: 1000}, () => sampleNormal(mu, sigma));

Bootstrap Confidence Intervals

import { bootstrapCI } from "vega";

const data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

// Bootstrap mean confidence interval
const meanCI = bootstrapCI(data, (sample) => {
  return sample.reduce((a, b) => a + b) / sample.length;
}, {
  samples: 1000,
  confidence: 0.95
});

console.log(`Mean 95% CI: [${meanCI.lo}, ${meanCI.hi}]`);

// Bootstrap median confidence interval  
const medianCI = bootstrapCI(data, (sample) => {
  const sorted = sample.slice().sort((a, b) => a - b);
  const mid = Math.floor(sorted.length / 2);
  return sorted.length % 2 ? sorted[mid] : (sorted[mid-1] + sorted[mid]) / 2;
});

console.log(`Median 95% CI: [${medianCI.lo}, ${medianCI.hi}]`);

Kernel Density Estimation

import { randomKDE, bandwidthNRD } from "vega";

const observations = [1, 2, 2.5, 3, 3.5, 4, 4.5, 5];

// Calculate optimal bandwidth
const bandwidth = bandwidthNRD(observations);

// Create KDE sampler
const kdesampler = randomKDE(observations, bandwidth);

// Generate samples from estimated distribution
const samples = Array.from({length: 1000}, () => kdesampler());

Advanced Binning

import { dotbin } from "vega";

const values = [1, 1, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5];

// Create dot plot bins
const dotBinFunc = dotbin({
  size: 1,
  spacing: 0.1
});

const dots = dotBinFunc(values);
console.log(dots);
// Array of dot bins with positions and counts

Install with Tessl CLI

npx tessl i tessl/npm-vega

docs

data-loading.md

dataflow.md

events.md

expressions.md

index.md

parsing.md

scales.md

scenegraph.md

statistics.md

time.md

utilities.md

view.md

tile.json