or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

advanced-analytics.mdclassification.mdclustering.mddeep-learning.mdfeature-engineering.mdindex.mdregression.mdvalidation-metrics.md
tile.json

tessl/maven-com-github-haifengl--smile-core

Statistical Machine Intelligence and Learning Engine providing comprehensive machine learning algorithms for classification, regression, clustering, and feature engineering in Java

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/com.github.haifengl/smile-core@3.1.x

To install, run

npx @tessl/cli install tessl/maven-com-github-haifengl--smile-core@3.1.0

index.mddocs/

Smile Core

Smile Core is the foundational library of the Statistical Machine Intelligence and Learning Engine (SMILE), providing a comprehensive suite of machine learning algorithms for classification, regression, clustering, feature engineering, and advanced analytics in Java. It offers high-performance implementations with optimized data structures, extensive validation utilities, and seamless integration with Java-based data science workflows.

Package Information

  • Package Name: smile-core
  • Package Type: maven
  • Language: Java
  • Installation:
    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-core</artifactId>
      <version>3.1.1</version>
    </dependency>

Core Imports

import smile.classification.*;
import smile.regression.*;
import smile.clustering.*;
import smile.feature.*;
import smile.validation.*;

Basic Usage

import smile.classification.RandomForest;
import smile.data.DataFrame;
import smile.data.formula.Formula; 
import smile.validation.CrossValidation;

// Load data (assuming DataFrame df with features and target)
Formula formula = Formula.lhs("target");

// Train a random forest classifier
RandomForest model = RandomForest.fit(formula, df);

// Make predictions on test DataFrame tuples
int prediction = model.predict(testTuple);

// Cross-validation
var results = CrossValidation.classification(10, RandomForest::fit, formula, df);
System.out.println("Accuracy: " + results.avg.accuracy);

Architecture

Smile Core is built around several key design principles:

  • Unified Interfaces: Core interfaces like Classifier<T>, Regression<T>, and PartitionClustering provide consistent APIs across algorithms
  • Type Safety: Extensive use of Java generics for type-safe machine learning pipelines
  • Performance: Optimized implementations with efficient data structures and parallel processing support
  • Modularity: Organized into logical packages for different ML domains (classification, regression, clustering, etc.)
  • Validation: Comprehensive metrics and cross-validation utilities built into the framework
  • Feature Engineering: Complete preprocessing pipeline with transformations, scaling, and imputation

Capabilities

Classification

Comprehensive supervised learning algorithms for predicting categorical outcomes, including ensemble methods, neural networks, and probabilistic models.

interface Classifier<T> extends ToIntFunction<T>, Serializable {
    int predict(T x);
    int predict(T x, double[] posteriori);
    default int numClasses();
    default int[] classes();
    default void update(T x, int y);
}

Classification

Regression

Supervised learning algorithms for predicting continuous values, from linear models to advanced ensemble methods and kernel machines.

interface Regression<T> extends ToDoubleFunction<T>, Serializable {
    double predict(T x);
    default void update(T x, double y);
}

Regression

Clustering

Unsupervised learning algorithms for discovering patterns and groupings in data, including partitioning, hierarchical, and density-based methods.

abstract class PartitionClustering implements Serializable {
    public final int k;
    public final int[] y;
    public final int[] size;
    public static final int OUTLIER = Integer.MAX_VALUE;
}

Clustering

Feature Engineering

Complete preprocessing pipeline including dimensionality reduction, feature selection, transformation, and imputation utilities.

interface Transform extends Function<double[], double[]> {
    double[] apply(double[] x);
}

abstract class Projection implements Transform {
    public abstract double[] project(double[] x);
}

Feature Engineering

Validation and Metrics

Comprehensive model validation framework with cross-validation, bootstrap sampling, and extensive performance metrics.

interface CrossValidation {
    Bag[] split(int n);
    static CrossValidation of(int k);
    static CrossValidation stratify(int k, int[] y);
}

interface ClassificationMetric {
    double score(int[] truth, int[] prediction);
}

Validation and Metrics

Deep Learning

Neural network components including multi-layer perceptrons, activation functions, and optimization algorithms.

abstract class MultilayerPerceptron implements Classifier<double[]> {
    public abstract int predict(double[] x);
    public abstract void update(double[] x, int y);
}

Deep Learning

Advanced Analytics

Specialized algorithms for manifold learning, time series analysis, sequence modeling, and association rule mining.

interface SequenceLabeler<T> {
    int[] predict(T[] sequence);
}

class TimeSeries {
    public static double[] autocorrelation(double[] data);
    public static double[] crosscorrelation(double[] x, double[] y);
}

Advanced Analytics

Types

Core Data Types

// Main data structures
class Bag {
    public final int[] samples;
    public final int[] oob;
}

class SupportVector {
    public final double[] x;
    public final double alpha;
}

// Validation results
class ClassificationValidation {
    public final double accuracy;
    public final double error;
    public final ConfusionMatrix confusion;
}

class RegressionValidation {
    public final double rmse;
    public final double mad;
    public final double r2;
}

Common Enums

enum SplitRule {
    GINI, ENTROPY, CLASSIFICATION_ERROR
}

enum Cost {
    MEAN_SQUARED_ERROR, CROSS_ENTROPY, SPARSE_CROSS_ENTROPY
}

enum OutputFunction {
    LINEAR, SIGMOID, SOFTMAX
}