CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/maven-com-github-haifengl--smile-core

Statistical Machine Intelligence and Learning Engine providing comprehensive machine learning algorithms for classification, regression, clustering, and feature engineering in Java

Pending
Overview
Eval results
Files

Smile Core

Smile Core is the foundational library of the Statistical Machine Intelligence and Learning Engine (SMILE), providing a comprehensive suite of machine learning algorithms for classification, regression, clustering, feature engineering, and advanced analytics in Java. It offers high-performance implementations with optimized data structures, extensive validation utilities, and seamless integration with Java-based data science workflows.

Package Information

  • Package Name: smile-core
  • Package Type: maven
  • Language: Java
  • Installation:
    <dependency>
      <groupId>com.github.haifengl</groupId>
      <artifactId>smile-core</artifactId>
      <version>3.1.1</version>
    </dependency>

Core Imports

import smile.classification.*;
import smile.regression.*;
import smile.clustering.*;
import smile.feature.*;
import smile.validation.*;

Basic Usage

import smile.classification.RandomForest;
import smile.data.DataFrame;
import smile.data.formula.Formula; 
import smile.validation.CrossValidation;

// Load data (assuming DataFrame df with features and target)
Formula formula = Formula.lhs("target");

// Train a random forest classifier
RandomForest model = RandomForest.fit(formula, df);

// Make predictions on test DataFrame tuples
int prediction = model.predict(testTuple);

// Cross-validation
var results = CrossValidation.classification(10, RandomForest::fit, formula, df);
System.out.println("Accuracy: " + results.avg.accuracy);

Architecture

Smile Core is built around several key design principles:

  • Unified Interfaces: Core interfaces like Classifier<T>, Regression<T>, and PartitionClustering provide consistent APIs across algorithms
  • Type Safety: Extensive use of Java generics for type-safe machine learning pipelines
  • Performance: Optimized implementations with efficient data structures and parallel processing support
  • Modularity: Organized into logical packages for different ML domains (classification, regression, clustering, etc.)
  • Validation: Comprehensive metrics and cross-validation utilities built into the framework
  • Feature Engineering: Complete preprocessing pipeline with transformations, scaling, and imputation

Capabilities

Classification

Comprehensive supervised learning algorithms for predicting categorical outcomes, including ensemble methods, neural networks, and probabilistic models.

interface Classifier<T> extends ToIntFunction<T>, Serializable {
    int predict(T x);
    int predict(T x, double[] posteriori);
    default int numClasses();
    default int[] classes();
    default void update(T x, int y);
}

Classification

Regression

Supervised learning algorithms for predicting continuous values, from linear models to advanced ensemble methods and kernel machines.

interface Regression<T> extends ToDoubleFunction<T>, Serializable {
    double predict(T x);
    default void update(T x, double y);
}

Regression

Clustering

Unsupervised learning algorithms for discovering patterns and groupings in data, including partitioning, hierarchical, and density-based methods.

abstract class PartitionClustering implements Serializable {
    public final int k;
    public final int[] y;
    public final int[] size;
    public static final int OUTLIER = Integer.MAX_VALUE;
}

Clustering

Feature Engineering

Complete preprocessing pipeline including dimensionality reduction, feature selection, transformation, and imputation utilities.

interface Transform extends Function<double[], double[]> {
    double[] apply(double[] x);
}

abstract class Projection implements Transform {
    public abstract double[] project(double[] x);
}

Feature Engineering

Validation and Metrics

Comprehensive model validation framework with cross-validation, bootstrap sampling, and extensive performance metrics.

interface CrossValidation {
    Bag[] split(int n);
    static CrossValidation of(int k);
    static CrossValidation stratify(int k, int[] y);
}

interface ClassificationMetric {
    double score(int[] truth, int[] prediction);
}

Validation and Metrics

Deep Learning

Neural network components including multi-layer perceptrons, activation functions, and optimization algorithms.

abstract class MultilayerPerceptron implements Classifier<double[]> {
    public abstract int predict(double[] x);
    public abstract void update(double[] x, int y);
}

Deep Learning

Advanced Analytics

Specialized algorithms for manifold learning, time series analysis, sequence modeling, and association rule mining.

interface SequenceLabeler<T> {
    int[] predict(T[] sequence);
}

class TimeSeries {
    public static double[] autocorrelation(double[] data);
    public static double[] crosscorrelation(double[] x, double[] y);
}

Advanced Analytics

Types

Core Data Types

// Main data structures
class Bag {
    public final int[] samples;
    public final int[] oob;
}

class SupportVector {
    public final double[] x;
    public final double alpha;
}

// Validation results
class ClassificationValidation {
    public final double accuracy;
    public final double error;
    public final ConfusionMatrix confusion;
}

class RegressionValidation {
    public final double rmse;
    public final double mad;
    public final double r2;
}

Common Enums

enum SplitRule {
    GINI, ENTROPY, CLASSIFICATION_ERROR
}

enum Cost {
    MEAN_SQUARED_ERROR, CROSS_ENTROPY, SPARSE_CROSS_ENTROPY
}

enum OutputFunction {
    LINEAR, SIGMOID, SOFTMAX
}

Install with Tessl CLI

npx tessl i tessl/maven-com-github-haifengl--smile-core
Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/com.github.haifengl/smile-core@3.1.x