Statistical Machine Intelligence and Learning Engine providing comprehensive machine learning algorithms for classification, regression, clustering, and feature engineering in Java
—
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Pending
The risk profile of this skill
Smile Core is the foundational library of the Statistical Machine Intelligence and Learning Engine (SMILE), providing a comprehensive suite of machine learning algorithms for classification, regression, clustering, feature engineering, and advanced analytics in Java. It offers high-performance implementations with optimized data structures, extensive validation utilities, and seamless integration with Java-based data science workflows.
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-core</artifactId>
<version>3.1.1</version>
</dependency>import smile.classification.*;
import smile.regression.*;
import smile.clustering.*;
import smile.feature.*;
import smile.validation.*;import smile.classification.RandomForest;
import smile.data.DataFrame;
import smile.data.formula.Formula;
import smile.validation.CrossValidation;
// Load data (assuming DataFrame df with features and target)
Formula formula = Formula.lhs("target");
// Train a random forest classifier
RandomForest model = RandomForest.fit(formula, df);
// Make predictions on test DataFrame tuples
int prediction = model.predict(testTuple);
// Cross-validation
var results = CrossValidation.classification(10, RandomForest::fit, formula, df);
System.out.println("Accuracy: " + results.avg.accuracy);Smile Core is built around several key design principles:
Classifier<T>, Regression<T>, and PartitionClustering provide consistent APIs across algorithmsComprehensive supervised learning algorithms for predicting categorical outcomes, including ensemble methods, neural networks, and probabilistic models.
interface Classifier<T> extends ToIntFunction<T>, Serializable {
int predict(T x);
int predict(T x, double[] posteriori);
default int numClasses();
default int[] classes();
default void update(T x, int y);
}Supervised learning algorithms for predicting continuous values, from linear models to advanced ensemble methods and kernel machines.
interface Regression<T> extends ToDoubleFunction<T>, Serializable {
double predict(T x);
default void update(T x, double y);
}Unsupervised learning algorithms for discovering patterns and groupings in data, including partitioning, hierarchical, and density-based methods.
abstract class PartitionClustering implements Serializable {
public final int k;
public final int[] y;
public final int[] size;
public static final int OUTLIER = Integer.MAX_VALUE;
}Complete preprocessing pipeline including dimensionality reduction, feature selection, transformation, and imputation utilities.
interface Transform extends Function<double[], double[]> {
double[] apply(double[] x);
}
abstract class Projection implements Transform {
public abstract double[] project(double[] x);
}Comprehensive model validation framework with cross-validation, bootstrap sampling, and extensive performance metrics.
interface CrossValidation {
Bag[] split(int n);
static CrossValidation of(int k);
static CrossValidation stratify(int k, int[] y);
}
interface ClassificationMetric {
double score(int[] truth, int[] prediction);
}Neural network components including multi-layer perceptrons, activation functions, and optimization algorithms.
abstract class MultilayerPerceptron implements Classifier<double[]> {
public abstract int predict(double[] x);
public abstract void update(double[] x, int y);
}Specialized algorithms for manifold learning, time series analysis, sequence modeling, and association rule mining.
interface SequenceLabeler<T> {
int[] predict(T[] sequence);
}
class TimeSeries {
public static double[] autocorrelation(double[] data);
public static double[] crosscorrelation(double[] x, double[] y);
}// Main data structures
class Bag {
public final int[] samples;
public final int[] oob;
}
class SupportVector {
public final double[] x;
public final double alpha;
}
// Validation results
class ClassificationValidation {
public final double accuracy;
public final double error;
public final ConfusionMatrix confusion;
}
class RegressionValidation {
public final double rmse;
public final double mad;
public final double r2;
}enum SplitRule {
GINI, ENTROPY, CLASSIFICATION_ERROR
}
enum Cost {
MEAN_SQUARED_ERROR, CROSS_ENTROPY, SPARSE_CROSS_ENTROPY
}
enum OutputFunction {
LINEAR, SIGMOID, SOFTMAX
}