Statistical Machine Intelligence and Learning Engine providing comprehensive machine learning algorithms for classification, regression, clustering, and feature engineering in Java
npx @tessl/cli install tessl/maven-com-github-haifengl--smile-core@3.1.00
# Smile Core
1
2
Smile Core is the foundational library of the Statistical Machine Intelligence and Learning Engine (SMILE), providing a comprehensive suite of machine learning algorithms for classification, regression, clustering, feature engineering, and advanced analytics in Java. It offers high-performance implementations with optimized data structures, extensive validation utilities, and seamless integration with Java-based data science workflows.
3
4
## Package Information
5
6
- **Package Name**: smile-core
7
- **Package Type**: maven
8
- **Language**: Java
9
- **Installation**:
10
```xml
11
<dependency>
12
<groupId>com.github.haifengl</groupId>
13
<artifactId>smile-core</artifactId>
14
<version>3.1.1</version>
15
</dependency>
16
```
17
18
## Core Imports
19
20
```java
21
import smile.classification.*;
22
import smile.regression.*;
23
import smile.clustering.*;
24
import smile.feature.*;
25
import smile.validation.*;
26
```
27
28
## Basic Usage
29
30
```java
31
import smile.classification.RandomForest;
32
import smile.data.DataFrame;
33
import smile.data.formula.Formula;
34
import smile.validation.CrossValidation;
35
36
// Load data (assuming DataFrame df with features and target)
37
Formula formula = Formula.lhs("target");
38
39
// Train a random forest classifier
40
RandomForest model = RandomForest.fit(formula, df);
41
42
// Make predictions on test DataFrame tuples
43
int prediction = model.predict(testTuple);
44
45
// Cross-validation
46
var results = CrossValidation.classification(10, RandomForest::fit, formula, df);
47
System.out.println("Accuracy: " + results.avg.accuracy);
48
```
49
50
## Architecture
51
52
Smile Core is built around several key design principles:
53
54
- **Unified Interfaces**: Core interfaces like `Classifier<T>`, `Regression<T>`, and `PartitionClustering` provide consistent APIs across algorithms
55
- **Type Safety**: Extensive use of Java generics for type-safe machine learning pipelines
56
- **Performance**: Optimized implementations with efficient data structures and parallel processing support
57
- **Modularity**: Organized into logical packages for different ML domains (classification, regression, clustering, etc.)
58
- **Validation**: Comprehensive metrics and cross-validation utilities built into the framework
59
- **Feature Engineering**: Complete preprocessing pipeline with transformations, scaling, and imputation
60
61
## Capabilities
62
63
### Classification
64
65
Comprehensive supervised learning algorithms for predicting categorical outcomes, including ensemble methods, neural networks, and probabilistic models.
66
67
```java { .api }
68
interface Classifier<T> extends ToIntFunction<T>, Serializable {
69
int predict(T x);
70
int predict(T x, double[] posteriori);
71
default int numClasses();
72
default int[] classes();
73
default void update(T x, int y);
74
}
75
```
76
77
[Classification](./classification.md)
78
79
### Regression
80
81
Supervised learning algorithms for predicting continuous values, from linear models to advanced ensemble methods and kernel machines.
82
83
```java { .api }
84
interface Regression<T> extends ToDoubleFunction<T>, Serializable {
85
double predict(T x);
86
default void update(T x, double y);
87
}
88
```
89
90
[Regression](./regression.md)
91
92
### Clustering
93
94
Unsupervised learning algorithms for discovering patterns and groupings in data, including partitioning, hierarchical, and density-based methods.
95
96
```java { .api }
97
abstract class PartitionClustering implements Serializable {
98
public final int k;
99
public final int[] y;
100
public final int[] size;
101
public static final int OUTLIER = Integer.MAX_VALUE;
102
}
103
```
104
105
[Clustering](./clustering.md)
106
107
### Feature Engineering
108
109
Complete preprocessing pipeline including dimensionality reduction, feature selection, transformation, and imputation utilities.
110
111
```java { .api }
112
interface Transform extends Function<double[], double[]> {
113
double[] apply(double[] x);
114
}
115
116
abstract class Projection implements Transform {
117
public abstract double[] project(double[] x);
118
}
119
```
120
121
[Feature Engineering](./feature-engineering.md)
122
123
### Validation and Metrics
124
125
Comprehensive model validation framework with cross-validation, bootstrap sampling, and extensive performance metrics.
126
127
```java { .api }
128
interface CrossValidation {
129
Bag[] split(int n);
130
static CrossValidation of(int k);
131
static CrossValidation stratify(int k, int[] y);
132
}
133
134
interface ClassificationMetric {
135
double score(int[] truth, int[] prediction);
136
}
137
```
138
139
[Validation and Metrics](./validation-metrics.md)
140
141
### Deep Learning
142
143
Neural network components including multi-layer perceptrons, activation functions, and optimization algorithms.
144
145
```java { .api }
146
abstract class MultilayerPerceptron implements Classifier<double[]> {
147
public abstract int predict(double[] x);
148
public abstract void update(double[] x, int y);
149
}
150
```
151
152
[Deep Learning](./deep-learning.md)
153
154
### Advanced Analytics
155
156
Specialized algorithms for manifold learning, time series analysis, sequence modeling, and association rule mining.
157
158
```java { .api }
159
interface SequenceLabeler<T> {
160
int[] predict(T[] sequence);
161
}
162
163
class TimeSeries {
164
public static double[] autocorrelation(double[] data);
165
public static double[] crosscorrelation(double[] x, double[] y);
166
}
167
```
168
169
[Advanced Analytics](./advanced-analytics.md)
170
171
## Types
172
173
### Core Data Types
174
175
```java { .api }
176
// Main data structures
177
class Bag {
178
public final int[] samples;
179
public final int[] oob;
180
}
181
182
class SupportVector {
183
public final double[] x;
184
public final double alpha;
185
}
186
187
// Validation results
188
class ClassificationValidation {
189
public final double accuracy;
190
public final double error;
191
public final ConfusionMatrix confusion;
192
}
193
194
class RegressionValidation {
195
public final double rmse;
196
public final double mad;
197
public final double r2;
198
}
199
```
200
201
### Common Enums
202
203
```java { .api }
204
enum SplitRule {
205
GINI, ENTROPY, CLASSIFICATION_ERROR
206
}
207
208
enum Cost {
209
MEAN_SQUARED_ERROR, CROSS_ENTROPY, SPARSE_CROSS_ENTROPY
210
}
211
212
enum OutputFunction {
213
LINEAR, SIGMOID, SOFTMAX
214
}
215
```