or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/maven-com-github-haifengl--smile-core

Statistical Machine Intelligence and Learning Engine providing comprehensive machine learning algorithms for classification, regression, clustering, and feature engineering in Java

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
mavenpkg:maven/com.github.haifengl/smile-core@3.1.x

To install, run

npx @tessl/cli install tessl/maven-com-github-haifengl--smile-core@3.1.0

0

# Smile Core

1

2

Smile Core is the foundational library of the Statistical Machine Intelligence and Learning Engine (SMILE), providing a comprehensive suite of machine learning algorithms for classification, regression, clustering, feature engineering, and advanced analytics in Java. It offers high-performance implementations with optimized data structures, extensive validation utilities, and seamless integration with Java-based data science workflows.

3

4

## Package Information

5

6

- **Package Name**: smile-core

7

- **Package Type**: maven

8

- **Language**: Java

9

- **Installation**:

10

```xml

11

<dependency>

12

<groupId>com.github.haifengl</groupId>

13

<artifactId>smile-core</artifactId>

14

<version>3.1.1</version>

15

</dependency>

16

```

17

18

## Core Imports

19

20

```java

21

import smile.classification.*;

22

import smile.regression.*;

23

import smile.clustering.*;

24

import smile.feature.*;

25

import smile.validation.*;

26

```

27

28

## Basic Usage

29

30

```java

31

import smile.classification.RandomForest;

32

import smile.data.DataFrame;

33

import smile.data.formula.Formula;

34

import smile.validation.CrossValidation;

35

36

// Load data (assuming DataFrame df with features and target)

37

Formula formula = Formula.lhs("target");

38

39

// Train a random forest classifier

40

RandomForest model = RandomForest.fit(formula, df);

41

42

// Make predictions on test DataFrame tuples

43

int prediction = model.predict(testTuple);

44

45

// Cross-validation

46

var results = CrossValidation.classification(10, RandomForest::fit, formula, df);

47

System.out.println("Accuracy: " + results.avg.accuracy);

48

```

49

50

## Architecture

51

52

Smile Core is built around several key design principles:

53

54

- **Unified Interfaces**: Core interfaces like `Classifier<T>`, `Regression<T>`, and `PartitionClustering` provide consistent APIs across algorithms

55

- **Type Safety**: Extensive use of Java generics for type-safe machine learning pipelines

56

- **Performance**: Optimized implementations with efficient data structures and parallel processing support

57

- **Modularity**: Organized into logical packages for different ML domains (classification, regression, clustering, etc.)

58

- **Validation**: Comprehensive metrics and cross-validation utilities built into the framework

59

- **Feature Engineering**: Complete preprocessing pipeline with transformations, scaling, and imputation

60

61

## Capabilities

62

63

### Classification

64

65

Comprehensive supervised learning algorithms for predicting categorical outcomes, including ensemble methods, neural networks, and probabilistic models.

66

67

```java { .api }

68

interface Classifier<T> extends ToIntFunction<T>, Serializable {

69

int predict(T x);

70

int predict(T x, double[] posteriori);

71

default int numClasses();

72

default int[] classes();

73

default void update(T x, int y);

74

}

75

```

76

77

[Classification](./classification.md)

78

79

### Regression

80

81

Supervised learning algorithms for predicting continuous values, from linear models to advanced ensemble methods and kernel machines.

82

83

```java { .api }

84

interface Regression<T> extends ToDoubleFunction<T>, Serializable {

85

double predict(T x);

86

default void update(T x, double y);

87

}

88

```

89

90

[Regression](./regression.md)

91

92

### Clustering

93

94

Unsupervised learning algorithms for discovering patterns and groupings in data, including partitioning, hierarchical, and density-based methods.

95

96

```java { .api }

97

abstract class PartitionClustering implements Serializable {

98

public final int k;

99

public final int[] y;

100

public final int[] size;

101

public static final int OUTLIER = Integer.MAX_VALUE;

102

}

103

```

104

105

[Clustering](./clustering.md)

106

107

### Feature Engineering

108

109

Complete preprocessing pipeline including dimensionality reduction, feature selection, transformation, and imputation utilities.

110

111

```java { .api }

112

interface Transform extends Function<double[], double[]> {

113

double[] apply(double[] x);

114

}

115

116

abstract class Projection implements Transform {

117

public abstract double[] project(double[] x);

118

}

119

```

120

121

[Feature Engineering](./feature-engineering.md)

122

123

### Validation and Metrics

124

125

Comprehensive model validation framework with cross-validation, bootstrap sampling, and extensive performance metrics.

126

127

```java { .api }

128

interface CrossValidation {

129

Bag[] split(int n);

130

static CrossValidation of(int k);

131

static CrossValidation stratify(int k, int[] y);

132

}

133

134

interface ClassificationMetric {

135

double score(int[] truth, int[] prediction);

136

}

137

```

138

139

[Validation and Metrics](./validation-metrics.md)

140

141

### Deep Learning

142

143

Neural network components including multi-layer perceptrons, activation functions, and optimization algorithms.

144

145

```java { .api }

146

abstract class MultilayerPerceptron implements Classifier<double[]> {

147

public abstract int predict(double[] x);

148

public abstract void update(double[] x, int y);

149

}

150

```

151

152

[Deep Learning](./deep-learning.md)

153

154

### Advanced Analytics

155

156

Specialized algorithms for manifold learning, time series analysis, sequence modeling, and association rule mining.

157

158

```java { .api }

159

interface SequenceLabeler<T> {

160

int[] predict(T[] sequence);

161

}

162

163

class TimeSeries {

164

public static double[] autocorrelation(double[] data);

165

public static double[] crosscorrelation(double[] x, double[] y);

166

}

167

```

168

169

[Advanced Analytics](./advanced-analytics.md)

170

171

## Types

172

173

### Core Data Types

174

175

```java { .api }

176

// Main data structures

177

class Bag {

178

public final int[] samples;

179

public final int[] oob;

180

}

181

182

class SupportVector {

183

public final double[] x;

184

public final double alpha;

185

}

186

187

// Validation results

188

class ClassificationValidation {

189

public final double accuracy;

190

public final double error;

191

public final ConfusionMatrix confusion;

192

}

193

194

class RegressionValidation {

195

public final double rmse;

196

public final double mad;

197

public final double r2;

198

}

199

```

200

201

### Common Enums

202

203

```java { .api }

204

enum SplitRule {

205

GINI, ENTROPY, CLASSIFICATION_ERROR

206

}

207

208

enum Cost {

209

MEAN_SQUARED_ERROR, CROSS_ENTROPY, SPARSE_CROSS_ENTROPY

210

}

211

212

enum OutputFunction {

213

LINEAR, SIGMOID, SOFTMAX

214

}

215

```