0
# Flink ML (flink-ml_2.10)
1
2
Apache Flink Machine Learning Library for Scala 2.10. This library provides distributed machine learning capabilities built on top of Apache Flink's stream and batch processing engine. Note: This version (1.3.3) contains a minimal implementation with stub functionality.
3
4
## Package Information
5
6
- **Package Name**: flink-ml_2.10
7
- **Package Type**: maven
8
- **Language**: Scala 2.10
9
- **Installation**:
10
```xml
11
<dependency>
12
<groupId>org.apache.flink</groupId>
13
<artifactId>flink-ml_2.10</artifactId>
14
<version>1.3.3</version>
15
</dependency>
16
```
17
18
## Core Imports
19
20
```scala
21
import org.apache.flink.ml.MLPackage
22
import org.apache.flink.ml.regression.MultipleLinearRegression
23
```
24
25
For Flink execution context:
26
```scala
27
import org.apache.flink.api.scala._
28
```
29
30
## Basic Usage
31
32
```scala
33
import org.apache.flink.api.scala._
34
import org.apache.flink.ml.MLPackage
35
import org.apache.flink.ml.regression.MultipleLinearRegression
36
37
// Access package information
38
val version = MLPackage.version
39
val scalaVersion = MLPackage.scalaVersion
40
41
// Create regression model (stub implementation)
42
val regression = new MultipleLinearRegression()
43
44
// Note: The fit method is a stub and returns dummy data
45
// val model = regression.fit(trainingData)
46
```
47
48
## Capabilities
49
50
### Package Information
51
52
Access version and compatibility information for the ML library.
53
54
```scala { .api }
55
object MLPackage {
56
val version: String
57
val scalaVersion: String
58
}
59
```
60
61
### Multiple Linear Regression
62
63
Basic linear regression implementation for distributed machine learning. Note: This is a stub implementation in version 1.3.3.
64
65
```scala { .api }
66
class MultipleLinearRegression extends Serializable {
67
/**
68
* Fit the linear regression model
69
*
70
* Parameters:
71
* - trainingData: DataSet[LabeledVector] - Training dataset with labeled vectors
72
*
73
* Returns:
74
* DataSet[Array[Double]] - Model coefficients (stub implementation returns Array(0.0))
75
*/
76
def fit(trainingData: DataSet[LabeledVector]): DataSet[Array[Double]]
77
}
78
```
79
80
## Types
81
82
The following types are used in the API:
83
84
```scala { .api }
85
// DataSet is from Flink's core API (imported via org.apache.flink.api.scala._)
86
// Represents a distributed dataset in Flink
87
type DataSet[T] // Flink distributed dataset
88
89
// Note: The following types are imported in MultipleLinearRegression.scala
90
// but are NOT defined in this stub implementation:
91
// - org.apache.flink.ml.common.LabeledVector
92
// - org.apache.flink.ml.common.LinearAlgebra
93
//
94
// These imports exist in the source code but reference non-existent classes,
95
// indicating this is an incomplete stub implementation.
96
```
97
98
## Implementation Status
99
100
**Important**: This version (1.3.3) appears to be a minimal stub implementation. The actual Flink ML library documentation describes comprehensive machine learning capabilities including:
101
102
- ALS (Alternating Least Squares)
103
- SVM using CoCoA
104
- k-Nearest Neighbors Join
105
- Cross Validation
106
- MinMax/Standard Scalers
107
- Polynomial Features
108
- Stochastic Outlier Selection
109
- Distance Metrics
110
- Pipelines
111
112
However, these features are not present in the actual source code for this version. Only the basic package information and a stub MultipleLinearRegression class are available.
113
114
## Dependencies
115
116
This library depends on:
117
- `flink-scala_2.10`: Core Flink Scala API
118
- `flink-streaming-scala_2.10`: Flink streaming Scala API
119
- `scala-library`: Scala 2.10.6 standard library