0
# Vector Operations
1
2
Core vector functionality supporting both dense and sparse representations with automatic optimization and conversion capabilities. Vectors are immutable and provide efficient operations for machine learning computations.
3
4
## Capabilities
5
6
### Vector Trait
7
8
Base trait for all vector types providing common operations and conversions.
9
10
```scala { .api }
11
/**
12
* Sealed trait representing a numeric vector with Int indices and Double values
13
*/
14
sealed trait Vector extends Serializable {
15
/** Size of the vector */
16
def size: Int
17
18
/** Converts the instance to a double array */
19
def toArray: Array[Double]
20
21
/** Gets the value of the ith element */
22
def apply(i: Int): Double
23
24
/** Makes a deep copy of this vector */
25
def copy: Vector
26
27
/** Applies function to all active elements (non-zero for sparse) */
28
def foreachActive(f: (Int, Double) => Unit): Unit
29
30
/** Number of active entries (explicitly stored) */
31
def numActives: Int
32
33
/** Number of nonzero elements */
34
def numNonzeros: Int
35
36
/** Converts to sparse vector with explicit zeros removed */
37
def toSparse: SparseVector
38
39
/** Converts to dense vector */
40
def toDense: DenseVector
41
42
/** Returns vector in format using less storage (dense or sparse) */
43
def compressed: Vector
44
45
/** Find index of maximal element, -1 if empty */
46
def argmax: Int
47
48
/** Calculate dot product with another vector */
49
def dot(v: Vector): Double
50
}
51
```
52
53
**Usage Examples:**
54
55
```scala
56
import org.apache.spark.ml.linalg.{Vectors, Vector}
57
58
// Vector operations
59
val vec1 = Vectors.dense(1.0, 2.0, 3.0)
60
val vec2 = Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0))
61
62
// Basic properties
63
println(vec1.size) // 3
64
println(vec2.numNonzeros) // 2
65
println(vec1.argmax) // 2 (index of max element)
66
67
// Conversions
68
val denseFromSparse = vec2.toDense // Convert to dense
69
val sparseFromDense = vec1.toSparse // Convert to sparse
70
val compressed = vec1.compressed // Optimal format
71
72
// Element access
73
val element = vec1(1) // 2.0
74
val array = vec1.toArray // Array(1.0, 2.0, 3.0)
75
76
// Operations
77
val dotProduct = vec1.dot(vec2.toDense) // 10.0
78
```
79
80
### Dense Vector
81
82
Dense vector represented by a value array, storing all elements explicitly.
83
84
```scala { .api }
85
/**
86
* Dense vector represented by a value array
87
* @param values the underlying array of values
88
*/
89
class DenseVector(val values: Array[Double]) extends Vector {
90
override def size: Int = values.length
91
override def toArray: Array[Double] = values
92
override def apply(i: Int): Double = values(i)
93
override def copy: DenseVector = new DenseVector(values.clone())
94
override def numActives: Int = size
95
override def numNonzeros: Int // Count of non-zero values
96
override def argmax: Int // Index of maximum value
97
}
98
99
object DenseVector {
100
/** Extracts the value array from a dense vector */
101
def unapply(dv: DenseVector): Option[Array[Double]] = Some(dv.values)
102
}
103
```
104
105
### Sparse Vector
106
107
Sparse vector represented by index and value arrays, storing only non-zero elements.
108
109
```scala { .api }
110
/**
111
* Sparse vector represented by index and value arrays
112
* @param size size of the vector
113
* @param indices index array (strictly increasing)
114
* @param values value array (same length as indices)
115
*/
116
class SparseVector(
117
override val size: Int,
118
val indices: Array[Int],
119
val values: Array[Double]) extends Vector {
120
override def toArray: Array[Double] // Converts to full dense array
121
override def apply(i: Int): Double // Binary search for value
122
override def copy: SparseVector = new SparseVector(size, indices.clone(), values.clone())
123
override def numActives: Int = values.length
124
override def numNonzeros: Int // Count of actual non-zero values
125
override def argmax: Int // Index of maximum value considering zeros
126
}
127
128
object SparseVector {
129
def unapply(sv: SparseVector): Option[(Int, Array[Int], Array[Double])] =
130
Some((sv.size, sv.indices, sv.values))
131
}
132
```
133
134
**Usage Examples:**
135
136
```scala
137
// Create sparse vector: [1.0, 0.0, 3.0, 0.0, 5.0]
138
val sparse = new SparseVector(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))
139
140
// Access properties
141
println(sparse.indices.mkString(",")) // "0,2,4"
142
println(sparse.values.mkString(",")) // "1.0,3.0,5.0"
143
println(sparse.numActives) // 3
144
println(sparse(2)) // 3.0 (binary search)
145
println(sparse(1)) // 0.0 (not stored)
146
147
// Convert to dense
148
val dense = sparse.toDense
149
println(dense.values.mkString(",")) // "1.0,0.0,3.0,0.0,5.0"
150
```
151
152
### Vector Factory
153
154
Factory methods for creating vectors with various initialization patterns.
155
156
```scala { .api }
157
/**
158
* Factory methods for Vector creation
159
*/
160
object Vectors {
161
/** Creates dense vector from variable arguments */
162
def dense(firstValue: Double, otherValues: Double*): Vector
163
164
/** Creates dense vector from array */
165
def dense(values: Array[Double]): Vector
166
167
/** Creates sparse vector from index and value arrays */
168
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
169
170
/** Creates sparse vector from (index, value) pairs */
171
def sparse(size: Int, elements: Seq[(Int, Double)]): Vector
172
173
/** Creates vector of all zeros */
174
def zeros(size: Int): Vector
175
176
/** Returns p-norm of vector */
177
def norm(vector: Vector, p: Double): Double
178
179
/** Returns squared distance between two vectors */
180
def sqdist(v1: Vector, v2: Vector): Double
181
}
182
```
183
184
**Usage Examples:**
185
186
```scala
187
import org.apache.spark.ml.linalg.Vectors
188
189
// Various construction methods
190
val dense1 = Vectors.dense(1.0, 2.0, 3.0)
191
val dense2 = Vectors.dense(Array(1.0, 2.0, 3.0))
192
val sparse1 = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))
193
val sparse2 = Vectors.sparse(5, Seq((0, 1.0), (2, 3.0), (4, 5.0)))
194
val zeros = Vectors.zeros(10)
195
196
// Vector norms
197
val l1Norm = Vectors.norm(dense1, 1.0) // 6.0 (sum of absolute values)
198
val l2Norm = Vectors.norm(dense1, 2.0) // ~3.74 (Euclidean norm)
199
val infNorm = Vectors.norm(dense1, Double.PositiveInfinity) // 3.0 (max value)
200
201
// Distance calculations
202
val distance = Vectors.sqdist(dense1, dense2) // Squared Euclidean distance
203
```
204
205
### Vector Operations
206
207
Common operations that can be performed on vectors.
208
209
```scala { .api }
210
// Iteration over vector elements
211
def foreachActive(f: (Int, Double) => Unit): Unit // Only stored elements
212
```
213
214
**Usage Examples:**
215
216
```scala
217
val vec = Vectors.sparse(5, Array(1, 3), Array(2.0, 4.0))
218
219
// Iterate over active (stored) elements
220
vec.foreachActive { (index, value) =>
221
println(s"Active: $index -> $value")
222
}
223
// Output: Active: 1 -> 2.0, Active: 3 -> 4.0
224
```
225
226
## Error Handling
227
228
Vector operations validate inputs and throw standard exceptions:
229
230
- `IndexOutOfBoundsException` for invalid indices
231
- `IllegalArgumentException` for dimension mismatches
232
- `UnsupportedOperationException` for unsupported operations
233
234
```scala
235
// These will throw exceptions
236
val vec = Vectors.dense(1.0, 2.0, 3.0)
237
vec(5) // IndexOutOfBoundsException: Index 5 out of bounds [0, 3)
238
239
val vec1 = Vectors.dense(1.0, 2.0)
240
val vec2 = Vectors.dense(1.0, 2.0, 3.0)
241
vec1.dot(vec2) // IllegalArgumentException: Vector dimensions do not match
242
```