0
# Vector Operations
1
2
Core vector data structures and operations for numerical computing in machine learning applications. Supports both dense and sparse representations with unified API and automatic optimization.
3
4
## Capabilities
5
6
### Vector Creation
7
8
Factory methods for creating different types of vectors.
9
10
```scala { .api }
11
object Vectors {
12
/**
13
* Creates a dense vector from values
14
* @param values Array of double values
15
* @return Dense vector instance
16
*/
17
def dense(values: Array[Double]): Vector
18
19
/**
20
* Creates a dense vector from varargs
21
* @param firstValue First value
22
* @param otherValues Additional values
23
* @return Dense vector instance
24
*/
25
def dense(firstValue: Double, otherValues: Double*): Vector
26
27
/**
28
* Creates a sparse vector from indices and values
29
* @param size Vector size
30
* @param indices Array of indices (must be strictly increasing)
31
* @param values Array of values (same length as indices)
32
* @return Sparse vector instance
33
*/
34
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
35
36
/**
37
* Creates a sparse vector from (index, value) pairs
38
* @param size Vector size
39
* @param elements Sequence of (index, value) pairs
40
* @return Sparse vector instance
41
*/
42
def sparse(size: Int, elements: Seq[(Int, Double)]): Vector
43
44
/**
45
* Creates a sparse vector from (index, value) pairs (Java-friendly)
46
* @param size Vector size
47
* @param elements Java iterable of (Integer, Double) pairs
48
* @return Sparse vector instance
49
*/
50
def sparse(size: Int, elements: java.lang.Iterable[(java.lang.Integer, java.lang.Double)]): Vector
51
52
/**
53
* Creates a vector of all zeros
54
* @param size Vector size
55
* @return Dense zero vector
56
*/
57
def zeros(size: Int): Vector
58
}
59
```
60
61
**Usage Examples:**
62
63
```scala
64
import org.apache.spark.ml.linalg._
65
66
// Dense vectors
67
val dense1 = Vectors.dense(Array(1.0, 2.0, 3.0))
68
val dense2 = Vectors.dense(1.0, 2.0, 3.0)
69
70
// Sparse vectors
71
val sparse1 = Vectors.sparse(5, Array(0, 2, 4), Array(1.0, 3.0, 5.0))
72
val sparse2 = Vectors.sparse(5, Seq((0, 1.0), (2, 3.0), (4, 5.0)))
73
74
// Zero vector
75
val zeros = Vectors.zeros(10)
76
```
77
78
### Vector Operations
79
80
Core operations available on all vector types.
81
82
```scala { .api }
83
sealed trait Vector extends Serializable {
84
/**
85
* Size of the vector
86
* @return Number of elements
87
*/
88
def size: Int
89
90
/**
91
* Convert to dense array
92
* @return Array containing all vector values
93
*/
94
def toArray: Array[Double]
95
96
/**
97
* Get value at index
98
* @param i Index (0-based)
99
* @return Value at index i
100
*/
101
def apply(i: Int): Double
102
103
/**
104
* Make deep copy
105
* @return New vector instance with copied values
106
*/
107
def copy: Vector
108
109
/**
110
* Dot product with another vector
111
* @param v Other vector (must have same size)
112
* @return Dot product result
113
*/
114
def dot(v: Vector): Double
115
116
/**
117
* Number of non-zero elements
118
* @return Count of non-zero values
119
*/
120
def numNonzeros: Int
121
122
/**
123
* Number of active (stored) elements
124
* @return Count of stored values
125
*/
126
def numActives: Int
127
128
/**
129
* Convert to sparse representation
130
* @return Sparse vector with explicit zeros removed
131
*/
132
def toSparse: SparseVector
133
134
/**
135
* Convert to dense representation
136
* @return Dense vector with all elements
137
*/
138
def toDense: DenseVector
139
140
/**
141
* Get vector in most compact format
142
* @return Vector in dense or sparse format using less storage
143
*/
144
def compressed: Vector
145
146
/**
147
* Find index of maximum element
148
* @return Index of first maximum element, -1 if empty
149
*/
150
def argmax: Int
151
152
/**
153
* Apply function to all active elements
154
* @param f Function taking (index, value) parameters
155
*/
156
def foreachActive(f: (Int, Double) => Unit): Unit
157
158
/**
159
* Returns iterator over all elements (including zeros)
160
* @return Iterator of (index, value) pairs for all positions
161
*/
162
private[spark] def iterator: Iterator[(Int, Double)]
163
164
/**
165
* Returns iterator over all active (stored) elements
166
* @return Iterator of (index, value) pairs for stored positions
167
*/
168
private[spark] def activeIterator: Iterator[(Int, Double)]
169
170
/**
171
* Returns iterator over all non-zero elements
172
* @return Iterator of (index, value) pairs for non-zero positions
173
*/
174
private[spark] def nonZeroIterator: Iterator[(Int, Double)]
175
}
176
```
177
178
**Usage Examples:**
179
180
```scala
181
import org.apache.spark.ml.linalg._
182
183
val vec1 = Vectors.dense(1.0, 2.0, 3.0)
184
val vec2 = Vectors.dense(4.0, 5.0, 6.0)
185
186
// Basic operations
187
println(vec1.size) // 3
188
println(vec1(1)) // 2.0
189
val copy = vec1.copy
190
191
// Dot product
192
val dot = vec1.dot(vec2) // 32.0
193
194
// Statistics
195
println(vec1.numNonzeros) // 3
196
println(vec1.argmax) // 2
197
198
// Format conversion
199
val sparse = vec1.toSparse
200
val dense = sparse.toDense
201
val compressed = sparse.compressed
202
203
// Iteration
204
vec1.foreachActive { (i, v) =>
205
println(s"Index $i: value $v")
206
}
207
208
// Iterator access (internal API)
209
// vec1.activeIterator.foreach { case (i, v) => println(s"Active: $i -> $v") }
210
// vec1.nonZeroIterator.foreach { case (i, v) => println(s"NonZero: $i -> $v") }
211
```
212
213
### Dense Vectors
214
215
Dense vector implementation storing all elements in an array.
216
217
```scala { .api }
218
class DenseVector(val values: Array[Double]) extends Vector {
219
/**
220
* The underlying value array
221
* @return Array of all vector values
222
*/
223
def values: Array[Double]
224
}
225
226
object DenseVector {
227
/**
228
* Extract values from dense vector (for pattern matching)
229
* @param dv Dense vector instance
230
* @return Some(values array) or None
231
*/
232
def unapply(dv: DenseVector): Option[Array[Double]]
233
}
234
```
235
236
### Sparse Vectors
237
238
Sparse vector implementation storing only non-zero elements.
239
240
```scala { .api }
241
class SparseVector(
242
override val size: Int,
243
val indices: Array[Int],
244
val values: Array[Double]
245
) extends Vector {
246
/**
247
* Array of indices of non-zero elements (strictly increasing)
248
* @return Index array
249
*/
250
def indices: Array[Int]
251
252
/**
253
* Array of non-zero values (same length as indices)
254
* @return Value array
255
*/
256
def values: Array[Double]
257
258
/**
259
* Create a slice of this vector based on given indices
260
* @param selectedIndices Array of indices to extract
261
* @param sorted Whether input indices are already sorted (default: false)
262
* @return New SparseVector with values at specified indices
263
*/
264
private[spark] def slice(selectedIndices: Array[Int], sorted: Boolean = false): SparseVector
265
}
266
267
object SparseVector {
268
/**
269
* Extract components from sparse vector (for pattern matching)
270
* @param sv Sparse vector instance
271
* @return Some((size, indices, values)) or None
272
*/
273
def unapply(sv: SparseVector): Option[(Int, Array[Int], Array[Double])]
274
}
275
```
276
277
### Vector Utilities
278
279
Additional utility functions for vector operations.
280
281
```scala { .api }
282
object Vectors {
283
/**
284
* Compute p-norm of vector
285
* @param vector Input vector
286
* @param p Norm parameter (must be >= 1.0)
287
* @return L^p norm of vector
288
*/
289
def norm(vector: Vector, p: Double): Double
290
291
/**
292
* Compute squared Euclidean distance between vectors
293
* @param v1 First vector
294
* @param v2 Second vector (must have same size as v1)
295
* @return Squared distance between vectors
296
*/
297
def sqdist(v1: Vector, v2: Vector): Double
298
}
299
```
300
301
**Usage Examples:**
302
303
```scala
304
import org.apache.spark.ml.linalg._
305
import scala.math._
306
307
val vec = Vectors.dense(3.0, 4.0, 0.0)
308
309
// Norms
310
val l1Norm = Vectors.norm(vec, 1.0) // 7.0 (Manhattan norm)
311
val l2Norm = Vectors.norm(vec, 2.0) // 5.0 (Euclidean norm)
312
val infNorm = Vectors.norm(vec, Double.PositiveInfinity) // 4.0 (max norm)
313
314
// Distance
315
val vec2 = Vectors.dense(0.0, 0.0, 0.0)
316
val distance = Vectors.sqdist(vec, vec2) // 25.0
317
```
318
319
## Type Hierarchy
320
321
```
322
Vector (sealed trait)
323
├── DenseVector (class)
324
└── SparseVector (class)
325
```
326
327
## Performance Notes
328
329
- **Dense vectors**: Optimal for small to medium vectors or when most elements are non-zero
330
- **Sparse vectors**: Optimal for large vectors with many zero elements
331
- **Automatic optimization**: The `compressed` method automatically chooses the most efficient representation
332
- **Native acceleration**: Dense operations may use optimized native BLAS when available