0
# Vector Operations
1
2
Vector data structures and operations for efficient representation and manipulation of 1D numerical data. Supports both dense and sparse representations with automatic format optimization based on sparsity patterns.
3
4
## Capabilities
5
6
### Vector Trait
7
8
Core abstraction for all vector types providing common operations and conversions.
9
10
```scala { .api }
11
sealed trait Vector extends Serializable {
12
/** Size of the vector */
13
def size: Int
14
15
/** Converts the instance to a double array */
16
def toArray: Array[Double]
17
18
/** Gets the value of the ith element */
19
def apply(i: Int): Double
20
21
/** Makes a deep copy of this vector */
22
def copy: Vector
23
24
/** Applies function to all active elements */
25
def foreachActive(f: (Int, Double) => Unit): Unit
26
27
/** Number of active entries */
28
def numActives: Int
29
30
/** Number of nonzero elements */
31
def numNonzeros: Int
32
33
/** Converts to sparse vector */
34
def toSparse: SparseVector
35
36
/** Converts to dense vector */
37
def toDense: DenseVector
38
39
/** Returns optimal storage format */
40
def compressed: Vector
41
42
/** Index of maximal element */
43
def argmax: Int
44
}
45
```
46
47
### Dense Vector
48
49
Dense vector implementation backed by a double array for efficient storage of vectors with mostly non-zero elements.
50
51
```scala { .api }
52
class DenseVector(val values: Array[Double]) extends Vector {
53
override def size: Int
54
override def toString: String
55
override def toArray: Array[Double]
56
override def apply(i: Int): Double
57
override def copy: DenseVector
58
}
59
```
60
61
Usage example:
62
63
```scala
64
import org.apache.spark.ml.linalg._
65
66
// Create dense vector from array
67
val dense = new DenseVector(Array(1.0, 2.0, 3.0, 4.0))
68
69
// Access elements
70
val element = dense(2) // 3.0
71
val size = dense.size // 4
72
73
// Convert to array
74
val array = dense.toArray
75
76
// Make a copy
77
val copied = dense.copy
78
```
79
80
### Sparse Vector
81
82
Sparse vector implementation using compressed format with separate indices and values arrays, efficient for vectors with many zero elements.
83
84
```scala { .api }
85
class SparseVector(
86
override val size: Int,
87
val indices: Array[Int],
88
val values: Array[Double]
89
) extends Vector {
90
override def toString: String
91
override def toArray: Array[Double]
92
override def apply(i: Int): Double
93
override def copy: SparseVector
94
}
95
```
96
97
Usage example:
98
99
```scala
100
import org.apache.spark.ml.linalg._
101
102
// Create sparse vector: size=5, elements at indices [0,3] with values [1.0,4.0]
103
val sparse = new SparseVector(5, Array(0, 3), Array(1.0, 4.0))
104
105
// Access elements
106
val element = sparse(3) // 4.0
107
val zeroElement = sparse(1) // 0.0
108
109
// Get active indices and values
110
val indices = sparse.indices // Array(0, 3)
111
val values = sparse.values // Array(1.0, 4.0)
112
```
113
114
### Vector Factory
115
116
Factory object providing convenient methods for creating vectors and performing vector operations.
117
118
```scala { .api }
119
object Vectors {
120
/** Creates dense vector from values */
121
def dense(firstValue: Double, otherValues: Double*): Vector
122
123
/** Creates dense vector from array */
124
def dense(values: Array[Double]): Vector
125
126
/** Creates sparse vector from indices and values */
127
def sparse(size: Int, indices: Array[Int], values: Array[Double]): Vector
128
129
/** Creates sparse vector from (index, value) pairs */
130
def sparse(size: Int, elements: Seq[(Int, Double)]): Vector
131
132
/** Creates zero vector */
133
def zeros(size: Int): Vector
134
135
/** Computes p-norm of vector */
136
def norm(vector: Vector, p: Double): Double
137
138
/** Computes squared distance between vectors */
139
def sqdist(v1: Vector, v2: Vector): Double
140
}
141
```
142
143
Usage examples:
144
145
```scala
146
import org.apache.spark.ml.linalg._
147
148
// Create dense vectors
149
val dense1 = Vectors.dense(1.0, 2.0, 3.0)
150
val dense2 = Vectors.dense(Array(1.0, 2.0, 3.0, 4.0))
151
152
// Create sparse vectors
153
val sparse1 = Vectors.sparse(5, Array(0, 3), Array(1.0, 4.0))
154
val sparse2 = Vectors.sparse(4, Seq((0, 1.0), (2, 3.0)))
155
156
// Create zero vector
157
val zeros = Vectors.zeros(10)
158
159
// Vector operations
160
val l2Norm = Vectors.norm(dense1, 2.0) // L2 norm
161
val l1Norm = Vectors.norm(dense1, 1.0) // L1 norm
162
val distance = Vectors.sqdist(dense1, dense2) // Squared distance
163
```
164
165
### Dense Vector Companion
166
167
Extractor object for pattern matching with DenseVector.
168
169
```scala { .api }
170
object DenseVector {
171
/** Extracts the value array from a dense vector */
172
def unapply(dv: DenseVector): Option[Array[Double]]
173
}
174
```
175
176
Usage example:
177
178
```scala
179
val vector = Vectors.dense(1.0, 2.0, 3.0)
180
181
vector match {
182
case DenseVector(values) => println(s"Dense vector with values: ${values.mkString(",")}")
183
case _ => println("Not a dense vector")
184
}
185
```
186
187
### Sparse Vector Companion
188
189
Extractor object for pattern matching with SparseVector.
190
191
```scala { .api }
192
object SparseVector {
193
def unapply(sv: SparseVector): Option[(Int, Array[Int], Array[Double])]
194
}
195
```
196
197
Usage example:
198
199
```scala
200
val vector = Vectors.sparse(5, Array(0, 3), Array(1.0, 4.0))
201
202
vector match {
203
case SparseVector(size, indices, values) =>
204
println(s"Sparse vector size=$size, indices=${indices.mkString(",")}, values=${values.mkString(",")}")
205
case _ => println("Not a sparse vector")
206
}
207
```
208
209
### Vector Conversion Operations
210
211
Additional operations for converting between vector formats and finding maximum elements.
212
213
#### Dense Vector Conversion
214
215
Converts any vector to dense format with all elements explicitly stored.
216
217
```scala { .api }
218
/** Converts this vector to a dense vector */
219
def toDense: DenseVector
220
```
221
222
Usage example:
223
224
```scala
225
import org.apache.spark.ml.linalg._
226
227
// Convert sparse to dense
228
val sparse = Vectors.sparse(4, Array(0, 3), Array(1.0, 4.0))
229
val dense = sparse.toDense // DenseVector(1.0, 0.0, 0.0, 4.0)
230
231
// Dense vector conversion is identity
232
val alreadyDense = Vectors.dense(1.0, 2.0, 3.0)
233
val stillDense = alreadyDense.toDense // Same instance
234
```
235
236
#### Optimal Storage Format
237
238
Returns the most memory-efficient storage format based on sparsity.
239
240
```scala { .api }
241
/** Returns optimal storage format based on sparsity */
242
def compressed: Vector
243
```
244
245
Usage example:
246
247
```scala
248
import org.apache.spark.ml.linalg._
249
250
// Sparse vector with few non-zeros stays sparse
251
val sparse = Vectors.sparse(100, Array(0, 50), Array(1.0, 2.0))
252
val compressed1 = sparse.compressed // Remains SparseVector
253
254
// Dense vector with many zeros converts to sparse
255
val mostlyZeros = Vectors.dense(1.0, 0.0, 0.0, 0.0, 0.0, 2.0)
256
val compressed2 = mostlyZeros.compressed // Becomes SparseVector
257
258
// Dense vector with many non-zeros stays dense
259
val mostlyNonZeros = Vectors.dense(1.0, 2.0, 3.0, 4.0, 5.0, 6.0)
260
val compressed3 = mostlyNonZeros.compressed // Remains DenseVector
261
```
262
263
#### Maximum Element Index
264
265
Finds the index of the element with maximum value.
266
267
```scala { .api }
268
/** Index of maximal element */
269
def argmax: Int
270
```
271
272
Usage example:
273
274
```scala
275
import org.apache.spark.ml.linalg._
276
277
// Dense vector
278
val dense = Vectors.dense(1.0, 5.0, 3.0, 2.0)
279
val maxIndex1 = dense.argmax // 1 (index of value 5.0)
280
281
// Sparse vector
282
val sparse = Vectors.sparse(6, Array(1, 3, 5), Array(2.0, 8.0, 1.0))
283
val maxIndex2 = sparse.argmax // 3 (index of value 8.0)
284
285
// Vector with negative values
286
val withNegatives = Vectors.dense(-1.0, -5.0, -3.0, -2.0)
287
val maxIndex3 = withNegatives.argmax // 0 (index of value -1.0)
288
```
289
290
## Type Definitions
291
292
```scala { .api }
293
sealed trait Vector extends Serializable
294
295
class DenseVector(val values: Array[Double]) extends Vector
296
297
class SparseVector(
298
override val size: Int,
299
val indices: Array[Int],
300
val values: Array[Double]
301
) extends Vector
302
```