Tessl Tile for maven/com.clearspring.analytics/stream@2.9.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cardinality.md frequency.md hash.md index.md membership.md quantile.md stream-summary.md

hash.mddocs/

0
# Hash Functions
1

2
Fast, well-distributed hash functions optimized for use in probabilistic data structures and general hash-based lookup. These implementations provide high-quality hash values with good distribution properties and performance characteristics.
3

4
## Capabilities
5

6
### MurmurHash
7

8
Fast, non-cryptographic hash function suitable for general hash-based lookup with excellent distribution properties and performance.
9

10
```java { .api }
11
/**
12
 * MurmurHash implementation - fast, well-distributed hash function
13
 */
14
public class MurmurHash {
15
    /**
16
     * Hash any object using its toString() method
17
     * @param o object to hash
18
     * @return 32-bit hash value
19
     */
20
    public static int hash(Object o);
21
    
22
    /**
23
     * Hash byte array
24
     * @param data byte array to hash
25
     * @return 32-bit hash value
26
     */
27
    public static int hash(byte[] data);
28
    
29
    /**
30
     * Hash byte array with seed
31
     * @param data byte array to hash
32
     * @param seed seed value for hash function
33
     * @return 32-bit hash value
34
     */
35
    public static int hash(byte[] data, int seed);
36
    
37
    /**
38
     * Hash byte array with length and seed
39
     * @param data byte array to hash
40
     * @param length number of bytes to hash
41
     * @param seed seed value for hash function
42
     * @return 32-bit hash value
43
     */
44
    public static int hash(byte[] data, int length, int seed);
45
    
46
    /**
47
     * Hash long value directly
48
     * @param data long value to hash
49
     * @return 32-bit hash value
50
     */
51
    public static int hashLong(long data);
52
    
53
    /**
54
     * 64-bit hash of object using its toString() method
55
     * @param o object to hash
56
     * @return 64-bit hash value
57
     */
58
    public static long hash64(Object o);
59
    
60
    /**
61
     * 64-bit hash of byte array
62
     * @param data byte array to hash
63
     * @param length number of bytes to hash
64
     * @return 64-bit hash value
65
     */
66
    public static long hash64(final byte[] data, int length);
67
    
68
    /**
69
     * 64-bit hash of byte array with seed
70
     * @param data byte array to hash
71
     * @param length number of bytes to hash
72
     * @param seed seed value for hash function
73
     * @return 64-bit hash value
74
     */
75
    public static long hash64(final byte[] data, int length, int seed);
76
}
77
```
78

79
**Usage Examples:**
80

81
```java
82
import com.clearspring.analytics.hash.MurmurHash;
83

84
// Hash strings and objects
85
String text = "hello world";
86
int hash1 = MurmurHash.hash(text);
87
int hash2 = MurmurHash.hash(text.getBytes());
88

89
// Hash with custom seed for different hash functions
90
int hash3 = MurmurHash.hash(text.getBytes(), 12345);
91

92
// Hash long values directly
93
long value = 1234567890L;
94
int longHash = MurmurHash.hashLong(value);
95

96
// Get 64-bit hashes for more bits and less collisions
97
long hash64 = MurmurHash.hash64(text);
98
long hash64WithSeed = MurmurHash.hash64(text.getBytes(), text.length(), 54321);
99

100
System.out.println("32-bit hash: " + hash1);
101
System.out.println("64-bit hash: " + hash64);
102
```
103

104
### Lookup3Hash
105

106
Fast, well-distributed, cross-platform hash functions based on Bob Jenkins' lookup3 algorithm with variations optimized for different use cases.
107

108
```java { .api }
109
/**
110
 * Lookup3Hash implementation based on Bob Jenkins' lookup3 algorithm
111
 */
112
public class Lookup3Hash {
113
    /**
114
     * Java implementation of hashword from lookup3.c
115
     * @param k array of integers to hash
116
     * @param offset starting offset in array
117
     * @param length number of integers to hash
118
     * @param initval initial value (seed)
119
     * @return 32-bit hash value
120
     */
121
    public static int lookup3(int[] k, int offset, int length, int initval);
122
    
123
    /**
124
     * Modified lookup3 with biased initial value for ycs (Yahoo Cloud Serving)
125
     * @param k array of integers to hash
126
     * @param offset starting offset in array
127
     * @param length number of integers to hash
128
     * @param initval initial value (seed)
129
     * @return 32-bit hash value
130
     */
131
    public static int lookup3ycs(int[] k, int offset, int length, int initval);
132
    
133
    /**
134
     * Hash character sequence using lookup3ycs algorithm
135
     * @param s character sequence to hash
136
     * @param start starting character index
137
     * @param end ending character index (exclusive)
138
     * @param initval initial value (seed)
139
     * @return 32-bit hash value
140
     */
141
    public static int lookup3ycs(CharSequence s, int start, int end, int initval);
142
    
143
    /**
144
     * 64-bit version of lookup3ycs for character sequences
145
     * @param s character sequence to hash
146
     * @param start starting character index
147
     * @param end ending character index (exclusive)
148
     * @param initval initial value (seed)
149
     * @return 64-bit hash value
150
     */
151
    public static long lookup3ycs64(CharSequence s, int start, int end, long initval);
152
    
153
    /**
154
     * Convenience method: 64-bit hash of entire character sequence
155
     * @param s character sequence to hash
156
     * @return 64-bit hash value using default parameters
157
     */
158
    public static long lookup3ycs64(CharSequence s);
159
}
160
```
161

162
**Usage Examples:**
163

164
```java
165
import com.clearspring.analytics.hash.Lookup3Hash;
166

167
// Hash integer arrays
168
int[] data = {1, 2, 3, 4, 5};
169
int hash = Lookup3Hash.lookup3(data, 0, data.length, 0);
170

171
// Hash strings with different variants
172
String text = "sample text";
173
int hashYcs = Lookup3Hash.lookup3ycs(text.toCharArray(), 0, text.length(), 0);
174

175
// Hash substrings
176
String longText = "this is a longer string for testing";
177
int substringHash = Lookup3Hash.lookup3ycs(longText, 5, 15, 12345);
178

179
// Get 64-bit hashes for better distribution
180
long hash64 = Lookup3Hash.lookup3ycs64(text, 0, text.length(), 0);
181
long simpleHash64 = Lookup3Hash.lookup3ycs64(text);
182

183
System.out.println("Lookup3 hash: " + hash);
184
System.out.println("64-bit hash: " + hash64);
185
```
186

187
## Usage Patterns
188

189
### Choosing Hash Functions for Probabilistic Data Structures
190

191
```java
192
// For Bloom filters - need multiple independent hash functions
193
public class MultiHashBloomFilter {
194
    private static final int[] SEEDS = {1, 3, 7, 11, 13, 17, 19, 23};
195
    
196
    private int[] getHashValues(String key, int numHashes) {
197
        int[] hashes = new int[numHashes];
198
        byte[] bytes = key.getBytes();
199
        
200
        for (int i = 0; i < numHashes; i++) {
201
            hashes[i] = Math.abs(MurmurHash.hash(bytes, SEEDS[i]));
202
        }
203
        
204
        return hashes;
205
    }
206
}
207
```
208

209
### High-Quality Hash Distribution
210

211
```java
212
// Use 64-bit hashes when avoiding collisions is critical
213
public class DistributedHashTable {
214
    private static final int BUCKET_COUNT = 1000000;
215
    
216
    public int getBucket(String key) {
217
        // Use 64-bit hash to minimize collisions
218
        long hash = MurmurHash.hash64(key);
219
        
220
        // Map to bucket using modulo (take absolute value first)
221
        return Math.abs((int)(hash % BUCKET_COUNT));
222
    }
223
}
224
```
225

226
### Consistent Hashing with Seeds
227

228
```java
229
// Create multiple hash functions with different seeds
230
public class ConsistentHashRing {
231
    private static final int VIRTUAL_NODES = 100;
232
    
233
    private void addNode(String nodeId) {
234
        for (int i = 0; i < VIRTUAL_NODES; i++) {
235
            // Use different seeds to create virtual nodes
236
            int hash = MurmurHash.hash(nodeId.getBytes(), i);
237
            ring.put(hash, nodeId);
238
        }
239
    }
240
}
241
```
242

243
### String vs Byte Array Hashing
244

245
```java
246
// Different ways to hash strings - choose based on performance needs
247
public class StringHashingComparison {
248
    
249
    public void demonstrateHashingMethods(String text) {
250
        // Method 1: Hash object directly (uses toString())
251
        int hash1 = MurmurHash.hash(text);
252
        
253
        // Method 2: Convert to bytes first (more control)
254
        byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
255
        int hash2 = MurmurHash.hash(bytes);
256
        
257
        // Method 3: Use Lookup3 for character sequences (avoids byte conversion)
258
        int hash3 = Lookup3Hash.lookup3ycs(text, 0, text.length(), 0);
259
        
260
        // Method 4: 64-bit for high quality
261
        long hash4 = MurmurHash.hash64(bytes, bytes.length);
262
        
263
        System.out.println("Object hash: " + hash1);
264
        System.out.println("Byte hash: " + hash2);
265
        System.out.println("Lookup3 hash: " + hash3);
266
        System.out.println("64-bit hash: " + hash4);
267
    }
268
}
269
```
270

271
### Performance-Critical Hashing
272

273
```java
274
// Optimize for performance in tight loops
275
public class PerformanceOptimizedHashing {
276
    
277
    // Pre-allocate byte arrays to avoid garbage collection
278
    private final byte[] reuseableBuffer = new byte[1024];
279
    
280
    public int fastHash(String input) {
281
        // For short strings, Lookup3 on character sequence is fastest
282
        if (input.length() < 50) {
283
            return Lookup3Hash.lookup3ycs(input, 0, input.length(), 0);
284
        }
285
        
286
        // For longer strings, convert to bytes once and reuse
287
        byte[] bytes = input.getBytes(StandardCharsets.UTF_8);
288
        return MurmurHash.hash(bytes, bytes.length, 0);
289
    }
290
    
291
    public int hashLongValue(long value) {
292
        // Specialized method for long values
293
        return MurmurHash.hashLong(value);
294
    }
295
}
296
```
297

298
### Creating Hash-Based Partitioning
299

300
```java
301
// Partition data across multiple processors/storage locations
302
public class DataPartitioner {
303
    private final int partitionCount;
304
    
305
    public DataPartitioner(int partitionCount) {
306
        this.partitionCount = partitionCount;
307
    }
308
    
309
    public int getPartition(String key) {
310
        // Use high-quality hash to ensure even distribution
311
        int hash = MurmurHash.hash(key.getBytes());
312
        
313
        // Map to partition (ensure positive result)
314
        return Math.abs(hash) % partitionCount;
315
    }
316
    
317
    public int getPartition(long key) {
318
        // Specialized version for numeric keys
319
        int hash = MurmurHash.hashLong(key);
320
        return Math.abs(hash) % partitionCount;
321
    }
322
}
323
```
324

325
## Algorithm Selection Guidelines
326

327
### MurmurHash vs Lookup3Hash
328

329
**Use MurmurHash when**:
330
- General-purpose hashing is needed
331
- Working primarily with byte arrays
332
- Need both 32-bit and 64-bit variants
333
- Want proven performance and distribution quality
334
- Building hash tables or hash-based data structures
335

336
**Use Lookup3Hash when**:
337
- Working directly with character sequences
338
- Need compatibility with Bob Jenkins' reference implementation
339
- Want to avoid string-to-byte conversion overhead
340
- Working with integer arrays
341
- Need the specific distribution characteristics of lookup3
342

343
### 32-bit vs 64-bit Hashes
344

345
**Use 32-bit hashes when**:
346
- Memory usage is critical
347
- Hash table size is moderate (< 1M entries)
348
- Performance is more important than collision resistance
349
- Working with legacy systems expecting 32-bit hashes
350

351
**Use 64-bit hashes when**:
352
- Large datasets where collisions are expensive
353
- High-quality distribution is critical
354
- Working with distributed systems
355
- Cryptographic quality is needed (though these are non-cryptographic)
356

357
## Performance Characteristics
358

359
**MurmurHash**:
360
- Very fast on most architectures
361
- Good performance on both small and large inputs
362
- Excellent distribution properties
363
- Low collision rate
364

365
**Lookup3Hash**:
366
- Slightly slower than MurmurHash
367
- Better for character sequence processing
368
- Cross-platform consistent results
369
- Good for applications needing reproducible hashes
370

371
## Hash Quality Metrics
372

373
Both hash functions provide:
374
- **Avalanche effect**: Small input changes cause large output changes
375
- **Uniform distribution**: Hash values spread evenly across output space
376
- **Low collision rate**: Minimal hash collisions for typical input sets
377
- **Fast computation**: Optimized for speed on modern processors
378

379
These properties make them suitable for use in:
380
- Hash tables and maps
381
- Bloom filters and probabilistic data structures  
382
- Distributed systems requiring consistent hashing
383
- Load balancing and data partitioning
384
- Checksums and data integrity verification (non-cryptographic)

Version

Tile

Files

hash.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

hash.mddocs/