0
# Hash Functions
1
2
Fast, well-distributed hash functions optimized for use in probabilistic data structures and general hash-based lookup. These implementations provide high-quality hash values with good distribution properties and performance characteristics.
3
4
## Capabilities
5
6
### MurmurHash
7
8
Fast, non-cryptographic hash function suitable for general hash-based lookup with excellent distribution properties and performance.
9
10
```java { .api }
11
/**
12
* MurmurHash implementation - fast, well-distributed hash function
13
*/
14
public class MurmurHash {
15
/**
16
* Hash any object using its toString() method
17
* @param o object to hash
18
* @return 32-bit hash value
19
*/
20
public static int hash(Object o);
21
22
/**
23
* Hash byte array
24
* @param data byte array to hash
25
* @return 32-bit hash value
26
*/
27
public static int hash(byte[] data);
28
29
/**
30
* Hash byte array with seed
31
* @param data byte array to hash
32
* @param seed seed value for hash function
33
* @return 32-bit hash value
34
*/
35
public static int hash(byte[] data, int seed);
36
37
/**
38
* Hash byte array with length and seed
39
* @param data byte array to hash
40
* @param length number of bytes to hash
41
* @param seed seed value for hash function
42
* @return 32-bit hash value
43
*/
44
public static int hash(byte[] data, int length, int seed);
45
46
/**
47
* Hash long value directly
48
* @param data long value to hash
49
* @return 32-bit hash value
50
*/
51
public static int hashLong(long data);
52
53
/**
54
* 64-bit hash of object using its toString() method
55
* @param o object to hash
56
* @return 64-bit hash value
57
*/
58
public static long hash64(Object o);
59
60
/**
61
* 64-bit hash of byte array
62
* @param data byte array to hash
63
* @param length number of bytes to hash
64
* @return 64-bit hash value
65
*/
66
public static long hash64(final byte[] data, int length);
67
68
/**
69
* 64-bit hash of byte array with seed
70
* @param data byte array to hash
71
* @param length number of bytes to hash
72
* @param seed seed value for hash function
73
* @return 64-bit hash value
74
*/
75
public static long hash64(final byte[] data, int length, int seed);
76
}
77
```
78
79
**Usage Examples:**
80
81
```java
82
import com.clearspring.analytics.hash.MurmurHash;
83
84
// Hash strings and objects
85
String text = "hello world";
86
int hash1 = MurmurHash.hash(text);
87
int hash2 = MurmurHash.hash(text.getBytes());
88
89
// Hash with custom seed for different hash functions
90
int hash3 = MurmurHash.hash(text.getBytes(), 12345);
91
92
// Hash long values directly
93
long value = 1234567890L;
94
int longHash = MurmurHash.hashLong(value);
95
96
// Get 64-bit hashes for more bits and less collisions
97
long hash64 = MurmurHash.hash64(text);
98
long hash64WithSeed = MurmurHash.hash64(text.getBytes(), text.length(), 54321);
99
100
System.out.println("32-bit hash: " + hash1);
101
System.out.println("64-bit hash: " + hash64);
102
```
103
104
### Lookup3Hash
105
106
Fast, well-distributed, cross-platform hash functions based on Bob Jenkins' lookup3 algorithm with variations optimized for different use cases.
107
108
```java { .api }
109
/**
110
* Lookup3Hash implementation based on Bob Jenkins' lookup3 algorithm
111
*/
112
public class Lookup3Hash {
113
/**
114
* Java implementation of hashword from lookup3.c
115
* @param k array of integers to hash
116
* @param offset starting offset in array
117
* @param length number of integers to hash
118
* @param initval initial value (seed)
119
* @return 32-bit hash value
120
*/
121
public static int lookup3(int[] k, int offset, int length, int initval);
122
123
/**
124
* Modified lookup3 with biased initial value for ycs (Yahoo Cloud Serving)
125
* @param k array of integers to hash
126
* @param offset starting offset in array
127
* @param length number of integers to hash
128
* @param initval initial value (seed)
129
* @return 32-bit hash value
130
*/
131
public static int lookup3ycs(int[] k, int offset, int length, int initval);
132
133
/**
134
* Hash character sequence using lookup3ycs algorithm
135
* @param s character sequence to hash
136
* @param start starting character index
137
* @param end ending character index (exclusive)
138
* @param initval initial value (seed)
139
* @return 32-bit hash value
140
*/
141
public static int lookup3ycs(CharSequence s, int start, int end, int initval);
142
143
/**
144
* 64-bit version of lookup3ycs for character sequences
145
* @param s character sequence to hash
146
* @param start starting character index
147
* @param end ending character index (exclusive)
148
* @param initval initial value (seed)
149
* @return 64-bit hash value
150
*/
151
public static long lookup3ycs64(CharSequence s, int start, int end, long initval);
152
153
/**
154
* Convenience method: 64-bit hash of entire character sequence
155
* @param s character sequence to hash
156
* @return 64-bit hash value using default parameters
157
*/
158
public static long lookup3ycs64(CharSequence s);
159
}
160
```
161
162
**Usage Examples:**
163
164
```java
165
import com.clearspring.analytics.hash.Lookup3Hash;
166
167
// Hash integer arrays
168
int[] data = {1, 2, 3, 4, 5};
169
int hash = Lookup3Hash.lookup3(data, 0, data.length, 0);
170
171
// Hash strings with different variants
172
String text = "sample text";
173
int hashYcs = Lookup3Hash.lookup3ycs(text.toCharArray(), 0, text.length(), 0);
174
175
// Hash substrings
176
String longText = "this is a longer string for testing";
177
int substringHash = Lookup3Hash.lookup3ycs(longText, 5, 15, 12345);
178
179
// Get 64-bit hashes for better distribution
180
long hash64 = Lookup3Hash.lookup3ycs64(text, 0, text.length(), 0);
181
long simpleHash64 = Lookup3Hash.lookup3ycs64(text);
182
183
System.out.println("Lookup3 hash: " + hash);
184
System.out.println("64-bit hash: " + hash64);
185
```
186
187
## Usage Patterns
188
189
### Choosing Hash Functions for Probabilistic Data Structures
190
191
```java
192
// For Bloom filters - need multiple independent hash functions
193
public class MultiHashBloomFilter {
194
private static final int[] SEEDS = {1, 3, 7, 11, 13, 17, 19, 23};
195
196
private int[] getHashValues(String key, int numHashes) {
197
int[] hashes = new int[numHashes];
198
byte[] bytes = key.getBytes();
199
200
for (int i = 0; i < numHashes; i++) {
201
hashes[i] = Math.abs(MurmurHash.hash(bytes, SEEDS[i]));
202
}
203
204
return hashes;
205
}
206
}
207
```
208
209
### High-Quality Hash Distribution
210
211
```java
212
// Use 64-bit hashes when avoiding collisions is critical
213
public class DistributedHashTable {
214
private static final int BUCKET_COUNT = 1000000;
215
216
public int getBucket(String key) {
217
// Use 64-bit hash to minimize collisions
218
long hash = MurmurHash.hash64(key);
219
220
// Map to bucket using modulo (take absolute value first)
221
return Math.abs((int)(hash % BUCKET_COUNT));
222
}
223
}
224
```
225
226
### Consistent Hashing with Seeds
227
228
```java
229
// Create multiple hash functions with different seeds
230
public class ConsistentHashRing {
231
private static final int VIRTUAL_NODES = 100;
232
233
private void addNode(String nodeId) {
234
for (int i = 0; i < VIRTUAL_NODES; i++) {
235
// Use different seeds to create virtual nodes
236
int hash = MurmurHash.hash(nodeId.getBytes(), i);
237
ring.put(hash, nodeId);
238
}
239
}
240
}
241
```
242
243
### String vs Byte Array Hashing
244
245
```java
246
// Different ways to hash strings - choose based on performance needs
247
public class StringHashingComparison {
248
249
public void demonstrateHashingMethods(String text) {
250
// Method 1: Hash object directly (uses toString())
251
int hash1 = MurmurHash.hash(text);
252
253
// Method 2: Convert to bytes first (more control)
254
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
255
int hash2 = MurmurHash.hash(bytes);
256
257
// Method 3: Use Lookup3 for character sequences (avoids byte conversion)
258
int hash3 = Lookup3Hash.lookup3ycs(text, 0, text.length(), 0);
259
260
// Method 4: 64-bit for high quality
261
long hash4 = MurmurHash.hash64(bytes, bytes.length);
262
263
System.out.println("Object hash: " + hash1);
264
System.out.println("Byte hash: " + hash2);
265
System.out.println("Lookup3 hash: " + hash3);
266
System.out.println("64-bit hash: " + hash4);
267
}
268
}
269
```
270
271
### Performance-Critical Hashing
272
273
```java
274
// Optimize for performance in tight loops
275
public class PerformanceOptimizedHashing {
276
277
// Pre-allocate byte arrays to avoid garbage collection
278
private final byte[] reuseableBuffer = new byte[1024];
279
280
public int fastHash(String input) {
281
// For short strings, Lookup3 on character sequence is fastest
282
if (input.length() < 50) {
283
return Lookup3Hash.lookup3ycs(input, 0, input.length(), 0);
284
}
285
286
// For longer strings, convert to bytes once and reuse
287
byte[] bytes = input.getBytes(StandardCharsets.UTF_8);
288
return MurmurHash.hash(bytes, bytes.length, 0);
289
}
290
291
public int hashLongValue(long value) {
292
// Specialized method for long values
293
return MurmurHash.hashLong(value);
294
}
295
}
296
```
297
298
### Creating Hash-Based Partitioning
299
300
```java
301
// Partition data across multiple processors/storage locations
302
public class DataPartitioner {
303
private final int partitionCount;
304
305
public DataPartitioner(int partitionCount) {
306
this.partitionCount = partitionCount;
307
}
308
309
public int getPartition(String key) {
310
// Use high-quality hash to ensure even distribution
311
int hash = MurmurHash.hash(key.getBytes());
312
313
// Map to partition (ensure positive result)
314
return Math.abs(hash) % partitionCount;
315
}
316
317
public int getPartition(long key) {
318
// Specialized version for numeric keys
319
int hash = MurmurHash.hashLong(key);
320
return Math.abs(hash) % partitionCount;
321
}
322
}
323
```
324
325
## Algorithm Selection Guidelines
326
327
### MurmurHash vs Lookup3Hash
328
329
**Use MurmurHash when**:
330
- General-purpose hashing is needed
331
- Working primarily with byte arrays
332
- Need both 32-bit and 64-bit variants
333
- Want proven performance and distribution quality
334
- Building hash tables or hash-based data structures
335
336
**Use Lookup3Hash when**:
337
- Working directly with character sequences
338
- Need compatibility with Bob Jenkins' reference implementation
339
- Want to avoid string-to-byte conversion overhead
340
- Working with integer arrays
341
- Need the specific distribution characteristics of lookup3
342
343
### 32-bit vs 64-bit Hashes
344
345
**Use 32-bit hashes when**:
346
- Memory usage is critical
347
- Hash table size is moderate (< 1M entries)
348
- Performance is more important than collision resistance
349
- Working with legacy systems expecting 32-bit hashes
350
351
**Use 64-bit hashes when**:
352
- Large datasets where collisions are expensive
353
- High-quality distribution is critical
354
- Working with distributed systems
355
- Cryptographic quality is needed (though these are non-cryptographic)
356
357
## Performance Characteristics
358
359
**MurmurHash**:
360
- Very fast on most architectures
361
- Good performance on both small and large inputs
362
- Excellent distribution properties
363
- Low collision rate
364
365
**Lookup3Hash**:
366
- Slightly slower than MurmurHash
367
- Better for character sequence processing
368
- Cross-platform consistent results
369
- Good for applications needing reproducible hashes
370
371
## Hash Quality Metrics
372
373
Both hash functions provide:
374
- **Avalanche effect**: Small input changes cause large output changes
375
- **Uniform distribution**: Hash values spread evenly across output space
376
- **Low collision rate**: Minimal hash collisions for typical input sets
377
- **Fast computation**: Optimized for speed on modern processors
378
379
These properties make them suitable for use in:
380
- Hash tables and maps
381
- Bloom filters and probabilistic data structures
382
- Distributed systems requiring consistent hashing
383
- Load balancing and data partitioning
384
- Checksums and data integrity verification (non-cryptographic)