or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cardinality.mdfrequency.mdhash.mdindex.mdmembership.mdquantile.mdstream-summary.md

hash.mddocs/

0

# Hash Functions

1

2

Fast, well-distributed hash functions optimized for use in probabilistic data structures and general hash-based lookup. These implementations provide high-quality hash values with good distribution properties and performance characteristics.

3

4

## Capabilities

5

6

### MurmurHash

7

8

Fast, non-cryptographic hash function suitable for general hash-based lookup with excellent distribution properties and performance.

9

10

```java { .api }

11

/**

12

* MurmurHash implementation - fast, well-distributed hash function

13

*/

14

public class MurmurHash {

15

/**

16

* Hash any object using its toString() method

17

* @param o object to hash

18

* @return 32-bit hash value

19

*/

20

public static int hash(Object o);

21

22

/**

23

* Hash byte array

24

* @param data byte array to hash

25

* @return 32-bit hash value

26

*/

27

public static int hash(byte[] data);

28

29

/**

30

* Hash byte array with seed

31

* @param data byte array to hash

32

* @param seed seed value for hash function

33

* @return 32-bit hash value

34

*/

35

public static int hash(byte[] data, int seed);

36

37

/**

38

* Hash byte array with length and seed

39

* @param data byte array to hash

40

* @param length number of bytes to hash

41

* @param seed seed value for hash function

42

* @return 32-bit hash value

43

*/

44

public static int hash(byte[] data, int length, int seed);

45

46

/**

47

* Hash long value directly

48

* @param data long value to hash

49

* @return 32-bit hash value

50

*/

51

public static int hashLong(long data);

52

53

/**

54

* 64-bit hash of object using its toString() method

55

* @param o object to hash

56

* @return 64-bit hash value

57

*/

58

public static long hash64(Object o);

59

60

/**

61

* 64-bit hash of byte array

62

* @param data byte array to hash

63

* @param length number of bytes to hash

64

* @return 64-bit hash value

65

*/

66

public static long hash64(final byte[] data, int length);

67

68

/**

69

* 64-bit hash of byte array with seed

70

* @param data byte array to hash

71

* @param length number of bytes to hash

72

* @param seed seed value for hash function

73

* @return 64-bit hash value

74

*/

75

public static long hash64(final byte[] data, int length, int seed);

76

}

77

```

78

79

**Usage Examples:**

80

81

```java

82

import com.clearspring.analytics.hash.MurmurHash;

83

84

// Hash strings and objects

85

String text = "hello world";

86

int hash1 = MurmurHash.hash(text);

87

int hash2 = MurmurHash.hash(text.getBytes());

88

89

// Hash with custom seed for different hash functions

90

int hash3 = MurmurHash.hash(text.getBytes(), 12345);

91

92

// Hash long values directly

93

long value = 1234567890L;

94

int longHash = MurmurHash.hashLong(value);

95

96

// Get 64-bit hashes for more bits and less collisions

97

long hash64 = MurmurHash.hash64(text);

98

long hash64WithSeed = MurmurHash.hash64(text.getBytes(), text.length(), 54321);

99

100

System.out.println("32-bit hash: " + hash1);

101

System.out.println("64-bit hash: " + hash64);

102

```

103

104

### Lookup3Hash

105

106

Fast, well-distributed, cross-platform hash functions based on Bob Jenkins' lookup3 algorithm with variations optimized for different use cases.

107

108

```java { .api }

109

/**

110

* Lookup3Hash implementation based on Bob Jenkins' lookup3 algorithm

111

*/

112

public class Lookup3Hash {

113

/**

114

* Java implementation of hashword from lookup3.c

115

* @param k array of integers to hash

116

* @param offset starting offset in array

117

* @param length number of integers to hash

118

* @param initval initial value (seed)

119

* @return 32-bit hash value

120

*/

121

public static int lookup3(int[] k, int offset, int length, int initval);

122

123

/**

124

* Modified lookup3 with biased initial value for ycs (Yahoo Cloud Serving)

125

* @param k array of integers to hash

126

* @param offset starting offset in array

127

* @param length number of integers to hash

128

* @param initval initial value (seed)

129

* @return 32-bit hash value

130

*/

131

public static int lookup3ycs(int[] k, int offset, int length, int initval);

132

133

/**

134

* Hash character sequence using lookup3ycs algorithm

135

* @param s character sequence to hash

136

* @param start starting character index

137

* @param end ending character index (exclusive)

138

* @param initval initial value (seed)

139

* @return 32-bit hash value

140

*/

141

public static int lookup3ycs(CharSequence s, int start, int end, int initval);

142

143

/**

144

* 64-bit version of lookup3ycs for character sequences

145

* @param s character sequence to hash

146

* @param start starting character index

147

* @param end ending character index (exclusive)

148

* @param initval initial value (seed)

149

* @return 64-bit hash value

150

*/

151

public static long lookup3ycs64(CharSequence s, int start, int end, long initval);

152

153

/**

154

* Convenience method: 64-bit hash of entire character sequence

155

* @param s character sequence to hash

156

* @return 64-bit hash value using default parameters

157

*/

158

public static long lookup3ycs64(CharSequence s);

159

}

160

```

161

162

**Usage Examples:**

163

164

```java

165

import com.clearspring.analytics.hash.Lookup3Hash;

166

167

// Hash integer arrays

168

int[] data = {1, 2, 3, 4, 5};

169

int hash = Lookup3Hash.lookup3(data, 0, data.length, 0);

170

171

// Hash strings with different variants

172

String text = "sample text";

173

int hashYcs = Lookup3Hash.lookup3ycs(text.toCharArray(), 0, text.length(), 0);

174

175

// Hash substrings

176

String longText = "this is a longer string for testing";

177

int substringHash = Lookup3Hash.lookup3ycs(longText, 5, 15, 12345);

178

179

// Get 64-bit hashes for better distribution

180

long hash64 = Lookup3Hash.lookup3ycs64(text, 0, text.length(), 0);

181

long simpleHash64 = Lookup3Hash.lookup3ycs64(text);

182

183

System.out.println("Lookup3 hash: " + hash);

184

System.out.println("64-bit hash: " + hash64);

185

```

186

187

## Usage Patterns

188

189

### Choosing Hash Functions for Probabilistic Data Structures

190

191

```java

192

// For Bloom filters - need multiple independent hash functions

193

public class MultiHashBloomFilter {

194

private static final int[] SEEDS = {1, 3, 7, 11, 13, 17, 19, 23};

195

196

private int[] getHashValues(String key, int numHashes) {

197

int[] hashes = new int[numHashes];

198

byte[] bytes = key.getBytes();

199

200

for (int i = 0; i < numHashes; i++) {

201

hashes[i] = Math.abs(MurmurHash.hash(bytes, SEEDS[i]));

202

}

203

204

return hashes;

205

}

206

}

207

```

208

209

### High-Quality Hash Distribution

210

211

```java

212

// Use 64-bit hashes when avoiding collisions is critical

213

public class DistributedHashTable {

214

private static final int BUCKET_COUNT = 1000000;

215

216

public int getBucket(String key) {

217

// Use 64-bit hash to minimize collisions

218

long hash = MurmurHash.hash64(key);

219

220

// Map to bucket using modulo (take absolute value first)

221

return Math.abs((int)(hash % BUCKET_COUNT));

222

}

223

}

224

```

225

226

### Consistent Hashing with Seeds

227

228

```java

229

// Create multiple hash functions with different seeds

230

public class ConsistentHashRing {

231

private static final int VIRTUAL_NODES = 100;

232

233

private void addNode(String nodeId) {

234

for (int i = 0; i < VIRTUAL_NODES; i++) {

235

// Use different seeds to create virtual nodes

236

int hash = MurmurHash.hash(nodeId.getBytes(), i);

237

ring.put(hash, nodeId);

238

}

239

}

240

}

241

```

242

243

### String vs Byte Array Hashing

244

245

```java

246

// Different ways to hash strings - choose based on performance needs

247

public class StringHashingComparison {

248

249

public void demonstrateHashingMethods(String text) {

250

// Method 1: Hash object directly (uses toString())

251

int hash1 = MurmurHash.hash(text);

252

253

// Method 2: Convert to bytes first (more control)

254

byte[] bytes = text.getBytes(StandardCharsets.UTF_8);

255

int hash2 = MurmurHash.hash(bytes);

256

257

// Method 3: Use Lookup3 for character sequences (avoids byte conversion)

258

int hash3 = Lookup3Hash.lookup3ycs(text, 0, text.length(), 0);

259

260

// Method 4: 64-bit for high quality

261

long hash4 = MurmurHash.hash64(bytes, bytes.length);

262

263

System.out.println("Object hash: " + hash1);

264

System.out.println("Byte hash: " + hash2);

265

System.out.println("Lookup3 hash: " + hash3);

266

System.out.println("64-bit hash: " + hash4);

267

}

268

}

269

```

270

271

### Performance-Critical Hashing

272

273

```java

274

// Optimize for performance in tight loops

275

public class PerformanceOptimizedHashing {

276

277

// Pre-allocate byte arrays to avoid garbage collection

278

private final byte[] reuseableBuffer = new byte[1024];

279

280

public int fastHash(String input) {

281

// For short strings, Lookup3 on character sequence is fastest

282

if (input.length() < 50) {

283

return Lookup3Hash.lookup3ycs(input, 0, input.length(), 0);

284

}

285

286

// For longer strings, convert to bytes once and reuse

287

byte[] bytes = input.getBytes(StandardCharsets.UTF_8);

288

return MurmurHash.hash(bytes, bytes.length, 0);

289

}

290

291

public int hashLongValue(long value) {

292

// Specialized method for long values

293

return MurmurHash.hashLong(value);

294

}

295

}

296

```

297

298

### Creating Hash-Based Partitioning

299

300

```java

301

// Partition data across multiple processors/storage locations

302

public class DataPartitioner {

303

private final int partitionCount;

304

305

public DataPartitioner(int partitionCount) {

306

this.partitionCount = partitionCount;

307

}

308

309

public int getPartition(String key) {

310

// Use high-quality hash to ensure even distribution

311

int hash = MurmurHash.hash(key.getBytes());

312

313

// Map to partition (ensure positive result)

314

return Math.abs(hash) % partitionCount;

315

}

316

317

public int getPartition(long key) {

318

// Specialized version for numeric keys

319

int hash = MurmurHash.hashLong(key);

320

return Math.abs(hash) % partitionCount;

321

}

322

}

323

```

324

325

## Algorithm Selection Guidelines

326

327

### MurmurHash vs Lookup3Hash

328

329

**Use MurmurHash when**:

330

- General-purpose hashing is needed

331

- Working primarily with byte arrays

332

- Need both 32-bit and 64-bit variants

333

- Want proven performance and distribution quality

334

- Building hash tables or hash-based data structures

335

336

**Use Lookup3Hash when**:

337

- Working directly with character sequences

338

- Need compatibility with Bob Jenkins' reference implementation

339

- Want to avoid string-to-byte conversion overhead

340

- Working with integer arrays

341

- Need the specific distribution characteristics of lookup3

342

343

### 32-bit vs 64-bit Hashes

344

345

**Use 32-bit hashes when**:

346

- Memory usage is critical

347

- Hash table size is moderate (< 1M entries)

348

- Performance is more important than collision resistance

349

- Working with legacy systems expecting 32-bit hashes

350

351

**Use 64-bit hashes when**:

352

- Large datasets where collisions are expensive

353

- High-quality distribution is critical

354

- Working with distributed systems

355

- Cryptographic quality is needed (though these are non-cryptographic)

356

357

## Performance Characteristics

358

359

**MurmurHash**:

360

- Very fast on most architectures

361

- Good performance on both small and large inputs

362

- Excellent distribution properties

363

- Low collision rate

364

365

**Lookup3Hash**:

366

- Slightly slower than MurmurHash

367

- Better for character sequence processing

368

- Cross-platform consistent results

369

- Good for applications needing reproducible hashes

370

371

## Hash Quality Metrics

372

373

Both hash functions provide:

374

- **Avalanche effect**: Small input changes cause large output changes

375

- **Uniform distribution**: Hash values spread evenly across output space

376

- **Low collision rate**: Minimal hash collisions for typical input sets

377

- **Fast computation**: Optimized for speed on modern processors

378

379

These properties make them suitable for use in:

380

- Hash tables and maps

381

- Bloom filters and probabilistic data structures

382

- Distributed systems requiring consistent hashing

383

- Load balancing and data partitioning

384

- Checksums and data integrity verification (non-cryptographic)