CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-mmh3

Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.

Pending
Overview
Eval results
Files

simple-functions.mddocs/

Simple Hash Functions

Direct hash computation functions that provide immediate results with various output formats and architecture optimizations. These functions are ideal for one-time hashing operations where you don't need streaming capabilities.

Capabilities

32-bit Hash Function

Computes a 32-bit MurmurHash3 hash value from the input data.

def hash(key: StrHashable, seed: int = 0, signed: bool = True) -> int:
    """
    Compute 32-bit MurmurHash3 hash.
    
    Args:
        key: Input data to hash (str, bytes, bytearray, memoryview, or array-like)
        seed: Seed value for hash computation (default: 0)
        signed: Return signed integer if True, unsigned if False (default: True)
    
    Returns:
        32-bit hash value as signed or unsigned integer
    """

Example usage:

import mmh3

# Basic hashing
result = mmh3.hash("foo")  # -156908512

# With custom seed
result = mmh3.hash("foo", seed=42)  # -1322301282

# Unsigned output
result = mmh3.hash("foo", signed=False)  # 4138058784

# Hash bytes
result = mmh3.hash(b"hello")  # hash bytes directly

# Hash array-like objects
import numpy as np
arr = np.array([1, 2, 3, 4], dtype=np.int8)
result = mmh3.hash(arr)

Buffer-based 32-bit Hash

Computes 32-bit hash without memory copying, optimized for large memory views and arrays.

def hash_from_buffer(key: StrHashable, seed: int = 0, signed: bool = True) -> int:
    """
    Compute 32-bit MurmurHash3 hash without memory copying.
    
    Args:
        key: Input data to hash (str, bytes, bytearray, memoryview, or array-like)
        seed: Seed value for hash computation (default: 0)
        signed: Return signed integer if True, unsigned if False (default: True)
    
    Returns:
        32-bit hash value as signed or unsigned integer
    """

Example usage:

import mmh3
import numpy as np

# Efficient hashing of large arrays
large_array = np.random.rand(1000000)
result = mmh3.hash_from_buffer(large_array)  # -2137204694

# Memory-efficient hashing
memview = memoryview(b"large data chunk" * 10000)
result = mmh3.hash_from_buffer(memview, signed=False)  # 3812874078

64-bit Hash Function

Computes 64-bit hash using the 128-bit algorithm backend, returning two 64-bit integers.

def hash64(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = True) -> tuple[int, int]:
    """
    Compute 64-bit MurmurHash3 hash using 128-bit algorithm.
    
    Args:
        key: Input data to hash (str, bytes, bytearray, memoryview, or array-like)
        seed: Seed value for hash computation (default: 0)
        x64arch: Use x64 optimization if True, x86 if False (default: True)
        signed: Return signed integers if True, unsigned if False (default: True)
    
    Returns:
        Tuple of two 64-bit hash values as signed or unsigned integers
    """

Example usage:

import mmh3

# Basic 64-bit hashing
result = mmh3.hash64("foo")  # (-2129773440516405919, 9128664383759220103)

# Unsigned 64-bit hash
result = mmh3.hash64("foo", signed=False)  # (16316970633193145697, 9128664383759220103)

# With x86 architecture optimization
result = mmh3.hash64("foo", x64arch=False)  # Different result optimized for x86

# With custom seed and architecture
result = mmh3.hash64("foo", seed=42, x64arch=True)  # (-840311307571801102, -6739155424061121879)

128-bit Hash Function

Computes a 128-bit MurmurHash3 hash value returned as a single large integer.

def hash128(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = False) -> int:
    """
    Compute 128-bit MurmurHash3 hash.
    
    Args:
        key: Input data to hash (str, bytes, bytearray, memoryview, or array-like)
        seed: Seed value for hash computation (default: 0)
        x64arch: Use x64 optimization if True, x86 if False (default: True)
        signed: Return signed integer if True, unsigned if False (default: False)
    
    Returns:
        128-bit hash value as signed or unsigned integer
    """

Example usage:

import mmh3

# Basic 128-bit hashing (unsigned by default)
result = mmh3.hash128("foo")  # Large 128-bit unsigned integer

# With custom seed
result = mmh3.hash128("foo", seed=42)  # 215966891540331383248189432718888555506

# Signed 128-bit hash
result = mmh3.hash128("foo", seed=42, signed=True)  # -124315475380607080215185174712879655950

# x86 architecture optimization
result = mmh3.hash128("foo", x64arch=False)  # Optimized for x86

Hash as Bytes

Computes 128-bit hash and returns the result as raw bytes.

def hash_bytes(key: StrHashable, seed: int = 0, x64arch: bool = True) -> bytes:
    """
    Compute 128-bit MurmurHash3 hash returned as bytes.
    
    Args:
        key: Input data to hash (str, bytes, bytearray, memoryview, or array-like)
        seed: Seed value for hash computation (default: 0)
        x64arch: Use x64 optimization if True, x86 if False (default: True)
    
    Returns:
        128-bit hash value as 16-byte bytes object
    """

Example usage:

import mmh3

# Hash as bytes
result = mmh3.hash_bytes("foo")  # b'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~'

# With custom seed
result = mmh3.hash_bytes("foo", seed=42)  # 16 bytes

# Convert to hex string if needed
hex_result = mmh3.hash_bytes("foo").hex()  # '6145f501578671e2877dba2be487af7e'

# Hash large numpy arrays efficiently
import numpy as np
large_array = np.zeros(2**20, dtype=np.int8)  # 1MB array
result = mmh3.hash_bytes(large_array)  # b'V\x8f}\xad\x8eNM\xa84\x07FU\x9c\xc4\xcc\x8e'

Architecture Optimization

The x64arch parameter in hash64, hash128, and hash_bytes functions controls algorithm optimization:

  • x64arch=True (default): Optimized for 64-bit architectures
  • x64arch=False: Optimized for 32-bit architectures

Choose the appropriate setting based on your target platform for optimal performance.

Input Type Support

All functions accept these input types:

  • str: Unicode strings (automatically encoded to UTF-8)
  • bytes: Raw byte data
  • bytearray: Mutable byte arrays
  • memoryview: Memory views for zero-copy operations
  • Array-like objects: NumPy arrays, lists with integer indexing

Seed Values

  • Seeds must be 32-bit integers (0 to 2^32 - 1)
  • Negative seeds are automatically converted to unsigned 32-bit representation
  • Seeds exceeding 32-bit range may produce unexpected results

Error Handling

Functions raise TypeError for invalid input types and handle memory allocation failures gracefully. All functions are thread-safe and can be used in concurrent environments.

Install with Tessl CLI

npx tessl i tessl/pypi-mmh3

docs

hashers.md

index.md

simple-functions.md

tile.json