CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-mmh3

Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

mmh3

mmh3 is a Python extension providing MurmurHash3 hash functions, a family of fast and robust non-cryptographic hash functions. It offers comprehensive hashing capabilities including 32-bit, 64-bit, and 128-bit hash functions with both signed and unsigned outputs, supporting architecture-specific optimizations for x86 and x64 platforms.

Package Information

  • Package Name: mmh3
  • Language: Python
  • Installation: pip install mmh3

Core Imports

import mmh3

All functions and classes are available directly from the mmh3 module.

Basic Usage

import mmh3

# Basic 32-bit hashing
hash_value = mmh3.hash("foo")  # -156908512
hash_with_seed = mmh3.hash("foo", seed=42)  # -1322301282
unsigned_hash = mmh3.hash("foo", signed=False)  # 4138058784

# 64-bit hashing (returns tuple of two 64-bit integers)
hash64_result = mmh3.hash64("foo")  # (-2129773440516405919, 9128664383759220103)

# 128-bit hashing
hash128_result = mmh3.hash128("foo", seed=42)  # 215966891540331383248189432718888555506

# Hash as bytes
hash_bytes = mmh3.hash_bytes("foo")  # b'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~'

# Streaming hasher for large data
hasher = mmh3.mmh3_32(seed=42)
hasher.update(b"foo")
hasher.update(b"bar")
digest = hasher.digest()  # bytes
sint_digest = hasher.sintdigest()  # signed int
uint_digest = hasher.uintdigest()  # unsigned int

Architecture

mmh3 provides two complementary interfaces:

  • Simple Functions: Direct hash computation for immediate results
  • Hasher Classes: Streaming interface for incremental hashing of large datasets

The library implements MurmurHash3 algorithms with architecture-specific optimizations:

  • x64 optimization: For 64-bit architectures (default)
  • x86 optimization: For 32-bit architectures

All hash functions support configurable seeds and signed/unsigned output options.

Capabilities

Simple Hash Functions

Direct hash computation functions for immediate results with various output formats and architecture optimizations.

def hash(key: StrHashable, seed: int = 0, signed: bool = True) -> int: ...
def hash_from_buffer(key: StrHashable, seed: int = 0, signed: bool = True) -> int: ...
def hash64(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = True) -> tuple[int, int]: ...
def hash128(key: StrHashable, seed: int = 0, x64arch: bool = True, signed: bool = False) -> int: ...
def hash_bytes(key: StrHashable, seed: int = 0, x64arch: bool = True) -> bytes: ...

Simple Hash Functions

Streaming Hashers

hashlib-compatible hasher classes for incremental hashing of large datasets and streaming operations.

class Hasher:
    def __init__(self, seed: int = 0) -> None: ...
    def update(self, input: Hashable) -> None: ...
    def digest(self) -> bytes: ...
    def sintdigest(self) -> int: ...
    def uintdigest(self) -> int: ...
    def copy(self) -> Hasher: ...
    @property
    def digest_size(self) -> int: ...
    @property
    def block_size(self) -> int: ...
    @property
    def name(self) -> str: ...

class mmh3_32(Hasher): ...

class mmh3_x64_128(Hasher):
    def stupledigest(self) -> tuple[int, int]: ...
    def utupledigest(self) -> tuple[int, int]: ...

class mmh3_x86_128(Hasher):
    def stupledigest(self) -> tuple[int, int]: ...
    def utupledigest(self) -> tuple[int, int]: ...

Streaming Hashers

Types

from typing import Protocol, Union

class IntArrayLike(Protocol):
    def __getitem__(self, index) -> int: ...

Hashable = Union[bytes, bytearray, memoryview, IntArrayLike]
StrHashable = Union[str, Hashable]

Common Use Cases

  • Data Mining & Machine Learning: Feature hashing and dimensionality reduction
  • Bloom Filters: Fast set membership testing with probabilistic data structures
  • MinHash Algorithms: Document similarity and near-duplicate detection
  • Natural Language Processing: Text fingerprinting and similarity matching
  • IoT Security Research: Shodan favicon hash calculations
  • Distributed Systems: Consistent hashing and data partitioning
  • Caching: Cache key generation with collision resistance

docs

hashers.md

index.md

simple-functions.md

tile.json