CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-gensim

tessl install tessl/pypi-gensim@4.3.0

Python library for topic modelling, document indexing and similarity retrieval with large corpora

Agent Success

Agent success rate when using this tile

78%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.03x

Baseline

Agent success rate without this tile

76%

task.mdevals/scenario-9/

Incremental Embedding Lifecycle

Build a small module that trains a lightweight word embedding model, persists it, reloads it in read-only form, and performs incremental updates without losing previously learned vectors.

Capabilities

Create and persist initial model

  • Training on a small list of tokenized sentences writes a checkpoint file to the requested path and returns the on-disk path actually used. @test
  • Reloading the saved checkpoint exposes cosine similarity for words that appeared in the initial corpus without retraining. @test

Memory-mapped inference

  • Loading the checkpoint in memory-mapped/read-only mode answers similarity queries while keeping the checkpoint file unmodified. @test

Incremental updates

  • Supplying new sentences that introduce unseen tokens updates the model and saves a new checkpoint; vectors for new tokens become available while previously learned tokens remain queryable. @test

Lifecycle logging

  • Each training, loading, and update operation appends a timestamped lifecycle entry retrievable via the API. @test

Implementation

@generates

API

from typing import Iterable, List, Any, Sequence

def train_checkpoint(sentences: Iterable[Sequence[str]], checkpoint_path: str, vector_size: int = 50, window: int = 5) -> str:
    """Train from scratch on tokenized sentences, persist a checkpoint, and return the path used."""

def load_for_inference(checkpoint_path: str, mmap: bool = True) -> Any:
    """Load a saved checkpoint for read-only inference; supports memory mapping when mmap is True."""

def update_with_sentences(model: Any, new_sentences: Iterable[Sequence[str]], checkpoint_path: str) -> str:
    """Incrementally update an existing model with additional sentences, persist a new checkpoint, and return its path."""

def similarity(model: Any, word_a: str, word_b: str) -> float:
    """Return cosine similarity for two tokens from the current model."""

def lifecycle_log(model: Any) -> List[str]:
    """Return lifecycle entries (most recent first) describing train/load/update steps with timestamps."""

Dependencies { .dependencies }

gensim { .dependency }

Provides persistence, incremental training, and lifecycle logging utilities for word embedding models.

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/gensim@4.3.x
tile.json