tessl install tessl/pypi-gensim@4.3.0Python library for topic modelling, document indexing and similarity retrieval with large corpora
Agent Success
Agent success rate when using this tile
78%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.03x
Baseline
Agent success rate without this tile
76%
Build a small module that trains a lightweight word embedding model, persists it, reloads it in read-only form, and performs incremental updates without losing previously learned vectors.
@generates
from typing import Iterable, List, Any, Sequence
def train_checkpoint(sentences: Iterable[Sequence[str]], checkpoint_path: str, vector_size: int = 50, window: int = 5) -> str:
"""Train from scratch on tokenized sentences, persist a checkpoint, and return the path used."""
def load_for_inference(checkpoint_path: str, mmap: bool = True) -> Any:
"""Load a saved checkpoint for read-only inference; supports memory mapping when mmap is True."""
def update_with_sentences(model: Any, new_sentences: Iterable[Sequence[str]], checkpoint_path: str) -> str:
"""Incrementally update an existing model with additional sentences, persist a new checkpoint, and return its path."""
def similarity(model: Any, word_a: str, word_b: str) -> float:
"""Return cosine similarity for two tokens from the current model."""
def lifecycle_log(model: Any) -> List[str]:
"""Return lifecycle entries (most recent first) describing train/load/update steps with timestamps."""Provides persistence, incremental training, and lifecycle logging utilities for word embedding models.