tessl install tessl/pypi-gensim@4.3.0Python library for topic modelling, document indexing and similarity retrieval with large corpora
Agent Success
Agent success rate when using this tile
78%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.03x
Baseline
Agent success rate without this tile
76%
Build a small utility that uses the target package's dataset and pre-trained model hub to list available word-embedding models, load one by name, and serve similarity/analogy queries using cached weights.
list_embeddings() returns at least five models sorted by name and includes a tuple for glove-wiki-gigaword-50 with vector size 50. @testlist_embeddings(min_vector_size=200) excludes glove-wiki-gigaword-50 and still returns glove-wiki-gigaword-300. @testglove-wiki-gigaword-50 with seeds ["king"] and topn=3 returns three entries ordered by similarity, with queen first. @testglove-wiki-gigaword-50 with positive=["king", "woman"] and negative=["man"], topn=1 returns queen as the best match with a positive score. @testmodel_name and a custom cache_dir across consecutive calls reuses the downloaded weights without fetching again. @test@generates
from pathlib import Path
from typing import Iterable, List, Tuple, Dict, Optional
def list_embeddings(min_vector_size: Optional[int] = None) -> List[Tuple[str, int]]:
"""Return available embedding names and their vector sizes, sorted by model name, optionally filtering out smaller vectors."""
def most_similar(model_name: str, seeds: Iterable[str], topn: int = 5, cache_dir: Optional[Path | str] = None) -> Dict[str, List[Tuple[str, float]]]:
"""Load the specified pre-trained embeddings from the hub (caching downloads) and return the top matches for each seed."""
def solve_analogy(model_name: str, positive: Iterable[str], negative: Iterable[str], topn: int = 1, cache_dir: Optional[Path | str] = None) -> List[Tuple[str, float]]:
"""Use the same cached embeddings to answer analogy queries; highest-scoring result first."""Provides dataset/model hub downloads and pre-trained word vectors.