tessl install tessl/pypi-gensim@4.3.0Python library for topic modelling, document indexing and similarity retrieval with large corpora
Agent Success
Agent success rate when using this tile
78%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.03x
Baseline
Agent success rate without this tile
76%
Builds word embeddings from short sentences, exposes similarity queries, and derives sentence-level vectors for downstream retrieval.
train builds embeddings where every token meeting min_count is in the vocabulary, each vector length equals vector_size, and repeated runs with the same seed and corpus keep similarity orderings stable. @test[['king', 'queen'], ['king', 'prince'], ['queen', 'princess'], ['river', 'flow']]), most_similar("king", topn=3) returns queen with a positive similarity score and ranks it ahead of unrelated tokens such as river. @testinfer_sentence_vector on ["spicy", "taco"] yields a dense list of floating-point numbers whose length matches vector_size, and no entry is NaN or infinity. @test["king", "and", "queen", "rule"] vs ["prince", "and", "princess", "rule"]) against an unrelated nature sentence (e.g., ["river", "rocks", "flow"]), sentence_similarity reports the related pair at least 0.2 higher than the unrelated pair. @test@generates
from typing import Iterable, List, Sequence, Tuple
class EmbeddingService:
def __init__(self, vector_size: int = 50, window: int = 2, min_count: int = 1, seed: int = 42): ...
def train(self, sentences: Iterable[Sequence[str]], epochs: int = 15) -> None: ...
def most_similar(self, word: str, topn: int = 5) -> List[Tuple[str, float]]: ...
def infer_sentence_vector(self, sentence: Sequence[str]) -> List[float]: ...
def sentence_similarity(self, sentence_a: Sequence[str], sentence_b: Sequence[str]) -> float: ...Provides tools for training and querying word and document embeddings.