tessl install tessl/pypi-gensim@4.3.0Python library for topic modelling, document indexing and similarity retrieval with large corpora
Agent Success
Agent success rate when using this tile
78%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.03x
Baseline
Agent success rate without this tile
76%
{
"context": "Evaluates how the solution trains and queries word/document embeddings with gensim to meet the embedding toolkit spec. Emphasizes correct hyperparameters, stable similarity orderings, and sentence-level vector usage.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Word2Vec training",
"description": "Uses gensim.models.Word2Vec (or equivalent KeyedVectors training) with the provided vector_size, window, min_count, seed, and epochs to build vocabulary and train on the tokenized corpus before any queries.",
"max_score": 35
},
{
"name": "Keyed similarity",
"description": "Retrieves similar words through model.wv.most_similar (KeyedVectors.most_similar) using the trained embeddings so that expected neighbors like 'queen' outrank distractors and similarities are positive.",
"max_score": 25
},
{
"name": "Sentence vectors",
"description": "Generates sentence-level embeddings via gensim (e.g., KeyedVectors.get_mean_vector or Doc2Vec.infer_vector) that match the configured vector_size and avoid NaN/inf values.",
"max_score": 20
},
{
"name": "Sentence cosine",
"description": "Computes relatedness between sentences using cosine similarity on the inferred vectors (KeyedVectors.similarity or gensim.matutils.cossim) and enforces the specified margin between related and unrelated pairs.",
"max_score": 15
},
{
"name": "Deterministic seed",
"description": "Sets gensim's random seed (and workers where applicable) so repeated training on the same corpus yields stable similarity ordering and test reproducibility.",
"max_score": 5
}
]
}