tessl install tessl/pypi-gensim@4.3.0Python library for topic modelling, document indexing and similarity retrieval with large corpora
Agent Success
Agent success rate when using this tile
78%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.03x
Baseline
Agent success rate without this tile
76%
{
"context": "Evaluates how the solution uses gensim's persistence and incremental lifecycle features to train a small embedding model, checkpoint it, reload it efficiently, and update it with new data. Focuses specifically on SaveLoad usage, mmap reloads, incremental updates, similarity queries, and lifecycle logging.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Save/load",
"description": "Initial training persists a gensim model via the built-in SaveLoad API (e.g., Word2Vec.save or KeyedVectors.save) and restores it with Word2Vec.load/KeyedVectors.load instead of manual serialization.",
"max_score": 25
},
{
"name": "Memory map",
"description": "Checkpoint reload for inference uses gensim's mmap support (Word2Vec.load(..., mmap='r') or KeyedVectors.load(..., mmap='r')) to enable read-only querying without rewriting the file.",
"max_score": 20
},
{
"name": "Incremental update",
"description": "New sentences are incorporated with gensim's incremental tools—build_vocab(update=True)/train(update=True) on a Word2Vec model or KeyedVectors.add_vectors/add_documents—rather than rebuilding from scratch.",
"max_score": 25
},
{
"name": "Similarity API",
"description": "Similarity queries use gensim's vector interfaces (model.wv.similarity, model.wv.most_similar, or KeyedVectors.similarity) instead of custom cosine code.",
"max_score": 15
},
{
"name": "Lifecycle logging",
"description": "Lifecycle events are captured with gensim facilities such as model.add_lifecycle_event, callbacks (e.g., CallbackAny2Vec/EpochLogger), or the model.log entries so train/load/update steps are traceable.",
"max_score": 15
}
]
}