tessl install tessl/pypi-gensim@4.3.0Python library for topic modelling, document indexing and similarity retrieval with large corpora
Agent Success
Agent success rate when using this tile
78%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.03x
Baseline
Agent success rate without this tile
76%
{
"context": "Evaluates how well the solution builds and uses a gensim-based topic modeling pipeline: preparing corpora, training topics, inferring distributions, summarizing topics, and persisting artifacts for reuse. Focuses on correct application of Gensim primitives rather than custom reimplementation.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Dictionary prep",
"description": "Raw texts are tokenized (e.g., via gensim.utils.simple_preprocess), optional stopwords are removed, and a corpora.Dictionary is built then applied with doc2bow to create the training corpus without reimplementing these steps.",
"max_score": 20
},
{
"name": "Model training",
"description": "Topic model is trained with gensim.models.LdaModel or LdaMulticore using the bag-of-words corpus and num_topics/passes parameters instead of a custom algorithm.",
"max_score": 30
},
{
"name": "Inference",
"description": "Inference for new text converts tokens with the same Dictionary and uses LdaModel.get_document_topics (or equivalent __getitem__) to produce a normalized topic distribution ordered by probability.",
"max_score": 20
},
{
"name": "Topic terms",
"description": "Top terms per topic are obtained through model helpers such as show_topic/print_topics with the requested topn and preserve descending weight order instead of manual sorting.",
"max_score": 15
},
{
"name": "Persistence",
"description": "Model and dictionary persistence relies on built-in save/load methods (e.g., LdaModel.save/load and Dictionary.save/load) so a reloaded pipeline reproduces prior inference results.",
"max_score": 15
}
]
}