CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-gensim

tessl install tessl/pypi-gensim@4.3.0

Python library for topic modelling, document indexing and similarity retrieval with large corpora

Agent Success

Agent success rate when using this tile

78%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.03x

Baseline

Agent success rate without this tile

76%

rubric.jsonevals/scenario-8/

{
  "context": "Evaluates whether the solution relies on gensim's dictionary and corpus utilities to build, filter, persist, and reload the bag-of-words workflow defined in the spec. Emphasizes API-level usage (Dictionary, doc2bow, filter_extremes, MmCorpus) rather than general coding practices.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Dictionary build",
      "description": "Creates the vocabulary with gensim.corpora.Dictionary (or Dictionary.add_documents) so token IDs follow first-appearance ordering from the provided documents rather than custom indexing.",
      "max_score": 20
    },
    {
      "name": "BOW encoding",
      "description": "Uses Dictionary.doc2bow to produce the per-document [(token_id, count)] lists expected by encode_corpus() instead of manual counting.",
      "max_score": 20
    },
    {
      "name": "Frequency filter",
      "description": "Applies Dictionary.filter_extremes with the given min_docs and max_doc_proportion parameters (optionally followed by compactify) to drop low/high-frequency tokens before re-encoding.",
      "max_score": 20
    },
    {
      "name": "Dictionary persistence",
      "description": "Saves the token map in a human-readable form via Dictionary.save_as_text and reloads it with Dictionary.load_from_text (or save/load) before further corpus work.",
      "max_score": 15
    },
    {
      "name": "Matrix corpus I/O",
      "description": "Persists the bag-of-words corpus with gensim.corpora.MmCorpus.save_corpus (or serialize) and loads it through MmCorpus for streaming access instead of custom file formats.",
      "max_score": 15
    },
    {
      "name": "Consistent reload",
      "description": "When reconstructing after load(), ties the MmCorpus back to the saved Dictionary (via id2word) so iterated documents preserve the original token IDs and counts.",
      "max_score": 10
    }
  ]
}

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/gensim@4.3.x
tile.json