tessl/pypi-gensim@4.3.x • Registry • Tessl

CtrlK

Community Documentation Log in Get started

tessl/pypi-gensim

tessl install tessl/pypi-gensim@4.3.0

Python library for topic modelling, document indexing and similarity retrieval with large corpora

Agent Success

Agent success rate when using this tile

78%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.03x

Baseline

Agent success rate without this tile

76%

{
  "context": "Evaluates whether the solution uses gensim's streaming text preprocessing to normalize documents, filter stopwords, support custom filters, and write cleaned corpora without loading everything into memory. Focus is on leveraging built-in tokenization, filtering, and stopword utilities rather than manual reimplementation.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Streaming tokenization",
      "description": "iter_clean_tokens relies on gensim.utils.simple_preprocess (or equivalent gensim tokenizer) with deaccenting and min_len/max_len to yield per-document tokens lazily.",
      "max_score": 30
    },
    {
      "name": "Stopword merge",
      "description": "Default stopwords come from gensim.parsing.preprocessing.STOPWORDS (or remove_stopwords) and are combined with provided stopwords before filtering tokens.",
      "max_score": 20
    },
    {
      "name": "Built-in cleaners",
      "description": "Punctuation/numeric stripping and length filtering use gensim preprocessing filters (e.g., strip_punctuation, strip_numeric, strip_short) or preprocess_string rather than hand-rolled regexes.",
      "max_score": 20
    },
    {
      "name": "Custom filters hook",
      "description": "extra_filters are threaded through gensim's preprocessing pipeline (e.g., preprocess_string with appended callables) so caller-provided filters run in order before tokenization.",
      "max_score": 15
    },
    {
      "name": "Streaming write",
      "description": "write_clean_corpus consumes the generator incrementally and writes joined tokens to disk without materializing the full corpus in memory, matching the requested delimiter and returning the document count.",
      "max_score": 15
    }
  ]
}

tessl/pypi-gensim

rubric.jsonevals/scenario-5/

Version

tessl/pypi-gensim

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-5/

Version

rubric.jsonevals/scenario-5/