CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-gensim

tessl install tessl/pypi-gensim@4.3.0

Python library for topic modelling, document indexing and similarity retrieval with large corpora

Agent Success

Agent success rate when using this tile

78%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.03x

Baseline

Agent success rate without this tile

76%

task.mdevals/scenario-1/

Streaming Topic Pipeline

Build a lightweight pipeline that turns raw text into a trained topic model, exposes human-readable topics, and supports inference and persistence.

Capabilities

Train topics from raw text

  • Given a handful of short documents and a desired topic count, training produces that many topics and at least one token per topic summary. @test

Infer topic mixture for unseen document

  • Inferring on a new document returns a non-empty list of topic-probability pairs whose weights sum to 1 within a small tolerance. @test

List top terms for a topic

  • Requesting top terms for a valid topic id returns tokens ordered by descending weight and honors the requested limit. @test

Save and reload pipeline

  • After saving and reloading the trained pipeline, inference on the same document yields the same highest-probability topic as before persistence. @test

Implementation

@generates

API

from typing import Iterable, List, Tuple

class TopicPipeline:
    def __init__(self, stopwords: Iterable[str] | None = None):
        ...

    def train(self, texts: Iterable[str], num_topics: int, passes: int = 5) -> None:
        """Builds vocabulary and fits the topic model."""

    def infer(self, text: str) -> List[Tuple[int, float]]:
        """Returns topic-probability pairs ordered by highest probability."""

    def top_terms(self, topic_id: int, topn: int = 10) -> List[Tuple[str, float]]:
        """Returns the most probable terms for a topic."""

    def save(self, path: str) -> None:
        """Persists model artifacts to a directory."""

    @classmethod
    def load(cls, path: str) -> "TopicPipeline":
        """Restores a previously saved pipeline."""

Dependencies { .dependencies }

gensim { .dependency }

Topic modeling and corpus streaming utilities. @satisfied-by

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/gensim@4.3.x
tile.json