CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-gensim

tessl install tessl/pypi-gensim@4.3.0

Python library for topic modelling, document indexing and similarity retrieval with large corpora

Agent Success

Agent success rate when using this tile

78%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.03x

Baseline

Agent success rate without this tile

76%

task.mdevals/scenario-10/

Pretrained Embedding Explorer

Build a small utility that uses the target package's dataset and pre-trained model hub to list available word-embedding models, load one by name, and serve similarity/analogy queries using cached weights.

Capabilities

Lists available embeddings

  • Calling list_embeddings() returns at least five models sorted by name and includes a tuple for glove-wiki-gigaword-50 with vector size 50. @test
  • Calling list_embeddings(min_vector_size=200) excludes glove-wiki-gigaword-50 and still returns glove-wiki-gigaword-300. @test

Serves similar words

  • Using model glove-wiki-gigaword-50 with seeds ["king"] and topn=3 returns three entries ordered by similarity, with queen first. @test

Solves analogies

  • Using model glove-wiki-gigaword-50 with positive=["king", "woman"] and negative=["man"], topn=1 returns queen as the best match with a positive score. @test

Uses cached hub assets

  • Passing the same model_name and a custom cache_dir across consecutive calls reuses the downloaded weights without fetching again. @test

Implementation

@generates

API

from pathlib import Path
from typing import Iterable, List, Tuple, Dict, Optional

def list_embeddings(min_vector_size: Optional[int] = None) -> List[Tuple[str, int]]:
    """Return available embedding names and their vector sizes, sorted by model name, optionally filtering out smaller vectors."""


def most_similar(model_name: str, seeds: Iterable[str], topn: int = 5, cache_dir: Optional[Path | str] = None) -> Dict[str, List[Tuple[str, float]]]:
    """Load the specified pre-trained embeddings from the hub (caching downloads) and return the top matches for each seed."""


def solve_analogy(model_name: str, positive: Iterable[str], negative: Iterable[str], topn: int = 1, cache_dir: Optional[Path | str] = None) -> List[Tuple[str, float]]:
    """Use the same cached embeddings to answer analogy queries; highest-scoring result first."""

Dependencies { .dependencies }

gensim { .dependency }

Provides dataset/model hub downloads and pre-trained word vectors.

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/gensim@4.3.x
tile.json