CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-gensim

tessl install tessl/pypi-gensim@4.3.0

Python library for topic modelling, document indexing and similarity retrieval with large corpora

Agent Success

Agent success rate when using this tile

78%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.03x

Baseline

Agent success rate without this tile

76%

task.mdevals/scenario-7/

Cross-Lingual Embedding Mapper

Build a module that aligns two pretrained embedding spaces using bilingual seed pairs, translates source-language tokens into a target language, and reports top-k accuracy on held-out pairs. Tests use small in-memory embeddings (under 100 tokens) and do not require external downloads.

Capabilities

Train mapping from seed dictionary

  • Training with at least ten bilingual pairs learns a linear transform; pairs missing from either vocabulary are skipped, but training still succeeds when five or more usable pairs remain and enables translation/accuracy calls, while translating before fitting raises a descriptive error. @test

Translate source tokens

  • Translating ["dog", "unknown"] returns up to topn ordered target candidates per token, preserving input order and yielding an empty list for out-of-vocabulary items. @test

Evaluate held-out dictionary

  • Given held-out bilingual pairs and a k value, computing accuracy returns the share of evaluable sources whose gold target appears within the top k translations; pairs with missing vocabulary entries are ignored, and no evaluable pairs returns 0.0. @test

Implementation

@generates

API

from typing import Iterable, List, Tuple, Dict, Any

class CrossLingualMapper:
    def __init__(self, source_vectors: Any, target_vectors: Any, seed_pairs: Iterable[Tuple[str, str]]): ...
    def fit(self, report_skipped: bool = True) -> None: ...
    def translate(self, source_words: Iterable[str], topn: int = 3) -> Dict[str, List[Tuple[str, float]]]: ...
    def accuracy(self, evaluation_pairs: Iterable[Tuple[str, str]], k: int = 1) -> float: ...

Dependencies { .dependencies }

gensim { .dependency }

Provides cross-lingual mapping utilities for aligning embedding spaces and translating vectors.

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/gensim@4.3.x
tile.json