CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-gensim

Python library for topic modelling, document indexing and similarity retrieval with large corpora

78

1.02x
Overview
Eval results
Files

Evaluation results

100%

Streaming Topic Pipeline

Topic modeling pipelines

Criteria
Without context
With context

Dictionary prep

100%

100%

Model training

100%

100%

Inference

100%

100%

Topic terms

100%

100%

Persistence

100%

100%

100%

Streaming Text Cleaner

Streaming text preprocessing

Criteria
Without context
With context

Streaming tokenization

100%

100%

Stopword merge

100%

100%

Built-in cleaners

100%

100%

Custom filters hook

100%

100%

Streaming write

100%

100%

71%

35%

Cross-Lingual Embedding Mapper

Cross-lingual mapping utilities

Criteria
Without context
With context

TranslationMatrix fit

0%

91%

Seed filtering

80%

100%

Translate topn

20%

32%

Accuracy from translate

66%

40%

KeyedVectors usage

90%

100%

95%

-4%

Embedding Retrieval Toolkit

Word and document embeddings

Criteria
Without context
With context

Word2Vec training

100%

100%

Keyed similarity

100%

100%

Sentence vectors

100%

100%

Sentence cosine

93%

66%

Deterministic seed

100%

100%

90%

Incremental Embedding Lifecycle

Persistence and incremental lifecycle

Criteria
Without context
With context

Save/load

100%

100%

Memory map

100%

100%

Incremental update

100%

100%

Similarity API

100%

100%

Lifecycle logging

33%

33%

60%

-25%

Pretrained Embedding Explorer

Datasets and pre-trained model hub

Criteria
Without context
With context

Hub metadata

100%

50%

Dimension filter

100%

53%

Hub loading

50%

40%

Similar words

100%

90%

Analogy query

66%

66%

Cache control

100%

60%

80%

-2%

Weighted Corpus Toolkit

Vector-space weighting and transforms

Criteria
Without context
With context

Dictionary setup

100%

100%

TF-IDF transform

100%

100%

Log-entropy transform

86%

100%

BM25 ranking

80%

25%

Random projection

33%

80%

Normalization & ordering

80%

80%

Top-term mapping

100%

100%

20%

20%

Bag-of-Words Corpus Manager

Dictionary and corpus management

Criteria
Without context
With context

Dictionary build

0%

0%

BOW encoding

0%

0%

Frequency filter

0%

0%

Dictionary persistence

0%

0%

Matrix corpus I/O

0%

100%

Consistent reload

0%

50%

84%

-6%

Topic Coherence Reporter

Topic coherence evaluation

Criteria
Without context
With context

Token prep

66%

80%

Dictionary & BoW

100%

100%

u_mass scoring

75%

50%

c_v scoring

100%

100%

Result assembly

100%

80%

Topn handling

100%

100%

Failed

Hierarchical graph embeddings

Install with Tessl CLI

npx tessl i tessl/pypi-gensim
Evaluated
Agent
Codex

Table of Contents