Name: tessl/pypi-gensim
Rating: 77 (1 reviews)
Author: tessl

Blog Docs Log in Get started

tessl/pypi-gensim

Python library for topic modelling, document indexing and similarity retrieval with large corpora

1.02x

Quality

Pending

Does it follow best practices?

Impact

77%

1.02x

Average score across 10 eval scenarios

Securityby

Pending

The risk profile of this skill

Overview

Eval results

Files

Evaluation results

100%

Streaming Topic Pipeline

Topic modeling pipelines

Criteria

Without context

With context

Dictionary prep

100%

Model training

100%

Inference

100%

Topic terms

100%

Persistence

100%

Streaming Text Cleaner

Streaming text preprocessing

Criteria

Without context

With context

Streaming tokenization

100%

Stopword merge

100%

Built-in cleaners

100%

Custom filters hook

100%

Streaming write

100%

71%

35%

Cross-Lingual Embedding Mapper

Cross-lingual mapping utilities

Criteria

Without context

With context

TranslationMatrix fit

91%

Seed filtering

80%

100%

Translate topn

20%

32%

Accuracy from translate

66%

40%

KeyedVectors usage

90%

100%

95%

-4%

Embedding Retrieval Toolkit

Word and document embeddings

Criteria

Without context

With context

Word2Vec training

100%

Keyed similarity

100%

Sentence vectors

100%

Sentence cosine

93%

66%

Deterministic seed

100%

90%

Incremental Embedding Lifecycle

Persistence and incremental lifecycle

Criteria

Without context

With context

Save/load

100%

Memory map

100%

Incremental update

100%

Similarity API

100%

Lifecycle logging

33%

60%

-25%

Pretrained Embedding Explorer

Datasets and pre-trained model hub

Criteria

Without context

With context

Hub metadata

100%

50%

Dimension filter

100%

53%

Hub loading

50%

40%

Similar words

100%

90%

Analogy query

66%

Cache control

100%

60%

80%

-2%

Weighted Corpus Toolkit

Vector-space weighting and transforms

Criteria

Without context

With context

Dictionary setup

100%

TF-IDF transform

100%

Log-entropy transform

86%

100%

BM25 ranking

80%

25%

Random projection

33%

80%

Normalization & ordering

80%

Top-term mapping

100%

20%

Bag-of-Words Corpus Manager

Dictionary and corpus management

Criteria

Without context

With context

Dictionary build

BOW encoding

Frequency filter

Dictionary persistence

Matrix corpus I/O

100%

Consistent reload

50%

84%

-6%

Topic Coherence Reporter

Topic coherence evaluation

Criteria

Without context

With context

Token prep

66%

80%

Dictionary & BoW

100%

u_mass scoring

75%

50%

c_v scoring

100%

Result assembly

100%

80%

Topn handling

100%

Failed

Hierarchical graph embeddings

Evaluated: 4 months ago
Agent: Codex
Model: Unknown

Table of Contents

Streaming Topic Pipeline Streaming Text Cleaner Cross-Lingual Embedding Mapper Embedding Retrieval Toolkit Incremental Embedding Lifecycle Pretrained Embedding Explorer Weighted Corpus Toolkit Bag-of-Words Corpus Manager Topic Coherence Reporter