Python library for topic modelling, document indexing and similarity retrieval with large corpora
78
Topic modeling pipelines
Dictionary prep
100%
100%
Model training
100%
100%
Inference
100%
100%
Topic terms
100%
100%
Persistence
100%
100%
Streaming text preprocessing
Streaming tokenization
100%
100%
Stopword merge
100%
100%
Built-in cleaners
100%
100%
Custom filters hook
100%
100%
Streaming write
100%
100%
Cross-lingual mapping utilities
TranslationMatrix fit
0%
91%
Seed filtering
80%
100%
Translate topn
20%
32%
Accuracy from translate
66%
40%
KeyedVectors usage
90%
100%
Word and document embeddings
Word2Vec training
100%
100%
Keyed similarity
100%
100%
Sentence vectors
100%
100%
Sentence cosine
93%
66%
Deterministic seed
100%
100%
Persistence and incremental lifecycle
Save/load
100%
100%
Memory map
100%
100%
Incremental update
100%
100%
Similarity API
100%
100%
Lifecycle logging
33%
33%
Datasets and pre-trained model hub
Hub metadata
100%
50%
Dimension filter
100%
53%
Hub loading
50%
40%
Similar words
100%
90%
Analogy query
66%
66%
Cache control
100%
60%
Vector-space weighting and transforms
Dictionary setup
100%
100%
TF-IDF transform
100%
100%
Log-entropy transform
86%
100%
BM25 ranking
80%
25%
Random projection
33%
80%
Normalization & ordering
80%
80%
Top-term mapping
100%
100%
Dictionary and corpus management
Dictionary build
0%
0%
BOW encoding
0%
0%
Frequency filter
0%
0%
Dictionary persistence
0%
0%
Matrix corpus I/O
0%
100%
Consistent reload
0%
50%
Topic coherence evaluation
Token prep
66%
80%
Dictionary & BoW
100%
100%
u_mass scoring
75%
50%
c_v scoring
100%
100%
Result assembly
100%
80%
Topn handling
100%
100%
Install with Tessl CLI
npx tessl i tessl/pypi-gensim