Tessl Tile for npm/natural@8.1.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/npm-natural

Comprehensive natural language processing library with tokenization, stemming, classification, sentiment analysis, phonetics, distance algorithms, and WordNet integration.

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:npm/natural@8.1.x

To install, run

npx @tessl/cli install tessl/npm-natural@8.1.0

0
# Natural
1

2
Natural is a comprehensive natural language processing library for Node.js that provides tokenization, stemming, classification, sentiment analysis, phonetics, string distance metrics, n-grams, TF-IDF calculations, and WordNet integration. It supports multiple languages and offers both functional and object-oriented APIs for text processing tasks.
3

4
## Package Information
5

6
- **Package Name**: natural
7
- **Package Type**: npm
8
- **Language**: JavaScript (with TypeScript definitions)
9
- **Installation**: `npm install natural`
10

11
## Core Imports
12

13
```javascript
14
const natural = require('natural');
15
// All functionality available on natural object
16
const { BayesClassifier, PorterStemmer, WordTokenizer } = natural;
17
```
18

19
For ES modules (when supported by your environment):
20

21
```javascript
22
import natural from 'natural';
23
// Note: Named imports may not work in all environments
24
// Use: import natural from 'natural'; then access natural.BayesClassifier
25
```
26

27
**Note**: Natural.js primarily uses CommonJS exports. ES6 named imports may not work in all environments. For best compatibility, use the default import and access methods via the natural object.
28

29
## Basic Usage
30

31
```javascript
32
const natural = require('natural');
33

34
// Text classification
35
const classifier = new natural.BayesClassifier();
36
classifier.addDocument('I love this movie', 'positive');
37
classifier.addDocument('This movie is terrible', 'negative');
38
classifier.train();
39

40
const sentiment = classifier.classify('This is amazing'); // 'positive'
41

42
// Text processing
43
const tokens = natural.WordTokenizer.tokenize('Hello world, how are you?');
44
// ['Hello', 'world', 'how', 'are', 'you']
45

46
const stemmed = natural.PorterStemmer.stem('running');
47
// 'run'
48

49
// Distance calculation
50
const distance = natural.JaroWinklerDistance('sitting', 'kitten');
51
// 0.746
52

53
// N-grams
54
const bigrams = natural.NGrams.bigrams('Hello world how are you');
55
// [['Hello', 'world'], ['world', 'how'], ['how', 'are'], ['are', 'you']]
56
```
57

58
## Architecture
59

60
Natural.js is organized into specialized modules that work independently or together:
61

62
- **Text Processing**: Tokenization, stemming, and normalization for preparing text data
63
- **Classification**: Machine learning algorithms for text categorization and prediction
64
- **Analysis**: Sentiment analysis, n-grams, and TF-IDF for text analytics
65
- **Similarity**: String distance algorithms for text comparison and fuzzy matching
66
- **Linguistic**: Part-of-speech tagging, phonetics, and WordNet integration
67
- **Utilities**: Supporting data structures and helper functions
68

69
## Capabilities
70

71
### Text Classification
72

73
Machine learning classifiers for categorizing text into predefined classes. Includes Naive Bayes, Logistic Regression, and Maximum Entropy classifiers with training, persistence, and evaluation capabilities.
74

75
```javascript { .api }
76
class BayesClassifier {
77
  constructor(stemmer?: object, smoothing?: number);
78
  addDocument(text: string, classification: string): void;
79
  train(): void;
80
  classify(observation: string): string;
81
  getClassifications(observation: string): Array<{label: string, value: number}>;
82
}
83

84
class LogisticRegressionClassifier {
85
  constructor(stemmer?: object);
86
  addDocument(text: string, classification: string): void;
87
  train(): void;
88
  classify(observation: string): string;
89
}
90
```
91

92
[Text Classification](./classification.md)
93

94
### Text Processing
95

96
Comprehensive text preprocessing tools including tokenization, stemming, and normalization for multiple languages. Essential for preparing raw text for analysis.
97

98
```javascript { .api }
99
// Tokenizers
100
class WordTokenizer {
101
  static tokenize(text: string): string[];
102
}
103

104
class AggressiveTokenizer {
105
  constructor(options?: object);
106
  tokenize(text: string): string[];
107
}
108

109
// Stemmers
110
class PorterStemmer {
111
  static stem(word: string): string;
112
}
113

114
class LancasterStemmer {
115
  static stem(word: string): string;
116
}
117

118
// Normalizers
119
function normalize(tokens: string[]): string[];
120
function removeDiacritics(text: string): string;
121
```
122

123
[Text Processing](./text-processing.md)
124

125
### String Distance Algorithms
126

127
Algorithms for calculating similarity between strings, useful for fuzzy matching, spell checking, and text comparison tasks.
128

129
```javascript { .api }
130
function JaroWinklerDistance(s1: string, s2: string): number;
131
function LevenshteinDistance(s1: string, s2: string): number;
132
function DamerauLevenshteinDistance(s1: string, s2: string): number;
133
function DiceCoefficient(s1: string, s2: string): number;
134
function HammingDistance(s1: string, s2: string): number;
135
```
136

137
[Distance Algorithms](./distance.md)
138

139
### Sentiment Analysis
140

141
Multi-language sentiment analysis using various lexicons and methodologies for determining emotional tone in text.
142

143
```javascript { .api }
144
class SentimentAnalyzer {
145
  constructor(language: string, stemmer?: object, type: string);
146
  getSentiment(words: string[]): number;
147
}
148
```
149

150
[Sentiment Analysis](./sentiment.md)
151

152
### N-grams and TF-IDF
153

154
Statistical text analysis tools for creating n-grams and calculating term frequency-inverse document frequency scores.
155

156
```javascript { .api }
157
// N-grams
158
function ngrams(sequence: string | string[], n: number, startSymbol?: string, endSymbol?: string): string[][];
159
function bigrams(sequence: string | string[]): string[][];
160
function trigrams(sequence: string | string[]): string[][];
161

162
// TF-IDF
163
class TfIdf {
164
  constructor();
165
  addDocument(document: string | string[], key?: string): void;
166
  tfidf(terms: string, documentIndex: number): number;
167
  listTerms(documentIndex: number): Array<{term: string, tfidf: number}>;
168
}
169
```
170

171
[N-grams and TF-IDF](./ngrams-tfidf.md)
172

173
### Part-of-Speech Tagging
174

175
Brill tagger implementation for assigning grammatical parts of speech to words in sentences.
176

177
```javascript { .api }
178
class BrillPOSTagger {
179
  constructor(lexicon: object, ruleSet: object);
180
  tag(sentence: string[]): object;
181
}
182

183
class Lexicon {
184
  constructor();
185
  addTaggedWord(word: string, tag: string): void;
186
}
187
```
188

189
[Part-of-Speech Tagging](./pos-tagging.md)
190

191
### WordNet Integration
192

193
Interface to WordNet lexical database for accessing word definitions, synonyms, and semantic relationships.
194

195
```javascript { .api }
196
class WordNet {
197
  constructor(dataDir?: string);
198
  lookup(word: string, callback: (results: object[]) => void): void;
199
  get(synsetOffset: number, pos: string, callback: (result: object) => void): void;
200
}
201
```
202

203
[WordNet](./wordnet.md)
204

205
### Phonetic Algorithms
206

207
Phonetic encoding algorithms for matching words by sound rather than spelling.
208

209
```javascript { .api }
210
class SoundEx {
211
  static process(word: string): string;
212
}
213

214
class Metaphone {
215
  static process(word: string): string;
216
}
217

218
class DoubleMetaphone {
219
  static process(word: string): string[];
220
}
221
```
222

223
[Phonetics](./phonetics.md)
224

225
### Text Transliteration
226

227
Japanese text transliteration functionality for converting Hiragana and Katakana to romanized text using the modified Hepburn system.
228

229
```javascript { .api }
230
class TransliterateJa {
231
  static transliterate(text: string): string;
232
}
233
```
234

235
[Transliteration](./transliterators.md)
236

237
### Utilities
238

239
Supporting data structures and utility functions including tries, graph algorithms, storage backends, and spell checking functionality.
240

241
```javascript { .api }
242
class Trie {
243
  constructor();
244
  addString(string: string): void;
245
  contains(string: string): boolean;
246
  findPrefix(prefix: string): string[];
247
}
248

249
class ShortestPathTree {
250
  constructor(graph: EdgeWeightedDigraph, source: number);
251
  distTo(vertex: number): number;
252
  hasPathTo(vertex: number): boolean;
253
}
254

255
class Spellcheck {
256
  constructor();
257
  isCorrect(word: string): boolean;
258
  getCorrections(word: string): string[];
259
}
260
```
261

262
[Utilities](./utilities.md)
263

264
## Types
265

266
```javascript { .api }
267
// Classification result
268
interface ClassificationResult {
269
  label: string;
270
  value: number;
271
}
272

273
// N-gram statistics
274
interface NgramStatistics {
275
  ngrams: string[][];
276
  frequencies: {[key: string]: number};
277
  Nr: {[key: string]: number};
278
  numberOfNgrams: number;
279
}
280

281
// TF-IDF term
282
interface TfIdfTerm {
283
  term: string;
284
  tfidf: number;
285
}
286

287
// WordNet result
288
interface WordNetResult {
289
  synsetOffset: number;
290
  pos: string;
291
  gloss: string;
292
  synonyms: string[];
293
}
294
```