0
# Natural
1
2
Natural is a comprehensive natural language processing library for Node.js that provides tokenization, stemming, classification, sentiment analysis, phonetics, string distance metrics, n-grams, TF-IDF calculations, and WordNet integration. It supports multiple languages and offers both functional and object-oriented APIs for text processing tasks.
3
4
## Package Information
5
6
- **Package Name**: natural
7
- **Package Type**: npm
8
- **Language**: JavaScript (with TypeScript definitions)
9
- **Installation**: `npm install natural`
10
11
## Core Imports
12
13
```javascript
14
const natural = require('natural');
15
// All functionality available on natural object
16
const { BayesClassifier, PorterStemmer, WordTokenizer } = natural;
17
```
18
19
For ES modules (when supported by your environment):
20
21
```javascript
22
import natural from 'natural';
23
// Note: Named imports may not work in all environments
24
// Use: import natural from 'natural'; then access natural.BayesClassifier
25
```
26
27
**Note**: Natural.js primarily uses CommonJS exports. ES6 named imports may not work in all environments. For best compatibility, use the default import and access methods via the natural object.
28
29
## Basic Usage
30
31
```javascript
32
const natural = require('natural');
33
34
// Text classification
35
const classifier = new natural.BayesClassifier();
36
classifier.addDocument('I love this movie', 'positive');
37
classifier.addDocument('This movie is terrible', 'negative');
38
classifier.train();
39
40
const sentiment = classifier.classify('This is amazing'); // 'positive'
41
42
// Text processing
43
const tokens = natural.WordTokenizer.tokenize('Hello world, how are you?');
44
// ['Hello', 'world', 'how', 'are', 'you']
45
46
const stemmed = natural.PorterStemmer.stem('running');
47
// 'run'
48
49
// Distance calculation
50
const distance = natural.JaroWinklerDistance('sitting', 'kitten');
51
// 0.746
52
53
// N-grams
54
const bigrams = natural.NGrams.bigrams('Hello world how are you');
55
// [['Hello', 'world'], ['world', 'how'], ['how', 'are'], ['are', 'you']]
56
```
57
58
## Architecture
59
60
Natural.js is organized into specialized modules that work independently or together:
61
62
- **Text Processing**: Tokenization, stemming, and normalization for preparing text data
63
- **Classification**: Machine learning algorithms for text categorization and prediction
64
- **Analysis**: Sentiment analysis, n-grams, and TF-IDF for text analytics
65
- **Similarity**: String distance algorithms for text comparison and fuzzy matching
66
- **Linguistic**: Part-of-speech tagging, phonetics, and WordNet integration
67
- **Utilities**: Supporting data structures and helper functions
68
69
## Capabilities
70
71
### Text Classification
72
73
Machine learning classifiers for categorizing text into predefined classes. Includes Naive Bayes, Logistic Regression, and Maximum Entropy classifiers with training, persistence, and evaluation capabilities.
74
75
```javascript { .api }
76
class BayesClassifier {
77
constructor(stemmer?: object, smoothing?: number);
78
addDocument(text: string, classification: string): void;
79
train(): void;
80
classify(observation: string): string;
81
getClassifications(observation: string): Array<{label: string, value: number}>;
82
}
83
84
class LogisticRegressionClassifier {
85
constructor(stemmer?: object);
86
addDocument(text: string, classification: string): void;
87
train(): void;
88
classify(observation: string): string;
89
}
90
```
91
92
[Text Classification](./classification.md)
93
94
### Text Processing
95
96
Comprehensive text preprocessing tools including tokenization, stemming, and normalization for multiple languages. Essential for preparing raw text for analysis.
97
98
```javascript { .api }
99
// Tokenizers
100
class WordTokenizer {
101
static tokenize(text: string): string[];
102
}
103
104
class AggressiveTokenizer {
105
constructor(options?: object);
106
tokenize(text: string): string[];
107
}
108
109
// Stemmers
110
class PorterStemmer {
111
static stem(word: string): string;
112
}
113
114
class LancasterStemmer {
115
static stem(word: string): string;
116
}
117
118
// Normalizers
119
function normalize(tokens: string[]): string[];
120
function removeDiacritics(text: string): string;
121
```
122
123
[Text Processing](./text-processing.md)
124
125
### String Distance Algorithms
126
127
Algorithms for calculating similarity between strings, useful for fuzzy matching, spell checking, and text comparison tasks.
128
129
```javascript { .api }
130
function JaroWinklerDistance(s1: string, s2: string): number;
131
function LevenshteinDistance(s1: string, s2: string): number;
132
function DamerauLevenshteinDistance(s1: string, s2: string): number;
133
function DiceCoefficient(s1: string, s2: string): number;
134
function HammingDistance(s1: string, s2: string): number;
135
```
136
137
[Distance Algorithms](./distance.md)
138
139
### Sentiment Analysis
140
141
Multi-language sentiment analysis using various lexicons and methodologies for determining emotional tone in text.
142
143
```javascript { .api }
144
class SentimentAnalyzer {
145
constructor(language: string, stemmer?: object, type: string);
146
getSentiment(words: string[]): number;
147
}
148
```
149
150
[Sentiment Analysis](./sentiment.md)
151
152
### N-grams and TF-IDF
153
154
Statistical text analysis tools for creating n-grams and calculating term frequency-inverse document frequency scores.
155
156
```javascript { .api }
157
// N-grams
158
function ngrams(sequence: string | string[], n: number, startSymbol?: string, endSymbol?: string): string[][];
159
function bigrams(sequence: string | string[]): string[][];
160
function trigrams(sequence: string | string[]): string[][];
161
162
// TF-IDF
163
class TfIdf {
164
constructor();
165
addDocument(document: string | string[], key?: string): void;
166
tfidf(terms: string, documentIndex: number): number;
167
listTerms(documentIndex: number): Array<{term: string, tfidf: number}>;
168
}
169
```
170
171
[N-grams and TF-IDF](./ngrams-tfidf.md)
172
173
### Part-of-Speech Tagging
174
175
Brill tagger implementation for assigning grammatical parts of speech to words in sentences.
176
177
```javascript { .api }
178
class BrillPOSTagger {
179
constructor(lexicon: object, ruleSet: object);
180
tag(sentence: string[]): object;
181
}
182
183
class Lexicon {
184
constructor();
185
addTaggedWord(word: string, tag: string): void;
186
}
187
```
188
189
[Part-of-Speech Tagging](./pos-tagging.md)
190
191
### WordNet Integration
192
193
Interface to WordNet lexical database for accessing word definitions, synonyms, and semantic relationships.
194
195
```javascript { .api }
196
class WordNet {
197
constructor(dataDir?: string);
198
lookup(word: string, callback: (results: object[]) => void): void;
199
get(synsetOffset: number, pos: string, callback: (result: object) => void): void;
200
}
201
```
202
203
[WordNet](./wordnet.md)
204
205
### Phonetic Algorithms
206
207
Phonetic encoding algorithms for matching words by sound rather than spelling.
208
209
```javascript { .api }
210
class SoundEx {
211
static process(word: string): string;
212
}
213
214
class Metaphone {
215
static process(word: string): string;
216
}
217
218
class DoubleMetaphone {
219
static process(word: string): string[];
220
}
221
```
222
223
[Phonetics](./phonetics.md)
224
225
### Text Transliteration
226
227
Japanese text transliteration functionality for converting Hiragana and Katakana to romanized text using the modified Hepburn system.
228
229
```javascript { .api }
230
class TransliterateJa {
231
static transliterate(text: string): string;
232
}
233
```
234
235
[Transliteration](./transliterators.md)
236
237
### Utilities
238
239
Supporting data structures and utility functions including tries, graph algorithms, storage backends, and spell checking functionality.
240
241
```javascript { .api }
242
class Trie {
243
constructor();
244
addString(string: string): void;
245
contains(string: string): boolean;
246
findPrefix(prefix: string): string[];
247
}
248
249
class ShortestPathTree {
250
constructor(graph: EdgeWeightedDigraph, source: number);
251
distTo(vertex: number): number;
252
hasPathTo(vertex: number): boolean;
253
}
254
255
class Spellcheck {
256
constructor();
257
isCorrect(word: string): boolean;
258
getCorrections(word: string): string[];
259
}
260
```
261
262
[Utilities](./utilities.md)
263
264
## Types
265
266
```javascript { .api }
267
// Classification result
268
interface ClassificationResult {
269
label: string;
270
value: number;
271
}
272
273
// N-gram statistics
274
interface NgramStatistics {
275
ngrams: string[][];
276
frequencies: {[key: string]: number};
277
Nr: {[key: string]: number};
278
numberOfNgrams: number;
279
}
280
281
// TF-IDF term
282
interface TfIdfTerm {
283
term: string;
284
tfidf: number;
285
}
286
287
// WordNet result
288
interface WordNetResult {
289
synsetOffset: number;
290
pos: string;
291
gloss: string;
292
synonyms: string[];
293
}
294
```