or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/npm-natural

Comprehensive natural language processing library with tokenization, stemming, classification, sentiment analysis, phonetics, distance algorithms, and WordNet integration.

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/natural@8.1.x

To install, run

npx @tessl/cli install tessl/npm-natural@8.1.0

0

# Natural

1

2

Natural is a comprehensive natural language processing library for Node.js that provides tokenization, stemming, classification, sentiment analysis, phonetics, string distance metrics, n-grams, TF-IDF calculations, and WordNet integration. It supports multiple languages and offers both functional and object-oriented APIs for text processing tasks.

3

4

## Package Information

5

6

- **Package Name**: natural

7

- **Package Type**: npm

8

- **Language**: JavaScript (with TypeScript definitions)

9

- **Installation**: `npm install natural`

10

11

## Core Imports

12

13

```javascript

14

const natural = require('natural');

15

// All functionality available on natural object

16

const { BayesClassifier, PorterStemmer, WordTokenizer } = natural;

17

```

18

19

For ES modules (when supported by your environment):

20

21

```javascript

22

import natural from 'natural';

23

// Note: Named imports may not work in all environments

24

// Use: import natural from 'natural'; then access natural.BayesClassifier

25

```

26

27

**Note**: Natural.js primarily uses CommonJS exports. ES6 named imports may not work in all environments. For best compatibility, use the default import and access methods via the natural object.

28

29

## Basic Usage

30

31

```javascript

32

const natural = require('natural');

33

34

// Text classification

35

const classifier = new natural.BayesClassifier();

36

classifier.addDocument('I love this movie', 'positive');

37

classifier.addDocument('This movie is terrible', 'negative');

38

classifier.train();

39

40

const sentiment = classifier.classify('This is amazing'); // 'positive'

41

42

// Text processing

43

const tokens = natural.WordTokenizer.tokenize('Hello world, how are you?');

44

// ['Hello', 'world', 'how', 'are', 'you']

45

46

const stemmed = natural.PorterStemmer.stem('running');

47

// 'run'

48

49

// Distance calculation

50

const distance = natural.JaroWinklerDistance('sitting', 'kitten');

51

// 0.746

52

53

// N-grams

54

const bigrams = natural.NGrams.bigrams('Hello world how are you');

55

// [['Hello', 'world'], ['world', 'how'], ['how', 'are'], ['are', 'you']]

56

```

57

58

## Architecture

59

60

Natural.js is organized into specialized modules that work independently or together:

61

62

- **Text Processing**: Tokenization, stemming, and normalization for preparing text data

63

- **Classification**: Machine learning algorithms for text categorization and prediction

64

- **Analysis**: Sentiment analysis, n-grams, and TF-IDF for text analytics

65

- **Similarity**: String distance algorithms for text comparison and fuzzy matching

66

- **Linguistic**: Part-of-speech tagging, phonetics, and WordNet integration

67

- **Utilities**: Supporting data structures and helper functions

68

69

## Capabilities

70

71

### Text Classification

72

73

Machine learning classifiers for categorizing text into predefined classes. Includes Naive Bayes, Logistic Regression, and Maximum Entropy classifiers with training, persistence, and evaluation capabilities.

74

75

```javascript { .api }

76

class BayesClassifier {

77

constructor(stemmer?: object, smoothing?: number);

78

addDocument(text: string, classification: string): void;

79

train(): void;

80

classify(observation: string): string;

81

getClassifications(observation: string): Array<{label: string, value: number}>;

82

}

83

84

class LogisticRegressionClassifier {

85

constructor(stemmer?: object);

86

addDocument(text: string, classification: string): void;

87

train(): void;

88

classify(observation: string): string;

89

}

90

```

91

92

[Text Classification](./classification.md)

93

94

### Text Processing

95

96

Comprehensive text preprocessing tools including tokenization, stemming, and normalization for multiple languages. Essential for preparing raw text for analysis.

97

98

```javascript { .api }

99

// Tokenizers

100

class WordTokenizer {

101

static tokenize(text: string): string[];

102

}

103

104

class AggressiveTokenizer {

105

constructor(options?: object);

106

tokenize(text: string): string[];

107

}

108

109

// Stemmers

110

class PorterStemmer {

111

static stem(word: string): string;

112

}

113

114

class LancasterStemmer {

115

static stem(word: string): string;

116

}

117

118

// Normalizers

119

function normalize(tokens: string[]): string[];

120

function removeDiacritics(text: string): string;

121

```

122

123

[Text Processing](./text-processing.md)

124

125

### String Distance Algorithms

126

127

Algorithms for calculating similarity between strings, useful for fuzzy matching, spell checking, and text comparison tasks.

128

129

```javascript { .api }

130

function JaroWinklerDistance(s1: string, s2: string): number;

131

function LevenshteinDistance(s1: string, s2: string): number;

132

function DamerauLevenshteinDistance(s1: string, s2: string): number;

133

function DiceCoefficient(s1: string, s2: string): number;

134

function HammingDistance(s1: string, s2: string): number;

135

```

136

137

[Distance Algorithms](./distance.md)

138

139

### Sentiment Analysis

140

141

Multi-language sentiment analysis using various lexicons and methodologies for determining emotional tone in text.

142

143

```javascript { .api }

144

class SentimentAnalyzer {

145

constructor(language: string, stemmer?: object, type: string);

146

getSentiment(words: string[]): number;

147

}

148

```

149

150

[Sentiment Analysis](./sentiment.md)

151

152

### N-grams and TF-IDF

153

154

Statistical text analysis tools for creating n-grams and calculating term frequency-inverse document frequency scores.

155

156

```javascript { .api }

157

// N-grams

158

function ngrams(sequence: string | string[], n: number, startSymbol?: string, endSymbol?: string): string[][];

159

function bigrams(sequence: string | string[]): string[][];

160

function trigrams(sequence: string | string[]): string[][];

161

162

// TF-IDF

163

class TfIdf {

164

constructor();

165

addDocument(document: string | string[], key?: string): void;

166

tfidf(terms: string, documentIndex: number): number;

167

listTerms(documentIndex: number): Array<{term: string, tfidf: number}>;

168

}

169

```

170

171

[N-grams and TF-IDF](./ngrams-tfidf.md)

172

173

### Part-of-Speech Tagging

174

175

Brill tagger implementation for assigning grammatical parts of speech to words in sentences.

176

177

```javascript { .api }

178

class BrillPOSTagger {

179

constructor(lexicon: object, ruleSet: object);

180

tag(sentence: string[]): object;

181

}

182

183

class Lexicon {

184

constructor();

185

addTaggedWord(word: string, tag: string): void;

186

}

187

```

188

189

[Part-of-Speech Tagging](./pos-tagging.md)

190

191

### WordNet Integration

192

193

Interface to WordNet lexical database for accessing word definitions, synonyms, and semantic relationships.

194

195

```javascript { .api }

196

class WordNet {

197

constructor(dataDir?: string);

198

lookup(word: string, callback: (results: object[]) => void): void;

199

get(synsetOffset: number, pos: string, callback: (result: object) => void): void;

200

}

201

```

202

203

[WordNet](./wordnet.md)

204

205

### Phonetic Algorithms

206

207

Phonetic encoding algorithms for matching words by sound rather than spelling.

208

209

```javascript { .api }

210

class SoundEx {

211

static process(word: string): string;

212

}

213

214

class Metaphone {

215

static process(word: string): string;

216

}

217

218

class DoubleMetaphone {

219

static process(word: string): string[];

220

}

221

```

222

223

[Phonetics](./phonetics.md)

224

225

### Text Transliteration

226

227

Japanese text transliteration functionality for converting Hiragana and Katakana to romanized text using the modified Hepburn system.

228

229

```javascript { .api }

230

class TransliterateJa {

231

static transliterate(text: string): string;

232

}

233

```

234

235

[Transliteration](./transliterators.md)

236

237

### Utilities

238

239

Supporting data structures and utility functions including tries, graph algorithms, storage backends, and spell checking functionality.

240

241

```javascript { .api }

242

class Trie {

243

constructor();

244

addString(string: string): void;

245

contains(string: string): boolean;

246

findPrefix(prefix: string): string[];

247

}

248

249

class ShortestPathTree {

250

constructor(graph: EdgeWeightedDigraph, source: number);

251

distTo(vertex: number): number;

252

hasPathTo(vertex: number): boolean;

253

}

254

255

class Spellcheck {

256

constructor();

257

isCorrect(word: string): boolean;

258

getCorrections(word: string): string[];

259

}

260

```

261

262

[Utilities](./utilities.md)

263

264

## Types

265

266

```javascript { .api }

267

// Classification result

268

interface ClassificationResult {

269

label: string;

270

value: number;

271

}

272

273

// N-gram statistics

274

interface NgramStatistics {

275

ngrams: string[][];

276

frequencies: {[key: string]: number};

277

Nr: {[key: string]: number};

278

numberOfNgrams: number;

279

}

280

281

// TF-IDF term

282

interface TfIdfTerm {

283

term: string;

284

tfidf: number;

285

}

286

287

// WordNet result

288

interface WordNetResult {

289

synsetOffset: number;

290

pos: string;

291

gloss: string;

292

synonyms: string[];

293

}

294

```