0
# Part-of-Speech Tagging
1
2
Brill tagger implementation for assigning grammatical parts of speech to words in sentences using transformation-based learning. The system combines lexical lookup with contextual transformation rules.
3
4
## Capabilities
5
6
### Brill POS Tagger
7
8
Main part-of-speech tagger using Eric Brill's transformation-based approach.
9
10
```javascript { .api }
11
/**
12
* Brill transformation-based POS tagger
13
* @param lexicon - Lexicon instance for initial word tagging
14
* @param ruleSet - RuleSet instance containing transformation rules
15
*/
16
class BrillPOSTagger {
17
constructor(lexicon: Lexicon, ruleSet: RuleSet);
18
19
/**
20
* Tag a sentence with part-of-speech tags
21
* @param sentence - Array of word strings
22
* @returns Sentence object with tagged words
23
*/
24
tag(sentence: string[]): Sentence;
25
26
/**
27
* Apply initial lexicon-based tagging
28
* @param sentence - Array of words
29
* @returns Initially tagged sentence
30
*/
31
tagWithLexicon(sentence: string[]): Sentence;
32
33
/**
34
* Apply transformation rules to improve tagging
35
* @param taggedSentence - Sentence with initial tags
36
* @returns Sentence with improved tags
37
*/
38
applyRules(taggedSentence: Sentence): Sentence;
39
}
40
```
41
42
### Lexicon
43
44
Dictionary mapping words to their most likely part-of-speech tags.
45
46
```javascript { .api }
47
/**
48
* POS tagging lexicon
49
*/
50
class Lexicon {
51
constructor();
52
53
/**
54
* Add a word-tag pair to the lexicon
55
* @param word - Word to add
56
* @param tag - POS tag for the word
57
*/
58
addTaggedWord(word: string, tag: string): void;
59
60
/**
61
* Get the most likely tag for a word
62
* @param word - Word to look up
63
* @returns Most likely POS tag
64
*/
65
tagWord(word: string): string;
66
67
/**
68
* Check if word exists in lexicon
69
* @param word - Word to check
70
* @returns true if word is in lexicon
71
*/
72
hasWord(word: string): boolean;
73
}
74
```
75
76
### Rule Set
77
78
Collection of transformation rules for improving POS tag accuracy.
79
80
```javascript { .api }
81
/**
82
* Set of transformation rules for POS tagging
83
*/
84
class RuleSet {
85
constructor();
86
87
/**
88
* Add a transformation rule
89
* @param rule - RuleTemplate instance
90
*/
91
addRule(rule: RuleTemplate): void;
92
93
/**
94
* Apply all rules to a sentence
95
* @param sentence - Tagged sentence to transform
96
* @returns Sentence with applied transformations
97
*/
98
applyRules(sentence: Sentence): Sentence;
99
}
100
101
/**
102
* Individual transformation rule template
103
*/
104
class RuleTemplate {
105
constructor();
106
107
/**
108
* Apply this rule to a sentence
109
* @param sentence - Sentence to apply rule to
110
* @returns Modified sentence
111
*/
112
apply(sentence: Sentence): Sentence;
113
}
114
115
/**
116
* Pre-defined rule templates
117
*/
118
declare const ruleTemplates: {
119
[templateName: string]: RuleTemplate;
120
};
121
```
122
123
### Sentence and Corpus
124
125
Data structures for representing tagged sentences and training corpora.
126
127
```javascript { .api }
128
/**
129
* Represents a sentence with POS tags
130
*/
131
class Sentence {
132
constructor();
133
134
/**
135
* Add a tagged word to the sentence
136
* @param word - Word text
137
* @param tag - POS tag
138
*/
139
addTaggedWord(word: string, tag: string): void;
140
141
/**
142
* Get all tagged words
143
* @returns Array of tagged word objects
144
*/
145
getTaggedWords(): TaggedWord[];
146
147
/**
148
* Get word at specific position
149
* @param index - Position in sentence
150
* @returns Tagged word at position
151
*/
152
getWordAt(index: number): TaggedWord;
153
154
/**
155
* Get sentence length
156
* @returns Number of words in sentence
157
*/
158
length(): number;
159
}
160
161
/**
162
* Tagged word representation
163
*/
164
interface TaggedWord {
165
word: string;
166
tag: string;
167
}
168
169
/**
170
* Training corpus for POS tagger
171
*/
172
class Corpus {
173
constructor();
174
175
/**
176
* Add a sentence to the corpus
177
* @param sentence - Sentence instance
178
*/
179
addSentence(sentence: Sentence): void;
180
181
/**
182
* Get all sentences in corpus
183
* @returns Array of sentences
184
*/
185
getSentences(): Sentence[];
186
}
187
```
188
189
**Usage Examples:**
190
191
```javascript
192
const natural = require('natural');
193
194
// Create lexicon and add word-tag pairs
195
const lexicon = new natural.Lexicon();
196
lexicon.addTaggedWord('the', 'DT');
197
lexicon.addTaggedWord('cat', 'NN');
198
lexicon.addTaggedWord('dog', 'NN');
199
lexicon.addTaggedWord('runs', 'VBZ');
200
lexicon.addTaggedWord('quickly', 'RB');
201
202
// Create rule set with transformation rules
203
const ruleSet = new natural.RuleSet();
204
// Add rules to improve tagging accuracy
205
// (In practice, you would load pre-trained rules)
206
207
// Create POS tagger
208
const tagger = new natural.BrillPOSTagger(lexicon, ruleSet);
209
210
// Tag a sentence
211
const sentence = ['the', 'cat', 'runs', 'quickly'];
212
const taggedSentence = tagger.tag(sentence);
213
214
// Display results
215
console.log('Tagged sentence:');
216
taggedSentence.getTaggedWords().forEach(word => {
217
console.log(`${word.word}/${word.tag}`);
218
});
219
```
220
221
### Training Components
222
223
**Trainer for creating custom models:**
224
225
```javascript { .api }
226
/**
227
* Trainer for Brill POS tagger
228
*/
229
class BrillPOSTrainer {
230
constructor();
231
232
/**
233
* Train a tagger on a corpus
234
* @param corpus - Training corpus
235
* @returns Trained tagger components
236
*/
237
train(corpus: Corpus): {lexicon: Lexicon, ruleSet: RuleSet};
238
}
239
240
/**
241
* Tester for evaluating tagger performance
242
*/
243
class BrillPOSTester {
244
constructor();
245
246
/**
247
* Test tagger accuracy on test corpus
248
* @param tagger - Trained tagger
249
* @param testCorpus - Test corpus
250
* @returns Accuracy metrics
251
*/
252
test(tagger: BrillPOSTagger, testCorpus: Corpus): TestResults;
253
}
254
255
interface TestResults {
256
accuracy: number;
257
precision: {[tag: string]: number};
258
recall: {[tag: string]: number};
259
}
260
```
261
262
### Advanced Usage
263
264
**Complete training and testing pipeline:**
265
266
```javascript
267
const natural = require('natural');
268
269
/**
270
* Train a custom POS tagger
271
*/
272
function trainCustomTagger(trainingData) {
273
// Create training corpus
274
const corpus = new natural.Corpus();
275
276
// Add training sentences
277
trainingData.forEach(sentenceData => {
278
const sentence = new natural.Sentence();
279
sentenceData.forEach(({word, tag}) => {
280
sentence.addTaggedWord(word, tag);
281
});
282
corpus.addSentence(sentence);
283
});
284
285
// Train the model
286
const trainer = new natural.BrillPOSTrainer();
287
const {lexicon, ruleSet} = trainer.train(corpus);
288
289
// Create tagger
290
const tagger = new natural.BrillPOSTagger(lexicon, ruleSet);
291
292
return tagger;
293
}
294
295
// Example training data
296
const trainingData = [
297
[
298
{word: 'the', tag: 'DT'},
299
{word: 'quick', tag: 'JJ'},
300
{word: 'brown', tag: 'JJ'},
301
{word: 'fox', tag: 'NN'},
302
{word: 'jumps', tag: 'VBZ'}
303
],
304
[
305
{word: 'a', tag: 'DT'},
306
{word: 'lazy', tag: 'JJ'},
307
{word: 'dog', tag: 'NN'},
308
{word: 'sleeps', tag: 'VBZ'}
309
]
310
// ... more training sentences
311
];
312
313
// Train custom tagger
314
const customTagger = trainCustomTagger(trainingData);
315
316
// Use trained tagger
317
const testSentence = ['the', 'big', 'cat', 'runs'];
318
const result = customTagger.tag(testSentence);
319
console.log('Custom tagger results:', result.getTaggedWords());
320
```
321
322
### Working with Pre-trained Models
323
324
```javascript
325
const natural = require('natural');
326
327
/**
328
* Load and use pre-trained POS tagger
329
*/
330
async function usePresentTrainedTagger() {
331
// In practice, you would load pre-trained lexicon and rules
332
// This example shows the structure for loading saved models
333
334
try {
335
// Load pre-trained lexicon (would be from file/database)
336
const lexicon = new natural.Lexicon();
337
338
// Load common English words with tags
339
const commonWords = {
340
'the': 'DT', 'a': 'DT', 'an': 'DT',
341
'cat': 'NN', 'dog': 'NN', 'house': 'NN',
342
'run': 'VB', 'runs': 'VBZ', 'running': 'VBG',
343
'quick': 'JJ', 'slow': 'JJ', 'big': 'JJ',
344
'quickly': 'RB', 'slowly': 'RB'
345
};
346
347
Object.entries(commonWords).forEach(([word, tag]) => {
348
lexicon.addTaggedWord(word, tag);
349
});
350
351
// Load rule set (would be from trained model)
352
const ruleSet = new natural.RuleSet();
353
354
// Create tagger
355
const tagger = new natural.BrillPOSTagger(lexicon, ruleSet);
356
357
return tagger;
358
359
} catch (error) {
360
console.error('Error loading pre-trained model:', error);
361
throw error;
362
}
363
}
364
365
// Usage
366
usePresentTrainedTagger().then(tagger => {
367
const sentences = [
368
['the', 'quick', 'brown', 'fox', 'runs'],
369
['a', 'big', 'dog', 'sleeps'],
370
['the', 'house', 'is', 'big']
371
];
372
373
sentences.forEach(sentence => {
374
const tagged = tagger.tag(sentence);
375
console.log('Sentence:', sentence.join(' '));
376
console.log('Tagged:', tagged.getTaggedWords().map(w => `${w.word}/${w.tag}`).join(' '));
377
console.log('---');
378
});
379
});
380
```
381
382
### Common POS Tags
383
384
Natural.js typically uses Penn Treebank POS tag set:
385
386
```javascript
387
// Common POS tags used in Natural.js
388
const commonTags = {
389
// Nouns
390
'NN': 'Noun, singular',
391
'NNS': 'Noun, plural',
392
'NNP': 'Proper noun, singular',
393
'NNPS': 'Proper noun, plural',
394
395
// Verbs
396
'VB': 'Verb, base form',
397
'VBD': 'Verb, past tense',
398
'VBG': 'Verb, gerund/present participle',
399
'VBN': 'Verb, past participle',
400
'VBP': 'Verb, non-3rd person singular present',
401
'VBZ': 'Verb, 3rd person singular present',
402
403
// Adjectives
404
'JJ': 'Adjective',
405
'JJR': 'Adjective, comparative',
406
'JJS': 'Adjective, superlative',
407
408
// Adverbs
409
'RB': 'Adverb',
410
'RBR': 'Adverb, comparative',
411
'RBS': 'Adverb, superlative',
412
413
// Determiners
414
'DT': 'Determiner',
415
416
// Prepositions
417
'IN': 'Preposition or subordinating conjunction',
418
419
// Pronouns
420
'PRP': 'Personal pronoun',
421
'PRP$': 'Possessive pronoun'
422
};
423
```