or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

classification.mddistance.mdindex.mdngrams-tfidf.mdphonetics.mdpos-tagging.mdsentiment.mdtext-processing.mdtransliterators.mdutilities.mdwordnet.md

pos-tagging.mddocs/

0

# Part-of-Speech Tagging

1

2

Brill tagger implementation for assigning grammatical parts of speech to words in sentences using transformation-based learning. The system combines lexical lookup with contextual transformation rules.

3

4

## Capabilities

5

6

### Brill POS Tagger

7

8

Main part-of-speech tagger using Eric Brill's transformation-based approach.

9

10

```javascript { .api }

11

/**

12

* Brill transformation-based POS tagger

13

* @param lexicon - Lexicon instance for initial word tagging

14

* @param ruleSet - RuleSet instance containing transformation rules

15

*/

16

class BrillPOSTagger {

17

constructor(lexicon: Lexicon, ruleSet: RuleSet);

18

19

/**

20

* Tag a sentence with part-of-speech tags

21

* @param sentence - Array of word strings

22

* @returns Sentence object with tagged words

23

*/

24

tag(sentence: string[]): Sentence;

25

26

/**

27

* Apply initial lexicon-based tagging

28

* @param sentence - Array of words

29

* @returns Initially tagged sentence

30

*/

31

tagWithLexicon(sentence: string[]): Sentence;

32

33

/**

34

* Apply transformation rules to improve tagging

35

* @param taggedSentence - Sentence with initial tags

36

* @returns Sentence with improved tags

37

*/

38

applyRules(taggedSentence: Sentence): Sentence;

39

}

40

```

41

42

### Lexicon

43

44

Dictionary mapping words to their most likely part-of-speech tags.

45

46

```javascript { .api }

47

/**

48

* POS tagging lexicon

49

*/

50

class Lexicon {

51

constructor();

52

53

/**

54

* Add a word-tag pair to the lexicon

55

* @param word - Word to add

56

* @param tag - POS tag for the word

57

*/

58

addTaggedWord(word: string, tag: string): void;

59

60

/**

61

* Get the most likely tag for a word

62

* @param word - Word to look up

63

* @returns Most likely POS tag

64

*/

65

tagWord(word: string): string;

66

67

/**

68

* Check if word exists in lexicon

69

* @param word - Word to check

70

* @returns true if word is in lexicon

71

*/

72

hasWord(word: string): boolean;

73

}

74

```

75

76

### Rule Set

77

78

Collection of transformation rules for improving POS tag accuracy.

79

80

```javascript { .api }

81

/**

82

* Set of transformation rules for POS tagging

83

*/

84

class RuleSet {

85

constructor();

86

87

/**

88

* Add a transformation rule

89

* @param rule - RuleTemplate instance

90

*/

91

addRule(rule: RuleTemplate): void;

92

93

/**

94

* Apply all rules to a sentence

95

* @param sentence - Tagged sentence to transform

96

* @returns Sentence with applied transformations

97

*/

98

applyRules(sentence: Sentence): Sentence;

99

}

100

101

/**

102

* Individual transformation rule template

103

*/

104

class RuleTemplate {

105

constructor();

106

107

/**

108

* Apply this rule to a sentence

109

* @param sentence - Sentence to apply rule to

110

* @returns Modified sentence

111

*/

112

apply(sentence: Sentence): Sentence;

113

}

114

115

/**

116

* Pre-defined rule templates

117

*/

118

declare const ruleTemplates: {

119

[templateName: string]: RuleTemplate;

120

};

121

```

122

123

### Sentence and Corpus

124

125

Data structures for representing tagged sentences and training corpora.

126

127

```javascript { .api }

128

/**

129

* Represents a sentence with POS tags

130

*/

131

class Sentence {

132

constructor();

133

134

/**

135

* Add a tagged word to the sentence

136

* @param word - Word text

137

* @param tag - POS tag

138

*/

139

addTaggedWord(word: string, tag: string): void;

140

141

/**

142

* Get all tagged words

143

* @returns Array of tagged word objects

144

*/

145

getTaggedWords(): TaggedWord[];

146

147

/**

148

* Get word at specific position

149

* @param index - Position in sentence

150

* @returns Tagged word at position

151

*/

152

getWordAt(index: number): TaggedWord;

153

154

/**

155

* Get sentence length

156

* @returns Number of words in sentence

157

*/

158

length(): number;

159

}

160

161

/**

162

* Tagged word representation

163

*/

164

interface TaggedWord {

165

word: string;

166

tag: string;

167

}

168

169

/**

170

* Training corpus for POS tagger

171

*/

172

class Corpus {

173

constructor();

174

175

/**

176

* Add a sentence to the corpus

177

* @param sentence - Sentence instance

178

*/

179

addSentence(sentence: Sentence): void;

180

181

/**

182

* Get all sentences in corpus

183

* @returns Array of sentences

184

*/

185

getSentences(): Sentence[];

186

}

187

```

188

189

**Usage Examples:**

190

191

```javascript

192

const natural = require('natural');

193

194

// Create lexicon and add word-tag pairs

195

const lexicon = new natural.Lexicon();

196

lexicon.addTaggedWord('the', 'DT');

197

lexicon.addTaggedWord('cat', 'NN');

198

lexicon.addTaggedWord('dog', 'NN');

199

lexicon.addTaggedWord('runs', 'VBZ');

200

lexicon.addTaggedWord('quickly', 'RB');

201

202

// Create rule set with transformation rules

203

const ruleSet = new natural.RuleSet();

204

// Add rules to improve tagging accuracy

205

// (In practice, you would load pre-trained rules)

206

207

// Create POS tagger

208

const tagger = new natural.BrillPOSTagger(lexicon, ruleSet);

209

210

// Tag a sentence

211

const sentence = ['the', 'cat', 'runs', 'quickly'];

212

const taggedSentence = tagger.tag(sentence);

213

214

// Display results

215

console.log('Tagged sentence:');

216

taggedSentence.getTaggedWords().forEach(word => {

217

console.log(`${word.word}/${word.tag}`);

218

});

219

```

220

221

### Training Components

222

223

**Trainer for creating custom models:**

224

225

```javascript { .api }

226

/**

227

* Trainer for Brill POS tagger

228

*/

229

class BrillPOSTrainer {

230

constructor();

231

232

/**

233

* Train a tagger on a corpus

234

* @param corpus - Training corpus

235

* @returns Trained tagger components

236

*/

237

train(corpus: Corpus): {lexicon: Lexicon, ruleSet: RuleSet};

238

}

239

240

/**

241

* Tester for evaluating tagger performance

242

*/

243

class BrillPOSTester {

244

constructor();

245

246

/**

247

* Test tagger accuracy on test corpus

248

* @param tagger - Trained tagger

249

* @param testCorpus - Test corpus

250

* @returns Accuracy metrics

251

*/

252

test(tagger: BrillPOSTagger, testCorpus: Corpus): TestResults;

253

}

254

255

interface TestResults {

256

accuracy: number;

257

precision: {[tag: string]: number};

258

recall: {[tag: string]: number};

259

}

260

```

261

262

### Advanced Usage

263

264

**Complete training and testing pipeline:**

265

266

```javascript

267

const natural = require('natural');

268

269

/**

270

* Train a custom POS tagger

271

*/

272

function trainCustomTagger(trainingData) {

273

// Create training corpus

274

const corpus = new natural.Corpus();

275

276

// Add training sentences

277

trainingData.forEach(sentenceData => {

278

const sentence = new natural.Sentence();

279

sentenceData.forEach(({word, tag}) => {

280

sentence.addTaggedWord(word, tag);

281

});

282

corpus.addSentence(sentence);

283

});

284

285

// Train the model

286

const trainer = new natural.BrillPOSTrainer();

287

const {lexicon, ruleSet} = trainer.train(corpus);

288

289

// Create tagger

290

const tagger = new natural.BrillPOSTagger(lexicon, ruleSet);

291

292

return tagger;

293

}

294

295

// Example training data

296

const trainingData = [

297

[

298

{word: 'the', tag: 'DT'},

299

{word: 'quick', tag: 'JJ'},

300

{word: 'brown', tag: 'JJ'},

301

{word: 'fox', tag: 'NN'},

302

{word: 'jumps', tag: 'VBZ'}

303

],

304

[

305

{word: 'a', tag: 'DT'},

306

{word: 'lazy', tag: 'JJ'},

307

{word: 'dog', tag: 'NN'},

308

{word: 'sleeps', tag: 'VBZ'}

309

]

310

// ... more training sentences

311

];

312

313

// Train custom tagger

314

const customTagger = trainCustomTagger(trainingData);

315

316

// Use trained tagger

317

const testSentence = ['the', 'big', 'cat', 'runs'];

318

const result = customTagger.tag(testSentence);

319

console.log('Custom tagger results:', result.getTaggedWords());

320

```

321

322

### Working with Pre-trained Models

323

324

```javascript

325

const natural = require('natural');

326

327

/**

328

* Load and use pre-trained POS tagger

329

*/

330

async function usePresentTrainedTagger() {

331

// In practice, you would load pre-trained lexicon and rules

332

// This example shows the structure for loading saved models

333

334

try {

335

// Load pre-trained lexicon (would be from file/database)

336

const lexicon = new natural.Lexicon();

337

338

// Load common English words with tags

339

const commonWords = {

340

'the': 'DT', 'a': 'DT', 'an': 'DT',

341

'cat': 'NN', 'dog': 'NN', 'house': 'NN',

342

'run': 'VB', 'runs': 'VBZ', 'running': 'VBG',

343

'quick': 'JJ', 'slow': 'JJ', 'big': 'JJ',

344

'quickly': 'RB', 'slowly': 'RB'

345

};

346

347

Object.entries(commonWords).forEach(([word, tag]) => {

348

lexicon.addTaggedWord(word, tag);

349

});

350

351

// Load rule set (would be from trained model)

352

const ruleSet = new natural.RuleSet();

353

354

// Create tagger

355

const tagger = new natural.BrillPOSTagger(lexicon, ruleSet);

356

357

return tagger;

358

359

} catch (error) {

360

console.error('Error loading pre-trained model:', error);

361

throw error;

362

}

363

}

364

365

// Usage

366

usePresentTrainedTagger().then(tagger => {

367

const sentences = [

368

['the', 'quick', 'brown', 'fox', 'runs'],

369

['a', 'big', 'dog', 'sleeps'],

370

['the', 'house', 'is', 'big']

371

];

372

373

sentences.forEach(sentence => {

374

const tagged = tagger.tag(sentence);

375

console.log('Sentence:', sentence.join(' '));

376

console.log('Tagged:', tagged.getTaggedWords().map(w => `${w.word}/${w.tag}`).join(' '));

377

console.log('---');

378

});

379

});

380

```

381

382

### Common POS Tags

383

384

Natural.js typically uses Penn Treebank POS tag set:

385

386

```javascript

387

// Common POS tags used in Natural.js

388

const commonTags = {

389

// Nouns

390

'NN': 'Noun, singular',

391

'NNS': 'Noun, plural',

392

'NNP': 'Proper noun, singular',

393

'NNPS': 'Proper noun, plural',

394

395

// Verbs

396

'VB': 'Verb, base form',

397

'VBD': 'Verb, past tense',

398

'VBG': 'Verb, gerund/present participle',

399

'VBN': 'Verb, past participle',

400

'VBP': 'Verb, non-3rd person singular present',

401

'VBZ': 'Verb, 3rd person singular present',

402

403

// Adjectives

404

'JJ': 'Adjective',

405

'JJR': 'Adjective, comparative',

406

'JJS': 'Adjective, superlative',

407

408

// Adverbs

409

'RB': 'Adverb',

410

'RBR': 'Adverb, comparative',

411

'RBS': 'Adverb, superlative',

412

413

// Determiners

414

'DT': 'Determiner',

415

416

// Prepositions

417

'IN': 'Preposition or subordinating conjunction',

418

419

// Pronouns

420

'PRP': 'Personal pronoun',

421

'PRP$': 'Possessive pronoun'

422

};

423

```