Get title property from HTML markup
npx @tessl/cli install tessl/npm-metascraper-title@5.49.00
# Metascraper Title
1
2
Metascraper Title is a metadata extraction rule module that provides intelligent title extraction from HTML markup. It operates as part of the metascraper ecosystem, offering 9 prioritized extraction strategies that handle Open Graph meta tags, Twitter Cards, JSON-LD structured data, HTML title elements, and common CSS class patterns.
3
4
## Package Information
5
6
- **Package Name**: metascraper-title
7
- **Package Type**: npm
8
- **Language**: JavaScript
9
- **Installation**: `npm install metascraper-title`
10
11
## Core Imports
12
13
```javascript
14
const metascraperTitle = require('metascraper-title');
15
```
16
17
For ES modules:
18
19
```javascript
20
import metascraperTitle from 'metascraper-title';
21
```
22
23
**Note**: metascraper-title is CommonJS only and does not provide native ES module exports.
24
25
## Basic Usage
26
27
```javascript
28
const metascraper = require('metascraper')([
29
require('metascraper-title')()
30
]);
31
32
const html = `
33
<html>
34
<head>
35
<title>Example Page Title</title>
36
<meta property="og:title" content="Better OpenGraph Title">
37
</head>
38
</html>
39
`;
40
41
const metadata = await metascraper({
42
html,
43
url: 'https://example.com'
44
});
45
46
console.log(metadata.title); // "Better OpenGraph Title"
47
```
48
49
## Architecture
50
51
Metascraper Title implements the metascraper plugin pattern with a rules-based extraction system:
52
53
- **Factory Function**: Returns a rules object containing title extraction logic
54
- **Rule Priority**: 9 extraction rules processed in priority order until a valid title is found
55
- **Helper Integration**: Uses `@metascraper/helpers` for DOM processing, text normalization, and JSON-LD parsing
56
- **Metascraper Integration**: Follows standard metascraper plugin interface for seamless composition
57
58
## Capabilities
59
60
### Title Extraction Rules
61
62
Provides a comprehensive set of title extraction rules with fallback prioritization.
63
64
```javascript { .api }
65
/**
66
* Creates metascraper rules for title extraction
67
* @returns {Rules} Rules object containing title extraction logic
68
*/
69
function metascraperTitle(): Rules;
70
71
interface Rules {
72
/** Array of title extraction rules in priority order */
73
title: Array<RulesOptions>;
74
/** Package identifier for debugging */
75
pkgName?: string;
76
/** Optional test function to skip rules */
77
test?: (options: RulesTestOptions) => boolean;
78
}
79
80
type RulesOptions = (options: RulesTestOptions) => string | null | undefined;
81
82
interface RulesTestOptions {
83
/** Cheerio DOM instance of the HTML */
84
htmlDom: CheerioAPI;
85
/** URL of the page being processed */
86
url: string;
87
}
88
```
89
90
**Rule Priority Order:**
91
92
1. **Open Graph Title** - `meta[property="og:title"]` content attribute
93
2. **Twitter Card Title (name)** - `meta[name="twitter:title"]` content attribute
94
3. **Twitter Card Title (property)** - `meta[property="twitter:title"]` content attribute
95
4. **HTML Title Element** - `<title>` element text content (filtered)
96
5. **JSON-LD Headline** - `headline` property from JSON-LD structured data
97
6. **Post Title Class** - `.post-title` element text content (filtered)
98
7. **Entry Title Class** - `.entry-title` element text content (filtered)
99
8. **H1 Title Class Link** - `h1[class*="title" i] a` element text content (filtered)
100
9. **H1 Title Class** - `h1[class*="title" i]` element text content (filtered)
101
102
**Usage Examples:**
103
104
```javascript
105
// Using with multiple metascraper rules
106
const metascraper = require('metascraper')([
107
require('metascraper-title')(),
108
require('metascraper-description')(),
109
require('metascraper-image')()
110
]);
111
112
// Extract from HTML with Open Graph tags
113
const ogHtml = `
114
<meta property="og:title" content="The Ultimate Guide to Web Development">
115
<title>Generic Page Title</title>
116
`;
117
118
const ogResult = await metascraper({
119
html: ogHtml,
120
url: 'https://blog.example.com/guide'
121
});
122
console.log(ogResult.title); // "The Ultimate Guide to Web Development"
123
124
// Extract from HTML with only title element
125
const titleHtml = `
126
<title>Simple Page Title | My Website</title>
127
`;
128
129
const titleResult = await metascraper({
130
html: titleHtml,
131
url: 'https://example.com/page'
132
});
133
console.log(titleResult.title); // "Simple Page Title | My Website" (processed)
134
135
// Extract from JSON-LD structured data
136
const jsonLdHtml = `
137
<script type="application/ld+json">
138
{
139
"@context": "https://schema.org",
140
"@type": "Article",
141
"headline": "Breaking News: Major Discovery"
142
}
143
</script>
144
<title>Default Title</title>
145
`;
146
147
const jsonLdResult = await metascraper({
148
html: jsonLdHtml,
149
url: 'https://news.example.com/article'
150
});
151
console.log(jsonLdResult.title); // "Breaking News: Major Discovery"
152
```
153
154
### Text Processing
155
156
All extracted titles are automatically processed using helpers for consistency:
157
158
- **Whitespace Normalization**: Condenses multiple whitespace characters
159
- **Smart Quotes**: Converts straight quotes to curly quotes where appropriate
160
- **HTML Entity Decoding**: Decodes HTML entities in extracted text
161
- **Filtering**: Removes empty or invalid title values
162
163
### Dependencies
164
165
**Internal Helper Functions** (from @metascraper/helpers):
166
167
- `toRule(title)` - Wraps extraction functions with title processing
168
- `$filter($, element)` - Filters DOM elements and extracts clean text
169
- `$jsonld(property)` - Extracts properties from JSON-LD structured data
170
- `title(value, options)` - Processes and normalizes title text
171
172
These are internal implementation details not exposed in the public API.
173
174
## Types
175
176
```javascript { .api }
177
// Cheerio DOM API (from metascraper integration)
178
interface CheerioAPI {
179
/** Select elements using CSS selector */
180
(selector: string): CheerioElement;
181
/** Get the root element */
182
root(): CheerioElement;
183
}
184
185
interface CheerioElement {
186
/** Get attribute value */
187
attr(name: string): string | undefined;
188
/** Get text content */
189
text(): string;
190
/** Get HTML content */
191
html(): string | null;
192
/** Find child elements */
193
find(selector: string): CheerioElement;
194
/** Filter elements */
195
filter(selector: string): CheerioElement;
196
/** Get first element */
197
first(): CheerioElement;
198
/** Iterate over elements */
199
each(callback: (index: number, element: Element) => void | false): CheerioElement;
200
/** Get number of elements */
201
length: number;
202
}
203
```