or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.md

index.mddocs/

0

# Metascraper Title

1

2

Metascraper Title is a metadata extraction rule module that provides intelligent title extraction from HTML markup. It operates as part of the metascraper ecosystem, offering 9 prioritized extraction strategies that handle Open Graph meta tags, Twitter Cards, JSON-LD structured data, HTML title elements, and common CSS class patterns.

3

4

## Package Information

5

6

- **Package Name**: metascraper-title

7

- **Package Type**: npm

8

- **Language**: JavaScript

9

- **Installation**: `npm install metascraper-title`

10

11

## Core Imports

12

13

```javascript

14

const metascraperTitle = require('metascraper-title');

15

```

16

17

For ES modules:

18

19

```javascript

20

import metascraperTitle from 'metascraper-title';

21

```

22

23

**Note**: metascraper-title is CommonJS only and does not provide native ES module exports.

24

25

## Basic Usage

26

27

```javascript

28

const metascraper = require('metascraper')([

29

require('metascraper-title')()

30

]);

31

32

const html = `

33

<html>

34

<head>

35

<title>Example Page Title</title>

36

<meta property="og:title" content="Better OpenGraph Title">

37

</head>

38

</html>

39

`;

40

41

const metadata = await metascraper({

42

html,

43

url: 'https://example.com'

44

});

45

46

console.log(metadata.title); // "Better OpenGraph Title"

47

```

48

49

## Architecture

50

51

Metascraper Title implements the metascraper plugin pattern with a rules-based extraction system:

52

53

- **Factory Function**: Returns a rules object containing title extraction logic

54

- **Rule Priority**: 9 extraction rules processed in priority order until a valid title is found

55

- **Helper Integration**: Uses `@metascraper/helpers` for DOM processing, text normalization, and JSON-LD parsing

56

- **Metascraper Integration**: Follows standard metascraper plugin interface for seamless composition

57

58

## Capabilities

59

60

### Title Extraction Rules

61

62

Provides a comprehensive set of title extraction rules with fallback prioritization.

63

64

```javascript { .api }

65

/**

66

* Creates metascraper rules for title extraction

67

* @returns {Rules} Rules object containing title extraction logic

68

*/

69

function metascraperTitle(): Rules;

70

71

interface Rules {

72

/** Array of title extraction rules in priority order */

73

title: Array<RulesOptions>;

74

/** Package identifier for debugging */

75

pkgName?: string;

76

/** Optional test function to skip rules */

77

test?: (options: RulesTestOptions) => boolean;

78

}

79

80

type RulesOptions = (options: RulesTestOptions) => string | null | undefined;

81

82

interface RulesTestOptions {

83

/** Cheerio DOM instance of the HTML */

84

htmlDom: CheerioAPI;

85

/** URL of the page being processed */

86

url: string;

87

}

88

```

89

90

**Rule Priority Order:**

91

92

1. **Open Graph Title** - `meta[property="og:title"]` content attribute

93

2. **Twitter Card Title (name)** - `meta[name="twitter:title"]` content attribute

94

3. **Twitter Card Title (property)** - `meta[property="twitter:title"]` content attribute

95

4. **HTML Title Element** - `<title>` element text content (filtered)

96

5. **JSON-LD Headline** - `headline` property from JSON-LD structured data

97

6. **Post Title Class** - `.post-title` element text content (filtered)

98

7. **Entry Title Class** - `.entry-title` element text content (filtered)

99

8. **H1 Title Class Link** - `h1[class*="title" i] a` element text content (filtered)

100

9. **H1 Title Class** - `h1[class*="title" i]` element text content (filtered)

101

102

**Usage Examples:**

103

104

```javascript

105

// Using with multiple metascraper rules

106

const metascraper = require('metascraper')([

107

require('metascraper-title')(),

108

require('metascraper-description')(),

109

require('metascraper-image')()

110

]);

111

112

// Extract from HTML with Open Graph tags

113

const ogHtml = `

114

<meta property="og:title" content="The Ultimate Guide to Web Development">

115

<title>Generic Page Title</title>

116

`;

117

118

const ogResult = await metascraper({

119

html: ogHtml,

120

url: 'https://blog.example.com/guide'

121

});

122

console.log(ogResult.title); // "The Ultimate Guide to Web Development"

123

124

// Extract from HTML with only title element

125

const titleHtml = `

126

<title>Simple Page Title | My Website</title>

127

`;

128

129

const titleResult = await metascraper({

130

html: titleHtml,

131

url: 'https://example.com/page'

132

});

133

console.log(titleResult.title); // "Simple Page Title | My Website" (processed)

134

135

// Extract from JSON-LD structured data

136

const jsonLdHtml = `

137

<script type="application/ld+json">

138

{

139

"@context": "https://schema.org",

140

"@type": "Article",

141

"headline": "Breaking News: Major Discovery"

142

}

143

</script>

144

<title>Default Title</title>

145

`;

146

147

const jsonLdResult = await metascraper({

148

html: jsonLdHtml,

149

url: 'https://news.example.com/article'

150

});

151

console.log(jsonLdResult.title); // "Breaking News: Major Discovery"

152

```

153

154

### Text Processing

155

156

All extracted titles are automatically processed using helpers for consistency:

157

158

- **Whitespace Normalization**: Condenses multiple whitespace characters

159

- **Smart Quotes**: Converts straight quotes to curly quotes where appropriate

160

- **HTML Entity Decoding**: Decodes HTML entities in extracted text

161

- **Filtering**: Removes empty or invalid title values

162

163

### Dependencies

164

165

**Internal Helper Functions** (from @metascraper/helpers):

166

167

- `toRule(title)` - Wraps extraction functions with title processing

168

- `$filter($, element)` - Filters DOM elements and extracts clean text

169

- `$jsonld(property)` - Extracts properties from JSON-LD structured data

170

- `title(value, options)` - Processes and normalizes title text

171

172

These are internal implementation details not exposed in the public API.

173

174

## Types

175

176

```javascript { .api }

177

// Cheerio DOM API (from metascraper integration)

178

interface CheerioAPI {

179

/** Select elements using CSS selector */

180

(selector: string): CheerioElement;

181

/** Get the root element */

182

root(): CheerioElement;

183

}

184

185

interface CheerioElement {

186

/** Get attribute value */

187

attr(name: string): string | undefined;

188

/** Get text content */

189

text(): string;

190

/** Get HTML content */

191

html(): string | null;

192

/** Find child elements */

193

find(selector: string): CheerioElement;

194

/** Filter elements */

195

filter(selector: string): CheerioElement;

196

/** Get first element */

197

first(): CheerioElement;

198

/** Iterate over elements */

199

each(callback: (index: number, element: Element) => void | false): CheerioElement;

200

/** Get number of elements */

201

length: number;

202

}

203

```