Get title property from HTML markup
npx @tessl/cli install tessl/npm-metascraper-title@5.49.0Metascraper Title is a metadata extraction rule module that provides intelligent title extraction from HTML markup. It operates as part of the metascraper ecosystem, offering 9 prioritized extraction strategies that handle Open Graph meta tags, Twitter Cards, JSON-LD structured data, HTML title elements, and common CSS class patterns.
npm install metascraper-titleconst metascraperTitle = require('metascraper-title');For ES modules:
import metascraperTitle from 'metascraper-title';Note: metascraper-title is CommonJS only and does not provide native ES module exports.
const metascraper = require('metascraper')([
require('metascraper-title')()
]);
const html = `
<html>
<head>
<title>Example Page Title</title>
<meta property="og:title" content="Better OpenGraph Title">
</head>
</html>
`;
const metadata = await metascraper({
html,
url: 'https://example.com'
});
console.log(metadata.title); // "Better OpenGraph Title"Metascraper Title implements the metascraper plugin pattern with a rules-based extraction system:
@metascraper/helpers for DOM processing, text normalization, and JSON-LD parsingProvides a comprehensive set of title extraction rules with fallback prioritization.
/**
* Creates metascraper rules for title extraction
* @returns {Rules} Rules object containing title extraction logic
*/
function metascraperTitle(): Rules;
interface Rules {
/** Array of title extraction rules in priority order */
title: Array<RulesOptions>;
/** Package identifier for debugging */
pkgName?: string;
/** Optional test function to skip rules */
test?: (options: RulesTestOptions) => boolean;
}
type RulesOptions = (options: RulesTestOptions) => string | null | undefined;
interface RulesTestOptions {
/** Cheerio DOM instance of the HTML */
htmlDom: CheerioAPI;
/** URL of the page being processed */
url: string;
}Rule Priority Order:
meta[property="og:title"] content attributemeta[name="twitter:title"] content attributemeta[property="twitter:title"] content attribute<title> element text content (filtered)headline property from JSON-LD structured data.post-title element text content (filtered).entry-title element text content (filtered)h1[class*="title" i] a element text content (filtered)h1[class*="title" i] element text content (filtered)Usage Examples:
// Using with multiple metascraper rules
const metascraper = require('metascraper')([
require('metascraper-title')(),
require('metascraper-description')(),
require('metascraper-image')()
]);
// Extract from HTML with Open Graph tags
const ogHtml = `
<meta property="og:title" content="The Ultimate Guide to Web Development">
<title>Generic Page Title</title>
`;
const ogResult = await metascraper({
html: ogHtml,
url: 'https://blog.example.com/guide'
});
console.log(ogResult.title); // "The Ultimate Guide to Web Development"
// Extract from HTML with only title element
const titleHtml = `
<title>Simple Page Title | My Website</title>
`;
const titleResult = await metascraper({
html: titleHtml,
url: 'https://example.com/page'
});
console.log(titleResult.title); // "Simple Page Title | My Website" (processed)
// Extract from JSON-LD structured data
const jsonLdHtml = `
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Breaking News: Major Discovery"
}
</script>
<title>Default Title</title>
`;
const jsonLdResult = await metascraper({
html: jsonLdHtml,
url: 'https://news.example.com/article'
});
console.log(jsonLdResult.title); // "Breaking News: Major Discovery"All extracted titles are automatically processed using helpers for consistency:
Internal Helper Functions (from @metascraper/helpers):
toRule(title) - Wraps extraction functions with title processing$filter($, element) - Filters DOM elements and extracts clean text$jsonld(property) - Extracts properties from JSON-LD structured datatitle(value, options) - Processes and normalizes title textThese are internal implementation details not exposed in the public API.
// Cheerio DOM API (from metascraper integration)
interface CheerioAPI {
/** Select elements using CSS selector */
(selector: string): CheerioElement;
/** Get the root element */
root(): CheerioElement;
}
interface CheerioElement {
/** Get attribute value */
attr(name: string): string | undefined;
/** Get text content */
text(): string;
/** Get HTML content */
html(): string | null;
/** Find child elements */
find(selector: string): CheerioElement;
/** Filter elements */
filter(selector: string): CheerioElement;
/** Get first element */
first(): CheerioElement;
/** Iterate over elements */
each(callback: (index: number, element: Element) => void | false): CheerioElement;
/** Get number of elements */
length: number;
}