or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/npm-metascraper-title

Get title property from HTML markup

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
npmpkg:npm/metascraper-title@5.49.x

To install, run

npx @tessl/cli install tessl/npm-metascraper-title@5.49.0

index.mddocs/

Metascraper Title

Metascraper Title is a metadata extraction rule module that provides intelligent title extraction from HTML markup. It operates as part of the metascraper ecosystem, offering 9 prioritized extraction strategies that handle Open Graph meta tags, Twitter Cards, JSON-LD structured data, HTML title elements, and common CSS class patterns.

Package Information

  • Package Name: metascraper-title
  • Package Type: npm
  • Language: JavaScript
  • Installation: npm install metascraper-title

Core Imports

const metascraperTitle = require('metascraper-title');

For ES modules:

import metascraperTitle from 'metascraper-title';

Note: metascraper-title is CommonJS only and does not provide native ES module exports.

Basic Usage

const metascraper = require('metascraper')([
  require('metascraper-title')()
]);

const html = `
<html>
  <head>
    <title>Example Page Title</title>
    <meta property="og:title" content="Better OpenGraph Title">
  </head>
</html>
`;

const metadata = await metascraper({ 
  html, 
  url: 'https://example.com' 
});

console.log(metadata.title); // "Better OpenGraph Title"

Architecture

Metascraper Title implements the metascraper plugin pattern with a rules-based extraction system:

  • Factory Function: Returns a rules object containing title extraction logic
  • Rule Priority: 9 extraction rules processed in priority order until a valid title is found
  • Helper Integration: Uses @metascraper/helpers for DOM processing, text normalization, and JSON-LD parsing
  • Metascraper Integration: Follows standard metascraper plugin interface for seamless composition

Capabilities

Title Extraction Rules

Provides a comprehensive set of title extraction rules with fallback prioritization.

/**
 * Creates metascraper rules for title extraction
 * @returns {Rules} Rules object containing title extraction logic
 */
function metascraperTitle(): Rules;

interface Rules {
  /** Array of title extraction rules in priority order */
  title: Array<RulesOptions>;
  /** Package identifier for debugging */
  pkgName?: string;
  /** Optional test function to skip rules */
  test?: (options: RulesTestOptions) => boolean;
}

type RulesOptions = (options: RulesTestOptions) => string | null | undefined;

interface RulesTestOptions {
  /** Cheerio DOM instance of the HTML */
  htmlDom: CheerioAPI;
  /** URL of the page being processed */
  url: string;
}

Rule Priority Order:

  1. Open Graph Title - meta[property="og:title"] content attribute
  2. Twitter Card Title (name) - meta[name="twitter:title"] content attribute
  3. Twitter Card Title (property) - meta[property="twitter:title"] content attribute
  4. HTML Title Element - <title> element text content (filtered)
  5. JSON-LD Headline - headline property from JSON-LD structured data
  6. Post Title Class - .post-title element text content (filtered)
  7. Entry Title Class - .entry-title element text content (filtered)
  8. H1 Title Class Link - h1[class*="title" i] a element text content (filtered)
  9. H1 Title Class - h1[class*="title" i] element text content (filtered)

Usage Examples:

// Using with multiple metascraper rules
const metascraper = require('metascraper')([
  require('metascraper-title')(),
  require('metascraper-description')(),
  require('metascraper-image')()
]);

// Extract from HTML with Open Graph tags
const ogHtml = `
<meta property="og:title" content="The Ultimate Guide to Web Development">
<title>Generic Page Title</title>
`;

const ogResult = await metascraper({ 
  html: ogHtml, 
  url: 'https://blog.example.com/guide' 
});
console.log(ogResult.title); // "The Ultimate Guide to Web Development"

// Extract from HTML with only title element
const titleHtml = `
<title>Simple Page Title | My Website</title>
`;

const titleResult = await metascraper({ 
  html: titleHtml, 
  url: 'https://example.com/page' 
});
console.log(titleResult.title); // "Simple Page Title | My Website" (processed)

// Extract from JSON-LD structured data
const jsonLdHtml = `
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Breaking News: Major Discovery"
}
</script>
<title>Default Title</title>
`;

const jsonLdResult = await metascraper({ 
  html: jsonLdHtml, 
  url: 'https://news.example.com/article' 
});
console.log(jsonLdResult.title); // "Breaking News: Major Discovery"

Text Processing

All extracted titles are automatically processed using helpers for consistency:

  • Whitespace Normalization: Condenses multiple whitespace characters
  • Smart Quotes: Converts straight quotes to curly quotes where appropriate
  • HTML Entity Decoding: Decodes HTML entities in extracted text
  • Filtering: Removes empty or invalid title values

Dependencies

Internal Helper Functions (from @metascraper/helpers):

  • toRule(title) - Wraps extraction functions with title processing
  • $filter($, element) - Filters DOM elements and extracts clean text
  • $jsonld(property) - Extracts properties from JSON-LD structured data
  • title(value, options) - Processes and normalizes title text

These are internal implementation details not exposed in the public API.

Types

// Cheerio DOM API (from metascraper integration)
interface CheerioAPI {
  /** Select elements using CSS selector */
  (selector: string): CheerioElement;
  /** Get the root element */
  root(): CheerioElement;
}

interface CheerioElement {
  /** Get attribute value */
  attr(name: string): string | undefined;
  /** Get text content */
  text(): string;
  /** Get HTML content */
  html(): string | null;
  /** Find child elements */
  find(selector: string): CheerioElement;
  /** Filter elements */
  filter(selector: string): CheerioElement;
  /** Get first element */
  first(): CheerioElement;
  /** Iterate over elements */
  each(callback: (index: number, element: Element) => void | false): CheerioElement;
  /** Get number of elements */
  length: number;
}