CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-metascraper-title

Get title property from HTML markup

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

index.mddocs/

Metascraper Title

Metascraper Title is a metadata extraction rule module that provides intelligent title extraction from HTML markup. It operates as part of the metascraper ecosystem, offering 9 prioritized extraction strategies that handle Open Graph meta tags, Twitter Cards, JSON-LD structured data, HTML title elements, and common CSS class patterns.

Package Information

  • Package Name: metascraper-title
  • Package Type: npm
  • Language: JavaScript
  • Installation: npm install metascraper-title

Core Imports

const metascraperTitle = require('metascraper-title');

For ES modules:

import metascraperTitle from 'metascraper-title';

Note: metascraper-title is CommonJS only and does not provide native ES module exports.

Basic Usage

const metascraper = require('metascraper')([
  require('metascraper-title')()
]);

const html = `
<html>
  <head>
    <title>Example Page Title</title>
    <meta property="og:title" content="Better OpenGraph Title">
  </head>
</html>
`;

const metadata = await metascraper({ 
  html, 
  url: 'https://example.com' 
});

console.log(metadata.title); // "Better OpenGraph Title"

Architecture

Metascraper Title implements the metascraper plugin pattern with a rules-based extraction system:

  • Factory Function: Returns a rules object containing title extraction logic
  • Rule Priority: 9 extraction rules processed in priority order until a valid title is found
  • Helper Integration: Uses @metascraper/helpers for DOM processing, text normalization, and JSON-LD parsing
  • Metascraper Integration: Follows standard metascraper plugin interface for seamless composition

Capabilities

Title Extraction Rules

Provides a comprehensive set of title extraction rules with fallback prioritization.

/**
 * Creates metascraper rules for title extraction
 * @returns {Rules} Rules object containing title extraction logic
 */
function metascraperTitle(): Rules;

interface Rules {
  /** Array of title extraction rules in priority order */
  title: Array<RulesOptions>;
  /** Package identifier for debugging */
  pkgName?: string;
  /** Optional test function to skip rules */
  test?: (options: RulesTestOptions) => boolean;
}

type RulesOptions = (options: RulesTestOptions) => string | null | undefined;

interface RulesTestOptions {
  /** Cheerio DOM instance of the HTML */
  htmlDom: CheerioAPI;
  /** URL of the page being processed */
  url: string;
}

Rule Priority Order:

  1. Open Graph Title - meta[property="og:title"] content attribute
  2. Twitter Card Title (name) - meta[name="twitter:title"] content attribute
  3. Twitter Card Title (property) - meta[property="twitter:title"] content attribute
  4. HTML Title Element - <title> element text content (filtered)
  5. JSON-LD Headline - headline property from JSON-LD structured data
  6. Post Title Class - .post-title element text content (filtered)
  7. Entry Title Class - .entry-title element text content (filtered)
  8. H1 Title Class Link - h1[class*="title" i] a element text content (filtered)
  9. H1 Title Class - h1[class*="title" i] element text content (filtered)

Usage Examples:

// Using with multiple metascraper rules
const metascraper = require('metascraper')([
  require('metascraper-title')(),
  require('metascraper-description')(),
  require('metascraper-image')()
]);

// Extract from HTML with Open Graph tags
const ogHtml = `
<meta property="og:title" content="The Ultimate Guide to Web Development">
<title>Generic Page Title</title>
`;

const ogResult = await metascraper({ 
  html: ogHtml, 
  url: 'https://blog.example.com/guide' 
});
console.log(ogResult.title); // "The Ultimate Guide to Web Development"

// Extract from HTML with only title element
const titleHtml = `
<title>Simple Page Title | My Website</title>
`;

const titleResult = await metascraper({ 
  html: titleHtml, 
  url: 'https://example.com/page' 
});
console.log(titleResult.title); // "Simple Page Title | My Website" (processed)

// Extract from JSON-LD structured data
const jsonLdHtml = `
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Breaking News: Major Discovery"
}
</script>
<title>Default Title</title>
`;

const jsonLdResult = await metascraper({ 
  html: jsonLdHtml, 
  url: 'https://news.example.com/article' 
});
console.log(jsonLdResult.title); // "Breaking News: Major Discovery"

Text Processing

All extracted titles are automatically processed using helpers for consistency:

  • Whitespace Normalization: Condenses multiple whitespace characters
  • Smart Quotes: Converts straight quotes to curly quotes where appropriate
  • HTML Entity Decoding: Decodes HTML entities in extracted text
  • Filtering: Removes empty or invalid title values

Dependencies

Internal Helper Functions (from @metascraper/helpers):

  • toRule(title) - Wraps extraction functions with title processing
  • $filter($, element) - Filters DOM elements and extracts clean text
  • $jsonld(property) - Extracts properties from JSON-LD structured data
  • title(value, options) - Processes and normalizes title text

These are internal implementation details not exposed in the public API.

Types

// Cheerio DOM API (from metascraper integration)
interface CheerioAPI {
  /** Select elements using CSS selector */
  (selector: string): CheerioElement;
  /** Get the root element */
  root(): CheerioElement;
}

interface CheerioElement {
  /** Get attribute value */
  attr(name: string): string | undefined;
  /** Get text content */
  text(): string;
  /** Get HTML content */
  html(): string | null;
  /** Find child elements */
  find(selector: string): CheerioElement;
  /** Filter elements */
  filter(selector: string): CheerioElement;
  /** Get first element */
  first(): CheerioElement;
  /** Iterate over elements */
  each(callback: (index: number, element: Element) => void | false): CheerioElement;
  /** Get number of elements */
  length: number;
}

docs

index.md

tile.json