CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/npm-notion-utils

Useful utilities for working with Notion data structures and operations in both Node.js and browser environments.

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview
Eval results
Files

content-extraction.mddocs/

Content Extraction

Functions for extracting specific types of content from Notion pages including images, tweets, and other embedded media, as well as comprehensive block content analysis.

Capabilities

Image Extraction

Functions for extracting and processing image URLs from Notion pages.

/**
 * Gets URLs of all images contained on the given page
 * @param recordMap - Extended record map containing page data
 * @param options - Configuration including image URL mapping function
 * @returns Array of processed image URLs
 */
function getPageImageUrls(
  recordMap: ExtendedRecordMap,
  options: { mapImageUrl: (url: string, block: Block) => string | undefined }
): string[];

Usage Example:

import { getPageImageUrls, defaultMapImageUrl } from "notion-utils";

// Extract all image URLs from a page
const imageUrls = getPageImageUrls(recordMap, {
  mapImageUrl: (url, block) => {
    // Use the default image URL mapping (handles Notion's image proxy)
    return defaultMapImageUrl(url, block);
  }
});

console.log(`Found ${imageUrls.length} images on the page`);
imageUrls.forEach((url, index) => {
  console.log(`Image ${index + 1}: ${url}`);
});

// Custom image URL processing
const customImageUrls = getPageImageUrls(recordMap, {
  mapImageUrl: (url, block) => {
    if (url?.startsWith('data:')) {
      // Skip data URLs
      return undefined;
    }
    // Add custom parameters
    return `${url}?width=800&quality=80`;
  }
});

Tweet Extraction

Functions for extracting Twitter/X content embedded in Notion pages.

/**
 * Gets the URLs of all tweets embedded on a page
 * @param recordMap - Extended record map containing page data
 * @returns Array of tweet URLs
 */
function getPageTweetUrls(recordMap: ExtendedRecordMap): string[];

/**
 * Gets the IDs of all tweets embedded on a page
 * @param recordMap - Extended record map containing page data
 * @returns Array of tweet ID strings
 */
function getPageTweetIds(recordMap: ExtendedRecordMap): string[];

Usage Examples:

import { getPageTweetUrls, getPageTweetIds } from "notion-utils";

// Get all tweet URLs
const tweetUrls = getPageTweetUrls(recordMap);
console.log(`Found ${tweetUrls.length} embedded tweets`);
tweetUrls.forEach(url => {
  console.log(`Tweet: ${url}`);
});

// Get just the tweet IDs
const tweetIds = getPageTweetIds(recordMap);
tweetIds.forEach(id => {
  console.log(`Tweet ID: ${id}`);
  // Use with Twitter API: https://api.twitter.com/2/tweets/${id}
});

Block Content Analysis

Functions for comprehensive content analysis and block identification.

/**
 * Gets the IDs of all blocks contained on a page
 * @param recordMap - Extended record map containing all blocks
 * @param blockId - Starting block ID (uses first block if not provided)
 * @returns Array of all block IDs found recursively
 */
function getPageContentBlockIds(recordMap: ExtendedRecordMap, blockId?: string): string[];

Usage Example:

import { getPageContentBlockIds } from "notion-utils";

// Get all blocks on a page
const allBlockIds = getPageContentBlockIds(recordMap);
console.log(`Page contains ${allBlockIds.length} total blocks`);

// Analyze block types
const blockTypes: Record<string, number> = {};
allBlockIds.forEach(blockId => {
  const block = recordMap.block[blockId]?.value;
  if (block) {
    blockTypes[block.type] = (blockTypes[block.type] || 0) + 1;
  }
});

console.log("Block type distribution:");
Object.entries(blockTypes).forEach(([type, count]) => {
  console.log(`  ${type}: ${count}`);
});

// Get blocks starting from a specific section
const sectionBlockIds = getPageContentBlockIds(recordMap, "specific-section-id");
console.log(`Section contains ${sectionBlockIds.length} blocks`);

Content Type Detection

The extraction functions automatically detect and handle various content types:

Image Sources

  • Image blocks: Direct image uploads and external image URLs
  • Page covers: Cover images set on pages
  • Page icons: When they are image URLs (not emojis)
  • Bookmark previews: Cover images from bookmarked websites
  • Callout icons: Image icons used in callout blocks
  • Database covers: Cover images from database/collection items

Tweet Sources

  • Tweet blocks: Dedicated Twitter embed blocks
  • Tweet URLs: Raw Twitter URLs in text content
  • Supported formats:
    • https://twitter.com/user/status/123456789
    • https://x.com/user/status/123456789

Block Traversal

  • Recursive traversal: Follows all content arrays and nested blocks
  • Property traversal: Includes blocks referenced in properties
  • Transclusion support: Handles synced blocks and references
  • Collection traversal: Optionally includes database items

Advanced Usage Patterns

import { 
  getPageImageUrls, 
  getPageTweetIds, 
  getPageContentBlockIds,
  defaultMapImageUrl 
} from "notion-utils";

// Comprehensive content analysis
function analyzePageContent(recordMap: ExtendedRecordMap) {
  const images = getPageImageUrls(recordMap, {
    mapImageUrl: defaultMapImageUrl
  });
  
  const tweets = getPageTweetIds(recordMap);
  const blocks = getPageContentBlockIds(recordMap);
  
  return {
    totalBlocks: blocks.length,
    imageCount: images.length,
    tweetCount: tweets.length,
    images: images,
    tweets: tweets.map(id => `https://twitter.com/i/status/${id}`)
  };
}

// Usage
const contentAnalysis = analyzePageContent(recordMap);
console.log("Content Analysis:", contentAnalysis);

Types

// Re-exported from notion-types
interface ExtendedRecordMap {
  block: Record<string, Block>;
  collection?: Record<string, Collection>;
  collection_view?: Record<string, CollectionView>;
  notion_user?: Record<string, NotionUser>;
  collection_query?: Record<string, any>;
  signed_urls?: Record<string, string>;
  preview_images?: Record<string, string>;
}

interface Block {
  id: string;
  type: BlockType;
  properties?: Record<string, any>;
  format?: Record<string, any>;
  content?: string[];
  parent_id: string;
  parent_table: string;
  alive: boolean;
  created_time: number;
  last_edited_time: number;
}

// Content-specific block types
type ContentBlockType = 
  | 'image' 
  | 'tweet' 
  | 'bookmark' 
  | 'callout'
  | 'video'
  | 'audio'
  | 'file'
  | 'pdf'
  | 'embed';

Install with Tessl CLI

npx tessl i tessl/npm-notion-utils

docs

block-property-utilities.md

content-extraction.md

data-operations.md

id-url-management.md

index.md

navigation-structure.md

page-analysis.md

text-processing.md

tile.json