tessl/pypi-markdown2

A fast and complete Python implementation of Markdown with extensive extras support

—

Pending

Overview

Eval results

Files

Processor Classes

Name: tessl/pypi-markdown2
Author: tessl

Reusable markdown processor classes for efficient batch processing, advanced configuration, and when you need to convert multiple documents with the same settings.

Capabilities

Markdown Processor Class

Main processor class that can be configured once and reused for multiple conversions, providing better performance for batch processing.

class Markdown:
    def __init__(
        self,
        html4tags: bool = False,
        tab_width: int = 4,
        safe_mode: Optional[Literal['replace', 'escape']] = None,
        extras: Optional[Union[list[str], dict[str, Any]]] = None,
        link_patterns: Optional[Iterable[tuple[re.Pattern, Union[str, Callable[[re.Match], str]]]]] = None,
        footnote_title: Optional[str] = None,
        footnote_return_symbol: Optional[str] = None,
        use_file_vars: bool = False,
        cli: bool = False
    ):
        """
        Initialize a reusable markdown processor.

        Parameters: (same as markdown() function)
        - html4tags: Use HTML 4 style for empty element tags
        - tab_width: Number of spaces per tab for code block indentation
        - safe_mode: Sanitize literal HTML ('escape' or 'replace')
        - extras: List of extra names or dict of extra_name -> extra_arg
        - link_patterns: Auto-link regex patterns as (pattern, replacement) tuples
        - footnote_title: Title attribute for footnote links
        - footnote_return_symbol: Symbol for footnote return links
        - use_file_vars: Look for Emacs-style file variables to enable extras
        - cli: Enable CLI-specific behavior for command-line usage
        """

    def convert(self, text: str) -> UnicodeWithAttrs:
        """
        Convert markdown text to HTML using configured settings.

        Parameters:
        - text: Markdown text to convert

        Returns:
        UnicodeWithAttrs: HTML string with optional metadata and toc_html attributes
        """
    
    def reset(self) -> None:
        """
        Reset internal state for clean processing of next document.
        
        Called automatically by convert(), but can be called manually
        to clear cached state between conversions.
        """

Usage Examples:

from markdown2 import Markdown

# Create processor with specific configuration
markdowner = Markdown(
    extras=["tables", "footnotes", "header-ids", "toc"],
    safe_mode="escape",
    tab_width=2
)

# Convert multiple documents with same settings
html1 = markdowner.convert(document1)
html2 = markdowner.convert(document2)
html3 = markdowner.convert(document3)

# Advanced configuration with extras options
processor = Markdown(
    extras={
        "header-ids": {"prefix": "section-"},
        "toc": {"depth": 3},
        "breaks": {"on_newline": True},
        "html-classes": {"table": "table table-striped"}
    }
)

html = processor.convert(markdown_text)

Pre-configured Processor

Convenience class with commonly used extras pre-enabled for typical use cases.

class MarkdownWithExtras(Markdown):
    """
    Markdown processor with common extras pre-configured.
    
    Pre-enabled extras:
    - footnotes: Support footnotes as used on daringfireball.net
    - fenced-code-blocks: GitHub-style fenced code blocks with optional syntax highlighting
    """

Usage Examples:

from markdown2 import MarkdownWithExtras

# Use pre-configured processor
markdowner = MarkdownWithExtras()
html = markdowner.convert(markdown_text)

# Add additional extras to the pre-configured set
processor = MarkdownWithExtras(
    extras=["tables", "header-ids"]  # Adds to the existing extras
)
html = processor.convert(markdown_text)

Enhanced Return Type

Special string subclass that can carry additional attributes from processing.

class UnicodeWithAttrs(str):
    """
    String subclass for markdown HTML output with optional attributes.
    
    Attributes:
    - metadata: Dict of document metadata (from 'metadata' extra)
    - toc_html: HTML string for table of contents (from 'toc' extra)
    """
    metadata: Optional[dict[str, str]]
    toc_html: Optional[str]

Usage Examples:

import markdown2

# Convert with metadata and TOC extras
html = markdown2.markdown(
    """---
title: My Document
author: John Doe
---

# Chapter 1
Content here...

## Section 1.1
More content...
""",
    extras=["metadata", "toc", "header-ids"]
)

# Access metadata
if html.metadata:
    print(f"Title: {html.metadata['title']}")
    print(f"Author: {html.metadata['author']}")

# Access table of contents
if html.toc_html:
    print("Table of Contents HTML:", html.toc_html)

# Still works as regular string
print("HTML length:", len(html))
print("HTML content:", str(html))

Performance Considerations

When to Use Classes vs Functions

Use Markdown class when:

Converting multiple documents with same settings
Need to reuse processor configuration
Processing documents in batches
Want to avoid reconfiguring extras repeatedly

Use markdown() function when:

Converting single documents
Each document needs different settings
Simple one-off conversions
Prototyping or quick scripts

Memory and State Management

from markdown2 import Markdown

# Good: Reuse processor for batch processing
processor = Markdown(extras=["tables", "footnotes"])
results = [processor.convert(doc) for doc in documents]

# Less efficient: Recreating processor each time
results = [markdown2.markdown(doc, extras=["tables", "footnotes"]) for doc in documents]

The Markdown class maintains internal state between conversions, so creating one instance and reusing it is more efficient for batch processing.

Error Handling

class MarkdownError(Exception):
    """Base exception class for markdown processing errors."""
    pass

Usage Examples:

from markdown2 import Markdown, MarkdownError

try:
    processor = Markdown(extras=["invalid-extra"])
    html = processor.convert(text)
except MarkdownError as e:
    print(f"Markdown processing error: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Install with Tessl CLI