Fetch any URL and convert to markdown using baoyu-fetch CLI (Chrome CDP with site-specific adapters). Built-in adapters for X/Twitter, YouTube transcripts, Hacker News threads, and generic pages via Defuddle. Handles login/CAPTCHA via interaction wait modes. Use when user wants to save a webpage as markdown.
76
73%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/baoyu-url-to-markdown/SKILL.mdFetches any URL via baoyu-fetch CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.
Important: The CLI source is vendored in the scripts/vendor/baoyu-fetch/ subdirectory of this skill.
Agent Execution Instructions:
{baseDir}{baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun${READER} = ${BUN_X} {baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts${READER} in this document with the resolved valueCheck EXTEND.md existence (priority order):
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }| Path | Location |
|---|---|
.baoyu-skills/baoyu-url-to-markdown/EXTEND.md | Project directory |
$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md | User home |
| Result | Action |
|---|---|
| Found | Read, parse, apply settings |
| Not found | MUST run first-time setup (see below) — do NOT silently create defaults |
EXTEND.md Supports: Download media by default | Default output directory
CRITICAL: When EXTEND.md is not found, you MUST use AskUserQuestion to ask the user for their preferences before creating EXTEND.md. NEVER create EXTEND.md with defaults without asking. This is a BLOCKING operation — do NOT proceed with any conversion until setup is complete.
Use AskUserQuestion with ALL questions in ONE call:
Question 1 — header: "Media", question: "How to handle images and videos in pages?"
Question 2 — header: "Output", question: "Default output directory?"
Question 3 — header: "Save", question: "Where to save preferences?"
After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue.
Full reference: references/config/first-time-setup.md
| Key | Default | Values | Description |
|---|---|---|---|
download_media | ask | ask / 1 / 0 | ask = prompt each time, 1 = always download, 0 = never |
default_output_dir | empty | path or empty | Default output directory (empty = ./url-to-markdown/) |
EXTEND.md → CLI mapping:
| EXTEND.md key | CLI argument | Notes |
|---|---|---|
download_media: 1 | --download-media | Requires --output to be set |
default_output_dir: ./posts/ | Agent constructs --output ./posts/{domain}/{slug}.md | Agent generates path, not a direct CLI flag |
Value priority:
--download-media, --output)baoyu-fetch CLI--adapter--format json# Default: headless capture, markdown to stdout
${READER} <url>
# Save to file
${READER} <url> --output article.md
# Save with media download
${READER} <url> --output article.md --download-media
# Headless mode (explicit)
${READER} <url> --headless --output article.md
# Wait for interaction (login/CAPTCHA) — auto-detect and continue
${READER} <url> --wait-for interaction --output article.md
# Wait for interaction — manual control (Enter to continue)
${READER} <url> --wait-for force --output article.md
# JSON output
${READER} <url> --format json --output article.json
# Force specific adapter
${READER} <url> --adapter youtube --output transcript.md
# Connect to existing Chrome
${READER} <url> --cdp-url http://localhost:9222 --output article.md
# Debug artifacts
${READER} <url> --output article.md --debug-dir ./debug/| Option | Description |
|---|---|
<url> | URL to fetch |
--output <path> | Output file path (default: stdout) |
--format <type> | Output format: markdown (default) or json |
--json | Shorthand for --format json |
--adapter <name> | Force adapter: x, youtube, hn, or generic (default: auto-detect) |
--headless | Force headless Chrome (no visible window) |
--wait-for <mode> | Interaction wait mode: none (default), interaction, or force |
--wait-for-interaction | Alias for --wait-for interaction |
--wait-for-login | Alias for --wait-for interaction |
--timeout <ms> | Page load timeout (default: 30000) |
--interaction-timeout <ms> | Login/CAPTCHA wait timeout (default: 600000 = 10 min) |
--interaction-poll-interval <ms> | Poll interval for interaction checks (default: 1500) |
--download-media | Download images/videos to local imgs/ and videos/, rewrite markdown links. Requires --output |
--media-dir <dir> | Base directory for downloaded media (default: same as --output directory) |
--cdp-url <url> | Reuse existing Chrome DevTools Protocol endpoint |
--browser-path <path> | Custom Chrome/Chromium binary path |
--chrome-profile-dir <path> | Chrome user data directory (default: BAOYU_CHROME_PROFILE_DIR env or ./baoyu-skills/chrome-profile) |
--debug-dir <dir> | Write debug artifacts (document.json, markdown.md, page.html, network.json) |
| Mode | Behavior | Use When |
|---|---|---|
| Default | Headless Chrome, auto-extract on network idle | Public pages, static content |
--headless | Explicit headless (same as default) | Clarify intent |
--wait-for interaction | Opens visible Chrome, auto-detects login/CAPTCHA gates, waits for them to clear, then continues | Login-required, CAPTCHA-protected |
--wait-for force | Opens visible Chrome, auto-detects OR accepts Enter keypress to continue | Complex flows, lazy loading, paywalls |
Interaction gate auto-detection:
Wait-for-interaction workflow:
--wait-for interaction → Chrome opens visibly--wait-for force is used, user can also press Enter to trigger capture manuallyCRITICAL: The agent must treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without causing the CLI to fail.
After every headless run, the agent MUST inspect the saved markdown output.
Application errorThis page could not be foundTip: Use --format json to get structured output including status, login.state, and interaction fields for programmatic quality assessment. A "status": "needs_interaction" response means the page requires manual interaction.
--wait-for interaction for auto-detected gates (login, CAPTCHA, Cloudflare)--wait-for force when the page needs manual browsing, scroll loading, or complex interaction--wait-for is used, tell the user exactly what to do:
--wait-for force: tell them to press Enter when ready"status": "needs_interaction", switch to --wait-for interaction automaticallyThe agent must construct the output file path since baoyu-fetch does not auto-generate paths.
Algorithm:
default_output_dir or default ./url-to-markdown/example.com){base_dir}/{domain}/{slug}/{slug}.md — each URL gets its own directory so media files stay isolated{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.mdPass the constructed path to --output. Media files (--download-media) are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.
Markdown output to stdout (or file with --output) as clean markdown text.
JSON output (--format json) returns structured data including:
adapter — which adapter handled the URLstatus — "ok" or "needs_interaction"login — login state detection (logged_in, logged_out, unknown)interaction — interaction gate details (kind, provider, prompt)document — structured content (url, title, author, publishedAt, content blocks, metadata)media — collected media assets with url, kind, rolemarkdown — converted markdown textdownloads — media download results (when --download-media used)When --download-media is enabled:
imgs/ next to the output file (or in --media-dir)videos/ next to the output file (or in --media-dir)| Adapter | URLs | Key Features |
|---|---|---|
x | x.com, twitter.com | Tweets, threads, X Articles, media, login detection |
youtube | youtube.com, youtu.be | Transcript/captions, chapters, cover image, metadata |
hn | news.ycombinator.com | Threaded comments, story metadata, nested replies |
generic | Any URL (fallback) | Defuddle extraction, Readability fallback, auto-scroll, network idle detection |
Adapter is auto-selected based on URL. Use --adapter <name> to override.
Based on download_media setting in EXTEND.md:
| Setting | Behavior |
|---|---|
1 (always) | Run CLI with --download-media --output <path> |
0 (never) | Run CLI with --output <path> (no media download) |
ask (default) | Follow the ask-each-time flow below |
--download-media with --output <path> → markdown savedhttps:// in image/video links)AskUserQuestion:
--download-media --output <same-path> (overwrites markdown with localized links)| Variable | Description |
|---|---|
BAOYU_CHROME_PROFILE_DIR | Chrome user data directory (can also use --chrome-profile-dir) |
Troubleshooting: Chrome not found → use --browser-path. Timeout → increase --timeout. Login/CAPTCHA pages → use --wait-for interaction. Debug → use --debug-dir to inspect captured HTML and network logs.
[MM:SS] Text segment with chapter headings--wait-for force if the page needs time to finish loading player metadata"status": "needs_interaction"--wait-for interaction for login-protected contentCustom configurations via EXTEND.md. See Preferences section for paths and supported options.
9eb032a
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.