Skills for working with Obsidian vaults and related formats: Obsidian Flavored Markdown, JSON Canvas files, the Obsidian CLI, and Defuddle for clean web content extraction.
96
96%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Use Defuddle CLI to extract clean readable content from web pages. Prefer over WebFetch for standard web pages — it removes navigation, ads, and clutter, reducing token usage.
If not installed: npm install -g defuddle
Defuddle is a server-side HTML stripper, not a browser. Think of it as getting the printed article without the magazine wrapper — it fetches raw HTML and removes boilerplate, always faster and cheaper than WebFetch for standard pages.
When to apply: Any standard HTTP page — documentation, articles, blog posts, wikis. When NOT to apply: SPAs requiring JavaScript, login-gated pages, JSON API endpoints, or pages behind auth redirects.
When in doubt, try defuddle first; if the output is empty or navigation-only, fall back to WebFetch.
defuddle parse <url> --md to extract clean markdown-o file.md when persistence is neededUse this decision framework before fetching any URL:
| Scenario | Tool | Reason |
|---|---|---|
| Article, blog post, documentation page | defuddle (ALWAYS) | Removes clutter, saves tokens |
| Standard web page with readable text | defuddle (ALWAYS) | Cleaner output than WebFetch |
| JavaScript-heavy SPA (React/Vue app shell) | WebFetch or inform user | defuddle cannot execute JS |
| Login-gated or auth-required page | WebFetch or inform user | defuddle fetches without credentials |
| API endpoint returning JSON | Direct fetch or inform user | defuddle is for HTML pages |
| User asks for metadata only (title, description) | defuddle parse <url> -p <field> | Lightweight extraction |
Mental model: defuddle is a server-side HTML parser — it downloads the HTML, strips boilerplate, and returns clean prose. It cannot authenticate, execute JavaScript, or follow redirects that require cookies. When in doubt about a URL, try defuddle first; if the output is empty or only navigation text, fall back to WebFetch.
ALWAYS use --md for markdown output:
defuddle parse <url> --mdSave to file:
defuddle parse <url> --md -o content.mdExtract specific metadata:
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain| Flag | Format |
|---|---|
--md | Markdown (default choice) |
--json | JSON with both HTML and markdown |
| (none) | HTML |
-p <name> | Specific metadata property |
See ANTI-PATTERNS.md for full explanations of each mistake.
NEVER use WebFetch for standard web pages when defuddle is available. Raw HTML commonly contains 5-10x more tokens than the extracted content.
BAD:
WebFetch("https://docs.example.com/guide")GOOD:
defuddle parse https://docs.example.com/guide --md--md flagNEVER call defuddle parse without --md when readable content is the goal. Without it, defuddle returns raw HTML — tag-heavy and token-wasteful.
BAD:
defuddle parse https://example.com/articleGOOD:
defuddle parse https://example.com/article --md-p when full content is neededNEVER use -p when the user wants the page body. It returns one metadata property and silently discards the entire body text.
BAD:
defuddle parse https://example.com/article -p titleGOOD:
defuddle parse https://example.com/article --md-oNEVER use > to save output — it can corrupt encoding on non-UTF-8 terminals. Always use -o.
BAD:
defuddle parse https://example.com/article --md > output.mdGOOD:
defuddle parse https://example.com/article --md -o output.md