Skills for working with Obsidian vaults and related formats: Obsidian Flavored Markdown, JSON Canvas files, the Obsidian CLI, and Defuddle for clean web content extraction.
77
96%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Use Defuddle CLI to extract clean readable content from web pages. Prefer over WebFetch for standard web pages — it removes navigation, ads, and clutter, reducing token usage.
If not installed: npm install -g defuddle
Defuddle is a server-side HTML stripper, not a browser. Think of it as getting the printed article without the magazine wrapper — it fetches raw HTML and removes boilerplate, always faster and cheaper than WebFetch for standard pages.
When to apply: Any standard HTTP page — documentation, articles, blog posts, wikis. When NOT to apply: SPAs requiring JavaScript, login-gated pages, JSON API endpoints, or pages behind auth redirects.
When in doubt, try defuddle first; if the output is empty or navigation-only, fall back to WebFetch.
defuddle parse <url> --md to extract clean markdown-o file.md when persistence is neededUse this decision framework before fetching any URL:
| Scenario | Tool | Reason |
|---|---|---|
| Article, blog post, documentation page | defuddle (ALWAYS) | Removes clutter, saves tokens |
| Standard web page with readable text | defuddle (ALWAYS) | Cleaner output than WebFetch |
| JavaScript-heavy SPA (React/Vue app shell) | WebFetch or inform user | defuddle cannot execute JS |
| Login-gated or auth-required page | WebFetch or inform user | defuddle fetches without credentials |
| API endpoint returning JSON | Direct fetch or inform user | defuddle is for HTML pages |
| User asks for metadata only (title, description) | defuddle parse <url> -p <field> | Lightweight extraction |
Mental model: defuddle is a server-side HTML parser — it downloads the HTML, strips boilerplate, and returns clean prose. It cannot authenticate, execute JavaScript, or follow redirects that require cookies. When in doubt about a URL, try defuddle first; if the output is empty or only navigation text, fall back to WebFetch.
ALWAYS use --md for markdown output:
defuddle parse <url> --mdSave to file:
defuddle parse <url> --md -o content.mdExtract specific metadata:
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain| Flag | Format |
|---|---|
--md | Markdown (default choice) |
--json | JSON with both HTML and markdown |
| (none) | HTML |
-p <name> | Specific metadata property |
See ANTI-PATTERNS.md for full explanations of each mistake.
NEVER use WebFetch for standard web pages when defuddle is available. Raw HTML commonly contains 5-10x more tokens than the extracted content.
BAD:
WebFetch("https://docs.example.com/guide")GOOD:
defuddle parse https://docs.example.com/guide --md--md flagNEVER call defuddle parse without --md when readable content is the goal. Without it, defuddle returns raw HTML — tag-heavy and token-wasteful.
BAD:
defuddle parse https://example.com/articleGOOD:
defuddle parse https://example.com/article --md-p when full content is neededNEVER use -p when the user wants the page body. It returns one metadata property and silently discards the entire body text.
BAD:
defuddle parse https://example.com/article -p titleGOOD:
defuddle parse https://example.com/article --md-oNEVER use > to save output — it can corrupt encoding on non-UTF-8 terminals. Always use -o.
BAD:
defuddle parse https://example.com/article --md > output.mdGOOD:
defuddle parse https://example.com/article --md -o output.md