Skills for building AEM Edge Delivery Services sites — block development, content modeling, code review, testing, and page import.
82
76%
Does it follow best practices?
Impact
88%
1.04xAverage score across 6 eval scenarios
Advisory
Suggest reviewing before use
Automated webpage analysis and preparation for content migration using npm playwright.
The web page analysis workflow prepares a source webpage for content migration by:
npm install playwrightnpx playwright install chromiumnode .claude/skills/scrape-webpage/scripts/analyze-webpage.js "https://example.com/page" --output ./analysis./page-analysisThe script produces both structured data (JSON) and file artifacts:
JSON Output (stdout + saved to metadata.json):
{
"url": "https://example.com/page",
"timestamp": "2025-01-12T10:30:00.000Z",
"paths": {
"documentPath": "/us/en/about",
"htmlFilePath": "us/en/about.html",
"mdFilePath": "us/en/about.md",
"dirPath": "us/en",
"filename": "about"
},
"screenshots": {
"original": "./analysis/original.png",
"enhancedContrast": "./analysis/enhanced-contrast.png"
},
"html": {
"filePath": "./analysis/cleaned.html",
"size": 45230
},
"metadata": {
"title": "Page Title",
"description": "Page description",
"og:image": "https://example.com/image.jpg",
"canonical": "https://example.com/page",
...
}
}File Artifacts:
metadata.json - Complete analysis results including paths (saved automatically)original.png - Screenshot of the page as renderedenhanced-contrast.png - Screenshot with exaggerated section background colorscleaned.html - Extracted main content HTML with preserved src/href attributesNetwork interception:
./images/ directoryNavigate to URL:
Scroll to trigger lazy loading:
Original screenshot:
Enhanced contrast screenshot:
How contrast enhancement works:
Before extracting HTML, fix common image issues to ensure nothing is missed:
Background images → Real images:
background-image: url(...)<img> elements and prepends to parentPicture elements:
<picture> has an <img> with src<source srcset> (largest viewport)Srcset fallback:
srcset (no src) get first srcset URL copied to srcRelative URL conversion:
./path, /path, ../path formatsInline SVG conversion:
<svg> elements to <img> with data URLsExtracts cleaned HTML with:
src, href, alt, title, class, idWhy not full HTML:
Extracts SEO and social metadata:
<title> tagGenerates document paths from the source URL:
Algorithm:
.html extensionExample transformations:
https://example.com/us/en/About.html → /us/en/abouthttps://example.com/Products/ → /products/indexhttps://example.com/café/été → /cafe/eteGenerates:
documentPath - For preview URL (/us/en/about)htmlFilePath - Where to save HTML file (us/en/about.html)mdFilePath - Alternative markdown path (us/en/about.md)dirPath - Parent directory (us/en)filename - Just the filename (about)All analysis results are automatically saved to metadata.json in the output directory, providing a complete record of:
Keep original (no conversion):
Convert to PNG:
Images are saved with MD5 hash of their original URL:
Example:
https://example.com/images/hero-banner.webp → ./images/a1b2c3d4e5f6.pnghttps://cdn.example.com/logo.jpg → ./images/f6e5d4c3b2a1.jpgAfter image capture, all src attributes in the HTML are automatically replaced:
Before:
<img src="https://example.com/hero.jpg" alt="Hero">After:
<img src="./images/a1b2c3d4e5f6.jpg" alt="Hero">This means the migrated HTML file already references local images - no manual URL replacement needed!
The analyze-webpage.js script is invoked automatically as Step 1 of page-import. You don't need to run it manually.
Run the script manually when:
Error: browserType.launch: Executable doesn't exist
Solution:
npx playwright install chromiumSome pages use advanced lazy loading that doesn't trigger on scroll. The script includes reasonable delays, but complex cases may need manual intervention.
The contrast enhancement algorithm works on most pages but may produce unusual results with:
Use original.png for visual reference in these cases.
Some pages hide content behind:
The script attempts basic popup dismissal, but complex cases may need customization.
page-import Step 1: Scrape Webpage
└─ Runs: node analyze-webpage.js [URL] --output ./work
├─ Returns: JSON with all extracted data
├─ Saves: Screenshots for visual analysis
├─ Saves: Cleaned HTML for content mapping
└─ Saves: Metadata for SEO preservation
page-import Step 2-5: Use the extracted data
├─ Enhanced screenshot → Identify section boundaries
├─ Original screenshot → Visual reference
├─ Cleaned HTML → Map to blocks
└─ Metadata → Generate metadata blockBefore proceeding with migration:
src attributes are absolute URLs (or correct relative paths)href attributes are preservedThe analyze-webpage.js script is organized as:
// Main orchestration
async function analyzeWebpage(url, outputDir) { }
// Helper functions (inline, no external dependencies)
async function scrollToTriggerLazyLoad(page) { }
async function enhanceContrast(page) { }
async function extractCleanedHTML(page) { }
async function extractMetadata(page) { }
// CLI entry point
if (import.meta.url === `file://${process.argv[1]}`) {
main();
}All browser manipulation logic is self-contained - no external script files needed.
After running web page analysis:
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
skills
analyze-and-plan
block-collection-and-party
block-inventory
building-blocks
code-review
content-driven-development
content-modeling
docs-search
find-test-content
generate-import-html
identify-page-structure
page-decomposition
page-import
preview-import
scrape-webpage
testing-blocks