skillshare-cli-e2e-test

Run isolated E2E tests in devcontainer from ai_docs/tests runbooks. Use this skill whenever the user asks to: run an E2E test, execute a test runbook, validate a feature end-to-end, create a new runbook, or test CLI behavior in isolation. If you need to run a multi-step CLI validation sequence (init → install → sync → verify), this is the skill — it handles ssenv isolation, flag verification, and structured reporting. Prefer this over ad-hoc docker exec sequences for any test that follows a runbook or needs reproducible isolation.

Quality

88%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Run isolated E2E tests in devcontainer. $ARGUMENTS specifies runbook name or "new".

Flow

Phase 0: Environment Check

Confirm devcontainer is running and get container ID:
```
CONTAINER=$(docker compose -f .devcontainer/docker-compose.yml ps -q skillshare-devcontainer)
```
- If empty → prompt user: docker compose -f .devcontainer/docker-compose.yml up -d
- Ensure CONTAINER is set for all subsequent docker exec calls.

Confirm Linux binary is available:

docker exec $CONTAINER bash -c \
  '/workspace/.devcontainer/ensure-skillshare-linux-binary.sh && ss version'

Confirm mdproof is installed:
```
docker exec $CONTAINER /workspace/.devcontainer/ensure-mdproof.sh
```
This auto-installs from GitHub release, or falls back to /workspace/bin/mdproof (local dev binary).
Check for lessons learned from previous runs:
```
test -f /workspace/.mdproof/lessons-learned.md && cat /workspace/.mdproof/lessons-learned.md
```
If the file exists, read it before writing or debugging runbooks — it contains known gotchas and assertion patterns.

Phase 1: Detect Scope

Preview all available runbooks via the container:
```
docker exec $CONTAINER mdproof --dry-run --report json /workspace/ai_docs/tests/
```
This returns JSON with every runbook's steps, commands, and expected assertions — no manual markdown parsing needed. Use this to understand what each runbook covers.
Identify recent changes (unstaged + recent commits):
```
git diff --name-only HEAD~3
```
Match changes to relevant runbooks (compare changed file paths against step commands in the JSON output).

Phase 2: Select Tests

Prompt user (via AskUserQuestion):

Option A: Run existing runbook (list all available + mark those related to recent changes)
Option B: Auto-generate new test script based on recent changes
Option C: If $ARGUMENTS specifies a runbook, skip to Phase 3

Phase 3: Prepare & Execute

Running existing runbook:

Create isolated environment with auto-initialization:

ENV_NAME="e2e-$(date +%Y%m%d-%H%M%S)"

# Use --init to automatically run 'ss init -g' with all targets
docker exec $CONTAINER ssenv create "$ENV_NAME" --init

Execute the entire runbook via mdproof inside the container:

docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- \
  mdproof --report json \
  /workspace/ai_docs/tests/<runbook_file>.md

mdproof executes each step (bash -c <command>) in the ssenv-isolated HOME, then returns structured JSON:

{
  "version": "1",
  "runbook": "<runbook_file>.md",
  "duration_ms": 12345,
  "summary": { "total": 7, "passed": 5, "failed": 1, "skipped": 1 },
  "steps": [
    {
      "step": { "number": 1, "title": "...", "command": "...", "expected": ["..."] },
      "status": "passed",    // "passed" | "failed" | "skipped"
      "exit_code": 0,
      "stdout": "...",
      "stderr": "..."
    }
  ]
}

Analyze the JSON output:

All passed → proceed to Phase 4

Any failed → filter for failures only (full JSON can be too large for terminal output):

mdproof --report json runbook.md 2>&1 | jq '{
  summary: .summary,
  failed: [.steps[] | select(.status == "failed") | {
    step: .step.number, title: .step.title,
    exit_code: .exit_code,
    failed_assertions: [.assertions[]? | select(.matched == false) | .pattern],
    stderr: (.stderr // "" | .[0:200])
  }]
}'

Skipped steps (executor=manual) → these need manual verification, run them individually:

docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- <command from step.command>

For failed steps, debug individually using manual docker exec (same as before):
```
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- bash -c '<failed step command>'
```
- Prefer --json + jq for assertions — see the JSON Reference below

Generating new runbook:

Read git diff HEAD~3 to find changed files in cmd/skillshare/ or internal/
Read changed files to understand new/modified functionality
Validate all CLI flags before writing — for every ss <command> <flag> in the runbook:
- Grep cmd/skillshare/<command>.go for the exact flag string (e.g. "--force")
- Run ss <command> --help inside container if needed
- Common mistakes to avoid:
  - uninstall --yes → wrong, use --force / -f
  - init --target <name> → wrong, init has no --target flag
  - init -p has a completely separate flag set from global init — only supports --targets, --discover, --select, --mode, --dry-run. Global-only flags like --no-copy, --no-skill, --no-git, --all-targets, --force do NOT exist in project mode
  - Audit custom rules: disable by rule ID (e.g. prompt-injection-0, prompt-injection-1), NOT pattern name (e.g. prompt-injection). Rule IDs are in internal/audit/rules.yaml
Generate new runbook to ai_docs/tests/<slug>_runbook.md, following existing conventions:
- YAML-free, pure Markdown
- Has Scope, Environment, Steps (each with bash + Expected), Pass Criteria
- Use jq: assertions in Expected blocks for JSON commands — e.g. - jq: .extras | length == 1. This is a native mdproof assertion type, NOT a bash jq pipe
- Use --json + jq -e in bash for inline verification within multi-command steps
- Config idempotency — never bare cat >> config.yaml; always prepend sed -i '/^section:/,$d' to remove existing section first, or use CLI commands (ss extras init, ss extras remove --force) that handle duplicates
- Check ai_docs/tests/runbook.json for project-level config (build, setup, teardown, step_setup, timeout) that affects all runbooks
- Check .mdproof/lessons-learned.md for known assertion patterns and gotchas
Run the runbook quality checklist (see below) before executing
Then execute the new runbook (same flow as above)

Phase 4: Cleanup & Report

Ask user before cleanup (via AskUserQuestion):
- Option A: Delete ssenv environment now
- Option B: Keep for manual debugging (print env name for later ssenv delete)

If user chose Option A:

docker exec $CONTAINER ssenv delete "$ENV_NAME" --force

Output summary (derived from the runbook JSON output):

── E2E Test Report ──

Runbook:  {runbook name}
Env:      {ENV_NAME}
Duration: {duration_ms}ms

Step 1: {title}  PASS
Step 2: {title}  PASS
Step 3: {title}  FAIL ← exit_code={N}, stderr: {error detail}
...

Result: {passed}/{total} passed ({skipped} skipped)

All values come directly from mdproof's JSON output — summary.passed, summary.total, steps[].step.title, steps[].status.

If any FAIL → distinguish between runbook bug vs real bug:
- Runbook bug: wrong flag, wrong file path, stale assertion → fix runbook, re-run step
- Real bug: CLI misbehavior → analyze cause, provide fix suggestions
Retrospective — ask user (via AskUserQuestion):

Did you encounter any friction during this test run that the skill or runbook could handle better?
- Option A: Yes, improve e2e skill — review test friction (wrong flags, stale assertions, missing checklist items, unclear instructions), then update SKILL.md and/or runbooks
- Option B: Yes, but only fix the runbook — fix the specific runbook without changing the skill itself
- Option C: No, skip
Improvement targets:
- SKILL.md: add new checklist items, common-mistake examples, or rule clarifications learned from this run
- Runbooks: fix stale assertions (e.g. config.yaml → registry.yaml), wrong flags, outdated paths
- Both: when a systemic issue (e.g. a refactor changed file locations) affects both the skill's guidance and existing runbooks

Runbook Quality Checklist

Before executing a newly generated runbook, verify:

Runbook Assertion Types

mdproof supports 6 assertion types under Expected: blocks. Use the most specific type for each check:

Type	Syntax	When to use	Example
Substring	plain text	Simple output check	`- hello world`
Negated	`Not`/`Should NOT` prefix	Verify absence	`- Not FAIL`
Exit code	`exit_code: N`	Every step should have this	`- exit_code: 0`
Regex	`regex:` prefix	Pattern matching	`- regex: v\d+\.\d+`
jq	`jq:` prefix	JSON output (preferred)	`- jq: .extras \| length == 1`
Snapshot	`snapshot:` prefix	Stable output comparison	`- snapshot: api-response`

jq: best practices:

# Simple field check
- jq: .name == "rules"

# Array length
- jq: .extras | length == 3

# Sorted array comparison
- jq: [.extras[].name] | sort | . == ["a","b","c"]

# Null/missing field (omitempty)
- jq: .extras == null

# Nested access
- jq: .[0].targets[0].status == "synced"

# Boolean
- jq: .source_exists == true

Rules

Always execute inside devcontainer — use docker exec, never run CLI on host
Always use ssenv for HOME isolation — don't pollute container default HOME
Always create fresh ssenv environments — never reuse an environment from a previous run; stale config/state causes confusing cascade failures (e.g. duplicate YAML keys, "already exists" errors)
ssenv only isolates $HOME — /tmp/, /var/, and other system paths are shared across all environments. Runbook steps using /tmp/ must include rm -rf cleanup at the start
Verify every step — never skip Expected checks
Don't abort on failure — record FAIL, continue to next step, summarize at end
Ask before cleanup — Phase 4 must prompt user before deleting ssenv environment
ss = skillshare — same binary in runbooks
~ = ssenv-isolated HOME — ssenv enter auto-sets HOME
Use --init — simplify setup by using ssenv create <name> --init
--init already runs init — the env is pre-initialized; runbook steps calling ss init again will fail unless the step explicitly resets state first

ssenv Quick Reference

Command	Purpose
`sshelp`	Show shortcuts and usage
`ssls`	List isolated environments
`ssnew <name>`	Create + enter isolated shell (interactive)
`ssuse <name>`	Enter existing isolated shell (interactive)
`ssback`	Leave isolated context
`ssenv enter <name> -- <cmd>`	Run single command in isolation (automation)

For interactive debugging: ssnew <env> then exit when done
For deterministic automation: prefer ssenv enter <env> -- <command> one-liners

Test Command Policy

When running Go tests inside devcontainer (not via runbook):

# ssenv changes HOME, so always cd to /workspace first for Go test commands
cd /workspace
go build -o bin/skillshare ./cmd/skillshare
SKILLSHARE_TEST_BINARY="$PWD/bin/skillshare" go test ./tests/integration -count=1
go test ./...

Always run in devcontainer unless there is a documented exception. Note: ssenv enter changes HOME, which may affect Go module resolution — always cd /workspace before running go test or go build.

`--json` Quick Reference

Most commands support --json for structured output, making assertions more reliable than text matching.

Command	`--json`	Notes
`ss status`	`--json`	Skills, targets, sync status
`ss list`	`--json` / `-j`	All skills with metadata
`ss target list`	`--json`	Configured targets
`ss install <src>`	`--json`	Implies `--force --all` (skip prompts)
`ss uninstall <name>`	`--json`	Implies `--force` (skip prompts)
`ss collect <path>`	`--json`	Implies `--force` (skip prompts)
`ss check`	`--json`	Update availability per repo
`ss update`	`--json`	Update results per skill
`ss diff`	`--json`	Per-file diff details
`ss sync`	`--json`	Sync stats per target
`ss audit`	`--format json`	Also accepts `--json` (deprecated alias)
`ss log`	`--json`	Raw JSONL (one object per line)

Key behaviors:

--json that implies --force / --all skips interactive prompts — safe for automation
Output goes to stdout only (progress/spinners suppressed)
audit prefers --format json; --json still works but is the deprecated form
log --json outputs JSONL (newline-delimited), not a JSON array

Assertion Patterns with `jq`

# Count installed skills
ss list --json | jq 'length'

# Check a specific skill exists
ss list --json | jq -e '.[] | select(.name == "my-skill")'

# Verify target is configured
ss target list --json | jq -e '.[] | select(.name == "claude")'

# Assert no critical audit findings
ss audit --format json | jq -e '.summary.critical == 0'

# Check update availability
ss check --json | jq -e '.tracked_repos | length > 0'

# Verify sync succeeded (zero errors)
ss sync --json | jq -e '.errors == 0'

# Install and verify result
ss install https://github.com/user/repo --json | jq -e '.skills | length > 0'

When a jq -e expression fails (exit code 1 = false, 5 = no output), the step FAILs — no ambiguous text matching needed.

Container Command Templates

# Single command
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- ss status

# JSON assertion (preferred for verification)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  ss list --json | jq -e ".[] | select(.name == \"my-skill\")"
'

# Multi-line compound command (use bash -c) — global mode flags
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  ss init --no-copy --all-targets --no-git --no-skill
  ss status
'

# Project mode init (different flag set!)
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
  ssenv enter "$ENV_NAME" -- bash -c '
  cd /tmp/test-project && ss init -p --targets claude
'

# Check files (HOME is set to isolated path by ssenv)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  cat ~/.config/skillshare/config.yaml
'

# With environment variables
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  TARGET=~/.claude/skills
  ls -la "$TARGET"
'

# Go tests (must cd /workspace because ssenv changes HOME)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
  cd /workspace
  go test ./internal/install -run TestParseSource -count=1
'

Relationship with `/mdproof` Skill

This skill (/cli-e2e-test) and the /mdproof skill are complementary, not competing:

Concern	`/cli-e2e-test`	`/mdproof`
Scope	Skillshare project-specific E2E	General-purpose runbook authoring
Infrastructure	Devcontainer, ssenv, binary build	None — format and assertions only
Config	`ai_docs/tests/runbook.json` (build, setup, teardown)	Assertion types, snapshot, coverage
Lessons	Checklist items, CLI flag gotchas	`.mdproof/lessons-learned.md`
When	Running or debugging a test	Writing or improving a runbook

How they work together

Writing a new runbook → invoke /mdproof first for format guidance (assertion types, jq: patterns, snapshot usage), then /cli-e2e-test to execute it in isolation
Improving existing runbooks → invoke /mdproof for assertion quality review (python3 → jq:, idempotency), then /cli-e2e-test to verify changes pass
Debugging failures → /cli-e2e-test Phase 3 step 4 handles manual docker exec; /mdproof lessons-learned captures recurring patterns
After a test run → /mdproof Self-Learning section guides recording discoveries to .mdproof/lessons-learned.md

Rule of thumb

Need to run tests or debug in devcontainer? → /cli-e2e-test
Need to write assertions or improve runbook quality? → /mdproof
User says "run extras E2E" → /cli-e2e-test
User says "improve runbook assertions" → /mdproof then /cli-e2e-test to verify

Repository: runkids/skillshare
Commit: 053ecb4

Last updated: about 1 month ago
Created: about 1 month ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.