CtrlK
BlogDocsLog inGet started
Tessl Logo

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

77

Quality

72%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/core/agent-ops-testing/SKILL.md
SKILL.md
Quality
Evals
Security

Testing Workflow

Works with or without aoc CLI installed. Issue tracking can be done via direct file editing.

Purpose

Provide structured guidance for test design, execution, and analysis that goes beyond baseline capture. This skill covers test strategy during planning, incremental testing during implementation, and coverage analysis.

Test Commands (from constitution)

# Python (uv/pytest)
uv run pytest                           # Run all tests
uv run pytest tests/ -v                 # Verbose output
uv run pytest tests/ -m "not slow"      # Skip slow tests
uv run pytest tests/ --tb=short -q      # Quick summary
uv run pytest --cov=src --cov-report=html  # Coverage report

# TypeScript/Node (vitest/jest)
npm run test                            # Run all tests
npm run test -- --coverage              # With coverage

# .NET (dotnet test)
dotnet test                             # Run all tests
dotnet test --collect:"XPlat Code Coverage"  # With coverage

Issue Tracking (File-Based — Default)

OperationHow to Do It
Create test issueAppend to .agent/issues/medium.md with type TEST
Create bug from failureAppend to .agent/issues/high.md with type BUG
Log test resultsEdit issue's ### Log section in priority file

Example: Post-Test Issue Creation (File-Based)

  1. Increment .agent/issues/.counter
  2. Append issue to appropriate priority file
  3. Add log entry with test run results

CLI Integration (when aoc is available)

When aoc CLI is detected in .agent/tools.json, these commands provide convenience shortcuts:

OperationCommand
Create test issueaoc issues create --type TEST --title "Add tests for..."
Create bug from failureaoc issues create --type BUG --priority high --title "Test failure: ..."
Log test resultsaoc issues update <ID> --log "Tests: 45 pass, 2 fail"

Test Isolation (MANDATORY)

Tests must NEVER create, modify, or delete files in the project folder.

Unit Tests

  • Use mocks/patches for ALL file system operations
  • Use in-memory data structures where possible
  • NEVER call real file I/O against project paths
  • Use unittest.mock.patch for Path, open(), file operations

Integration Tests

  • ALWAYS use pytest tmp_path fixture (auto-cleaned)
  • Use Docker containers for service dependencies (API, DB, etc.)
  • Fixtures MUST handle cleanup on both success AND failure
  • Test data lives ONLY in temp directories

Forbidden Patterns

# ❌ NEVER do this - pollutes project
Path(".agent/test.md").write_text("test")
Path("src/data/fixture.json").write_text("{}")
open("tests/output.log", "w").write("log")

# ✅ Always use tmp_path
def test_example(tmp_path):
    test_file = tmp_path / "test.md"
    test_file.write_text("test")  # Auto-cleaned

Review Checklist (before approving tests)

  • No hardcoded paths to project directories
  • All file operations use tmp_path or mocks
  • Integration tests use fixtures with cleanup
  • Docker fixtures auto-remove containers

When to Use

  • During planning: designing test strategy for new features
  • During implementation: running incremental tests
  • During review: analyzing coverage and gaps
  • On demand: investigating test failures, improving test suite

Preconditions

  • .agent/constitution.md exists with confirmed test command
  • .agent/baseline.md exists (for comparison)

Test Strategy Design

For New Features

  1. Identify test levels needed:

    • Unit tests: isolated function/method behavior
    • Integration tests: component interaction
    • E2E tests: user-facing workflows (if applicable)
  2. Define test cases from requirements:

    • Happy path: expected inputs → expected outputs
    • Edge cases: boundary values, empty inputs, max values
    • Error cases: invalid inputs, failure scenarios
    • Regression cases: ensure existing behavior unchanged
  3. Document in task/plan:

    ## Test Strategy
    - Unit: [list of unit test cases]
    - Integration: [list of integration scenarios]
    - Edge cases: [specific edge cases to cover]
    - Not testing: [explicitly excluded with rationale]

For Bug Fixes

  1. Write failing test FIRST (reproduces the bug)
  2. Fix the bug
  3. Verify test passes
  4. Check for related regression tests needed

Test Execution

Incremental Testing (during implementation)

After each implementation step:

  1. Run the smallest reliable test subset covering changed code
  2. If tests fail: stop, diagnose, fix before proceeding
  3. Log test results in focus.md

Full Test Suite (end of implementation)

  1. Run complete test command from constitution
  2. Compare results to baseline
  3. Investigate ANY new failures (even in unrelated areas)

Test Command Patterns

# Run specific test file
<test-runner> path/to/test_file.py

# Run tests matching pattern
<test-runner> -k "test_feature_name"

# Run with coverage
<test-runner> --coverage

# Run failed tests only (re-run)
<test-runner> --failed

Actual commands must come from constitution.

Coverage Analysis

Confidence-Based Coverage Thresholds (MANDATORY)

Coverage requirements scale with confidence level:

ConfidenceLine CoverageBranch CoverageEnforcement
LOW≥90% on changed code≥85% on changed codeHARD — blocks completion
NORMAL≥80% on changed code≥70% on changed codeSOFT — warning if missed
HIGHTests passN/AMINIMAL — existing tests only

Rationale:

  • LOW confidence = more unknowns = more code paths to verify
  • HIGH confidence = well-understood = existing tests sufficient

Enforcement:

🎯 COVERAGE CHECK — {CONFIDENCE} Confidence

Required: ≥{line_threshold}% line, ≥{branch_threshold}% branch
Actual:   {actual_line}% line, {actual_branch}% branch

[PASS] Coverage meets threshold
— OR —
[FAIL] Coverage below threshold — must add tests before completion

For LOW confidence failures:

  • Coverage failure is a HARD BLOCK
  • Cannot proceed until threshold is met
  • Document why if threshold is truly unachievable (rare)

When to Analyze Coverage

  • After completing a feature (before critical review)
  • When investigating untested code paths
  • During improvement discovery

Coverage Metrics to Track

MetricTargetNotes
Line coverage≥80% for new codeNot a hard rule; quality over quantity
Branch coverageCritical paths coveredFocus on decision points
Uncovered linesDocument rationaleSome code legitimately untestable

Coverage Gaps to Flag

  • New code with 0% coverage → must address
  • Error handling paths untested → should address
  • Complex logic untested → investigate
  • Generated/boilerplate untested → acceptable

Test Quality Checklist

Good Tests

  • Test behavior, not implementation
  • Independent (no test order dependencies)
  • Deterministic (same result every run)
  • Fast (< 1 second per unit test)
  • Readable (test name describes scenario)
  • Minimal mocking (only external dependencies)

Anti-Patterns to Avoid

  • ❌ Testing implementation details (breaks on refactor)
  • ❌ Excessive mocking (tests mock, not real code)
  • ❌ Flaky tests (intermittent failures)
  • ❌ Slow tests without justification
  • ❌ Tests that require manual setup
  • ❌ Commented-out tests

Failure Investigation

When tests fail unexpectedly, invoke agent-ops-debugging:

  1. Apply systematic debugging process:

    • Isolate: Run failing test alone
    • Reproduce: Confirm failure is consistent
    • Form hypothesis: What might cause this?
    • Test hypothesis: Add logging, inspect state
  2. Categorize the failure:

    CategoryEvidenceAction
    Agent's changeTest passed in baselineFix the change
    Pre-existingTest failed in baselineDocument, create issue
    FlakyIntermittent, no code changeFix test or document
    EnvironmentWorks elsewhereCheck constitution assumptions
  3. Handoff decision:

    🔍 Test failure analysis:
    
    - Test: {test_name}
    - Category: {agent_change | pre_existing | flaky | environment}
    - Root cause: {diagnosis}
    
    Next steps:
    1. Fix and re-run (if agent's change)
    2. Create issue and continue (if pre-existing)
    3. Deep dive with /agent-debug (if unclear)

Output

After test activities, update:

  • .agent/focus.md: test results summary
  • .agent/baseline.md: if establishing new baseline

Issue Discovery After Testing

After test analysis, invoke agent-ops-tasks discovery procedure:

  1. Collect test-related findings:

    • Failing tests → BUG (high)
    • Missing test coverage → TEST (medium)
    • Flaky tests identified → CHORE (medium)
    • Test anti-patterns found → REFAC (low)
    • Missing edge case tests → TEST (medium)
  2. Present to user:

    📋 Test analysis found {N} items:
    
    High:
    - [BUG] Flaky test: PaymentService.processAsync (failed 2/10 runs)
    
    Medium:
    - [TEST] Missing coverage for error handling in UserController
    - [TEST] No edge case tests for empty input scenarios
    
    Low:
    - [REFAC] Tests have excessive mocking in OrderService.test.ts
    
    Create issues for these? [A]ll / [S]elect / [N]one
  3. After creating issues:

    Created {N} test-related issues. What's next?
    
    1. Start fixing highest priority (BUG-0024@abc123 - flaky test)
    2. Continue with current work
    3. Review test coverage report
Repository
majiayu000/claude-skill-registry
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.