CtrlK
BlogDocsLog inGet started
Tessl Logo

lading-optimize-review

Reviews optimization patches using a 5-persona peer review system. Requires unanimous approval backed by benchmarks.

66

Quality

58%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./.claude/skills/lading-optimize-review/SKILL.md
SKILL.md
Quality
Evals
Security

Optimization Patch Review

A rigorous 5-persona peer review system for optimization patches in lading. Requires unanimous approval backed by concrete benchmark data. Duplicate Hunter persona prevents redundant work.

Role: Judge

Review is the decision-maker. It does NOT record results.

Review judges using benchmarks and 5-persona review, then returns a structured report.

Outcomes

OutcomeVotesAction
APPROVED5/5 APPROVEReturn APPROVED report
REJECTEDAny REJECTReturn REJECTED report

Arguments

This skill requires 5 positional arguments passed by the caller:

ArgFieldExampleUsed for
$ARGUMENTS[0]benchtrace_agentcargo criterion --bench flag
$ARGUMENTS[1]fingerprintci/fingerprints/trace_agent_v04/lading.yamlpayloadtool config path
$ARGUMENTS[2]filelading_payload/src/trace_agent/v04.rsreport + duplicate check
$ARGUMENTS[3]targetV04::to_bytesreport
$ARGUMENTS[4]techniquebuffer-reusereport + duplicate check

If any argument is missing -> REJECT. All 5 are required.

Generate Report ID

Derive the id from the file and technique arguments:

  1. Take the filename stem from $ARGUMENTS[2] (e.g. lading_payload/src/trace_agent/v04.rstrace-agent-v04)
  2. Append the technique $ARGUMENTS[4] (e.g. buffer-reuse)
  3. Join with -trace-agent-v04-buffer-reuse

Use this id in the report.


Phase 1: Benchmark Execution

Step 1: Read Baseline Data

Read the baseline benchmark files captured:

  • /tmp/criterion-baseline.log — micro-benchmark baseline
  • /tmp/baseline.json — macro-benchmark timing baseline
  • /tmp/baseline-mem.txt — macro-benchmark memory baseline

If baseline data is missing -> REJECT. Baselines must be captured before any code change and before this gets invoked.

Step 2: Run Post-Change Micro-benchmarks

cargo criterion --bench $ARGUMENTS[0] 2>&1 | tee /tmp/criterion-optimized.log

Note: Criterion automatically compares against the last run and reports percentage changes.

Compare results — look for "change:" lines showing improvement/regression.

Example output: time: [1.2345 ms 1.2456 ms 1.2567 ms] change: [-5.1234% -4.5678% -4.0123%]

Step 3: Run Post-Change Macro-benchmarks

cargo build --release --bin payloadtool
hyperfine --warmup 3 --runs 30 --export-json /tmp/optimized.json \
  "./target/release/payloadtool $ARGUMENTS[1]"
./target/release/payloadtool "$ARGUMENTS[1]" --memory-stats 2>&1 | tee /tmp/optimized-mem.txt

Statistical Requirements

  • Minimum 30 runs for hyperfine (--runs 30)
  • Criterion handles statistical significance internally
  • Time improvement >= 5% for significance
  • Memory improvement >= 10% for significance
  • Allocation reduction >= 20% for significance

NO EXCEPTIONS

  • "Test dependencies don't work" -> REJECT. Fix dependencies first.
  • "Theoretically better" -> REJECT. Prove it with numbers.
  • "Obviously an improvement" -> REJECT. Obvious is not measured.
  • "Will benchmark later" -> REJECT. Benchmark now.

Phase 2: Five-Persona Review

1. Duplicate Hunter (Checks for Redundant Work)

  • Check .claude/skills/lading-optimize-hunt/assets/db.yaml for $ARGUMENTS[2] + $ARGUMENTS[4] combo
  • File + technique combo not already approved
  • No substantially similar optimization exists
  • If duplicate found -> REJECT with "DUPLICATE: see <existing entry>"

2. Skeptic (Demands Proof)

  • Baseline data verified present
  • Post-change benchmarks executed with identical methodology
  • Hot path verified via profiling (not just guessed)
  • Micro threshold met (>=5% time)
  • Macro threshold met (>=5% time OR >=10% mem OR >=20% allocs)
  • Statistical significance confirmed (p<0.05 or criterion "faster")
  • Improvement is real, not measurement noise
  • Benchmark methodology sound (same config, same machine)

3. Conservative (Guards Correctness)

  • Run ci/validate and validate that it passes completely
  • No semantic changes to output
  • Determinism preserved (same seed -> same output)
  • No .unwrap() or .expect() added (lading MUST NOT panic)
  • No bugs introduced (if bug found -> REJECT with bug details)
  • Property tests exist for changed code

4. Rust Expert (Lading-Specific Patterns)

  • No mod.rs files (per CLAUDE.md)
  • All use statements at file top (not inside functions)
  • Format strings use named variables ("{index}" not "{}")
  • Pre-computation in initialization, not hot paths
  • Worst-case behavior considered, not just average-case
  • No unnecessary cloning or allocation in hot paths

5. Greybeard (Simplicity Judge)

  • Code still readable without extensive comments
  • Complexity justified by measured improvement
  • "Obviously fast" pattern, not clever trick
  • Follows 3-repeat abstraction rule (no premature abstraction)
  • Change is minimal - no scope creep

Phase 3: Kani/Property Test Check

If the optimization touches critical code:

For lading_throttle:

ci/kani lading_throttle

For lading_payload:

ci/kani lading_payload

Kani constraints:

  • Kani proofs are more complete but labor-intensive
  • Kani may not compile complex code
  • Kani runs are EXTREMELY slow for complex code

If Kani fails to run:

  1. Document why (compilation error? timeout?)
  2. Verify comprehensive property tests exist instead
  3. This is acceptable - Kani feasibility varies

Phase 4: Decision

OutcomeVotesAction
APPROVED5/5 APPROVEReturn APPROVED report
REJECTEDAny REJECTReturn REJECTED report

Duplicates, bugs, correctness issues, and missing benchmarks are all rejections. Describe the specific reason in the report's reason field.


Phase 5: Return Report

Review does NOT record results and does NOT create files. Return a structured YAML report to the caller.

Fill in the appropriate template and return the completed YAML:

VerdictTemplate
approved.claude/skills/lading-optimize-review/assets/approved.template.yaml
rejected.claude/skills/lading-optimize-review/assets/rejected.template.yaml
  1. Read the appropriate template from the .claude/skills/lading-optimize-review/assets/ directory
  2. Fill in placeholders using argument values:
    • id → generated ID (see "Generate Report ID" above)
    • target$ARGUMENTS[2]:$ARGUMENTS[3] (e.g. lading_payload/src/trace_agent/v04.rs:V04::to_bytes)
    • technique$ARGUMENTS[4]
  3. Fill remaining fields with actual benchmark data from the review
  4. Return the filled-in report

Repository
DataDog/lading
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.