tidb-test-diff-triage

Triage unexpected TiDB test diffs that seem unrelated to the current PR. Use when plan/result/testdata changes appear after merge/rebase or only in specific local runs, especially to quickly rule in/out failpoint enablement issues.

1.39x

Quality

83%

Does it follow best practices?

Impact

95%

1.39x

Average score across 3 eval scenarios

Securityby

Passed

No findings from the security scan

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A well-structured triage skill that efficiently guides Claude through a specific debugging workflow with clear sequencing and gate conditions. Its main weakness is that some steps (particularly the failpoint-enabled run) defer to external docs without providing the concrete commands inline, reducing immediate actionability. The output format template is a strong addition that serves as both documentation and a verification checklist.

Suggestions

Include the explicit failpoint-enabled `go test` command inline rather than only referencing `docs/agents/testing-flow.md`, so the skill is self-contained for the most common case.

Add a concrete example of the full bisect loop with a test command, e.g., `git bisect run go test -run TestName -count=1 -tags=intest ./pkg/...` to make the bisect step copy-paste ready.

Dimension	Reasoning	Score
Conciseness	Every section is lean and purposeful. No unnecessary explanations of what TiDB, failpoints, or git bisect are—assumes Claude already knows these concepts. Each rule is stated directly with only the essential context.	3 / 3
Actionability	Provides concrete commands for git bisect and go test flags, and references a specific doc path for failpoint decisions. However, the failpoint-enabled run command is not shown explicitly (it defers to another doc), and the bisect workflow lacks the full loop (e.g., `git bisect run` with a test command). Some steps remain directional rather than copy-paste ready.	2 / 3
Workflow Clarity	The three rules form a clear, sequenced triage workflow: first rule out failpoint setup, then isolate merge impact, then (and only then) update testdata. Each rule has an explicit gate condition before proceeding, and Rule 3 explicitly prevents premature action. The output format serves as a validation checklist.	3 / 3
Progressive Disclosure	References `docs/agents/testing-flow.md` for failpoint details, which is appropriate one-level-deep disclosure. However, no bundle files are provided to verify the reference exists, and the skill could benefit from clearer signaling of what additional resources are available (e.g., linking to bisect guides or testdata update procedures).	2 / 3
	Total	10 / 12 Passed

Description

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted, niche skill description that clearly communicates both what it does and when to use it. Its strength lies in its highly specific domain targeting (TiDB failpoint-related test diffs) and natural trigger terms. The only minor weakness is that the 'what' portion could enumerate more concrete actions beyond 'triage' to give a fuller picture of the skill's capabilities.

Suggestions

Consider listing 2-3 more specific actions the skill performs during triage, e.g., 'checks failpoint environment variables, compares baseline test output, identifies non-deterministic test results' to improve specificity.

Dimension	Reasoning	Score
Specificity	Names the domain (TiDB test diffs) and a key action (triage unexpected diffs, rule in/out failpoint enablement issues), but doesn't list multiple concrete actions—it's more of a single workflow description than a list of specific capabilities.	2 / 3
Completeness	Clearly answers both 'what' (triage unexpected TiDB test diffs unrelated to the current PR) and 'when' (when plan/result/testdata changes appear after merge/rebase or only in specific local runs, especially for failpoint enablement issues) with an explicit 'Use when' clause.	3 / 3
Trigger Term Quality	Includes highly natural and specific trigger terms a user would actually say: 'test diffs', 'merge/rebase', 'plan/result/testdata changes', 'failpoint', 'local runs', 'unrelated to the current PR'. These are terms a TiDB developer would naturally use when encountering this problem.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive—targets a very specific niche (TiDB test diff triage related to failpoint enablement). The combination of TiDB, test diffs, merge/rebase context, and failpoint issues makes it extremely unlikely to conflict with other skills.	3 / 3
	Total	11 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: pingcap/tidb
Path: .agents/skills/tidb-test-diff-triage/SKILL.md
Commit: e70762e

Reviewed: 2 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.