backend-tests

Run backend tests and code quality checks for OPRE OPS. Covers ops_api pytest, data_tools pytest, and nox linting/formatting sessions. Use this skill when the user wants to run backend tests, check code quality, lint Python code, run pytest, or verify their backend changes pass CI checks — even if they just say "run the tests" or "does this pass".

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

65%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is highly actionable with concrete executable commands and clear sequencing, but loses points for redundant boilerplate that could be tightened, missing validation gates on the batch CI workflow, and a monolithic single-file structure with no progressive disclosure into reference files.

Suggestions

Consolidate the repeated 'cd backend/<pkg>' boilerplate and the verbose 'pytest -v --tb=short' variants into a shared note to remove redundant tokens across sections.

Add explicit validation gates to the 'all'/CI workflow (e.g., 'if a step fails, report and decide whether to continue before the next step') and a clearer validate→fix→retry loop for test failures, rather than running all four steps unconditionally.

Consider offloading the per-argument command recipes, Key File Locations, or Common Issues into a reference file to shrink the ~200-line body and give the skill genuine progressive disclosure.

Dimension	Reasoning	Score
Conciseness	The body is mostly efficient concrete commands with little concept-explanation, but redundancy could be tightened: the verbose 'pytest -v --tb=short' variant, repeated 'cd backend/<pkg>' boilerplate, and a default-help section that restates the seven numbered cases above. This fits 'mostly efficient but could be tightened' rather than the 'every token earns its place' level 3.	2 / 3
Actionability	Fully executable, copy-paste-ready commands throughout — 'cd backend/ops_api', 'pipenv run pytest', 'pipenv run nox -s lint', 'docker info > /dev/null 2>&1 && echo ...', and specific test paths — with no pseudocode, matching the level 3 anchor and clearly above the incomplete-guidance level 2.	3 / 3
Workflow Clarity	Sequencing is clear (numbered cases, the 'all' Step 1/4–4/4 run, a Docker pre-flight check), but the batch CI workflow lacks explicit per-step validation gates or a validate→fix→retry feedback loop, and only data_tools has a pre-flight checkpoint. Per the rubric's batch-operation guideline, missing validation caps this at 2 rather than 3.	2 / 3
Progressive Disclosure	No bundle/reference files exist and the ~200-line body is a single monolithic file; sections are well-organized and navigable, but everything is inline with nothing offloaded to reference files, fitting 'content that should be separate is inline' (level 2) rather than the reference-split level 3.	2 / 3
	Total	9 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is strong across all dimensions: it states concrete actions, includes natural trigger phrasing with casual variants, explicitly covers both what and when, and is tightly scoped to the OPRE OPS backend tooling. No changes needed.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — 'ops_api pytest, data_tools pytest, and nox linting/formatting sessions' and 'lint Python code, run pytest' — rather than vague language, matching the 'lists multiple specific concrete actions' anchor and exceeding the 'some actions' level 2.	3 / 3
Completeness	Explicitly answers both what ('Run backend tests and code quality checks... Covers ops_api pytest, data_tools pytest, and nox linting/formatting sessions') and when ('Use this skill when the user wants to run backend tests... even if they just say "run the tests"'), satisfying the explicit-trigger level 3 anchor.	3 / 3
Trigger Term Quality	Covers natural terms users would say — 'run backend tests', 'check code quality', 'lint Python code', 'run pytest', plus casual variants 'run the tests' and 'does this pass' — giving good coverage including common variations, above level 2's 'missing common variations'.	3 / 3
Distinctiveness Conflict Risk	Tied to a specific repo and toolset (OPRE OPS, ops_api/data_tools, pytest, nox) with distinct triggers, occupying a clear niche unlikely to fire for unrelated skills, matching the level 3 'clear niche with distinct triggers' anchor.	3 / 3
	Total	12 / 12 Passed

Validation

87%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 14 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	14 / 16 Passed

Repository: HHS/OPRE-OPS
Commit: 908aa71

Reviewed: 5 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.