analyzing-malware-sandbox-evasion-techniques

Detect sandbox evasion techniques in malware samples by analyzing timing checks, VM artifact queries, user interaction detection, and sleep inflation patterns from Cuckoo/AnyRun behavioral reports

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

50%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The body is well-structured and procedural but stays at a descriptive level: it explains known concepts, omits executable guidance, and fails to point at the available reference and script bundle files.

Suggestions

Reference the bundle files explicitly, e.g. "See references/api-reference.md for indicator tables and scripts/agent.py for a runnable analyzer."

Trim the Overview's definitional sentence and lead with the actionable detection procedure.

Add a validation checkpoint after step 7 (e.g. verify each detected technique has API-call evidence before emitting the report).

Dimension	Reasoning	Score
Conciseness	Mostly efficient sections, but the Overview explains sandbox-evasion concepts ("allows malware to detect analysis environments and alter behavior to avoid detection") that Claude already knows, so it could be tightened.	2 / 3
Actionability	Steps name concrete indicators (GetTickCount, vmtoolsd.exe, GetCursorPos) but provide no executable code or commands, and the body never points to the bundled agent.py that would make it copy-paste ready.	2 / 3
Workflow Clarity	A clear 7-step sequence with a defined expected output, but no validation/verification checkpoint (e.g. confirming detections against the ATT&CK mapping) is specified, leaving checkpoints implicit.	2 / 3
Progressive Disclosure	Sections are well organized, but the bundled references/api-reference.md and scripts/agent.py are never signaled from the body, and indicator lists are duplicated inline rather than split out with clear one-level references.	2 / 3
	Total	8 / 12 Passed

Description

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific and distinct, with strong natural trigger terms and third-person voice. Its main gap is the missing explicit "Use when…" clause, which caps completeness.

Suggestions

Add an explicit trigger clause, e.g. "Use when analyzing Cuckoo/AnyRun behavioral reports for sandbox evasion or when a malware sample's T1497 behavior needs scoring."

Consider naming the output artifact (e.g. "produces a JSON evasion report") to sharpen the "what" further.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — "analyzing timing checks, VM artifact queries, user interaction detection, and sleep inflation patterns from Cuckoo/AnyRun behavioral reports" — rather than vague language.	3 / 3
Completeness	Clearly answers "what" but offers no explicit "when/Use when…" trigger guidance, so per the guideline completeness is capped at 2.	2 / 3
Trigger Term Quality	Natural analyst vocabulary (sandbox evasion, malware samples, Cuckoo/AnyRun, behavioral reports, sleep inflation) in third person ("Detect"), with good coverage of terms a user would say.	3 / 3
Distinctiveness Conflict Risk	A clear niche (sandbox-evasion detection from Cuckoo/AnyRun reports) with distinct triggers unlikely to overlap with unrelated skills.	3 / 3
	Total	11 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	15 / 16 Passed

Repository: mukul975/Anthropic-Cybersecurity-Skills
Commit: 673da1f

Reviewed: 13 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.