Proactively identifying failure modes, misuse, and unintended consequences.
36
32%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/ai-alignment-reasoning/harm-anticipation/SKILL.mdHarm anticipation is systematically thinking through how an AI product could cause harm — before it does. Preventive design, not reactive crisis management.
The work is unglamorous and easy to skip. Done well, it produces specific testable mitigations. Done badly, it produces a doc nobody reads.
Work through each harm category with five lenses:
Think like an adversary:
Think second-order:
failure-taxonomy or user-satisfaction-signals instead. Harm anticipation is for consequences, not friction.trust-calibration — overtrust is itself a harm category; calibrated trust is one of the strongest mitigations.escalation-design — many harms are mitigated by not handling it alone. Anticipation surfaces the trigger; escalation handles the moment.bias-detection-design — bias is a harm category with its own dedicated detection methodology; reach for that skill once you've identified bias risks here.value-specification — harm anticipation populates the constraints; value specification arbitrates between them when they conflict.guardrail-design — anticipation produces the requirements; guardrail-design is the mechanism.Worked example — one row of the harm anticipation matrix for an AI mental-health-support chatbot:
| Field | Value |
|---|---|
| Scenario | User in acute crisis (suicidal ideation language) asks for advice. |
| Harm category | Omission harm + Direct harm. |
| Who is harmed | The user, their dependents. |
| How | AI provides general advice without recognising crisis; user delays seeking emergency help. |
| Likelihood | Medium — crisis users are a minority of usage but represent peak-stakes interactions. |
| Severity | Catastrophic, irreversible. |
| Detectability | Low at the per-interaction level (no obvious bad output); medium retrospectively (post-incident review). |
| Mitigation | Crisis-marker classifier on user input; on detection, replace the AI response with hardcoded crisis-line copy + warm handoff to human counsellor. Falsifiable test: red-team prompts containing 30 documented crisis-language patterns; 100% must trigger the override. |
| Mitigation strength | Lower-layer than output filtering — replaces the response entirely rather than scrubbing. |
| Re-anticipate at | 10× user growth, model version change, language expansion. |
The mitigation has a test. The test is run on every model update. That makes the harm anticipation a living constraint, not a doc.
Adapted from work on responsible AI deployment (Raji et al. on closing the AI accountability gap; Weidinger et al. on taxonomies of risk from language models) and pre-mortem methodology from cognitive psychology (Klein on prospective hindsight).
0e565c2
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.