harm-anticipation

Proactively identifying failure modes, misuse, and unintended consequences.

Quality

38%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Fix and improve this skill with Tessl

tessl review fix ./gemini-extension/ai-alignment-reasoning/skills/harm-anticipation/SKILL.md

The canonical home for this skill is harm-anticipation in Owl-Listener/ai-design-skills

Harm Anticipation

Name: harm-anticipation
Rating: 40.8 (1 reviews)
Author: Owl-Listener

Harm anticipation is systematically thinking through how an AI product could cause harm — before it does. Preventive design, not reactive crisis management.

The work is unglamorous and easy to skip. Done well, it produces specific testable mitigations. Done badly, it produces a doc nobody reads.

Categories of AI harm

Direct harm: the AI outputs something harmful — dangerous advice, discriminatory content, privacy violations
Facilitated harm: the AI helps a user do something harmful, even if the AI's output itself is benign
Emergent harm: harmful patterns from scale or interaction effects, not from any single output
Omission harm: the AI fails to act when it should — not flagging a crisis, not escalating
Erosion harm: gradual negative effects — dependency, deskilling, manipulation, trust erosion

Structured anticipation

Work through each harm category with five lenses:

Who could be harmed? The user, people the user interacts with, vulnerable populations, society at large.
How could they be harmed? Physical, emotional, financial, reputational, privacy, autonomy.
What's the likelihood? Common use case vs. edge case vs. adversarial attack.
What's the severity? Inconvenience vs. distress vs. irreversible damage.
What's the detectability? Obvious and immediate vs. subtle and delayed.

Misuse scenarios

Think like an adversary:

How would someone deliberately misuse this feature?
What's the easiest way to extract harmful output?
Could this be used to manipulate, deceive, or coerce?
What if the user lies about their intent?
What happens at scale, automated?

Unintended consequences

Think second-order:

What happens at millions of users?
What skills do users lose by relying on this?
Could this create unfair advantages or disadvantages?
What power dynamics shift?
What becomes possible that wasn't before — for better and worse?

Decision rules

Score by frequency × severity, not severity alone. A common moderate harm beats a rare catastrophic one in cumulative impact, almost every time. Teams chronically over-weight catastrophic-rare and under-weight chronic-moderate.
Each harm needs a falsifiable test. "If the AI does X to user type Y, the system should Z." If you can't write the test, you haven't specified the harm precisely enough — refine before mitigating.
Mitigate at the lowest layer that works. Filtering output is weaker than constraining the prompt; constraining the prompt is weaker than scoping the feature; scoping the feature is weaker than not building it.
Re-anticipate after each scale jump. Harms at 1k users differ from 100k differ from 10M. The same checklist gives different answers at each tier.
Vulnerable users define the floor, not the ceiling. If the worst-off user is well-served, others will be too. Designing for the median user with edge-case patches is the wrong shape.
Prefer reversible mitigations during the discovery phase. Lock-in early and you'll be scared to revisit assumptions.

Anti-patterns

Pre-mortem theatre: writing the harm anticipation doc as a deliverable, then never opening it again. The doc is the artefact; the practice is the value.
Worst-case-only thinking: only catastrophic harms get attention. Moderate-frequency moderate-severity harms accumulate unaddressed and become the actual product story.
Adversary-only framing: anticipating bad actors but missing harms from well-intentioned users in unexpected contexts (the helpful suicide-line responder, the well-meaning legal advice giver).
Mitigation-by-checklist: treating harms as boxes to tick rather than ongoing design constraints. Once boxed, the harm is "handled" until it isn't.
Externalised liability: "the user agreed to ToS, we're covered." Liability ≠ harm reduction. Users are still hurt; the company is just protected.
Single-axis thinking: anticipating along one harm category and missing intersections. "We checked for bias and we checked for privacy" — but bias × privacy at intersection is its own failure mode.
Optimist's mitigation: writing mitigations that depend on the user noticing, the user reporting, the AI detecting. If the mitigation requires a working sensor, audit whether the sensor actually works.

When not to use this

For routine UX paper-cuts (slow loads, awkward copy) — reach for failure-taxonomy or user-satisfaction-signals instead. Harm anticipation is for consequences, not friction.
For purely technical reliability (latency, uptime) — those are SRE concerns; harm anticipation is the product-level layer above them.
In place of red-teaming — anticipation is a planning activity; red-teaming is a probing activity. Both, not either.
As a one-off audit at launch — the anticipation that catches harms is the one done continuously, especially after each significant change.

Design Artefacts

Harm anticipation matrix
Misuse scenario catalogue
Pre-mortem analysis documents
Risk-severity heatmaps for product features
Mitigation tracking log with pass/fail tests per harm

Worked example — one row of the harm anticipation matrix for an AI mental-health-support chatbot:

Field	Value
Scenario	User in acute crisis (suicidal ideation language) asks for advice.
Harm category	Omission harm + Direct harm.
Who is harmed	The user, their dependents.
How	AI provides general advice without recognising crisis; user delays seeking emergency help.
Likelihood	Medium — crisis users are a minority of usage but represent peak-stakes interactions.
Severity	Catastrophic, irreversible.
Detectability	Low at the per-interaction level (no obvious bad output); medium retrospectively (post-incident review).
Mitigation	Crisis-marker classifier on user input; on detection, replace the AI response with hardcoded crisis-line copy + warm handoff to human counsellor. Falsifiable test: red-team prompts containing 30 documented crisis-language patterns; 100% must trigger the override.
Mitigation strength	Lower-layer than output filtering — replaces the response entirely rather than scrubbing.
Re-anticipate at	10× user growth, model version change, language expansion.

The mitigation has a test. The test is run on every model update. That makes the harm anticipation a living constraint, not a doc.

Adapted from work on responsible AI deployment (Raji et al. on closing the AI accountability gap; Weidinger et al. on taxonomies of risk from language models) and pre-mortem methodology from cognitive psychology (Klein on prospective hindsight).

Repository: Owl-Listener/ai-design-skills
Path: gemini-extension/ai-alignment-reasoning/skills/harm-anticipation/SKILL.md
Commit: f41b650

Last updated: 2 days ago
First committed: 3 months ago

Canonical home

Owl-Listener/ai-design-skills

In sync

since May 8, 2026

Also appears in

Owl-Listener/ai-design-skills

In sync

since May 8, 2026

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.