CtrlK
BlogDocsLog inGet started
Tessl Logo

rollback-strategy-advisor

Advises on rollback strategies by analyzing what a deploy changes — recommending revert, roll-forward, feature-flag kill, or data repair depending on reversibility. Use during an incident when a deploy went bad, when designing a deploy pipeline and the user asks how to make it reversible, or when a migration needs an undo plan.

Install with Tessl CLI

npx tessl i github:santosomar/general-secure-coding-agent-skills --skill rollback-strategy-advisor
What are skills?

97

Quality

96%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SKILL.md
Review
Evals

Rollback Strategy Advisor

"Just deploy the old version" only works if the new version didn't change anything the old version depends on. It usually did.

The reversibility question

Before choosing a strategy, classify what the bad deploy changed:

Change typeReversible by redeploying old code?Why / why not
Stateless code only✅ YesOld code runs on old state; no state changed
Additive schema (new column, new table)✅ YesOld code ignores the new column
Destructive schema (drop column, rename)❌ NoOld code expects the column that's gone
Additive data (new rows)⚠️ UsuallyUnless the new rows confuse old code's queries
Mutated data (UPDATE existing rows)❌ NoOld code expects old data shape; you need data repair
New external side effects (emails sent, payments made)❌ NeverCan't unsend. Compensating actions only.
Config only✅ YesRevert the config
Feature flag flip✅ InstantlyFlip it back — this is why flags exist

Decision tree

Is there a feature flag gating the bad behavior?
  ├─ YES → Kill the flag. Done in seconds. Investigate at leisure.
  └─ NO →
     Did the deploy change data/schema?
       ├─ NO → Redeploy previous artifact. Done.
       └─ YES →
          Was the change additive-only?
            ├─ YES → Redeploy previous artifact. Clean up schema later.
            └─ NO →
               Can you roll FORWARD (fix is small, well-understood)?
                 ├─ YES → Roll forward. Faster than untangling.
                 └─ NO → You're in data repair. See below.

Data repair (the hard case)

When old code can't run on new data:

  1. Stop the bleeding. Feature flag, maintenance page, or traffic drain — whatever stops more data from being mutated.
  2. Snapshot. Before you touch anything. pg_dump / volume snapshot. You will want this when your repair script has a bug.
  3. Assess scope. How many rows? SELECT count(*) WHERE <mutated-condition>. 10 rows is a manual fix. 10 million is a migration.
  4. Repair or compensate:
    • Reversible mutation → write the inverse UPDATE
    • Irreversible mutation → restore affected rows from snapshot/backup
    • External side effects → compensating action (refund, apology email, manual ticket)
  5. Then redeploy old code.

Design-time advice — make rollbacks boring

If you're not mid-incident, the best advice is to make the next deploy reversible:

TechniqueMakes rollback trivial when
Feature flagsYou're shipping a behavior change — gate it
Expand-contract migrationsYou're changing schema — add new alongside old, migrate, remove old in a later deploy
Dual-write periodYou're changing data format — write both formats until new code is stable
Immutable artifacts by SHAYou're deploying — image:abc123 can always be re-deployed; image:latest can't
Backward-compatible APIsYou're changing an interface — new version reads old format

Worked example

Situation: Deployed v2.4.0 at 14:00. At 14:20, error rate spikes. v2.4.0 added a NOT NULL column users.tenant_id with a default, and the new code reads it.

Reversibility check: Schema change was additive (new column with default) → old code should ignore it. ✅ Reversible.

Wait — check the migration. It ran ALTER TABLE users ADD COLUMN tenant_id ... NOT NULL DEFAULT 1. But old code does INSERT INTO users (...) without tenant_id. Does NOT NULL DEFAULT allow that? Yes — the default fires. ✅ Still reversible.

Action: Redeploy v2.3.9. kubectl rollout undo deployment/app. The column stays; old code ignores it. Clean up never — the column is fine, the bug is elsewhere in v2.4.0.

Post-mortem note: If the migration had been NOT NULL without a default, old code's INSERTs would fail. That would have been a non-reversible schema change masquerading as additive.

Do not

  • Do not roll back before you understand what changed. Rolling back into a broken state is worse than being broken in a known state.
  • Do not roll forward under pressure with an untested fix. Rolling forward is for when you know the fix; it's not a license to deploy a guess.
  • Do not skip the snapshot before data repair. The repair will have a bug. It always does.
  • Do not DROP the new column/table during rollback. Leave it. It's harmless, and the next deploy attempt will need it.
  • Do not design a rollback strategy during an incident. Design it when you design the deploy. → cd-pipeline-generator should include the undo path.

Output format

Incident mode:

## Reversibility
<change type> → <reversible: yes/no/partial>

## Recommended action
<flag kill | redeploy | roll forward | data repair>

## Steps
1. ...

## If this doesn't work
<next fallback>

Design mode:

## This deploy's reversibility class
<from the table>

## To make it cheaply reversible
<specific technique: flag / expand-contract / dual-write>

## Rollback command (pre-written — paste during incident)
<exact command>
Repository
santosomar/general-secure-coding-agent-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.