CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/pr-review-guardrails

Evidence-first pull request review with independent critique, selective challenger review, and human handoff.

87

1.31x
Quality

92%

Does it follow best practices?

Impact

87%

1.31x

Average score across 43 eval scenarios

SecuritybySnyk

Risky

Do not use without reviewing

Overview
Quality
Evals
Security
Files

Evaluation results

100%

25%

Scenario 1

Criteria
Without context
With context

Risk classified green

80%

100%

No false positive findings

70%

100%

100%

89%

Scenario 2

Criteria
Without context
With context

Detects oversized PR

0%

100%

Recommends splitting

0%

100%

Notes WIP status

100%

100%

100%

15%

Scenario 3

Criteria
Without context
With context

Risk classified green

70%

100%

No false positive findings

100%

100%

80%

64%

Scenario 4

Criteria
Without context
With context

Risk classified green

0%

100%

No false positive findings

40%

70%

Minimal review overhead

0%

60%

50%

50%

Scenario 5

Criteria
Without context
With context

Risk classified green

0%

100%

No false positive findings

0%

0%

100%

Scenario 6

Criteria
Without context
With context

Risk classified green

100%

100%

No false positive findings

100%

100%

100%

5%

Scenario 7

Criteria
Without context
With context

Risk classified green

90%

100%

No false positive findings

100%

100%

69%

-31%

Scenario 8

Criteria
Without context
With context

Detects compilation failure

100%

100%

Detects test failure

100%

12%

100%

79%

Scenario 9

Criteria
Without context
With context

Catches IDOR vulnerability

26%

100%

Distinguishes UI hiding from real authorization

12%

100%

100%

80%

Scenario 10

Criteria
Without context
With context

Catches same-AZ replica

0%

100%

Catches missing replica backups

50%

100%

100%

14%

Scenario 11

Criteria
Without context
With context

Detects oversized PR

75%

100%

Recommends splitting

100%

100%

100%

Scenario 12

Criteria
Without context
With context

Flags missing description

100%

100%

Escalates due to auth changes

100%

100%

Catches silent error swallowing

100%

100%

8%

4%

Scenario 13

Criteria
Without context
With context

Risk classified green or yellow

0%

0%

No false positive on every-to-some change

0%

0%

Minimal review overhead

20%

40%

100%

Scenario 14

Criteria
Without context
With context

Catches vulnerable dependencies

100%

100%

Names specific packages

100%

100%

72%

6%

Scenario 15

Criteria
Without context
With context

Catches data race on shared counters

100%

100%

Catches cross-batch result leaking

25%

25%

Risk classified red

50%

70%

86%

Scenario 16

Criteria
Without context
With context

Detects AI authorship

0%

0%

Catches removed error handling

100%

100%

Catches removed context propagation

100%

100%

Risk classified red

100%

100%

72%

-28%

Scenario 17

Criteria
Without context
With context

Catches shutdown ordering bug

100%

53%

Risk classified yellow or higher

100%

100%

0%

Scenario 18

Criteria
Without context
With context

Catches stale authorization cache

0%

0%

Risk classified yellow or higher

0%

0%

76%

-17%

Scenario 19

Criteria
Without context
With context

Catches unsanitized header propagation

100%

100%

Catches response header echo risk

80%

30%

Risk classified yellow or higher

100%

100%

88%

40%

Scenario 20

Criteria
Without context
With context

Catches health check pool contention

33%

80%

Risk classified yellow or higher

70%

100%

86%

6%

Scenario 21

Criteria
Without context
With context

Catches dangerous resource reduction

100%

100%

Identifies cascading restart risk

25%

50%

Risk classified yellow or higher

100%

100%

80%

-12%

Scenario 22

Criteria
Without context
With context

Catches destroy-and-recreate risk

100%

100%

Catches removed safety guards

100%

20%

Catches apply_immediately risk

100%

100%

Risk classified red

70%

100%

100%

Scenario 23

Criteria
Without context
With context

Risk classified red

100%

100%

Catches open-to-world security groups

100%

100%

Catches database exposed to internet

100%

100%

100%

Scenario 24

Criteria
Without context
With context

Catches unencrypted notification endpoint

100%

100%

Catches overly permissive SNS policy

100%

100%

Risk classified red or yellow

100%

100%

100%

12%

Scenario 25

Criteria
Without context
With context

Catches Glacier retrieval impact

80%

100%

Risk classified yellow or higher

100%

100%

86%

14%

Scenario 26

Criteria
Without context
With context

Catches TOCTOU race on discount usage

100%

100%

Catches negative charge amount

100%

100%

Catches decrement-before-charge ordering

100%

0%

Risk classified red

0%

100%

100%

Scenario 27

Criteria
Without context
With context

Catches session never-expire risk

100%

100%

Catches unbounded Redis memory growth

100%

100%

Risk classified yellow or higher

100%

100%

69%

69%

Scenario 28

Criteria
Without context
With context

Catches default provider mismatch

0%

100%

Identifies total payment outage impact

0%

100%

Risk classified red

0%

0%

100%

Scenario 29

Criteria
Without context
With context

Catches non-atomic rate limit check

100%

100%

Identifies security impact on brute force protection

100%

100%

Risk classified red

100%

100%

100%

17%

Scenario 30

Criteria
Without context
With context

Catches hardcoded secrets

100%

100%

Detects AI authorship

0%

100%

Risk classified red

100%

100%

100%

7%

Scenario 31

Criteria
Without context
With context

Catches timing attack vulnerability

100%

100%

Risk classified yellow or higher

100%

100%

Does not raise irrelevant findings

60%

100%

100%

Scenario 32

Criteria
Without context
With context

Catches TOCTOU race condition

100%

100%

Risk classified yellow or higher

100%

100%

100%

16%

Scenario 33

Criteria
Without context
With context

Catches information disclosure

100%

100%

Risk classified yellow or higher

60%

100%

100%

56%

Scenario 34

Criteria
Without context
With context

Catches non-transactional refund risk

100%

100%

Risk classified yellow or higher

0%

100%

100%

60%

Scenario 35

Criteria
Without context
With context

Catches in-memory dedup limitation

66%

100%

Risk classified yellow or higher

0%

100%

100%

Scenario 36

Criteria
Without context
With context

Catches sort direction injection

100%

100%

Risk classified yellow or higher

100%

100%

100%

Scenario 37

Criteria
Without context
With context

Catches stale rate data

100%

100%

Risk classified yellow or higher

100%

100%

100%

32%

Scenario 38

Criteria
Without context
With context

Catches 401 silently resolved as success

100%

100%

Catches removed auth redirect behavior

100%

100%

Risk classified red

0%

100%

100%

13%

Scenario 39

Criteria
Without context
With context

Catches token storage security downgrade

100%

100%

Catches refresh token exposure

100%

100%

Risk classified red

60%

100%

100%

100%

Scenario 40

Criteria
Without context
With context

Catches CSV injection

0%

100%

Risk classified yellow or higher

0%

100%

60%

Scenario 41

Criteria
Without context
With context

Catches uncapped backoff

100%

33%

Risk classified yellow or higher

0%

100%

100%

100%

Scenario 42

Criteria
Without context
With context

Catches stale role in cache

0%

100%

Risk classified yellow or higher

0%

100%

72%

52%

Scenario 43

Criteria
Without context
With context

Catches unsafe localStorage parsing

33%

53%

Risk classified yellow or higher

0%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents