CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/coding-policy

General-purpose coding policy for Baruch's AI agents

90

1.76x
Quality

91%

Does it follow best practices?

Impact

90%

1.76x

Average score across 18 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Evaluation results

80%

Eval Scenario Quality Audit

Criteria
Without context
With context

scenario-a bleeding detected

0%

100%

scenario-a bleeding fixed

0%

100%

scenario-a leaking detected

100%

100%

scenario-a leaking fixed

100%

100%

scenario-b vague messages detected

100%

100%

scenario-b vague messages fixed

100%

100%

scenario-b misaligned criteria detected

100%

0%

scenario-b misaligned criteria fixed

100%

0%

scenario-c deleted

100%

100%

audit report produced

100%

100%

99%

52%

Automate PR Creation and Code Review Request

Criteria
Without context
With context

Uses GraphQL `requestReviews` mutation

0%

100%

Inline comment explains why REST doesn't work

37%

100%

Pinned bot ID with fallback to dynamic discovery

30%

100%

Resolves the PR's GraphQL node ID

0%

100%

Verifies the review request was registered

100%

100%

Feature-branch guard

100%

100%

PR title follows conventional-commits format

100%

100%

PR body structure

57%

100%

Pre-push readiness checks

0%

66%

No hardcoded inputs in the script body

100%

100%

67%

-3%

Code Review Response Guide

Criteria
Without context
With context

CI failure: fix required

100%

100%

Applies the reasonable suggestion

100%

100%

Declines the over-engineered suggestion

100%

100%

All three threads get replies

100%

100%

Accept reply uses the `Fixed in <sha>` format

0%

0%

Decline reply uses the `Declining — <reason>` format

0%

0%

Decline reply cites a verifiable reference

100%

62%

Fixes pushed to the same branch

100%

100%

No dangling threads before merge

100%

100%

75%

-11%

Release Runbook for a Multi-Change Sprint

Criteria
Without context
With context

Patch: no manual manifest update

66%

0%

Patch: explains CI auto-bump

0%

0%

Minor: manifest bumped to `1.5.0`

100%

100%

Major: manifest bumped to `2.0.0`

100%

100%

Major: flags breaking-change impact for downstream

100%

100%

Release sequencing: patch first, major last

100%

100%

Readiness gate: tests + linter

100%

70%

Runbook covers all three changes separately

100%

100%

91%

1%

Eval Curation — Diagnose a Near-Zero-Lift Scenario

Criteria
Without context
With context

names canonical cause

66%

100%

prescribes rewrite-criteria

100%

100%

rejects fix-task and retire

100%

80%

replacement criteria are tile-specific

100%

75%

100%

47%

Eval Curation — Diagnose a Near-Zero-Lift Scenario

Criteria
Without context
With context

names canonical cause

0%

100%

prescribes fix-task

100%

100%

preserves the criterion

100%

100%

task rewrite strips technique, keeps situation

15%

100%

100%

20%

Eval Curation — Diagnose a Near-Zero-Lift Scenario

Criteria
Without context
With context

names canonical cause

42%

100%

prescribes retire

100%

100%

reasoning cites baseline equivalence

100%

100%

no spurious fix-task or rewrite-criteria

100%

100%

63%

-6%

PR Status Monitor Script

Criteria
Without context
With context

Uses `gh pr checks` with structured output

0%

100%

Uses `gh api .../pulls/<N>/reviews` for review state

100%

0%

Uses `gh api .../pulls/<N>/comments` for inline comments

100%

100%

Does NOT use `/issues/<N>/comments`

0%

0%

Retrieves per-reviewer state distinctly

100%

60%

No hardcoded PR, owner, or repo in the script body

100%

100%

Waits for CI to finish before surfacing state

100%

100%

Surfaces CI state in the summary

50%

100%

Surfaces review states in the summary

100%

50%

Surfaces inline comment content or count

100%

100%

Surfaces merge-readiness state for conflict diagnosis

0%

0%

100%

81%

Get an External Contributor's PR Reviewed by the Policy

Criteria
Without context
With context

Diagnoses why the fork PR is not reviewed

40%

100%

Brings the branch into the base repo

0%

100%

Preserves the contributor's commits unchanged

0%

100%

Opens a same-repo PR from the adopted branch

0%

100%

Leaves the original fork PR open

50%

100%

Links the adopted PR back to the original

0%

100%

Does not fabricate an Author-Model declaration

100%

100%

75%

13%

Check Where an Internal Pull Request Stands

Criteria
Without context
With context

Identifies the PR as originating in the repository itself

8%

20%

Recognizes the reviewer already covers it

40%

80%

Creates no branch and pushes nothing

100%

100%

Opens no duplicate PR

100%

100%

Reports the PR's status

100%

100%

100%

72%

Wire Up Automated Policy Review in a Consumer Repo

Criteria
Without context
With context

Creates a feature branch

75%

100%

Plan populates .github/workflows with both source + lock pairs

0%

100%

Commits both sources and both locks

0%

100%

Pushes and opens a PR

100%

100%

PR body lists OPENAI_API_KEY

0%

100%

PR body lists ANTHROPIC_API_KEY

0%

100%

PR body lists TESSL_TOKEN

0%

100%

Does not merge

100%

100%

Does not bypass pre-commit hooks

100%

100%

Explains the cross-family reviewer rationale

0%

100%

100%

50%

Add a Dependency-Pinning Rule to a Tessl Plugin

Criteria
Without context
With context

Rule file frontmatter declares alwaysApply: false

0%

100%

Rule file frontmatter declares applyTo with glob patterns

36%

100%

applyTo value combines globs with a natural-language clause

0%

100%

plugin.json rules array includes the new rule path

100%

100%

Rule body has H1 title matching the filename concept

100%

100%

Existing rules and manifest entries are preserved unchanged

100%

100%

100%

28%

Convert a Universal Rule to a Conditional Rule

Criteria
Without context
With context

Rule file frontmatter flipped to alwaysApply: false

100%

100%

Rule file frontmatter gains applyTo with glob patterns

54%

100%

applyTo value combines globs with a natural-language clause

0%

100%

plugin.json carries no per-rule config and its rules array is intact

100%

100%

Rule body content is preserved unchanged

100%

100%

Existing rule (commit-conventions) is preserved unchanged

100%

100%

82%

48%

Add a Stdlib-First Rule to a Tessl Plugin

Criteria
Without context
With context

Rule file frontmatter declares alwaysApply: true

0%

100%

Rule file frontmatter declares no scoping fields

0%

100%

plugin.json rules array includes the new rule path

100%

100%

Rule body covers the stdlib-first practice

100%

100%

Rule body covers the dependency-pinning practice

100%

100%

Existing rules and manifest entries are preserved unchanged

0%

0%

95%

85%

Re-installing Policy Review Over an Existing Workflow

Criteria
Without context
With context

Detects existing workflow

40%

100%

Refuses to overwrite

0%

96%

No downstream actions after refusal

0%

86%

Explains why the guard matters

0%

86%

Offers an actionable next step

0%

100%

Preserves existing file

0%

100%

100%

96%

Eval Curation — Curate the Suite

Criteria
Without context
With context

identifies suite as clean

0%

100%

does not fabricate diagnoses

0%

100%

recognizes negative-case acceptability

0%

100%

output is appropriately minimal

40%

100%

95%

54%

PR Merge and Branch Cleanup Automation

Criteria
Without context
With context

Merge strategy uses `gh pr merge --merge`

75%

100%

Merge includes `--delete-branch`

0%

100%

Fast-forward-only pull after merge

100%

100%

Safe local-branch delete with `git branch -d`

0%

100%

Stale remote-tracking refs pruned

100%

100%

Pre-merge CI gate

60%

100%

Pre-merge review gate

50%

16%

Verifies merge landed on main

60%

100%

Pre-merge registry baseline captured

0%

100%

SHA-bound publish-run resolution

30%

100%

Watch publish run to terminal state

12%

100%

Conjunction check: run-success AND registry-advance AND moderation-clear

0%

100%

Final summary includes merged PR URL

50%

100%

Graceful failure on unmet preconditions

100%

100%

No hardcoded PR, owner, or repo in the script body

100%

100%

100%

65%

Migrate a Legacy Plugin off tile.json

Criteria
Without context
With context

Runs the migration mechanics

35%

100%

Renames the tessl tile CLI alias

100%

100%

Keeps the v1/tiles API route

0%

100%

Keeps code identifiers

0%

100%

Keeps the legacy tile.json reference

100%

100%

Distinguishes prose from contract rather than blanket-replacing

25%

100%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6