General-purpose coding policy for Baruch's AI agents
90
91%
Does it follow best practices?
Impact
90%
1.76xAverage score across 18 eval scenarios
Advisory
Suggest reviewing before use
Lift, Not Attainment). Proving plugin value is the goal, not coverageLift, Not Attainment curation, not forced truncationtessl scenario generate skews toward happy-path)fix/*" is a task with the answer smuggled in.tessl/plugins/... paths, plugin-only identifiersgh pr create, REST endpoints, conventional-commits format, semverFixed in <sha>, chosen flags like --ff-only, invented format literals. Checking for them measures application, not leakinggh pr merge is public; createJwtToken internal action is plugin-internalwith-context score minus baseline score, where with-context is with the plugin loaded and baseline is without. Near-zero lift on a positive case has three causes:
skills/eval-curation/SKILL.md) every few publishestessl plugin publish (or tesslio/patch-version-publish) executes — that is the persistence pointtessl eval run step to plugin-repo CI; do not add a scheduled or recurring workflow that re-runs the eval suitejbaruch/coding-policy-evals)Fixture Hygienemy-scenario, not MyScenario, my_scenario, or my scenario)<skill>-<descriptor> (e.g., install-reviewer-refuses-overwrite, eval-curation-task-leak-fix)pr-merge-and-post-merge-cleanup)refuses-overwrite ✓, checks-existing-file-via-stat ✗tessl eval view, tessl scenario generate) silently truncating longer namestessl eval run that exceed it are grandfathered by the rename-stability clause (which wins over the cap)tessl eval run, the name is stable — do not rename. tessl eval view identifies scenarios by directory name; renaming resets the lift history the Persistence section relies on<rule>-<fixture_type>-<cell>-run-<n>) may diverge from kebab-case-with-descriptor when documented in the plugin's evals/instructions.jsonrule/fixture_type/cell/run needed by the rig's scorer for lift bucketing)evals/instructions.json declares the rig's actual safe length AND names every tessl-eval tool the rig touches end-to-endtessl eval run <path>, scoring via custom scorer### Rules (or equivalent) entry citing this clausetessl eval view identifies scenarios by directory name irrespective of how the name was generatedfixture-2025-04-17.json).tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
rules
skills
adopt-fork-pr
eval-curation
install-reviewer
migrate-to-plugin