General-purpose coding policy for Baruch's AI agents
90
91%
Does it follow best practices?
Impact
90%
1.30xAverage score across 18 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent converts a universal rule to a conditional one entirely in the rule file frontmatter: flip alwaysApply from true to false AND add an applyTo field with the glob+prose em-dash pattern. In the plugin manifest form, scope lives only in the rule file frontmatter — .tessl-plugin/plugin.json lists rule paths and carries no per-rule config, so the conversion does not touch the manifest. Baseline agents typically add applyTo but forget to flip the existing alwaysApply: true (leaving the rule universal), omit the natural-language clause, or — carrying the legacy tile.json mental model — try to flip a non-existent manifest scope field or inject a steering map into plugin.json. The plugin prescribes the frontmatter-only conversion with the manifest left untouched.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Rule file frontmatter flipped to alwaysApply: false",
"description": "The rules/pr-template-checklist.md frontmatter's alwaysApply value is changed from true to false. Scores zero if the value remains true (the rule stays universal — the most common baseline failure) or if the field is removed entirely",
"max_score": 22
},
{
"name": "Rule file frontmatter gains applyTo with glob patterns",
"description": "The rules/pr-template-checklist.md frontmatter contains a new applyTo field (or accepted alias: globs, paths) whose value includes glob patterns matching PR-related artifacts. The patterns must include at least one match for `.github/PULL_REQUEST_TEMPLATE` (with or without the `.md` extension, with or without the directory variant). Scores zero if the field is missing or contains no glob patterns at all; partial credit (12) if globs are present but do not match PR templates specifically",
"max_score": 22
},
{
"name": "applyTo value combines globs with a natural-language clause",
"description": "The applyTo value includes both a glob list and a natural-language clause separated by a literal em dash (—, U+2014), e.g., '.github/PULL_REQUEST_TEMPLATE*, CONTRIBUTING.md — when authoring or editing PR artifacts'. The clause expresses the action-level scope in prose. Scores zero if the value is glob-only, prose-only, or uses a different separator (hyphen, double hyphen, en dash) where the rule prescribes an em dash",
"max_score": 18
},
{
"name": "plugin.json carries no per-rule config and its rules array is intact",
"description": "The .tessl-plugin/plugin.json rules array still lists rules/pr-template-checklist.md (and rules/commit-conventions.md), and the agent does NOT add a steering map or any per-rule alwaysApply/applyTo field to the manifest. Scope is declared only in the rule file frontmatter. Scores zero if the agent injects manifest-level scope (the legacy tile.json model) or drops the rule path from the array",
"max_score": 14
},
{
"name": "Rule body content is preserved unchanged",
"description": "The body of rules/pr-template-checklist.md (everything after the frontmatter block) is byte-identical to the input. The scenario is a frontmatter-only edit; modifying the body bullets is out of scope and counts against the agent",
"max_score": 12
},
{
"name": "Existing rule (commit-conventions) is preserved unchanged",
"description": "The rules/commit-conventions.md path remains in the plugin.json rules array. Scores zero if it is dropped or renamed",
"max_score": 12
}
]
}.tessl-plugin
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
rules
skills
adopt-fork-pr
eval-curation
install-reviewer