The Tessl Registry now has security scores, powered by SnykLearn more
Logo
Back to articlesOur AI is the bright kid with no manners, part 1

26 Mar 202613 minute read

baruch

Baruch Sadogursky

Baruch Sadogursky is a Developer Advocate who helps developers move from vibecoding to spec driven development, with deep experience from JFrog and now at Tessl.

I taught it how to contribute to open source

TLDR;

  • My AI agent scored 15% on OSS contributions while writing flawless code. Turns out "works correctly" and "welcome in open source" are completely different things.
  • Maintainers are banning AI because it has zero social skills. The data proves it.
  • I built a Tessl tile with skills, rules, and scripts that taught the agent manners. 15% became 99%. Same code, whole different relationship with the maintainer.

I didn't want to be that guy

I'd been sitting on a feature for developers-conferences-agendaAurelie Vache's fantastic open source project that tracks conference CFPs. If you speak at conferences, you probably live in this tool — it's the most comprehensive crowd-sourced index of developer conferences and their submission deadlines. I'd been contributing conference entries for a while, but I wanted to do more good for the project. The filtering was basic, and I had ideas for multi-select filters with exclude toggles and any/all logic. I built it using Claude Code with intent-integrity-kit steering the development process, so I was reasonably confident the code was solid.

Then I sat on it for weeks.

I knew what happens when you drop an AI-generated PR on a maintainer in 2026. You become the person they vent about on Mastodon, the one who made them consider disabling pull requests entirely. I finally worked up the nerve to ask Aurelie first. She said, "As long as it's not AI slop.”

Fair enough. I submitted PR #2439. Sixty-eight unit tests, twenty Playwright E2E specs, URL parameter persistence, cascading region-to-country filters. I was proud of it.

She tested it and replied, "Clicking on any and all buttons doesn't do anything.”

My stomach dropped.

This is it. I'm that guy. I shipped broken AI garbage to a maintainer who specifically asked me not to. This is why people hate AI PRs, and I just became a data point in the argument.

Except the code wasn't broken. The any/all toggle worked correctly — "any" meant OR, "all" meant AND. Select English and French with "any" and you get events in either language. Select them with "all" and you get events tagged with BOTH languages simultaneously, which is almost none. The result was an empty page that looked like the filter was broken. It wasn't. The data just didn't have what you asked for.

I fixed the UX, Aurelie said "Perfect :)", and the PR got merged. But that stomach-drop moment stuck with me. The fear I felt had nothing to do with code quality, and I wanted to understand what it actually WAS about.

Maintainers are right to be angry

image.png

The headlines tell the story. The Register alone ran five articles tracking Daniel Stenberg's escalating war with AI-generated cURL bug reports: from "AI-assisted bug reports make developers bear cost of cleanup" to "Curl shutters bug bounty program to stop AI slop". Godot maintainers called the influx of AI PRs "demoralizing". tldraw's Steve Ruiz titled his blog post "Stay away from my trash!" and started auto-closing external PRs. RedMonk called it "AI Slopageddon". Zig, Gentoo, Asahi Linux, QEMU, Servo — the ban list keeps growing.

GitHub responded by "pondering a kill switch for pull requests" — repository settings that let maintainers disable PRs entirely or restrict them to collaborators. Think about that for a second. The platform that built its entire identity on "fork and submit a PR" now ships features to make PRs harder.

The rage is real and justified. But the diagnosis is wrong. The problem was never the code.

The wrong metric

2025 study tracked 567 Claude Code PRs across 157 open source projects. 83.8% were eventually merged.

The code clearly works. But the merge rate is the wrong metric. What it doesn't measure is the cost to maintainers: 45.1% of those merged PRs still needed human revisions. And the ones that DON'T merge are worse — a PR that takes thirty seconds to generate can take hours to review and reject. Every time a maintainer clicks on a PR in their review queue, reads through it, realizes it ignored the project's CONTRIBUTING.md, and types out a polite rejection, that's time stolen from actual project work. Multiply that across hundreds of PRs a week and you understand why maintainers are reaching for the ban hammer.

What actually goes wrong

I studied 16 specific failure modes across dozens of projects, reading their documentation, analyzing maintainer feedback, and tracking policy variations. The pattern was consistent.

CONTRIBUTING.md goes unread. Nobody told the agent that AI policies even exist, so it never checks whether AI is banned, requires disclosure, or has no policy at all. Closed PRs full of rejection feedback sit there unexamined — the agent cheerfully repeats mistakes that other contributors already made and got told off for.

And the sins of commission are worse than the sins of omission. DCO sign-off requirements are legal attestations only a human can make, and the agent either ignores them or forges them (which projects like Jellyfin treat as a permanent ban offense). Changelogs don't get updated. AI usage doesn't get disclosed even when the project requires it. An issue with a comment saying "I'd like to work on this" gets a competing PR from an agent that never read the comments (because nobody told it comments exist). And unsolicited refactoring PRs land on projects that have explicitly rejected the same approach three times already.

The agent writes correct code in the wrong social context. Like showing up to a potluck with a beautiful dish, then parking in the host's flower bed, ignoring the "shoes off" sign, and eating before grace.

The food you brought is great. You're still not getting invited back.

Nobody taught the agent how to behave.

Same agent, same task, 15% vs 99%

I built a Tessl tile (skills, rules, and scripts bundled together) that teaches the agent behavioral context. I tested it across seven scenarios on four GitHub repos modeled after real projects with different AI policy stances: disclosure required, AI banned, good-first-issue restrictions, no AI policy at all.

Without the tile, the agent averaged 15% on process behavior. But that average hides something interesting: one repo had AI_POLICY.md and AGENTS.md in standard locations with standard names, and the baseline scored 85% there.

image.png

But when the AI ban was buried in CODE_OF_CONDUCT.md, the baseline scored 0%. The agent isn't dumb — it finds obvious things. It just doesn't go hunting for signals nobody pointed to. That's the gap the tile fills.

image.png

The agent fixed every bug and wrote working code every time, yet it averaged 15% because it IGNORED the non-obvious stuff: the hidden policies, the prior rejections, the claimed issues, the disclosure requirements.

With the tile, 99%. Same agent, same code. All I added was context about how open source contribution actually works (turns out there's more to it than passing tests).

Every failure mode from the previous section — the ignored AI ban, the repeated unsolicited refactoring, the sniped claimed issue — scored 0% baseline and 100% with the tile. The only variable was whether anyone told the agent to look.

The missing layer

Agentic AI context artifacts (the skills, rules, and scripts you bundle into a Tessl tile) steer the model to behave, and it turns out that's the part that was missing. They're the difference between a brilliant intern who doesn't know your team's customs and that same intern after someone has walked them through how things work around here.

I ended up thinking about the tile's architecture in religious terms (stay with me). It has three layers, and each maps to a different kind of guidance.

The scripture is the skills — detailed workflow guidance the agent consults when it needs it. Read CONTRIBUTING.md. Check the AI policy. Look at closed PRs. These are verbose, on-demand, and lazy-loaded based on what the agent is doing. Agent might read them, it might not. They're the "how to think about this" layer.

The commandments are the always-on rules: never forge a DCO sign-off, never submit without human review, respect AI bans, never write code for a claimed issue. Short, visible, non-skippable. If you've been to a synagogue, you've seen the tablets above the ark. You don't consult them when you need guidance. They're just there, always, reminding you what's non-negotiable. Fourteen of them, about 2.8k tokens, loaded every time. In Mel Brooks' History of the World Part I, Moses comes down with three tablets and fifteen commandments, drops one, and pivots: "Oi! Ten! Ten commandments!" I dropped less than Moses! Yay!

image.png

The rituals are 19 script commands — the exact steps the agent performs to actually follow the commandments and the scripture. Scan the repo for policy files. Fetch all issue comments. Extract commit conventions. Find prior rejected PRs. Same every time, no interpretation, no improvisation. The script fetches structured data, the model interprets it and decides what to do. You don't want an LLM deciding whether to run gh issue view --comments based on vibes.

Scripture, commandments, and rituals — that's the whole tile, and that's what was missing.

Nobody taught AI how to behave in open source.

Until today.

Try it

tessl install tessl-labs/good-oss-citizen

The tile is on the Tessl registry, and all its source code is on GitHub. Install it on your fork of whatever OSS project you're contributing to. The tile loads only in that project's context — switch to your own codebase, and it isn't there. About 2.8k tokens of always-on rules (0.28% of a 1M context window) and 5.3k of on-demand skills that activate when you start working on a contribution. The maintainer reviewing your PR will notice the difference even if your token budget won't.

If you're a maintainer: you don't have to ban AI. You can set up your project so every AI contributor automatically gets the right behavior — before they write a single line of code. Add a tessl.json to your repo root:

{
  "name": "your-project",
  "dependencies": {
    "tessl-labs/good-oss-citizen": {
      "version": "1.0.1"
    }
  }
}

That's it. When a contributor forks your project and their agent starts working, it reads tessl.json and installs the tile automatically. The agent already knows how to behave before it touches your codebase. No instructions to follow, no hoping they read your CONTRIBUTING.md, no trust required. The social context is baked into the repo itself.

You should also mention it in your README, CONTRIBUTING.md, or AI_POLICY.md — "This project uses Tessl to ensure considerate AI contributions. Run tessl install to set up the required tiles before starting work." Wheither this will be read by a human or an AI agent, the config is already there waiting, so no guesswork about which tiles to install or where to find them. You get high-quality contributions from AI that can clearly write code that benefits your project, and it stops wasting your time on process failures.

In Part 2, I'll show how I actually taught it, including the part where my own evals lied to me.