The Tessl Registry now has security scores, powered by SnykLearn more
Logo
Back to articlesGitHub to use Copilot interaction data for training by default

30 Mar 20266 minute read

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

GitHub has revealed that it’s changing how it handles data from its Copilot coding assistant, saying it will begin using user interaction data to train its AI models unless individuals opt out.

The update applies to Copilot Free, Pro, and Pro+ users, while Copilot Business and Enterprise customers are excluded from the policy change.

“We believe the future of AI-assisted development depends on real-world interaction data from developers like you,” Mario Rodriguez, chief product officer at GitHub, said in a blog post.

The change covers what GitHub describes as “interaction data” — specifically inputs, outputs, code snippets, and associated context. GitHub says this material will be used to improve model performance and better reflect how developers actually use the tool.

“This approach aligns with established industry practices and will improve model performance for all users,” Rodriguez continued. “By participating, you’ll help our models better understand development workflows, deliver more accurate and secure code pattern suggestions, and improve their ability to help you catch potential bugs before they reach production.”

GitHub Copilot data policy: What’s changing

The updated policy, due to take effect on April 24, draws a distinction between different types of data. GitHub says it will not use code from private repositories “at rest” for training. But when developers actively use Copilot, the tool processes code from those repositories in real time — and that interaction data can be used unless the user opts out.

Rodriguez addresses that distinction directly, pointing to how Copilot handles code during active use.

“We use the phrase ‘at rest’ deliberately because Copilot does process code from private repositories when you are actively using Copilot,” Rodriguez notes. “This interaction data is required to run the service and could be used for model training unless you opt out.”

That clarification makes the boundary more explicit, though it leaves most day-to-day usage within scope. Stored private code is excluded, but anything surfaced through Copilot during active use can be used unless users change their settings.

Opt-in vs opt-out

The decision revisits concerns that have followed Copilot since its launch, particularly around how the tool is trained and what developer data is used. Critics argued that Copilot was trained on publicly available code without always respecting licensing terms or attribution. In 2022, a group of developers filed a class-action lawsuit against GitHub, Microsoft, and OpenAI, alleging the tool reproduced code in ways that breached open-source licences and copyright law.

That recent history is also shaping how developers are reacting to this latest update, with discussion centered around two key themes: defaults and boundaries.

On Hacker News, several commenters took issue with the opt-out design, arguing that settings involving code and training data should require an explicit decision rather than relying on users to find and change a preference. Some said the default-on approach reflects an assumption that few users would choose to opt in voluntarily. “Who in their right mind will opt into sharing their code for training? Absolutely nobody. This is just a dark pattern,” one user wrote.

Legal concerns also surfaced, particularly in Europe. “What is the legal basis of this in the EU?” one commenter asked, adding that any consent would need to be “freely given, specific, informed and unambiguous” under the EU’s General Data Protection Regulation (GDPR).

The update also fed into longer-running unease around open source, with some questioning whether publishing code publicly still makes sense in an era of AI training.

“Thanks to Github and the AI apocalypse, all my software is now stored on a private git repository on my server,” one user wrote. “Why would I even spend time choosing a copyleft license if any bot will use my code as training data to be used in commercial applications? I'm not planning on creating any more open-source code, and what projects of mine still have users will be left on GitHub for posterity.”

The same user pointed to alternatives outside GitHub’s ecosystem. “If you're still serious about opensource, time to move to Codeberg.”

For GitHub and, by extension, Microsoft, the policy change sets out a clearer position on how Copilot is trained. For developers, it’s likely to fuel an ongoing debate about how much control they retain over code once it’s fed into AI systems.