Skills on Tessl: a developer-grade package manager for agent skillsLearn more
Logo

Agents Explained:

Beginner To Pro

Maksim Shaposhnikov
AI Research Engineer, Tessl
Back to podcasts

AI Agents Beyond Context Limits

with Maksim Shaposhnikov

Transcript

Chapters

Trailer
[00:00:00]
Introduction
[00:00:53]
Deep Dive into Agents 101
[00:01:18]
Understanding AI Entities: Bots, Copilots, Assistants, and Agents
[00:03:56]
Understanding Context Limitations in LLMs
[00:28:17]
Capabilities and Limitations of AI Agents
[00:32:22]
Security, Memory Management, and Future of AI Agents
[00:36:02]

In this episode

In this episode of AI Native Dev, host Simon Maple and Tessl research engineer Maksim Shaposhnikov explore the evolving landscape of software "agents" and how they differ from bots, copilots, and assistants. They delve into practical strategies for building reliable agentic code generation, offering insights on using the right tools for specific tasks, balancing speed and capability, and designing environments that foster safe, autonomous execution.

In this Agents 101 episode of AI Native Dev, host Simon Maple sits down with Maksim Shaposhnikov, a research engineer at Tessl, to demystify what “agents” really are, how they differ from assistants, copilots, and bots, and how developers can use them effectively. Max brings a background in large-scale LLM pretraining and now focuses on making agentic code generation reliable—so developers can trust the output without sinking hours into manual verification. The discussion delivers a practical taxonomy, clear usage patterns (IDE vs terminal), and hard-won guidance on reliability, latency, and long-horizon task execution.

A Practical Taxonomy: Bots, Copilots, Assistants, and Agents

Max frames the landscape with four clear categories. Bots are the oldest and simplest: they automate narrow, scripted tasks using predetermined dialog flows or trees. If machine learning is involved, it’s usually lightweight—think basic classification or named entity recognition. Critically, bots don’t have a meaningful “environment” to act in; they don’t explore or make decisions beyond the scripted path, which is why they’re often underwhelming for complex workflows.

Copilots are best understood as ultra-fast autocomplete. They operate on local context (e.g., the current file, nearby functions, and sometimes future tokens in the buffer), producing small, targeted completions. Historically, speed was everything—developers tolerated only split-second latency. As capabilities have grown, developers are now more willing to wait a bit longer if the copilot can synthesize something more complex, like building out a method or filling in a rich docstring-guided class skeleton.

Assistants bring multi-turn chat into the mix and can span a wide range of tasks—coding, research, planning—but the user is firmly in the loop. They respond to instructions, await clarifications, and focus on the “now” task rather than long-horizon execution. Agents, by contrast, are designed to act with autonomy. They can plan and execute multi-step, long-horizon tasks, sometimes without a human in the loop. Practically, many “agents” are still used like assistants today (e.g., Claude Code inside an IDE) because developers want to validate outputs and intervene rapidly. The distinction is less about raw capability and more about how you let the system operate.

Building Agents You Can Trust: Reliability and Robust Codegen

Max’s current work at Tessl centers on reliability: making sure agent-generated code is robust so developers don’t spend hours validating output. Reliability is especially hard in agentic settings because the system must coordinate multiple steps, tools, and files. If you’re building or integrating an agent, treat verification as a first-class concern rather than an afterthought.

In practice, reliability comes from structured execution and feedback loops. Keep tasks bounded but multi-step: give the agent specific milestones (e.g., “create module + unit tests + run linter + run tests”) and ensure it can observe results. For codegen, always pair generation with automatic checks—formatters, linters, type-checkers, and test runs. Treat the file system and shell as your agent’s environment for verification, not just for editing. Require diff previews or commit gates where the agent proposes a change, runs checks, and you approve before commit. The objective is to minimize blind trust while preserving flow.

Observability also matters. Even when using an agent “as an assistant,” capture logs, diffs, and command outputs so you can diagnose where things went wrong. Agents can overwhelm you with output; favor structured summaries and artifacts (e.g., a test report and a change summary) so human validation stays fast. Over time, this discipline lets you expand autonomy from “assistive” to “agentic” with confidence.

IDE or Terminal? Picking the Right Control Surface for Agentic Dev

The choice between IDE-based agents (e.g., Cursor, Windsurf, Claude Code inside your editor) and terminal-driven agents (e.g., Codex, Gemini CLI, Claude Code via CLI) comes down to interactivity and control. IDEs excel at onboarding and visibility: you see file diffs, inline annotations, and contextual suggestions. Buttons and panels translate into lower cognitive overhead. If you’re still keeping a human-in-the-loop, an IDE makes it trivial to validate, tweak, and keep track of multi-file changes.

Terminals favor power users. You don’t need UI chrome when you can combine commands, pipe outputs, and script everything. The big advantage is that you can run agents in non-interactive mode and spawn background processes that tackle long-horizon tasks while you do other work. The trade-off is that it’s easier to lose the plot among logs and output streams. If you choose the terminal route, treat readability as part of the system design: write outputs to files, use consistent prefixes/timestamps, and provide concise summaries so you can grep or tail your way into the signal quickly.

A practical pattern is to start interactively in an IDE to validate an agent’s approach on a small task, then graduate that same workflow to a CLI command for background execution once it proves itself. This preserves tight feedback during design and gives you scale and speed once you trust the flow.

Speed vs Capability: Designing for Latency Across Copilots and Assistants

Early copilots lived or died by latency because they were limited to single-line or small-block completions. Today’s tools can handle more complexity, and developers will tolerate a few extra seconds if the output quality jumps—especially when you provide detailed intent via docstrings or comments. The key is to align latency with task scope: use an instant “fast path” for inline completions and a deliberate “slow path” for heavier requests like scaffolding a class, authoring tests, or refactoring across files.

Assistants and agents both benefit from explicit task scoping. Keep assistant requests focused and immediately verifiable to maintain conversational flow. When you need long-horizon behavior—such as installing dependencies, creating files, and running tests—hand that to an agent mode that can plan, execute, and self-check. Make the transition explicit in your UI or CLI so users know when to expect near-instant completions versus multi-step execution with longer runtimes.

Where possible, set expectations in the interface. Show “thinking” or “executing” phases with summaries of planned steps. For terminal users, emit a plan header (steps, tools to use), stream concise progress, and conclude with a result summary and next actions. These small touches keep trust high even when tasks take longer.

Environments and Autonomy: What Elevates an Agent

“Environment” is the pivotal concept that separates bots from agents. Bots follow scripts; agents operate inside an environment where they can observe, act, and evaluate. For developer workflows, the environment often includes the file system, shell, package manager, test runner, VCS, and sometimes internet-accessible APIs. Giving the agent these tools is what enables long-horizon, multi-step work.

But autonomy without boundaries invites risk. Start with guardrails: read-only exploration before writes; diff previews for changes; approval checkpoints before running destructive commands; and controlled network access if applicable. Implement graduated autonomy—begin as an assistant, then allow the agent to execute a small subset of actions automatically, and expand from there. Maintain the option to pause, inspect, and roll back (e.g., via git). This balances velocity with safety.

Finally, match the interaction model to the job. Use assistants for quick, high-precision tasks where the human decides next steps. Switch to agents for proactive, background execution that can plan, act, and verify without constant supervision. With thoughtful environment design, clear checkpoints, and robust verification, you can let agents do more while staying confident in the results.

Key Takeaways

  • Use the right tool class: bots for scripted flows, copilots for instant local completions, assistants for chat-based, user-driven tasks, and agents for autonomous, long-horizon execution.
  • Design for reliability: pair code generation with automatic checks (formatters, linters, type-checkers, tests), require diff previews, and keep observability artifacts (logs, reports) for easy validation.
  • Choose your interface deliberately: IDEs maximize visibility and onboarding; terminals maximize power and background execution. Start interactively, then graduate stable flows to CLI automation.
  • Align latency with task scope: instant completions for inline edits; accept seconds for richer outputs like docstring-driven classes; use explicit agent modes for multi-step execution.
  • Treat environment as a first-class concept: give agents the tools they need (FS, shell, tests, VCS), but enforce guardrails and graduated autonomy so you can scale trust safely.

Chapters

Trailer
[00:00:00]
Introduction
[00:00:53]
Deep Dive into Agents 101
[00:01:18]
Understanding AI Entities: Bots, Copilots, Assistants, and Agents
[00:03:56]
Understanding Context Limitations in LLMs
[00:28:17]
Capabilities and Limitations of AI Agents
[00:32:22]
Security, Memory Management, and Future of AI Agents
[00:36:02]