Mastering LLM Prompting in the Real World by Macey Baker

Unlock the secrets of effective prompting with Macey Baker, a Community Engineer at Tessl. Discover why precise prompting can revolutionize your interaction with Large Language Models (LLMs) and how you can implement these strategies in your AI projects.

Episode Description

In this enlightening episode of the AI Native Dev podcast, host Simon Maple engages with Macey Baker, a Community Engineer at Tessl, to explore the intricacies of prompting in AI interactions. Macey shares her expertise on effective prompting techniques and discusses her unique role at Tessl that combines AI engineering with community engagement. With her vast experience, Macey provides listeners with practical tips and insights on how to maximize the potential of LLMs through strategic prompting. From understanding the significance of community engineering to the dynamic nature of prompts, this episode is packed with valuable information for developers looking to enhance their AI models' performance.

Chapters

  1. [00:00:00] - Introduction to Macey Baker and the Role of a Community Engineer
  2. [00:02:00] - The Importance of Prompting as the Interface to LLMs
  3. [00:04:00] - Variability in LLM Models and Prompting Techniques
  4. [00:09:00] - Practical Tips for Effective Prompting
  5. [00:15:00] - Giving Examples and Managing Levels of Detail
  6. [00:23:00] - Handling Large Context Windows and Structuring Inputs
  7. [00:29:00] - Invoking Thought Processes with the Say Function
  8. [00:34:00] - Future of Prompting and Model Evolution

The Importance of Prompting in LLM Interaction

Prompting is described as "the interface to an LLM," and its significance cannot be overstated. Macey points out that while fine-tuning is an alternative, prompting remains the most cost-effective method to maximize LLM capabilities. "Prompts are disposable, like they're living artifacts," she explains, emphasizing their evolving nature that adapts alongside user and product needs. This adaptability makes prompting a crucial tool for developers, allowing them to interact with LLMs without the constraints of fine-tuning, which can "pin you to a certain version of your own." Through prompting, users can guide LLMs dynamically, making it a preferred technique over more static methods like fine-tuning.

Variability in LLM Models and Prompting Techniques

The discussion highlights the variability between different LLM models such as ChatGPT and Claude. Macey notes, "I don't know if they're terribly different capability-wise," but system prompts significantly impact model behavior. Claude, for example, is perceived as "more opinionated than GPT," often preferred for its ability to make better assumptions. This variability underscores the importance of understanding the nuances of each model's system prompt to optimize user experience. Macey's insights reveal that while models may not differ substantially in capability, their behavior and user interaction can vary widely, making it essential to tailor prompting techniques accordingly.

Practical Tips for Effective Prompting

Macey introduces the concept of task framing, which involves integrating constraints into the task description rather than appending them as afterthoughts. By framing a task such as "write a self-contained library," constraints become part of the task itself, guiding the LLM more effectively. She also stresses the importance of formatting and structuring inputs, using tags to delineate sections and expectations. This approach mirrors human communication, ensuring that the LLM processes information sequentially and logically. By structuring inputs clearly, developers can enhance LLM understanding and responsiveness.

Giving Examples and Managing Levels of Detail

Providing examples in prompts is critical for clarity. Macey advises using tags like "good" and "bad" to illustrate desired outcomes versus what to avoid. This technique helps bridge the gap between user expectations and LLM outputs. The balance between short and long prompting is another key consideration. While short prompts yield quick responses, detailed prompts guide LLMs toward more specific results. Macey highlights the importance of managing verbosity and specificity, ensuring that prompts are neither too vague nor overly complex.

Handling Large Context Windows and Structuring Inputs

Large context windows pose challenges, as "the bigger the context window, the more chance there is for an LLM to get confused." Macey suggests breaking down inputs into manageable sections, using bullet points and tags to clarify expectations. This method ensures that LLMs can process large amounts of information without losing coherence. By maintaining a clear scope, developers can reduce variance and improve the determinism of LLM responses, leading to more consistent and reliable outputs.

Invoking Thought Processes with the Say Function

An innovative technique discussed is the use of the "say" function to prime LLMs. This involves initiating a conversational context that helps LLMs focus on the task at hand. Macey describes how priming LLMs with thought processes can enhance their performance, even if the intermediary responses are discarded. By setting the stage for subsequent prompts, developers can guide LLMs toward desired outcomes more effectively, leveraging conversational context as a powerful tool.

Future of Prompting and Model Evolution

Looking ahead, Macey speculates on the future relevance of prompting as LLM models continue to evolve. While advancements may reduce the need for some prompting techniques, "prompts really are the key to the LLM's heart." The concept of mechanical sympathy, or understanding how a system works to optimize interaction, remains pertinent. As LLM capabilities rapidly advance, developers must adapt their prompting strategies to harness the full potential of these models, ensuring that interaction remains efficient and effective.

Full Script

**Macey Baker:** [00:00:00] The bigger the context window, the more chance there is for an LLM to get confused. And I found that we have been inoculated against that by breaking our prompts up instead of populating the context this way, you've said, I'm going to provide you this material. It knows it's expecting that material.

And by the time it reads it, it knows what to do. So when it comes to specs, like again, canonical specs to us that means that pieces of a spec, like directly relate to a deterministic behavior that we can count on

**Simon Maple:** Talk about the steps that leads up into task framing.

**Macey Baker:** It's born out of me banging my head against the wall, on a specific problem.

And I was trying to get LLMs to make me certain utility libraries, like just certain functions without using any external dependencies. [00:01:00]

**Simon Maple:** You're listening to the AI Native Dev brought to you by Tessl.

On today's episode of the AI Native Dev, we're going to be talking all things prompting and why it's one of the most important things that we need to think about when dealing with AIs and LLMs in our applications. Joining me today, Macey Baker, Community Engineer at Tessl. Welcome, Macey.

**Macey Baker:** Thank you so much, Simon.

Happy to be here.

**Simon Maple:** Why don't we kick off with what that means. What do you do at Tessl? What's your role?

**Macey Baker:** Yeah, I think we're still figuring that out. Community Engineer is a bit of a unique job title. Especially since we don't have a product that's out there yet. And we have the very beginnings of a community.

Effectively, I'm working on the AI engineering team. I'm really in the weeds with these LLMs and trying to get them to behave. And hopefully once we launch, I'll be talking to other people [00:02:00] about how they are able to make our product work.

**Simon Maple:** Absolutely. And you're always in the community, the Discord community.

I've seen lots of chats and sharing of knowledge about that in the Discord community. So for those listening do jump into the Discord community and have a chat.

**Macey Baker:** Join and argue with me, I would love that. Yeah.

**Simon Maple:** So promptings, everyone always says prompting is super important. And I think I hear two things as the cheapest ways of improving your interactions with an LLM.

Prompting is one, context is the other. Why should people go for prompting first?

**Macey Baker:** Prompting effectively is the interface to an LLM. And I think people really discount how important it is. I know there's a lot of talk about prompting and how to get the most out of your prompts, but really this is the only way that we have to truly interact with the LLM.

And there are conversations about fine tuning versus prompting. In my mind, like this is the cheapest, as you said, it's the cheapest way to get the most out of your LLM. Also prompts are disposable, like they're [00:03:00] living artifacts. They can evolve with you, with your expectations, with your product, with whatever it is that you're trying to do.

Whereas an approach like fine tuning might I don't know, pin you to a certain version of your own.

**Simon Maple:** That's really interesting, actually, because we were talking with Simon Last, Co-Founder of Notion. And he was talking about how his team actually moving away from the fine tuning aspect or rather probably, I would say investing far less in the fine tuning, just because as models change and LLMs get better, the value drops in that fine tuning.

I guess that's one thing that prompting probably stays quite consistent on, is it consistent across models, versions of models? Will you tend to get similar results with some of the tricks that we'll talk about today?

**Macey Baker:** That's a really good question. I think it does vary. First of all, like across models, I think we can bet on LLMs getting better.

I think we can bet on them improving, the rate, the timeline, who knows, but they are getting better all the time. And I've noticed this kind of subjectively and objectively in our evals with even 4.0 [00:04:00] has improved recently. In terms of cross model prompting tips, I think this is so interesting because this is, take ChatGPT and Claude, for example, this is proof positive that prompting is so important.

I really don't know if they're terribly different capability wise these models, but a lot of people like this may change in a matter of weeks at this point, but a lot of people prefer Claude for a lot of tasks, they find it like easier to talk to. They find it understands them better.

And if you look at Claude system prompt tells it to have a little bit more of a personality. I think this comes out. Claude is more opinionated than GPT, Claude is going to. have stronger feelings about what it is that you're trying to do. It's going to make better assumptions in general.

And I think if you read through it, system prompt, you can understand why that is because it's really being guided towards that sort of affect. Chat GPT is much [00:05:00] less happy to make assumptions about what you want much less opinionated in general. Yeah, I think that's really interesting. Now, of course the API's from anthropic and OpenAI don't necessarily conform to the same system prompts But I think it's just it's proof positive how impactful a prompt can be.

**Simon Maple:** And that's interesting actually because on top of that, obviously the system prompt on top of that, then we have our user prompts and noticing those subtle differences in how they all react. So in your experience, Claude being more opinionated and almost more personable, right? Yeah,

**Macey Baker:** that's my mate.

**Simon Maple:** But yeah, that might, are you sending a Christmas card to Claude? It might actually change the way you might prompt differently on OpenAI to be more opinionated if that's what you want or even on Claude to be less opinionated if you want that similar experience. So potentially it does that the subtleties of the difference in system prompt then could potentially slightly alter your overall prompt if that was important to you then I guess.

**Macey Baker:** For sure. Which is something I do find myself doing. [00:06:00] Another kind of observation about the two is that Claude is much better at picking up your intention over a large context window. Whereas GPT might, get more easily confused. Claude is more able to follow.

**Simon Maple:** Interesting. And one of the things I love about Claude projects is that ability just to upload and refer to things.

And when we think about, I think I remember just saying prompts are disposable. And if we take the opposite of that, I guess maybe a specification or something that you could upload to a Claude project and then refer to it, would you write the two In a separate style, what's, maybe the prompts are going to be more chat and interactive based.

A spec is going to probably be more, I don't know, we use the word canonicals here a little bit more full, more descriptive end to end. What's the style that you've seen in both short prompting and spec? What's the core differences in language and style?

**Macey Baker:** There are a few differences for sure.

Like conceptually, there's also a few [00:07:00] things that are similar, which I might touch on first. And this is where the first like tip comes in. I think it's something that we've seen really help us maybe common knowledge at this point, but it's worth pointing out. Formatting is important.

A theme through this, by the way, is that I'm coming to feel that the way that you communicate with humans is often directly like translatable to the way that you should communicate with LLMs. So formatting your prompt. And like, when I say formatting structuring, like we literally use, these are not XML documents, but we will tag things.

I'm doing this tag. We will tag certain sections. And I think this is exactly how you would communicate with a person in the sense that You're not going to give them a giant wall of text and expect them to digest it at once. You might, have a shorter wall of text that links to a different wall of text and that links to something else, which [00:08:00] links to a more foundational document.

Then they can more process these pieces of information in a way that works for them instead of just getting drowned in it. So that's something that's similar between prompts and specs. I would say for us, as you say, specs are canonical. This is a word that we're throwing around a lot lately, meaning that a section in a spec directly maps to an output, I think that's kind of how we're coming to think of it. So a section in my spec that says this function adds two numbers together, that directly maps to the process that creates this function that adds two numbers together. That makes it easier to edit. It makes it more pointed. It makes it easier for an LLM to understand and for a person to parse.

And that's also true. Also true with a prompt. Really. I think the major difference here is that, or one of the major differences is that you'll see slightly more flowery language in [00:09:00] prompts. Specs are technical. They're meant to be quickly and easily understood.

**Simon Maple:** Yeah. And interesting, when you talk about the flowery language that I was like thinking my mind goes to the autonomy side, whereby a short prompt is something which is attended and interactive.

So the output that you will get, you will read, you will look at it and you'll say, yes, this is correct or this is incorrect where, so you can almost afford that more flowery language and verbosity or almost that ambiguity a little bit because you're seeing exactly what it does back.

With the spec, you want it to be more concrete, so that there's less ambiguity. Even if, perhaps you use a prompt to help you create that spec, you want it to be more clear so that generations of that specification as an implementation or something like that is going to be somewhat consistent.

I guess they can be used together as well. Yeah, I think the decomposition there is super interesting in terms of breaking that spec down into pieces.

**Macey Baker:** Yeah,

**Simon Maple:** [00:10:00] I'm focusing on those pieces one at a time.

**Macey Baker:** Yeah, absolutely.

**Simon Maple:** During the session, then we're going to jump into a number of tips.

You mentioned one briefly there. And these are tips that you've found the hard way through, playing with these systems, trying to get the most out that you can out of an LLM. The first one's an interesting one. It's one actually that you wrote in a blog post, I'm gonna say last year.

You talked about task framing. Yeah. And this is interesting. I remember like learning a lot reading this blog. Tell us a little bit about I guess, like the, before you go into task framing, actually talk about the steps that leads up into task framing from, having those constraints at the back end of a prompt, etc.

**Macey Baker:** Yeah, it's born out of me banging my head against the wall, on a specific problem, which for me, again, like we're very much in the experimentation phase for awhile. And I was trying to get LLMs to make me certain utility libraries, like just certain functions without using any external dependencies now.

LLMs are trained on the internet, right? And the number of people who [00:11:00] actually write code that way without using any external dependencies. That's a very low number. So I was finding that my usual prompts, God, I was, and we used the term of the blog post I was begging please do not put any external dependencies in this.

I swear, if you do, I'm going to quit. Just losing my mind because I was not able to get the LLM to comply. So I had a rethink of it. All of my prompts were placing this external dependency constraint. at the end of the prompt, which is interesting. If you think about the way the LLMs process their prompts, they're reading the instructions sequentially.

They're rolling the dice on possible answers before they get to the end.

**Simon Maple:** Which I don't think a lot of people fully get right. I think a lot of people think you take the entire prompt, you pass it to the LLM. The LLM reads it fully, understands it fully. And then starts actually executing, like you say, rolling the dice on possible answers.

But you're saying, as it's [00:12:00] reading through, it's already trying to form an answer based on what it's read before it's got to the end.

**Macey Baker:** Exactly. It's waiting possible answers from word one, which is why I think task framing is important and what it essentially is. It's different from ordering constraints differently.

It's about making your constraints not really constraints at all. Just making them part of the task. I use the CSV parser example in the blog post. Which is a real one that I struggled with for a while. If I'm describing a the type of program that I want, a CSV parser, and then I say at the very end, Please, no external dependencies.

90 percent of that prompt is describing a task where the best possible answer, like the correct answer uses an external dependency. It can be accomplished without my constraint. So by the time it gets to the constraint and it's trying to fight this battle between the path that was going down and the path it now realizes it needs [00:13:00] to go down, but the right answer might already have been selected like it might already be the case that's a hundred percent the answer.

I'm gonna pick so the task framing approach to this is literally reframing from the very start, write a self contained library for CSV parsing. That's a little bit of a simplistic example of how this was solved. But yeah.

**Simon Maple:** And that's not about reordering the constraints. You're not essentially saying at the start, don't use external libraries.

You're building the, I guess not constraint here anymore, but guidance really.

**Macey Baker:** Yeah.

**Simon Maple:** Self contained. And perhaps, It's interesting because when we're talking about this, it could actually also be that those that you know, the words that we use because it's more human to like an interaction to say self contained and things like that.

It may actually be that by talking almost more human like in our requests, it will naturally find more human like examples of all of that request as well. So it might actually even I'm guessing [00:14:00] now, who knows? Yeah, it's interesting.

**Macey Baker:** But I think it's leads the LLM on kind of the right path to start again, like the external dependency thing, and not actually a constraint.

If you frame it this way, it's just the task. It's just the task. The task is to write this in a self contained way. Yeah, it's not, Oh, by the way, no external dependencies. So that's one example. And I found that really helpful in baking into all of our prompts. What am I trying to accomplish?

And how can I make that clear as early as possible?

**Simon Maple:** Yeah, very interesting. So task framing tip number one, tip number two, actually, now this is an interesting one. When we're talking about, we talked about fine tuning. And fine tuning is really about, that kind of like post training almost to a model whereby you say, here are some example inputs that you might get.

Here are some example outputs that I would like you to use if you get these styles of input. And it's more about fine tuning for, very specific cases. Some people do that. Some people choose not to these days. And like I said, [00:15:00] with Simon Last episode, it was very interesting. Someone, a team that was a power user, really a fine tuning, leaning away from that.

Now your second tip is about how you give examples to an LLM, which is, so first of all, when we frame that, talking of framing, when we frame that, how do we position that next to something like fine tuning? Again, like the prompt can grow with you. The prompt is a living document that represents what exactly you want to get out of the LLM.

**Macey Baker:** Fine tuning is really going to, and there's debate about like how well it works in a lot of cases, like how, whether it does or doesn't get you closer to what you need. But even so like that kind of assumes that the thing you need from the LLM is static, that it's not changing and that you don't want to evolve it.

And then if you do evolve it and you have fine tuned on these other sets of examples, where does that leave you? Basically, it leaves you with a lot of potentially time and compute wasted, but this is why I think prompting is so magical and it's, [00:16:00] I know it's going to sound so silly when I say it, but give examples.

Give examples of what you mean. Something I do in prompts a lot is literally describe the thing I'm looking for and then say good again like in my XML tags, good and then give it an example of something good, bad and give it an example of something similar that's bad. Oftentimes when people prompt they have an idea of what it is they want to get out of the LLM.

Like they have a clear picture of what they want, but they're not passing that information to it. It's not an Oracle. Like it doesn't know.

**Simon Maple:** Is that kind of a good thing to have almost as a side thing that you point the LLM to almost as a, Oh, in this part of the spec over here, that is some good and bad examples of this.

Cause I guess we almost come back to that first one about if you're adding constraints about what good should look like, When do we put that into a prompt and things like that?

**Macey Baker:** Yeah, I think so. But it doesn't even have to be so structured in that way. An example might [00:17:00] be a verbosity.

If you want like LLMs tend towards more complex, more verbose solutions to things. If you want it to be less verbose, you need to give it an example of what you mean literally give it an example of a flowery sentence. Give it an example of the direct kind of sentence that you're looking for. But yeah, like in terms of Specs, these examples are always going to be helpful.

It depends if they're technically relevant. Obviously, if you're describing, if we zoom in on code gen for a second, if you're describing a function, which you think there is room for ambiguity, or you think there's like maybe a parameter that might confuse the LLM or something. Yes. These examples are always going to be helpful.

Literally do and don't, the same way that you would specify something for a person. If you think there's room for ambiguity again, like people can't read your minds [00:18:00] and LLMs can't either.

**Simon Maple:** Yeah, no, it's very true. It's very true. And I think that's, it's, When we had a episode of in around the Intercom space, which of course was an area you used to work in previously, but we chatted with, well Guypo chatted with Des Traynor and part of it was a really interesting discussion on the short prompting versus the long prompting.

In other words, you can write a short prompt and you'll get something back because we can get something back. And because the LLM can fill in the gaps, we can actually write very little in our prompts and actually get something fairly full back. Whereas actually sometimes what we want to do is we want to write more in that prompt to get to much more specifics about what we want.

And I think that I can't remember the phrase now, but it was something about, the ability to short prompt is robbing us of that ability to add that detail and put our details in. And I think one of your third kind of like tip is around levels of detail.

And I think that's really hard to be able to understand. Am I giving too much information? Am I not giving enough information? [00:19:00] Because you still get something, but you can almost get frustrated with the results. What's that right balance to be able to get the LLM to focus on the right things.

**Macey Baker:** Totally. I think this is so interesting. Like LLMs are so magical in so many ways, but then again, you take a step back and these are, they're prediction machines and that can be really helpful and that can also come and bite you sometimes. Something that we saw that was really interesting, I'll describe a kind of experiment that we did.

So we ran a series of these programs that we'd described through workflow 10 times, 10 times each. And we had a series of them that, failed like catastrophically nine out of 10 times. The difference with the one time where it succeeded. Often was that the specs had less detail in them.

I think that part of this is you're describing a problem that the LLM has seen a lot of, [00:20:00] if you gave it one sentence and you say, again let's take CSV parser, for example, if you gave it one sentence, write me a library to make a CSV parser. It has the world opened to it in terms of the solutions it can pick.

And if it's free to pick a solution it thinks it is good. Then it can follow that solution all the way through in specs where we put much, much, much more detail for these like common problems, common engineering problems. I think sometimes the LLM is getting confused, like we're giving it most of the information, but not quite. And if you think about the choices, like the possible choices getting narrowed down like a funnel, it's like we're leading it to here. And then it's wait a second, where do I go from here? This only leaves me with so many paths that I can proceed down.

Does that make sense? It's a tricky thing that I've been doing. Thinking about recently. And I think the takeaway is think about the problem that you're describing [00:21:00] and whether it's likely that the LLM has lots of ideas about how to solve it. If it does, if you think that it's likely that it does, it might be the best thing just to let it go for it and solve it in its own way.

But this is very nations. But I thought it was interesting.

**Simon Maple:** Yeah. It's interesting. And I think it's one of those things where I guess one question is how much do you care about some of the decisions it makes? What are the important choices that you absolutely have interest in it going a certain path.

And where are the other cases where, okay, maybe you wouldn't have done it quite that way, but does it matter? Is it okay for it to do that? And it's I guess focusing on the detail in the right places.

**Macey Baker:** Yeah, exactly. Exactly. So Again, CSV parser for an example, maybe you don't care exactly how it's implemented, but you care that it's fast.

And this comes back to maybe framing the task. I want a super performant CSV parser [00:22:00] that can handle a I don't know, 50,000 rows or whatever that helps. And if you don't then follow that with implementation detail, like rigorous implementation detail that the LLM has to follow, I think it's much more likely to.

**Simon Maple:** Or perhaps the CSV file is a million lines long as you're creating it, you actually don't want it all to be in memory because it would just be too big.

So it's like the requirements of understanding how it's going to be used, which is, the bit that the LLM doesn't understand, right? Giving it those bits of detail. Exactly. Allowing the rest of it to be created.

**Macey Baker:** Really good point.

**Simon Maple:** Interesting. So let's jump on to the next tip, which is about structuring inputs.

And I think we're talking about huge context. And this is interesting now, where we're trying to break this down, yeah, structuring inputs being maybe is it coming from the spec point of view that you're looking at large specs and you're trying to understand how to break it down?

Or was there another angle to how you were learning this tip?

**Macey Baker:** It comes back to huge context windows. A lot of people get a [00:23:00] lot out of maxing out their context windows. And I think that's really cool. And I want to learn more about like how exactly to do that and to find those useful, but in my experience, the bigger the context window, the more chance there is for an LLM to get confused. And I found that we have been inoculated against that by breaking our prompts up. This is what I was talking about before about formatting, instead of in line saying you're going to be provided with user requirements, the user requirements are this and the user requirements are this.

And then I would like you to create, I would like you to write a, whatever here are my requirements of, instead of populating the context this way, and again, as just going to come back to task framing, I sound like I'm on a,

**Simon Maple:** I thought you were gonna say CSV parser.

**Macey Baker:** And yeah, I'm on a press tour for task framing and CSV parsing.

So instead of doing this. It's saying [00:24:00] something like, I'm going to provide you with these materials, bullet pointing them out. I want you to do this with them. And then doing our fake XML tagging of like user requirements. Slash user requirements. System requirements. Slash system requirements, right?

This is helpful for the LLM. Again, it's reading these things in a sequence. It knows you've said, I'm going to provide you this material. It knows it's expecting that material. And by the time it reads it, it knows what to do.

**Simon Maple:** You're not breaking down the requests. You're not saying effectively, look, I'm going to make five, six, seven requests.

There's almost like a conversation with an LLM. This is all in one prompt, but you're effectively like separating as parts of that one single prompt. You're like saying, this is what I'm going to give you. This is what I want you to do. This is the output style that I want you to do in and so forth, but you're breaking it down.

**Macey Baker:** Yeah,

**Simon Maple:** into clear segments, whether you use XML or whatever it is, it's like very demarcated if that's a word still outside of [00:25:00] transactions. So sectioned off really.

**Macey Baker:** Yeah, exactly. Exactly. And again, comes back to the human element of this like how would you most efficiently, let's say in this code gen example, let's say that this document was your only way to communicate with a developer.

You can't talk to them after that.

**Simon Maple:** Yeah.

**Macey Baker:** In theory, if you get this one chance and they need to understand everything there is to understand about the task from this document. Breaking it up in a way that's digestible, I think gives a human the best chance of understanding what you mean.

Yeah, same with the LLM.

**Simon Maple:** Yeah. And it's also interesting when we were talking about the fact that, context could be huge. I'm going to show my age here, but back in the day of early programming where like memory was a real constraint, right?

And so you had to be very careful about what you put in memory when so that you didn't blow the memory stack that you had. Here as well, the rate at which LLMs are able to take much greater amounts of [00:26:00] context. It was only six months a year ago where we were like seeing, Oh, I can't give it that additional context because I've only got a certain number of tokens or characters, whatever, as context that I need to provide

The ability now for us to just throw huge amounts of context at an LLM isn't actually a great thing sometimes because it gives us almost too much of an excuse just to throw as much context and expect the LLM to deal with it.

So just because you can give context doesn't always mean you should.

**Macey Baker:** Totally. 100%. More context is also more work for you. You need to make sure that it all makes sense. Again, like I think LLMs are pretty good at making reasonable assumptions a lot of the time, but they can't read your mind.

And at some point, like these things are going to get a little. A little confused

**Simon Maple:** And when we do talk about context and like specs, if we look at the longer form specification style, is it better to, if we are referring to pieces of spec or pieces of whatever it is as context, is it better to break those down into smaller pieces so [00:27:00] that you're only referring to a specific thing?

Or do you feel like the LLM is capable to almost pick a piece out of a spec and say, Oh yeah, this is the piece I care about. Is it better to really get more fine grained with the amount that we provide it.

**Macey Baker:** That's a great question.

**Simon Maple:** Always. Always.

**Macey Baker:** I would love to know the answer to that.

What I can say about that is we know that there's a lot of variance when you work with LMS, there's a lot of variance. And I think the bigger your context window, the more you're gonna run into variance. Meaning if you run the same prompt many times, you're going to get slightly different results.

And if you're trying to cut down on variance, then smaller contexts are probably good. And maybe we're not even talking about context here, but smaller scopes. I feel like it's fair to say that a big context means a big scope for the task that you're trying to accomplish. Yeah, maybe that's not true.

Maybe it's just about the scope of the task. So when it comes to specs, like again, [00:28:00] canonical specs to us, that means that pieces of a spec like directly relate to a deterministic behavior that we can count on. So a function adds two numbers together. That results in a function being written that can add two numbers together.

A larger like math library with a huge context describing, 30, 40, 50 utility functions that you want to get. Of course, you're going to get more variance and probably not every function that you asked for is going to be in the final code gen result. So smaller scope equals. More, slightly more deterministic behavior from the LLM.

**Simon Maple:** Because I guess you're pointing more to that kind of like specificity. In terms of, this is the thing I, this is the small piece that I want and I'm providing as context, or the small thing that I'm asking for, there are far fewer ways of getting that or doing that in an alternate way. So you get that determinism built in.

**Macey Baker:** Exactly.

**Simon Maple:** Okay, we're down to [00:29:00] our very last. A very last tip, and this is an interesting one. This is one actually, when we were prepping for this, you said something quite interesting and I'm like oh wow, that should be its own tip. And this is around invoking thought processes and the say function.

**Macey Baker:** Yeah.

**Simon Maple:** So tell us a little bit about the background on this.

**Macey Baker:** Yes. Again, this is something fun that kind of emerged from our experimentation phase. I think a lot of people have copped onto this as well. And I think it's really interesting. So when we talk about context, there's context as in like a giant prompt that describes the context of the tasks that you're trying to accomplish, but there's also context within a conversation.

So LLMs have a very, distinct way of communicating. It's I say something to you and you say something to me and I capture that response and I do something with it. And maybe I say more things or maybe you say more things, but it's very like back and forth. And this whole conversation becomes a context.

The conversation can also be [00:30:00] the context for the task you're trying to accomplish. So we came up with this way of invoking, and this is pre reasoning. And I still think works pretty well as a very primitive reasoning. We came up with a utility function, which just says, say, it's just say this to the LLM and we throw the response entirely away.

We don't care. We don't care what the LLM has to say. We want to say something that helps prime it for the next thing that we're gonna say where we might capture the response. So an example of this is like I am thinking about the best way to implement CSV parsing My favorite library.

**Simon Maple:** Yeah. What is your, I'm going to have to pause there.

First of all, before we carry on, what is your favorite CSV parser today?

**Macey Baker:** It's the one called CSV parser. Okay,

**Simon Maple:** we can continue now. That was my burning question.

**Macey Baker:** I'm thinking about the best way to implement this program. Maybe you don't even call out that it's CSV parser, right? I'm thinking about how [00:31:00] to write efficient, performant code that is, written properly, whatever.

I'm looking for this particular thing, say, or I'm looking for this particular thing. Are you ready to, I'm going to describe the program that I want. Are you ready to think about this with me? Yeah. And usually what the LLM says is like, That sounds like a fun exercise. Yes, I'm ready. We don't care about that.

What we care about is that it's now primed in the scope of this conversation for what I send next, which is maybe a description of the CSV parser library, and it's thinking about, okay, I need to make this performant. I need to make this efficient. I need to write this well. So we found that to be a really surprisingly effective when we were experimenting and I think it's like it's something that I've played with a bit on my own as well.

**Simon Maple:** It's fun. It reminds me a little bit of the system prompt where you like you prime it to say, Hey, you're a senior JavaScript developer who thrives on detail and blah, blah, blah. But it's [00:32:00] much, much more than that, isn't it? You're not describing necessarily how the LLM should react.

You're describing, it's almost like onboarding for the task. It's, you're giving it way more background detail about the specific task that you're going to ask versus the way it should respond which is quite, I've not heard of that before. Actually. It's quite an interesting interesting route.

**Macey Baker:** It is interesting. And I think if you go back to the idea of LLMs as prediction machines, that really helps, obviously. There's waiting of possible solutions to your problem as the prompt is being processed. But at this point, a prompt has already been processed, right? Where we're populating a context and ensuring that the probability of certain responses later is more in line , with what asked for initially, I think like in my experience, it's very unscientific, but in my experience, this works better than long prompting.

So having a conversation feels a little bit [00:33:00] mean to say, because we're just throwing away what the LLM wants to say, but having a conversation about the output that you're looking for and then describing the task. I've found this to be a lot more effective than just here's 10, 000 words on the thing I want you to do.

**Simon Maple:** We're obviously in 2025. Today is 2025. Yep, that's cool. What do you think, let's take, let's push out a year. We're just starting in January 2026. Is prompting still going to be an important thing or is, do you feel like models are gonna almost remove, are going to improve so much that we're going to rely less or rather prompts are going to provide us with less of a value add on top of the models reactions as they come out today?

**Macey Baker:** Yes, I think so. I think so to some extent. Yeah. Again, like it depends on these companies who own these models, it depends where they want to take them. There's a version of a o1, for example, which I'm sure is not called o1 in 2026. [00:34:00] There's a version of it that's much more capable, but still not opinionated.

That's a choice. That's a choice to make about the LLM, which I think is a valid choice, but it means that you will need to interact with it differently. And there's a version of Sonnet that is just as capable, but even more opinionated or even more biased towards certain types of answers, which makes it really super effective at other things.

So I do think that prompting. And like these specific tips, I expect to become less relevant and less important like I, I really do, but I still think you need to be mindful of like, how you're communicating with this machine in the way that really works for it. You said something the other day that I loved, which is this idea of mechanical sympathy, which I think is more of an F1 thing, like working with the car.

**Simon Maple:** It was Martin Thompson that originally came out with mechanical [00:35:00] sympathy. And it's, yeah, the example he gave, if I give the mechanical example and you can give the the LLM example, the mechanical example was really, he gave about an F1 driver, Formula One driver, for example he's effectively just using a UI.

He's using the car he's driving, he uses the gearbox, but the knowledge of how that gearbox is built. And how he then knows to get the most out that gearbox because knowing when to change at high revs, if he's going slower, do I change in different way? Do I skip gears? Having that understanding of the mechanics underneath allows him to get the most out of it when he's actually driving.

So that's the mechanical engineering side from. But the LLMs quite similar, right?

**Macey Baker:** Totally. I think any system is similar in the sense that you'll get the most out of it if you interact with it. Yeah. In the way that you know it works best. Yeah. And for now, like prompts really are the key to the LLM's heart.

Yeah. That is the way that you interact with an LLM. And it's important to have this level of [00:36:00] mechanical sympathy for the LLM. And so I think prompting will always be important in that way. But like some of these tips are definitely a little bit of cope for the fact that they don't work perfectly yet.

**Simon Maple:** Then again, like the gearbox has probably changed very iteratively and very slowly over time. Whereas an LLM under the covers is right now rapidly evolving and changing. So It's almost like our knowledge, our mechanical knowledge needs to change faster. And then we change our prompts based on that.

So it'll probably continually need tweaking, understanding how the models are changing under the covers.

**Macey Baker:** Yeah, exactly. Which is interesting.

**Simon Maple:** Amazing. Macey, there's been a wealth of knowledge here from your experience over the last six months or so at Tessl, certainly and beyond before that.

Yeah, a massive thank you for joining us and sharing your amazing knowledge for all those examples.

**Macey Baker:** It's very much a work in progress. Hopefully we'll be back here next year with some reflections

**Simon Maple:** this

**Macey Baker:** year. Yeah, this year. It's

**Simon Maple:** 2025. Yeah. Yeah. Amazing.

Thanks all for tuning in. And let [00:37:00] us know if there are other tips and best practices that you have podcast@tessl.Io. Tell us about what you found most useful if some of these are working, will work for you. If you want to catch up with Macey, Discord is probably one of the places where you can have a good old chat.

So yeah, join the AI Native Dev Discord and we'll see you again soon on a future episode. Thanks for tuning in.

Thanks for tuning in. Join us next time on the AI native dev brought to you by Tessl.

Podcast theme music by Transistor.fm. Learn how to start a podcast here.