October 22, 2024

From DevOps to AI: Patrick Debois Shares Strategies for Successful AI Integration and Cultural Change

In this episode of the AI Native Dev podcast, we explore the dynamic world of AI Native Development with Patrick Debois, a leading figure in the DevOps movement. Discover how AI is transforming software development and the cultural shifts needed to embrace these innovations.

Listen to the episode

Episode Description

Join host Guy Podjarny as he sits down with Patrick Debois, known as the "father of DevOps", to discuss the rapidly evolving landscape of AI Native Development. With over two decades of experience in IT, Patrick offers a unique perspective on the parallels between AI and DevOps, emphasizing the need for integration and cultural change. In this insightful conversation, they delve into the speed of AI adoption, the importance of breaking down silos, and the role of governance in AI systems. Patrick also shares strategies for successful AI integration, highlighting the significance of engineering practices and continuous improvement. Whether you're a seasoned developer or new to AI, this episode provides valuable insights into harnessing the power of AI in modern software development.

Chapters

[00:00:00] Introduction to AI Native Development
[00:01:00] Patrick Debois: The Father of DevOps
[00:02:00] The Analogy with Cloud Native and DevOps
[00:04:00] Speed and Scale of AI Adoption
[00:05:00] Breaking Silos: Cultural Shifts in AI
[00:09:00] Organizational Structures for AI Integration
[00:12:00] Governance and Enablement in AI Systems
[00:15:00] Engineering Practices for AI Success
[00:21:00] Evaluation and Continuous Improvement
[00:28:00] Conclusion and Key Takeaways

The Analogy of AI Native Development

Patrick Debois opens the discussion by exploring the analogy between AI Native development and the established concepts of cloud native and DevOps. He notes that throughout his career, there have been instrumental technology moments, such as the emergence of mobile, serverless, and cloud-native technologies. These moments often reshape the way we work, and Patrick believes that GenAI is another such moment. As he states, "The speed of impact is drastically different," highlighting how quickly AI technologies are being adopted compared to earlier innovations.

Guy Podjarny adds to this by questioning whether the rapid adoption is due to the ease of integration or simply the hype surrounding AI. Patrick responds by emphasizing that while hype travels fast, the democratization of tools like LLMs (Large Language Models) has made AI more accessible. He explains, "Now if I'm a lawyer, I can use ChatGPT. I can use OpenAI and it's there," underscoring the significant difference in accessibility compared to early cloud technologies.

The Speed of AI Adoption

The conversation shifts to the speed and scale of AI adoption, with Patrick pointing out the challenges faced by companies like OpenAI in serving a global user base. He compares the current state of AI to the early days of cloud-native development, where numerous iterations occurred before reaching standardization. "Today this company's hot, tomorrow there's going to be another," he remarks, reflecting on the rapid evolution of AI technologies.

Patrick highlights the importance of integration in the AI era, much like DevOps was about integrating development and operations. He states, "I believe the new game is about integration," suggesting that understanding and integrating AI tools is crucial for businesses looking to leverage AI effectively.

Cultural Shifts and Breaking Silos

A significant part of the discussion revolves around the cultural shifts required to embrace AI technologies. Patrick shares his experiences of early attempts to integrate AI into companies, noting the friction between AI engineers and production teams. "One group not working together with another group," he describes, drawing parallels to the initial challenges faced in the DevOps movement.

Patrick emphasizes the need for organizations to break down silos, similar to the DevOps approach, and integrate AI initiatives with existing production processes. He introduces the concept of "shift right," a cultural change that brings AI engineers closer to production, ensuring that AI projects deliver real value to end customers.

To facilitate this, organizations must foster a culture of collaboration and continuous learning, enabling teams to adapt to the evolving technological landscape. By encouraging open communication and cross-functional teamwork, companies can create an environment where AI initiatives thrive and deliver tangible benefits.

Organizational Structures for AI Integration

The podcast touches on the ideal organizational structure for integrating AI technologies. Patrick suggests that companies should start with dedicated AI teams for incubation, but as AI becomes more integral, traditional engineers should receive training to incorporate AI into their workflows. He explains, "I would use that AI data science team and say, you're now the mentors of what goes on in the other teams."

Patrick also highlights the importance of shared infrastructure and standardized AI services, akin to a cloud services catalog. By providing a standardized "paved road," companies can streamline AI integration and ensure consistency across teams.

Moreover, organizations should consider establishing AI centers of excellence to drive best practices and innovation. These centers can serve as hubs for knowledge sharing, enabling teams to learn from each other's experiences and build a collective understanding of AI technologies.

The Role of Governance and Enablement

Governance and enablement are crucial components in the AI integration process. Patrick discusses the need for centralized governance to manage AI infrastructure and provide guardrails for teams. He points out, "I don't want to have guardrails in every different flavor," emphasizing the importance of standardization.

Additionally, Patrick stresses the significance of feedback loops and observability in AI systems. By gathering feedback from end users and monitoring AI outputs, companies can continuously improve their AI models and ensure they deliver reliable results.

Effective governance also involves defining clear policies and guidelines for AI usage, ensuring that ethical considerations are taken into account. By establishing a robust governance framework, organizations can mitigate risks and build trust with their stakeholders.

Engineering Practices and Dealing with Uncertainty

The conversation delves into the engineering practices necessary for successful AI integration. Patrick notes that dealing with the non-deterministic nature of LLMs requires a shift in mindset. He asserts, "It's dealing with uncertainty," encouraging engineers to adopt practices that account for variability in AI outputs.

Patrick highlights the importance of using AI tools to enhance engineering workflows and foster excitement among developers. By experiencing the benefits of AI tools firsthand, engineers are more likely to embrace AI integration in their projects.

To effectively manage uncertainty, teams should adopt practices such as continuous testing, monitoring, and iterative development. By embracing a culture of experimentation and learning, organizations can navigate the complexities of AI systems and drive continuous improvement.

Evaluation and Continuous Improvement

The discussion on evaluation and continuous improvement underscores the challenges of assessing AI systems. Patrick outlines the strides made in creating effective evaluation processes, including the use of synthetic questions and feedback loops. He explains, "The evaluations in the beginning were more when you were building your own LLMs," highlighting the need for domain-specific evaluation methods.

Patrick also discusses the role of product owners in defining evaluation criteria and ensuring that AI outputs align with business goals. By involving product owners in the evaluation process, companies can bridge the gap between technical and business perspectives.

Continuous improvement requires a commitment to learning from both successes and failures. By analyzing performance data and user feedback, organizations can refine their AI models and enhance their overall effectiveness.

Conclusion

In conclusion, the podcast episode offers valuable insights into the world of AI Native Development, highlighting the parallels with DevOps and Cloud Native practices. Patrick Debois provides a comprehensive overview of the challenges and opportunities presented by AI technologies, emphasizing the need for cultural shifts, integration, and governance.

The key takeaways from the discussion include:

The rapid adoption of AI technologies and the democratization of AI tools.
The importance of breaking down silos and fostering collaboration between AI engineers and production teams.
The need for standardized AI services and centralized governance.
The significance of feedback loops and observability in AI systems.
The role of engineering practices in dealing with uncertainty and enhancing workflows.
The importance of evaluation and continuous improvement to ensure AI systems meet business objectives.

As AI continues to reshape the software development landscape, organizations must adapt to these changes and embrace the potential of AI Native Development. By learning from the experiences of industry leaders like Patrick Debois, developers can navigate the complexities of AI integration and drive innovation in their projects.

Full Script

Patrick Debois: [00:00:00] I do a test when I come in like a company and I ask them, okay, what are you doing about your Gen AI? If they're all telling me about model training, I know it's going to be hard sell.

You're listening to the AI Native Dev brought to you by Tessl.

Guy Podjarny: Hello everyone. Thanks for tuning back into the AI Native Dev. Today, I am excited to dig into the analogy of AI Native development and how does it compare to DevOps, to cloud native development and to really offer a bright perspective about that. We have Patrick Debois, who has done many, many things across his career on it.

Here, we'll just emphasize that he's one of the core drivers and leaders in forming the DevOps movement, created DevOps days, and has really shaped a lot of what we think of as [00:01:00] modern DevOps today and the practices and has spent the last year or two immersing into GenAI and how that really impacts software development and such.

So Patrick, thanks for coming onto the show.

Patrick Debois: Yes. And Guy, it's exciting, right? So these new technologies. And some people might not expect this from me saying okay, it's culture first DevOps. There's two things obviously there's tools and there's culture in this case. But yes, I'm excited.

Guy Podjarny: Yeah. I fully agree. I I think the analogy is quite strong. So I guess maybe we start indeed with that question is, do you think the analogy between AI native development and cloud native or DevOps is that an apt analogy when you think about the magnitude of change here.

You think about what we need to consider in it. Does it feel like a useful lens, an apt lens?

Patrick Debois: Yup, I think over my career there's been these kind of like instrumental technology moments, right? And then it's always something new, whether that was the mobile or all of a sudden was serverless or cloud native or Kubernetes.

There's always this emerging kind of [00:02:00] feeling that and those kind of new technologies, we often in the beginning don't know exactly where it leads to, where we were trying and playing with this, but ultimately they have an influence on the way we work. I would say that the Gen AI, if I look back, the speed of impact, that is drastically different, right?

All of a sudden, you can share something tomorrow, somebody can use it, it goes into production. So that speed is definitely different than, when the early cloud native days were emerging and people are trying to figure this out,

Guy Podjarny: it's much easier to interweave it into the activities than it was before.

I guess maybe aspects of cloud were a bit earlier, right? Like S3, for instance, I think pretty quick adoption. Is it about the ease of adopting it to do your workflows or is it just about the sort of the sex appeal?

Patrick Debois: Hype travels definitely faster, but, looking beyond the hype, I think the, almost like the tools, like using an LLM have been so democratized, like you didn't, in the early days of cloud first you had to spend some [00:03:00] infrastructure and so on but now if I'm a lawyer, I can use ChatGPT.

I can use OpenAI and it's there. That kind of gets it much faster in usage and also has everybody much faster on the train because they see the possibilities of how these do. And that's different but I think that is due to the SaaSification of things that it's so commoditized and so easy to get at.

Guy Podjarny: Yeah, that's really interesting. So it's easier to provide a solution that is Gen AI powered. And as a result of that, there are just many more. Yeah.

Patrick Debois: Although I wouldn't want to be in OpenAI's shoes, serving the world. So that part is not different.

Guy Podjarny: Like from a scale perspective, you have to support everything.

Patrick Debois: So that's definitely the speed, the impact, the companies playing into the hype, the new technology, the startups, the funding that's there, right? That kind of like rocket ship doing this. And that's been instrumental in pushing this forward. Similarities again, nobody in the [00:04:00] beginning of cloud native knew exactly how.

We've seen many iterations before we got to more standardized things like Kubernetes, everybody settled on, but before there, there was like numerous iterations. And this is where I feel we're at. We're like in the fast iteration pace. Today this company's hot, tomorrow there's going to be another.

We're just churning through until we settle on a pace that, not saying the winner takes it all, but it could be in very different forms. Every time there's been a new release, there's also the problem of this being like a thin veneer layer on top of an LLM that people are, and I'm not saying it's useful, but everybody can start building as such.

And so that's that fast building is really different than in the past. If you're looking for more similarities, maybe on the cultural side in the group, I experimented, early bringing AI into a company in the product that the teams that were doing the AI or kind of the AI [00:05:00] engineers there, they were not used to production.

There was somewhere, and I always joke, there's somewhere in there, assuming in their data lake, nobody knew, building reports, and all of a sudden they became star of the show,

Guy Podjarny: right?

Patrick Debois: So that kind of friction I've seen firsthand off. One group not working together with another group.

And obviously it was the same with Dev and Ops and bringing them closer together because we need to get into production. There is value there and we need to bring that to the end customer. And that's definitely a friction point. I think it was called in this time, not shift left guy, but it was shift right, getting them closer to production.

And that's definitely also a cultural change in kind of a organization.

Guy Podjarny: I love that point around breaking the silos just like we did between Dev and Ops, or it was a key driver there, but also with the data. I remember a conversation with one of the lead security people in one of the big banks who was talking about how, he's been spending years cultivating his relationship with the development team and working on it.

And you can debate, if he is or he's not. happy about it, but it's deep. He [00:06:00] knows who to talk to. They've had a long time conversation and suddenly there's this whole horde of new people that he needs to talk to that he hasn't, he doesn't even know who he's supposed to talk to and they don't know who he is and what concerns and he had security concerns.

Like, where is this data even coming from? And do you know, you can trust it, et cetera, et cetera. But it's also true for operations. It's all true for anything production. So that's a really good point. And I guess, have you seen. organisations take it on in that type of outlook, right?

Or thinking about, hey, what have we done around Dev and Ops?

Patrick Debois: Most of the initial projects are often done with the data science folks or something, because they're closer to the kind of the data

Guy Podjarny: They can achieve the functionality, right?

Patrick Debois: And so a lot of the, what I hear from companies is this is where the incubation happens in the company.

They understand the lingo a little bit, but then they often lack, like we mentioned, like going to production and kind of bringing that in. It's where you bring in the engineering, a lot of people in the data science think, hey, I have a Jupyter notebook and I'm good, right? No.

We can't run this on production. And I [00:07:00] definitely seen companies deliberately putting, infusing either the data science team with more engineering or traditional engineering folks to deliver the proof of concept and not just to stay in kind of eternal mode. I do a test when I come in like in a company and ask them, okay, what are you doing about your Gen AI?

If they're all telling me about model training, I know it's going to be hard sell because the company is overrun by the data science folks. And they will tell you like the data needs to be perfect and we need to get everything. I believe the new game is about integration, much like DevOps was about integration of the two worlds.

I do not need to know, have a deep understanding of all the algorithms, but I can definitely put it to use. And that's why I'm also so attracted to this field because I always had an eye on the data science folks, but the math went beyond my brain, but this game I've been doing for a long time. And there's a lot of similarities on how to do that integration and bring that to production.

Guy Podjarny: Yeah, I think I [00:08:00] really like this analogy. And I guess if we think about DevOps, you had some companies that they called it, they created these DevOps teams, which is ironic, but we have these it's fine that they're called DevOps teams, but oftentimes these DevOps teams were actually not trying to break silos.

What they were doing is this is the team that uses infrastructure as code, they're the Puppet and Chef users on it. And therefore, unlike their predecessors in ops, they are coders a little bit. They know how to build things. Maybe they're at some point, even using terraform, but they weren't tackling the culture shift.

So it feels maybe it's a little bit like that, right? You have your AI engineers today are being formed. They might know how to build applications out of these LLMs or like wrangle LLMs to do something effective. And they might traverse further back into the actual data scientists, but there might still be a gap to the production side of it.

Indeed. Yeah. And who would you say needs to fill that gap? If you were to optimally set up an organization. How do you imagine it would be set up in terms of responsibilities?

Patrick Debois: When there's a new technology? And again, you can have the mobile team being a new one. It [00:09:00] doesn't matter, right?

So they're the incubators. And then you have to make a choice in a way that are they just going to be the team that always takes care of this. And in some companies that makes sense. Oh, it's the chat team, the chatbot team, and it's the only thing they do. But if you're set out to, for example, say all my engineers, when they're building, I want them to be able to use these new AI technologies to do certain things faster, to get better results.

Then I would have to scale up my traditional engineers with some training, right? So I would use that AI data science team and say, you're now the mentors of what goes on in the other teams. It could be the same as the developer experience team. It's an enablement team that goes across your organization, who goes into each of the teams and helps them get bootstrapped.

It's the point is not about making them all the experts, but they will stay the experts, much like the DevOps folks, we're not supposed to get everybody to do all the DevOps stuff. [00:10:00] But just enough that they know when to ask and things they can do themselves. So that's scaling out the knowledge as a pattern in the different organizations.

There is then, if you continue that journey, there is also a part where there's shared infrastructure. I do not want all the teams to run their LLMs in a different way. I do not want to have guardrails in every different flavor. So you see more of a standardization happening and that's similar. Like when you have a few teams and you can build like guilds around it, exchange it, but then the infrastructure in my belief is very similar to your cloud infrastructure, but it's just the AI infrastructure that could be a proxy that filters things. That's how you monitor things. That's a centralized way of getting feedback. So you see, those are similar to a cloud catalog of services. We're going to provide all the teams with an AI services catalog and say if you need this, and then we are working the same [00:11:00] way as this is the paved road, right? This is the standard way.

Guy Podjarny: Sounds like familiar terms now.

Patrick Debois: Yes. And then if you want to deviate it, that's fine, but your cost. So in that way, I see the scale out. And this is what I see companies like Cisco or Adobe, they're taking the same pattern of one team enabling the other team scaling this out and then absorbing the central pieces into your platform. And it's not just about the infrastructure. It is the enablement on the infrastructure. And then it's the governance that you put on that centralized piece. Because that's another problem that you all have to take care of in each of the teams if you're using this.

I don't want you to use these kind of models. Yeah. How do I not force, but at least make sure that you know about this and and put that in guidelines.

Guy Podjarny: Yeah, makes perfect sense to me. So if I go back a little bit here, you start with, the digital transformation team on it, or, whatever, some specific group that is specifically enabling something that now has heavy [00:12:00] use of LLM, right?

Maybe it didn't even exist before, right? And now you have it. The rest of the company continues to operate as they had before. Maybe they have some interaction with this group, but generally it's an addition, it's a specific product. And so you don't need shared infrastructure because it's just this group, when a second, a third group forms and things like that, now you start needing.

Okay, I've got some shared infrastructure. And so that might require that sort of platform team starting to have governance capabilities, starting to have some uniformity as you expand into some future. I guess that was true for cloud and it's debatable for AI or LLMs. Do we get ourselves to a place in which everything is LLM powered?

You can make that case or not. I think in cloud , it felt obvious at some point, which is everybody needs a server to run on. And so we'll run on the cloud and run that fashion. With AI may be a bit more debatable, but the more you expand it, the more you need to have enablement have those. And I guess you get into shared responsibility mode as well.

Now it's okay, someone else is running these LLMs, maybe even constraining you with that paved road or really make your life quite difficult to deviate from it. [00:13:00] So maybe I chose a specific partner enterprises do this today, right? I've chosen one of the specific foundation models to partner with and so you're constrained you have to use that but then within that Sandbox or within those boundaries the application teams are the ones that

Patrick Debois: And it's not just about running your own LLMs like even if they're all SaaS version of the LLM.

Guy Podjarny: Yeah,

Patrick Debois: you still have to say we're gonna put the right like check marks for the policies in place. We're going to make sure that we have volume discounts, much like we had CDN costs at dealing with. So those are all things. And again, one of the important services people keep forgetting is when you put these things out.

Yes. Governance guardrails, but the gathering feedback from the end users to make it better, that's also a component you do not want each of the teams to do. And tracing observability, all those kinds of things fit in that group. Again, it's not about having your own infrastructure, but it's like managing that kind of suite of [00:14:00] services and offering them as a kind of a standard package.

Guy Podjarny: So maybe let's talk a little bit about within the organization the culture or different aptitudes that you need amongst those people that work. Cause in DevOps, that was significant, right? Part of it was a technical enablement or maybe the skills of knowing how to work with a cloud or knowing how to work.

But then there were all these other things, for instance, probably most well known is the pace of delivery, right? The sort of continuous deployment and the notion of much more dynamic software development process, which required different people in DevOps indeed, there were people that needed to become coders or be a little bit more comfortable with code from being sysadmins.

I guess when you think about AI, you think about those teams, maybe put aside the actual concrete data scientists, think about the software engineers. that are working with these systems. Do you see a similar culture shift required? What are the different skills or aptitudes needed?

Patrick Debois: I think I've hit most of the times in discussion around these LLMs are non deterministic, so therefore we cannot trust them.

Therefore we cannot use them, but mostly it is about [00:15:00] people not understanding how you test this more reliably. Again, I'm not saying it's perfect, and, but it's dealing with uncertainty. That's a shift that people have to go through. And once you explain to them that there are certain ways to do this, that limit this, then you got them over and become more engineering practices again.

Otherwise it's more oh yeah, it looks good to me, but hey, I don't know how I should do this. The second thing that usually I see companies do is while getting their engineering teams excited on AI and the new technology, they also get them excited about the new set of tools that help them in the new job, because that kind of, if they see the power in their day to day job, they're also likely to get more excited on using this because they want to understand how this works and go from there.

Guy Podjarny: If they're excited by co pilot and whatever, seeing how that makes them more productive, they're more excited about adding a co pilot like capability. For example. Yeah.

Patrick Debois: But if they don't see the [00:16:00] value there. They're going to be quite negative on using the tooling and bringing it into their kind of product as well.

Guy Podjarny: So I found that I very much relate and experience the joy if I was to air quote it of the lack of predictability of the LLMs and not knowing, it's very hard to evaluate them. Maybe we'll dig into that a bit more. Des Traynor, the founder of Intercom, an early episode here on the podcast said that you basically need to make peace with the fact that you don't know if your product works.

And one of the thoughts that I had was it's actually quite analogous to, it's another step change in the same direction that we had when we ship software more continuously, right? When you went from these sort of quarterly releases that were thoroughly tested or annual releases that were thoroughly tested that you felt like your confidence level around what we said we are doing exactly what we were doing was very high.

So places that really celebrates many releases in a given day and observability and knowing whether your product works or not by observing it in production. Yes, you add test. You don't just blindly ship the software but embracing this notion of the tests [00:17:00] may fail or rather that the test may pass and that the product may fail in production and be very well instrumented to identify that happened to roll back and to learn from it or to gradually, and it feels a little bit like, maybe LLMs take it to another level.

Like I really have no idea, but all these concepts of these canary releases, gradual rollouts, observability, the ability to roll back. And all those are the same tools, right? Is it is it fair to think of it as another order of magnitude or another step in the same direction in terms of what should your ops and software delivery practices be?

Patrick Debois: I think in ops, we've always dealt with chaos because we all understand that in theory, everything should be great for the customer. But in reality, we look at our log files and it's not right? So in that way, yes, the observability is one step in the right direction to know when you deal in like first there's monitoring and then there's observability it did what in production let me check that for you.

So that's our, you [00:18:00] see a lot of people put that in production to see what the customers are experiences. Then, I guess we learned that in the cloud architectures, we were no more architecting for that one machine that could not fail. We architected and assumed failure, right?

And I'm seeing this kind of. It's happening also here. If the LLM might not give the right answer, what are we doing to control this , we're not going to put this in to decide whether you get a raise or not. You could, it's probably not the best use, but understanding what use cases is the failure.

Yes or no. We had one interesting example is that we used actually the UX to our advantage, so LLMs can generate the text. And let's say a year ago, they were not always that fast, right? So we use that to our advantage that the customers and the users knew it was being generated by AI. So they had to bring more attention to verifying this.

We had icons with [00:19:00] a sparkle on there that meant it's AI. So again, this is not technical, but these are cues of like warning also people saying this is not perfect. It doesn't mean it's not helpful and they can just reload and retry and regenerate and that's fine but sometimes you see it's not always that technical that needs to be perfect.

It's your architect around this and kind of deal with the imperfection.

Guy Podjarny: Yeah. Yeah, that's a good point. And it's actually maybe an advantage that AI gives us that we didn't have in the cloud because in the cloud the customers generally didn't know and didn't care if you're hosted on the cloud or your own data center, their expectation of uptime performance or all of those are the same, whether you're hosted on the cloud or your data center and with AI, there's, in certain use cases, a certain level of tolerance maybe that we're getting because users are also willing to adapt their ways, which is a card or an opportunity that is a helper for those building the LLM based products. Within this world though, it's convenient and there's a lot to learn from indeed this path or saying okay you have to really be good [00:20:00] at this. And this analogy keeps coming up. I had a conversation with Armon from HashiCorp on a recent episode, and we were talking about how, if you want to do self healing and all sorts of crazy things in your ops stack then actually, the way to be best equipped to do that is to already be very good at operations and be accumulating.

If you already accumulated and instrumented all your system, you're committed to this data, ability data, iteration data, all of that about your system. You can probably inform the the LLM based tool, LLM powered tool that is trying to do self healing better and therefore you can do self healing better.

So there's like a compounding effect. Basically the teams, it's the haves and the have nots are growing further apart in that operations. And I guess here it's another. It's another muscle in the same vein, because because you get better. But I do think some aspects of it, like the evaluation, they get extreme, right?

When you ship a new product, for instance. Even when you're, very opsy, very dynamic, for iterations, for improvements, you can work [00:21:00] in this type of way. incremental, roll it out, do it in small bits, evolve it. When you ship a new capability, you need to have a certain level of confidence that it is above a bar to be able to ship it.

And I guess that's proven very hard on some capabilities in the LLM world. What is the view that you formed around evaluation, around this quality and the ability to assess it? In the world of LLMs?

Patrick Debois: We made some great strides there. The evaluations in the beginning were more when you were building your own LLMs, and that was built for that.

And obviously they have a lot of money and they can spend a lot of time building the perfect data set. But for your use case, because that's usually what it boils down to. Like I build something for my use case and it's not like a standard available test set that I have. So yes, you have a domain expert creating questions, but now we actually have LLMs assisting us to write better questions and to ease the generation of those evaluations better. So that's one way, that's the [00:22:00] synthetic way of creating those evaluations together with the domain experts. And then the third source is actually when it went wrong from production is feed those examples back in through your test evaluation set. So that kind of improves it, but that's why that return aspect is so important that you notice the differences. Is it perfect? By no means but it is changing one comma in a prompt will have different results. So that's why we've learned that we need to have that test set.

It doesn't need to be a million things you test, but at least the most common cases that gives you that impact feel. If you change the model, likely your tests are going to be different. And then you need to tune your prompt a little bit. There is on the other hand, also a new, not a new, but something that's going to help us is tools like DSPy that actually rewrite or refactor prompts based on what the model is we're using.

So they're getting more clever on, much [00:23:00] like we do refactoring of code but refactoring the prompt because they know this is the model you're probably want to phrase it like this. So we get a little bit of help, almost like a transpiler of prompts in there. Again, not perfect, but we've come a long way from doing the manual thing to having it more assisted. And then you get into the feedback loop, very similar to your code, where you say okay, this evaluation, no, this is not good. Okay, I'll prompt you with a better one. So that's the test battery that you're doing. I think what we learned was very important is that often testing is seen as something the developer does but we learned that to actually write those tests and the questions of the domain is something that needs to be moved to the product owner. And they don't do this in code. They don't do this in prompt but they need a system that actually allows them to do this quite easily and see the impacts of the changes and do the validation.

So there's a little bit of a shift from testing is for the engineers. Although we already had that problem with business tests and so on. But [00:24:00] this moved up to actually the product owners again.

Guy Podjarny: And to an extent, that's a shift further left element because you're going further upstream to the requirements.

But it's interesting. It sounds like you're describing on one hand very much engineering action of basically better instrumentation of your testing process. We need to learn from failures. I think in the past, the tests were binary. If they passed, great. If they didn't pass, you only used the failures momentarily to get them to pass, but it didn't actually try to store that data or do anything iterative. And now it feels like maybe we should think about testing processes and workflows somewhat in a similar way to what we think about operations and systems. We need to capture and learn from the failures as well, so we can optimize and tune our systems.

And that's very like a technical piece. So that piece maybe moves from ops to dev Hey dev, you need to learn a little bit from your ops counterparts to do those and then further left in the product side, it's you have to define what good looks like what it means that the product works and what it does and what, versus not, [00:25:00] because that's going to have to be codified.

Patrick Debois: But much like we reuse the testing from the BDB to test actual, the monitoring production.

Guy Podjarny: Yeah,

Patrick Debois: This is the same thing here, right? You write your evals, but then you have a specific set of evals for basically you're running all the time as a health check because maybe the model changed something on either way side or something you don't know, but you will know there's a difference.

And that's like you don't run your full test suite, but at least the more strategically test there.

Guy Podjarny: I agree. But I think in general, the key takeaway here is the notion of evaluation, the notion of knowing what is correct and incorrect in your product has really jumped up in importance because you have to assess these unpredictable systems and see what they are, what they're doing.

It's an interesting, it's like the unpleasant thoughts to think about whether laggards might actually have a little bit of an advantage. I

Patrick Debois: think putting on mics to that again. Every time there's a new technology, the people say, I think that's only valid because [00:26:00] first part of the problem is making it work, and then you progress into more maturity.

Okay, we need to do more repeatedly and then have to prepare for failure and so on. I think that's only normal that I ask that question as meetups. You've been live now for five months, and how do you test? And then you hear like crickets, right? Because they're still doing oh, it looks good to me.

And if you can do it, and, It's definitely not something you can do at scale and tdp or not. It's very much still.

Guy Podjarny: Yeah, that's a good point. And I guess what we said before, how users give you some slack. That's probably temporary as well. Eventually users are, oh, you want me to use this tool?

It should work. It's your problem. It's unpredictable. Just make it work.

Patrick Debois: It all looks very easy at the beginning, but getting it is robust.

Guy Podjarny: It is something that you work with. And you have to really change your strategy is really, it feels like a craft in the making feels like a new, it's not entirely unique because [00:27:00] we had these untrusted or unreliable pieces of our software development and so it's hard to define your software methodology, right?

Patrick Debois: So if I look at all of how devops as a profession has matured first 20 percent was infrastructure as code. Eventually we rearchitectured the stuff to resilience engineering, because then failure, we already touched back.

Guy Podjarny: Yeah, but we were assuming these systems would break and we're not sure why that is

Patrick Debois: The dreams is all the agents have to take care of all their jobs, but the reality is they're not going to be perfect. That's inherent in the technology. It's not perfect.

So you can actually replay the agent to a point, and then you can go from there. Sometimes it generates multiple options, so you get to choose. People are starting to figure out it's not just the initial process dealing with the extra costs.

Guy Podjarny: So maybe sometimes it's hard, even the term like AI engineer, you were trying to software developer with AI or trying to someone who's building like applications are talking about an agent, AI [00:28:00] agent that is writing code for you. But as you point out, it's just things are moving so quickly that it's not something that's inherent in this domain.

The terms will be defined as DevOps has never like fully settled.

And I guess what we need to do, and we're doing it right here is sharing the learning, sharing the perspective. We normally be at the moment on many different stages, but at least many people are hearing it.

I think as long as we continue doing that, we'll try to consolidate where it's relevant and just keep the sharing going. Thanks Patrick for sharing all these great views. I think there's a lot of learnings and I recommend everyone to follow Patrick's LinkedIn social networks.

He has all these super interesting experiments.

It's worth doing it because we've been doing all sorts of things. Interesting hands on work of what does it actually feel like to code with voice, right? And to automate cursor and so we don't have time to cover all of those, unfortunately over here, but I recommend you try that out.

So thanks again, Patrick, for coming on. And thanks for [00:29:00] everyone for tuning in.

Thanks for tuning in. Join us next time on the AI Native Dev brought to you by Tessl

Podcast theme music by Transistor.fm. Learn how to start a podcast here.