Armon Dadgar, Hashicorp co-founder, on AI Native DevOps: Can AI shape the future of Autonomous DevOps workloads?

October 9, 2024 Simon Maple

In this episode, Guy Podjarny and Armon Dadgar, Co-founder of HashiCorp, explore the transformative potential of AI in DevOps and modern infrastructure management. Gain insights from an industry leader on how AI is set to revolutionize infrastructure processes and what this means for the future of DevOps.

See this content in the original post

Episode Description

Join Guy Podjarny as he sits down with Armon Dadgar, Co-founder of HashiCorp, in this insightful episode of "AI Native Dev." Armon shares his expertise on the evolving role of AI in modern infrastructure management, discussing the life cycle of infrastructure, the tools involved, and the potential for AI to automate and streamline these processes. Through this conversation, Armon provides a comprehensive look at the future of AI integration in DevOps, detailing challenges, opportunities, and the skills necessary to thrive in this rapidly changing landscape.

Chapters

00:00 Introduction

01:35 Modern Cloud Infrastructure

03:33 Observability and monitoring tools

04:47 Context Free AI

06:45 Understanding LLM generated code

08:30 Security of AI Generated Code

13:26 Limits of Context Free AI Generated Code

16:36 How to rank which code is better?

18:57 Future of AI Generated code

24:02 Use Cases of finetuning models

31:11 Autonomous DevOps in AI Native Future

39:15 What does it mean for people working in this domain?

Full Script

Guy Podjarny: [00:00:00] It's not a hallucination, it's a fantasy. It's a fantasy of the user that the LLM can read your mind. And know what it is that you wanted.

Armon Dadgar: And to me, I think context is the key between where we are today and where we want to go in the future. You might hire the world's best SRE, but when they join your organization, they don't know anything about your organization, so they don't know, do you run Windows in production, RHEL in production, Ubuntu in production?

I think a big missing piece here is the ability to I'll call it normalize or make legible, what your infrastructure actually looks like.

Guy Podjarny: If you had enough data, then the LLMs could make sense out of that. But because the data is not public, there isn't like a massive body of data that they can come along and identify all the loose strands that connects different bits of data and the data of one organization might not be enough to satisfy that.

You're listening to the AI native dev brought to you by [00:01:00] Tessl.

Hello everyone. Welcome back to the AI native dev. Today, we're going to explore a brand new horizon, maybe of of AI dev tools and how they apply to kind of modern software development by going into the world of DevOps and modern infrastructure and such, basically Few others can be as well qualified about that other than Armon Dadgar who is the co founder of HashiCorp. Thanks Armon for coming here onto the show to talk to us about this.

Armon Dadgar: Yeah. Thanks so much, Guy, for having me. Looking forward to it.

Guy Podjarny: To start with, we're going to dig into, elaborate and built up this sort of world of of DevOps, pioneered a lot of things with with Terraform, especially when it comes to the cloud And how we think about setting up modern infra.

Before we dive into AI itself, maybe give us a bit of a lay of the land of how to think about managing or wrangling a modern cloud infrastructure today.

Armon Dadgar: The way I like to think about it is, if [00:02:00] we take a step back from the specifics of the tooling or the process, and really just think about what are the fundamental steps in the life cycle of managing infrastructure, I think that's important to understand because, ultimately that's the outcomes we care about, as opposed to the specifics of the tooling or process.

So when I look at that, I think about day zero, right? You sign up for an account with your favorite hyperscaler. You have to go set up a foundation, right? So you need a landing zone of your VPC or, VNets, your security groups, your whatever. So what's that landing zone actually look like? That's day zero.

Then you get to day one where I have this blank cloud environment. Now I need to deploy my applications into that. It's a day one concern. Now you get into day two where you say, Okay, now I have a bunch of existing apps running. Now I have, a patching action I have to do. There's a new Linux kernel version that comes out.

Okay, so there's a day two patch. Then I get to day three, and now I'm under, scaling constraints. So I need to scale up or scale down or whatever. Then you get to day four, you have a security incident, whatever it is, JDK vulnerability. I have to go do a bunch of, remediation on that.

And then ultimately you're going to get to the sort of day end of an application where [00:03:00] now you're going to decommission it, right? You have to take it out of service. So to me, that's how I like to think about it. There's this sort of life cycle. Now there's many techniques. You can say, I'm going to use tools like infrastructure as code, right?

Tools like Terraform, particularly designed for things like that day zero, day one provisioning and the day end decommissioning. You have config management tools, Chef, Puppet, Ansible. They tend to be more of day two, day three tools in terms of, patching remediation, config management. And then of course you have other techniques like you could go manually point and click, you could use the cloud provider by SDK, whatever directly.

So there's a bunch of techniques, but those life cycle points are fundamental.

Guy Podjarny: And I think so. So that makes a lot of sense in terms of the, sort of the configuration tools, that get these changes applied to it. Where would you classify in this world, the more of the observability tools or the the monitoring tools?

I guess they, they just layer on top of that. Do you feel like they fit that model?

Armon Dadgar: Yeah. I think they span the whole thing, right? So I think you take most observability monitoring tools. You're probably going to configure them alongside the app that got deployed day one, right? So i'm deploying my app and i'm [00:04:00] putting APM and observability aligned with it and then I'm going to operate that from day one through to day end, right?

Once I decommissioned the app, I'm probably not monitoring it. So I think it will span that whole life cycle will be where observability tools, and you're probably using tools like Terraform to actually deploy it at the same time as your application gets deployed.

Guy Podjarny: Yeah, I don't think that makes sense.

And in general, that's probably true for all sorts of other facets of it. Whatever costs controls that you might have, whatever security, posture management or other tools that you might have post those.

Armon Dadgar: Yeah. It feels like Wiz or Palo Alto, you'd probably deploy again, same thing, like day zero, day one.

It's going to live until day N and cost management, I think, tends to be more of a day two to day three concern. Cause until I have to write some, I don't really know how much cost. So once I get into sort of that day to day three, it's Hey, I get my cloud bill 30 percent higher than I expect. Okay.

Now I'm going to bring in some FinOps tooling to figure out like, why is my spend higher than I thought?

Guy Podjarny: So I think that makes sense. And that's a useful, just starting point for us to talk about, what can we do then with AI? And so we, we built these different phases on it. I guess maybe for starters, where do you think AI is kicking in today to help?

Where have you seen it?

Armon Dadgar: So [00:05:00] today, the way I describe it, I think is I'll introduce a new term because we'll come back to it a few times. There's this notion of almost context free AI. And what I mean by that is, I can go to Claude or GPT or Gemini And say, Hey, write me a Terraform script that will, I don't know, provision, let's say an S3 bucket and an EC2 VM.

And so what I mean by that being context free is it really doesn't need to know anything about my existing infrastructure. I can go to a generic foundation model basically and just tell it, Hey, here's what I want, and it can spit out some reasonably good Terraform code, right? Say at this point, just given based on.

Pick your favorite modern model. It's going to be 90 percent pretty good in terms of being able to write that Terraform code for you. The flip side of it, right? So that's I'll call it a one way translation from, I have this high level intent and that's translating it to the implementation for me.

They're also pretty good at going in the reverse. So I can give it an implementation and say, Hey, What's the intent, right? Here's 20, 000 lines of Terraform code. I don't want to read it. Summarize this for me.

Guy Podjarny: What do you mean you don't want to read it? That's everybody's favorite favorite activity.

Armon Dadgar: Yeah. [00:06:00] And I think there's value there. Cause it's you're a new engineer. You just started at a company. You don't really know how it works. And so you're like, Hey, I'm supposed to manage this app, as a starting point, I could sit there and read the 20, 000 lines, or I could say, Hey, give me a high level summary of this thing and I'll say, okay, this is deploying.

A Java app on an EC2 VM with an S3 bucket. And so I think the reason the tools are very good at both those directions today is you really don't need any context beyond either what's captured in the user's intent to do the translation or what's captured in the Terraform code to do the summarization.

You don't really need to know anything else. And so I think for those relatively constrained use cases, there's a lot of value today, right? In that day zero, day one, it can give you a huge head start in, writing the initial code, cut down the time it actually takes, and similarly to summarize what's there.

It gives you an understanding a lot faster.

Guy Podjarny: And in this context, it's infrastructure as code is just code. It's you don't really care if it was, Java or Python, it learns it, understands it the same way. How do you feel though? I think it's interesting to think about the distinctions between infrastructure as code and regular code.

And so a few thoughts about that. One is infrastructure as [00:07:00] code tends to be an extreme example in which you can write things in very few lines in Terraform. The complexity isn't in how many lines, it's just in understanding the implications of what you've just written. It's, how did you configure the VPC and all that.

So it's not about complexity, but there's this like world of data or understanding, it's a very, it's a flammable environment, right? Like an or combustible environment. You can pretty easily blow yourself up or your environment. How have you seen the level of knowledge of LLMs, in this domain and in general have there been tools that you've seen around that maybe help you grasp, what is this seemingly naive code that the LLM has generated for you?

Armon Dadgar: But I think what's interesting and what we've seen is I think one of the things that's nice about, in some sense, the conciseness of something like Terraform code and the very high declarative nature of it means that there's really not a lot of imperative logic that the LLMs have to understand.

And what I mean by that is if you give an LLM a very complicated function, with lots of branching, lots of different sort of [00:08:00] conditionals unless you really know what the execution flow is and what the data path is, it's hard to actually know what the result is, right? Because it's so conditional and branching.

Terraform code tends not to look like that because it's very flat, it's very declarative, that's very limited in terms of, like, how much branching you end up having. And so in that sense, what's nice about it is that flatness. actually makes it a lot easier, I think, for the LLMs to understand, because they don't really need to reason about data flow through the models, or through the execution path. And so in that sense, I think they do a better job potentially.

Guy Podjarny: Yeah. And we see that I think in terms of generating the code that you intended many examples also in tools that are dedicated to try and make the LLMs work better in which they.

Represent the requirement in a, sometimes even pseudo code or somehow simplified language, more meta language. And Terraform is not far from pseudo code, but it is, it's a meta language. It's a language that is above APIs, and so being simpler makes it more likely. There's basically like slightly less options and maybe enough.

But what about the implications of that [00:09:00] and highlighting those to the user? If I'm writing Java or I'm writing Python or writing JavaScript, then first of all again, I think I have slightly less combustible, but also, I'm going to write some tests, that I'm going to have some of these defense mechanisms.

If I'm setting up a VPC, if I'm creating infrastructure as code, because I don't have expertise in that space and I'm using the LLMs to generate that, I'm And it creates infrastructure to me. How would I know that it's now open for anybody to access?

Armon Dadgar: To me, I think this highlights one of the biggest risks we see is the models really don't have a good sense of, things like what are the security implications of these things?

What is secure by default? And What you can't expect from them is that they're going to generate things that are secure by default. Necessarily. Like you asked it for an S3 bucket. It's probably going to generate a public S3 bucket, which might be what you wanted, but might not be what you want.

And so I think that kind of a nuance to your point, that combustibility here is. That's a one line difference in configuration, whether you said private is equal to true, or private is equal to false, or you omitted it and it defaulted to true. [00:10:00] So it's not, the difference between a bad thing happened and a good thing happened is not the number of characters being a lot, or the logic really being wrong, it's just you missed a declarative.

Private equals true and it had unintended consequences. And I don't think the models today, they don't really have a appropriate semantic understanding of the sort of security boundary and security implication of some of those things. And so one config field is the same as another config in terms of importance to them.

And I think what's happening is you just get a pattern match based on the input data of If it saw more S3 buckets that were public than were private, then that's what you're going to get. Yeah, it was just gravitating towards what the training data was, which was like, I saw more public than I saw private, so therefore I'm defaulting public.

That's the nature of it. And so I think part of, I think the answer of getting some of these things right is to actually trust. Some of the output, I think you actually need to invest in the right security guardrails on the other side of it to say, okay, how am I going to do automated reasoning around that?

We think we talk a lot about things like policies code, which is okay. If I'm going to let my users write Terraform or [00:11:00] generate Terraform in this case, with AI. I need to actually have a set of policy guardrails that I trust that's the belt and suspender. So I might put a check that says, okay, you're provisioning S3 bucket.

It has to be marked private. And so you might've generated it, but then I have a belt and suspenders approach. Hey, no, you just asked for a public S3 bucket. Try again. And that becomes a user in the loop now who has to go and say, okay, I'm going to. Either I have to change my statement to the LLM, say, make it a private S3 bucket, I need to be explicit, or I'm just going to modify the generated Terraform code and say private equals true.

Guy Podjarny: I really like how this is a bit of an extreme example of something that is happening in code that is not infrastructure as code as well. I just had Caleb Sima on the podcast, and we had a bit of a conversation about is generated code secure and it's hard to make any sort of strong statements about that, in part, because it's not one thing.

And we should go check out the episode if you want to hear everything we said about it. But what's interesting is you take these questions and you take them to an extreme. It's easier to get a sense for it. And so in infrastructure as code, I think I love that example around the S3 bucket, which is [00:12:00] it's not right or wrong to create that S3 bucket.

It's public. Okay. Or or private it's just about, being contextual. We'll get to that, but it's also, it's a decision. And do you understand the implications of it? And it's also easy to not notice the omitted line or to not notice that it's true or false. And Terraform is like an extreme version of it.

The same types of omission exist in any other programming language that you create. But they're just less evident, less blatant maybe. And in all of those cases, the answer is code generation is a part of your stack and all of your software elements. But if you just assume that code that gets generated is magically fit for purpose, especially when you didn't even tell it, create a secure bucket or a private bucket, you just said, create me an S3 bucket.

Then you have to have. Other means of validating, like policy as code what is right and what is wrong.

Armon Dadgar: I think one of the interesting things is sometimes people will ask me and say isn't that just because, whatever we didn't write, if it had generated unit tests, for example, for the Terraform, that would have been caught.

And my view is no, you, it's [00:13:00] fundamentally unknowable because what I don't know is. Maybe you're making a marketing site and you're hosting public assets like images and it should be a public bucket. Or maybe this is hosting customer data and it should be a private. But the problem is input was fundamentally underspecified.

So there's no way to know. This is not a hallucination. It's just you fundamentally underspecified your input. And so no amount of testing can tell me whether it was right or wrong because I don't know. I don't know what you

Guy Podjarny: want. I had a conversation with Simon when we were saying, it's not a hallucination.

It's a fantasy. It's a fantasy of the user that the LLM can read your mind and know know what it is that you wanted. Clearly at Tessl, we talk about spec centric development, talk about creating a specification and not just about prompt, but there's this good and bad thing about specs in the LLM era, which is whatever you don't say, the LLM will fill in the gaps and you need proper tooling to be able to highlight to you.

The decisions the LLM has made, you still need to pay attention. You still need potentially policies codes to, to decide whether it [00:14:00] conforms to your company's requirements or your own requirements, whatever, whoever sets the rules, you need some form of visibility to like it or not, the LLM is going to fill in the gaps and it's not going to read your mind while doing it.

Armon Dadgar: And I think there's this interesting tension too, between sort of underspecifying LLM to fill things in. And when taken to the limit, if you specify everything, you're effectively writing the Terraform code. So another example I gave here is I said I want an EC2 VM. What OS is that EC2 VM running?

Is it Windows? Is it Red Hat Linux? Is it Debian? Is it Ubuntu? I didn't specify, but something's going to get filled in. The EC2 VM is going to be running something. So you will end up with an operating system. Now you might realize it wasn't the operating system you wanted. And so now you're going to come back and be like no.

Give me an EC2 VM running Red Hat Enterprise Linux with this particular version. And so you can start to see, it's then there was a whole bunch of other things you didn't specify, like the size of the VM, the region of the VM, the availability of the VM. And so the more and more you start specifying at some point, actually you're [00:15:00] declarative input starts to converge on effectively you're writing the Terraform code. So there's this interesting balance here of how much can you leave unspecified, particularly in the world of infrastructure where details matter, right? If I'm asking you to write me a Java code to sort a list.

As long as the list is sorted, do I really care? Not really. You might when your end gets large enough and it turned out it was bubble sort. But I think in infrastructure in particular, the details actually do tend to matter. And so I think this is where you start to run to limits of context free generation.

Guy Podjarny: The other challenge maybe in the world of infrastructure as code is around the body of knowledge to learn from. And Infrastructure as code holds, vast amounts of insight and guidance, and you can see these things, but you really, you don't see implications when you just look at the code, it's what it did.

It's not necessarily what was right, what happens to it. And in code, arguably, because you can run that code, without operating and run the tests, there's been a lot of reliance by the LLM companies of using that data to train their system, to separate. Good code from bad or [00:16:00] whatever, somehow prioritize it so that it's properly encoded, say, prefer to generate code in this fashion in infrastructure as code or really in anything sort of DevOps and modern infra, I don't think there is such a body of knowledge that exists that is public information around operability around what happened around, it failed to deploy a thing and how was that remediated? What were the cost characteristics? What were the performance characteristics, security characteristics of something? How do you think about that? Do you think there is a body of knowledge like that?

Do you think can we, Is that a maximum limit that we basically have to hit , as an industry? Or is there a way to solve this?

Armon Dadgar: So it's interesting. I think there's a bunch of, there's a bunch of different competing factors here, right? On one hand, I think what's valuable is that there's a pretty large community when we think around, Terraform of people writing high quality modules and sharing them publicly, right?

So in particular, the hyperscalers themselves will publish modules on how to do landing zones, how to do common patterns, things like that. So these are. So high quality modules being written and [00:17:00] curated by the cloud providers and shared with their customers through things like our public registry and GitHub

so you have a bunch of these high quality curated examples that are in the public domain. At the same time, I think you have things like the Terraform registry, where you have tens of thousands of community maintained modules. And you have reasonable signals of like how much they're useful and work based on things like downloads, version history, how often they're updated.

So you can look at that and say, okay, I'm going to look at the subset of things that are frequently updated, relatively recent, over a certain threshold of downloads that signal they're a reasonable quality bar. And so you can subset the training to a set of things that are known better, which is still a large body, right?

Tens of thousands of examples.

Guy Podjarny: Yeah. It's sorting of the code. Like it's still code. You don't have further insight, but you're relying on community insight to help you rank which code should carry more weight.

Armon Dadgar: Correct. And then I think even internally, because of things like Terraform cloud, we actually have more sort of operational insight into, Hey, we know which Terraform runs fail and succeed, for example, and we know what code bases and what module versions [00:18:00] that correlates with.

So you can say, okay, I know this one, failed 10 times in a row, probably not a great example to learn from versus, this other code base, has a very high success rate. So those are higher quality in some sense, training data. So you can take some of that signal and use that to look and say, okay, I have some inputs that I believe to be higher quality in terms of what I want to train on. Yeah. So I think you have a few of these different things that I think helps pull that together, but you're right. I think when you think about the universe of, maybe Java code that's out there, it's gonna be a lot bigger than infrastructure as code.

The flip side is because infrastructure as code is I think a lot simpler, more constrained, doesn't have complex branching and data paths. I think you actually don't need probably as much training data, right? And I think, similarly, because the documentation is so rich, it also is a pretty valuable form of training data, versus documentation for a programming language has to be very low level kind of function oriented versus infrastructure as code tends to be resource oriented.

So the documentation is actually a much richer input source than it would be for a program language.

Guy Podjarny: Yeah. And I tend to agree with that. I feel [00:19:00] infrastructure as code. Is an easier thing for LLMs to generate the catch being it will successfully generate working Terraform code. Is it creating the infrastructure that is correct is a question that's harder because in Java, it's easier to run the code.

And see if it passed the tests and infrastructure as code, it's doable. You could set up cloud environments and have those runs. And, for all I know, maybe a bunch of these LLM providers are doing some of that, but it's a lot more elaborate and expensive. And I'm curious because the cloud providers themselves are so deep into the LLM foundation model providers as well, how much are they able or are using the data that they see?

They observe on their own platforms to be able to guide it on one hand. I'm sure organizations will be plenty annoyed if they do that. On the other hand, we're in a position to see that. I guess that's true. I'm not going to challenge you on those for it, but I guess that's also true with your platform and Terraform cloud.

So interesting to see where it goes. Let's talk about that sort of step two. [00:20:00] Context free IaC generation, summarization or sort of reading from it is very useful, has a bunch of caveats about, you still need to maintain your sort of threshold or policy of code for responsible use of it.

Probably summarization, it can be very good because it has all the advantages and none of the disadvantages we discussed. Where do you see this headed? What do you think is the next step?

Armon Dadgar: Yes, I think we alluded to it right here a few times with this discussion on context. And to me, I think context is the key between where we are today and where we want to go in the future, which is, the example I would use and make this concept a little bit maybe more crisp for people is like, You might hire the world's best SRE who knows Terraform and cloud inside and out.

But when they join your organization, they don't know anything about your organization. So they don't know, do you run Windows in production? RHEL in production? Ubuntu in production? Now, they might be power users at all of those things, but the context of knowing, Hey, as an organization, we've standardized on RHEL.

That's a critical piece of context that you have to be a [00:21:00] useful SRE at the org, right? You don't want to hire someone, the rest of my stack is, Windows and all of a sudden this person's writing Linux for me, it doesn't help me, right? They need to understand how does the organization work?

What have we standardized on? What does my existing infrastructure look like for them to generate things that are then contextually relevant to my environment? And I think that's actually that context. If you think about it as that SRE who knows about my environment or doesn't know about my environment extends to every point in the life cycle.

So even if I go all the way to that day zero example, I gave, if I say, just give me the S3 bucket and the EC2 VM. If the SRE had the context to say, okay what type of app is this? Oh, it's your production billing app. Okay. There's a few implications to that production billing app probably implies it should not be public.

So I shouldn't have to specify that's three bucket is public. And if I've standardized on a RHEL, then you're not going to ask me what's the operating system because I've already standardized on real. In some sense, the word assumption is an important one here. It's what are the set of assumptions the LLM is going to be making that you want to make explicit because you have context and that's going to be everything from production environments, [00:22:00] regulatory regimes, like billing out might need to be PCI compliant, to operating systems that I'm running to what regions I'm always in US east one, or I'm always in. whatever region, right? So what are those things that I need to have contextual awareness that are just going to let me be much more effective in my day zero and day one, when I'm doing that sort of generation,

Guy Podjarny: it makes very natural sense to to want to do those, how much are you envisioning this being more sit down and have an interview with the chief architect to understand this? Versus auto discovery. And I guess this is a gut feel type question, but what level of granularity matters here? Is it really just, Hey, what are your primary stacks and all that? And the best practices really amidst 99 percent of those that use it and know what they're doing is to do these things versus.

You're really the devils of the details. You need to understand this organization's micro choices, these interactions between these applications and how they work to achieve something that is comparable to one of your good DevOps engineers.

Armon Dadgar: I think for like most of these things, I think that's an [00:23:00] 80, 20 rule.

I think. 80 percent of the value is in codifying 20 percent of the assumptions. What I mean by that is like for most people, if it's cloud provider, it's region, it's operating system, it's course, like when you say, give me a database, okay, are we in my SQL shop or a Postgres shop, right? So I think it's probably maybe a dozen key variables when you think about like core stack that if you say, okay, these dozen things I want to be well aware of that probably is 80 percent of the value.

Sort of the surface area of what you care about, right? That last 20 percent probably starts getting super nuanced and specific to each application. So you might say, okay, this is a high frequency trading app. Okay. That has a whole bunch of different, very specific implications because of the nature of that workload.

But that workload is unique. It probably doesn't represent most of your workload. Most of my workload, it's three tier Java application. So it's fine if you make, if you fill in some of the blanks and it's doesn't matter, but maybe high frequency trade, I really care about performance, latency, sizing, OS, a bunch of those variables will matter, [00:24:00] but it will be very workload specific.

Guy Podjarny: And have you seen anyone try to take this all the way, to go and say, I will inspect all the observability data and the Terraform State and whatever it is that I can glean outta the deployed environment that you have and how your application operates and maybe historical data. And I will fine tune a model. I'm making this up.

I'll fine tune a local model, and therefore I will generate perfect. very tuned, but also attuned to your goals. You're looking to reduce costs. So I will make the choices around that. I think that sounds directionally like something that a startup might do. Have you seen companies or big incumbents, take a crack at doing things of that nature?

Armon Dadgar: Not quite yet. And I think, maybe this gets into the second challenge. I think when we start talking about context, which is, I think a big missing piece here is the ability to I'll call it normalize or make legible, what your infrastructure actually [00:25:00] looks like. And I, and what I mean by that is, if you think about code, great, it's a flat text file sitting in version control.

That, that's all I need to know about it. It's all there. If you think about infrastructure, it lives, if you will, in some sense, and it's the real world, the cloud world, however you want to think about it. And so there is this sort of running thing that's out there that has all sorts of, ephemeral attributes has emergent properties because it has accumulated over decades.

And how do you actually make that legible to the LLM, right? And I think a big piece of what's missing is the ability to inventory, if you will, all of that in a consistent way, and then, appropriately vectorize it, categorize it, whatever, so that then an LLM can reason about it in a structured way, right?

I think as humans, we're really good about dealing about it. The ambiguity of these environments and the complexity of them, but LLMs fundamentally need to work on something that's much more normalized. Like they can't, you can't tell them to go log into the AWS console and click around and figure out what production looks like.

Guy Podjarny: Yeah. In theory, you [00:26:00] could. Yeah. I think people can do that and, we should talk a little bit about trust. I think the blast radius is there. The combustibility comes back to get in the way of that. I like the normalization. question, but I guess maybe to challenge that a little bit, like it is still small in comparison to like parameters, sort of the billions of languages, I guess in theory, and maybe that's the dissonance here.

If you had enough data, then the LLMs could make sense out of that. But because the data is not public, there isn't like a massive body of data that they can come along and identify all the loose strands that connects different bits of data and the data of one organization might not be enough to satisfy that.

Armon Dadgar: I think it's actually worse than that. It's not even though the data is not public. It's that most organizations don't have it. And what I mean by that is I'll take a very real example, right? Open SSL vulnerabilities. And we work with thousands of enterprises, the vast majority of them, I would say when they have an issue like open vulnerability is you create a spreadsheet [00:27:00] and you go manually inventory all of your applications and then you go manually figure out what's in production and then you manually figure out what version of open SSL isn't there and then you go manually remediate that.

So if you think about, that's the universe where most of these organizations are starting with what is the input to the LLM? It's there's just a bunch of stuff running in production. That's the start point. Someone manually created that spreadsheet, but until you create that there was no data for the LLM.

It's just, there's stuff. But it's there.

Guy Podjarny: It's an interesting almost frustrating equation a little bit. The data technically exists. It is in API accessible systems. We have a technology competency that knows to take fairly unstructured data and connect the dots. Ownership of that data and ownership to the access to that data, because maybe they would do the legwork is complicated.

And without it. Getting like vast quantities of it, you you wouldn't be able to connect the dots. So you have to resort to this normalization. Which in turn gets you back to this no code land, though, that you have to [00:28:00] anticipate all the variables and something that really has too many variables to enumerate.

Armon Dadgar: So I don't think it's intractable. I think to me it's like how you have to solve this is, in some sense, first of all, we have to pull that data and make it more queryable in a sane way, right? Meaning, okay, if I can create sort of a representation that says, okay, here is all my resources that are running.

Here's all the packages that are running on those things. Here's the versions of the applications that are running. Here's the I'm going to tag something's production, something staging, something's development. Now I have a metadata that says, okay, I understand applications. I understand environments.

I understand what resources are associated with these different things. In some sense, it's a knowledge graph. If I can walk that tree from, okay, from production to application to these 50 VMs to the packages on those 50 VMs. I can start to normalize it where now I have a representation of this infrastructure as opposed to it's just stuff running.

And now that representation, I can start to answer an open ended set of questions, right? But it's, I've normalized, if you will, the, what does my infrastructure look like from it's just stuff that's running into a knowledge graph that is [00:29:00] queryable and interactable by an LLM in a bit more of a structured way.

And so to me, I think that then starts to enable that foundation of contextful, and then to your earlier point, Of how much am I going to sit there and interview the user and ask all them? Maybe the first 10 variables you'll ask them, but the other variables you'll infer them by mining the knowledge graph.

So even operating system, if I look at the knowledge graph and say, Hey, 90 percent of it is running RHEL. Okay. I can probably make an inference that we've standardized on RHEL. I don't even need to ask the user, but if I don't have that graph and it's just here's 50 VMs, I have no idea what's on those 50 VMs.

How do you make that inference, right? You have to know what's on the VM. And so I think that's, to me, it's once you have that, you can start to simplify the user experience. They don't have to enumerate all their assumptions. You can infer them. But also then the LM can do much more advanced reasoning.

Oh yeah, application foo is impacted by OpenSSL. Because we can understand that the relationship between these things exists.

Guy Podjarny: Ironically, I think if you have anywhere near that type of information at your fingertips, you're already pretty good at managing your modern infra. Like you're probably [00:30:00] quite up there.

So this might imply that it's something that further extends the gap. It fits more into the making the ones that are good, great versus dragging along the ones that weren't as good, at least until such time that someone can create an auto magic thing that plugs into all these APIs and classifies them, which frankly, in security has been happening with the world of CSPM, DSPM, all of those, a lot of them rely at least from a security poster perspective, which is only one lens on it, but they do rely on automated discovery of your estate.

And classification and all of that, that are like roughly correct on humans, of course, edit them and tune them. So maybe that can have someone jump ahead or a sort of leap ahead from having all that inventory to having AI classification of that inventory on top of which they can have AI workflows.

Armon Dadgar: Yeah, I think there's a lot of different techniques there, but I think it's about pulling it together and then helping these organizations frankly cross a skills gap that a lot of them have. [00:31:00] I'm getting there. Cause you're right. If you're an organization who had all of this, you're easily in the top percentile, I would say, of organizations in terms of maturity.

So how do you bring the rest of that bell curve with you? I think is the problem.

Guy Podjarny: And I guess maybe leaning a little bit into that further future, which is the AI native, cause it gave us a picture of today or tomorrow, but. Maybe, five years from now, when we think about something that is a native, I'm curious what your thoughts are around like autonomous DevOps of cases in which you can actually have that, that patching process or something else be done automatically.

How far do you think we are from something like that? Do you? What is like crystal ball. Do you think this is five years or is this 20, but also, I guess what would be deemed initial success is around workflows that we can say, Hey, we are autonomously doing this piece of modern infra handling.

Armon Dadgar: So to me, I actually, I think there's actually a few dimensions on which if I'm looking at the five plus year horizon, I think we'll see evolution. One is [00:32:00] I think in terms of the specification level and what I mean by that is, if I think about like raw cloud. APIs, SDKs, whatever.

I think about that as that was assembly language, right? Like you can make a direct API call. It's create EC2, VM. That's a very low level to operate. Then you had tools like Terraform, which are like, you moved up to Python, right? You just said, I want a VM. It's Terraform's job to figure out how to create, manage it, manage the sub, 20 APIs under the hood.

The next level up is I don't even want to really write the Python. I just want to write the specification said. Here's my, Java app, it needs a MySQL database, and it needs an S3 bucket. Great, it should generate me my Python, which I can review and potentially modify, but I don't need to directly author it.

And then, that's intermediating me from the assembly language. So I think on one dimension, it's, we're going to move up the stack in that order. What that enables though, is if I say, hey, the way I specified my app day one, My going back to the sort of life cycle was, Java app with MySQL with whatever.

Then now, when I start talking about the day two, day three, day four operational activities. To your point, how many of those can I now [00:33:00] automate away without breaking the contract you've given me? Your contract was this Java app with MySQL or whatever. Day two, I know, okay, there was a, I can watch the CVE list that comes from JDK, I know there's a JDK vulnerability that needs to be patched.

So you can look at it and say there's only so many operational workflows. A very key one is things like, patching and vulnerability management. So can I take something like that, which is a huge chore for most organizations, I can automatically subscribe to the feed of vulnerabilities or new version releases of Linux kernels or whatever.

And really the core should be okay. I should be able to take the app, take the new version of the OS and JDK, I can deploy you into a lower environment, like staging. I can look at your observability. If nothing broke. Your unit tests pass, metrics look consistent, plus minus a margin. I might then tee it up and give it to the user and say, Hey, I did the patch.

Here's the change in the code. I ran it in staging. Here's the delta in your testing and your metrics. Everything looks nominal, plus minus 1%. Do you want to push this button and merge it to production? And I think a system like that then gives you the user a lot of control and confidence [00:34:00] because you're not saying I'm going to go auto patch prod for you and maybe blow up your universe.

Guy Podjarny: Yeah. Eventually if you said yes, a hundred times and never had to say, no, you might say, you know what, just go ahead automatically, maybe under these conditions.

Armon Dadgar: Or maybe you segregate your tier one application. Maybe my core bank, I would don't, whatever my mobile API, okay.

Whatever. And so you might tier it tier one, tier two or whatever. And over time you build the confidence. But I think it's like, what's the amount of work when you think about day two, patching every app that exists in the world? How much of that can I automate? Probably like 99 percent of it, if not a hundred percent of it.

And that is a tremendous amount of value add when you think about just one very specific thing, which is just patching vulnerability management. Then you think about right sizing. I don't think I've ever met a developer who right sized any app, ever. And it's it's tedious. You're going to deploy the app.

Then you're going to go look at telemetry metrics. Then you're going to go try a different size out. It's just Who's going to do this, right? So there's a whole bunch of automated tools, but the whole process of saying, okay, I'm going to try it in an automated way. Maybe I think I should be at five VMs instead of 10.

So I'm going to incrementally delete VMs, monitor it, make sure [00:35:00] nothing breaks. It's just tedious and time consuming. But if the AI could just do it for me and say, Hey, I saw that you were underutilized. You're at 2 percent CPU utilization. I right size you from 10 VMs to five and I stayed within some guardrail of CPU utilization stays under 50%, do you care?

Probably not like that type of day three, right? Size and cost optimization. You can probably have it fully automated. So when I started looking at those three day four type actions, it's a lot of things like that, right? It's runbook automation. Hey, you saw my logs are getting, my disc is getting full, based on my Datadog observability or whatever, just go run Ansible automatically and clear the log files, right?

You don't need to tell me to just go do it.

Guy Podjarny: It feels a promising future and it's interesting that it feels like a place in which a modern startup that creates themselves can tee themselves up to automate a bunch of these processes on it more so than an enterprise that has a messy environment.

It's a little bit like the advent of infrastructure as code. If you do it from the beginning. You find yourself in a very different place, but once you've [00:36:00] allowed drifts to happen and not even drift you've just done a bunch of things manual, it's actually very hard to wrangle it back which by the way is another piece that a bunch of AI DevOps startups are trying to do, which is I will read your state in production and I will generate infrastructure as code to do that.

Once again, needing to be manually verified, but they're correct. But I think I like the the destination. It's clearly very aligned with with what we do here at Tessl. And it boils down to these small set of things, which is I need. A specification that says, what is it that I want to do with a certain level of degrees of freedom around how much is within your remit to decide versus how much do I want to dictate?

There is some generation ability of the actual code, which as we discussed in Terraform, especially is, it's more doable even than in other types of code. And then eventually when deployed, key, key, key here is the ability to verify and to see that nothing broke, that it is okay. And and I think that's true in many things.

Operational, but I think what's interesting in operations is that the best practices in [00:37:00] this world actually have handled this, not just by understanding all the variables and anticipating everything, but also through these canary releases of, Hey, just apply this to some small fraction of your traffic and then monitor to see nothing significantly deviated and then go on and on.

And so a lot of this is around existing best practices at scale. And it's just hard to fit it in if you don't do them all. Okay. Which sort of smells of digital transformation type motion where you really have to go in. It's a little bit harder to dabble.

Armon Dadgar: I think that's 100 percent right.

It's the, the way you put it before is the divide between the haves and have nots gets wider. And I think the enterprises that grew up on ITIL sort of ticket driven manual process. For them, it makes it that much harder because I think if, to your point, it's like the ones that started DevOps the native or infrastructure as code native, however you want to think about it for them to leap into the sort of world that I'm describing isn't that hard because the fundamental building blocks will be the same, right?

At the end of the day, the higher level specification will generate [00:38:00] infrastructure as code. The infrastructure as code can also help synthesize what the higher level intentions are. But if you're starting from a universe of no intentionality, basically, it was just a bunch of tickets that created a bunch of infrastructure, then, yeah, there's tools out there that will reverse engineer.

But what's always missing is intent, right? I can't discern what's an accidental property of your environment versus something that was an intentional property of your environment. Was the S3 bucket public because it was a mistake or you intended it to be public, right? There's almost no way, right?

You're trying to put Humpty Dumpty back together. From a thousand pieces and you don't know what Humpty ever looked like. And I think, and the problem in a lot of these big enterprises is nobody does. Because this infrastructure has been running for 30 years and the people who provisioned it have been retired.

Guy Podjarny: I think that's true. And I know, again, the good news is AI can slightly help you with that as well. They can go there and auto classify and learn a bunch of these things, but it's a process that has to happen. I think so. So we've painted a timeline over here. Starting from the AI generated infrastructure as code or summarization on it on to [00:39:00] hopefully like a fairly imminent contextualization of that generation doesn't feel that far out or at least some of the base level of it and then can evolve into it into maybe these sort of native autonomous workflows that maybe they do require having all the pieces in place to enact.

Let's talk a little bit about people just to end is what does this all mean to someone working in this domain? If you were an SRE Today versus an SRE in five years time. I don't think it's more than that. I think it's five years would already be substantial. I guess what's your view around the difference in skills you might need?

And, if you're in that position right now, what would you go learn and invest in?

Armon Dadgar: I would say this is like a three to five year timeline for a lot of the stuff we're talking about. It's not a 20 year timeline. It's interesting because I think it actually, if you think about what infrastructure as code did for a lot of SREs, Sort of moves you out from I'll call it sitting at the receiving end of a ticketing queue and saying, Hey, you're going to manually go orchestrate things.

To really what your job was to really design and assess what the outcome was, and then to some extent translate that to things like infrastructure as [00:40:00] code, and build automation and pipelines and things like that around it. So what this really does is actually it's just a longer lever for those people, right?

I think the people are already thinking that's what their job is. If my job is design, assessment, optimization, I'm going to give you a much longer lever. If you think your job is writing of Terraform code or execution of infrastructure activities, then in many ways, it's some of the stuff will obviate that, right?

Because you'll move away from really, even today, we shouldn't be in a place where we're manually filing tickets and someone's updating a firewall rule by hand, right? A lot of that happens. We still see a ton of that in industry. But really, we already have the tools when you think about things like infrastructure as code that you shouldn't be doing that.

The next level up is should you even write the infrastructure as code, or should you just manage those set of intents and specifications? So I think from a skill set perspective, what's still relevant is you have to know the domain. You have to know what a firewall is. You have to know what a VPC is.

Guy Podjarny: And these are the implications that the LLM might not know as well. You have to know that Public bucket is less secure than a private bucket. Okay. Maybe that's basic, but and yeah,

Armon Dadgar: but I think what's key is it's the domain knowledge that's valuable of the SRE [00:41:00] and then helping focus on an automation driven approach of how I'm enabling the business to achieve its outcomes.

Cause infrastructure at the end of the day, nobody cares about infrastructure as an ends. Infrastructure is a means fundamentally. So if you put yourself in that role as necessary to say, okay, what I'm delivering is a means for my business to deliver its applications or to operate its business or whatever.

How do I do that most efficiently through automation, through an upleveling of the specification and apply a certain domain knowledge to make sure we're doing it in a way that's secure, compliant, cost effective. Then this world becomes a much longer lever for you.

Guy Podjarny: We had a couple of conversations on the podcast around the evolving role of a software developer of an app dev in this and talked about these sort of two routes one which they go more towards the product management side and understanding what the business needs and the other goes more towards the architect route. And and seeing things at a higher level and it's possible that in the world of SREs they need to do both because really a lot of the work converges and so they Need to have the domain expertise, look at things [00:42:00] architecturally to be able to make those trade off decisions through their subject matter knowledge, but also understand the motivations of the business and what are like, what is indeed the current initiative and trajectory, which I guess I hope for many people is a positive transition, but it is yet another jump for those who really are sysadmins at their core.

It moves them further and further away from that.

Armon Dadgar: Correct. Yeah, I think that's fair. And I think that notion of when you look at the very best SREs, they really are some combination of a PM and an architect because that's what they're doing. And they're translating the business's needs and requirements into an architecture and then into a roadmap from the infrastructure perspective.

And I think it's you're right. That's what it'll become. I think in app dev, there's a stronger distinction between a PM, which is more user customer facing versus an architect. In infrastructure or your user customers, the internal app. So the architect and PM were one of the same to a large extent.

And I think for the sysadmin folks, I think the nice thing in some sense is some of these tools will actually be, I think a lower learning curve I would expect [00:43:00] then, maybe the jump from sysadmin to infrastructure as code in some sense was a higher bar because you have to actually learn a bunch of syntax and learn, a new set of tools and workflows where it's in some sense, going to the AI driven world of it, it'll be a little bit more NLP driven.

But what there's no escape from is actually your value is you still have to understand a much broader domain. If all I was doing was sysadmining and updating firewall rules, my domain that I understood was a very small domain, you're probably not going to want an architect that only knows about firewall rules, it's a bit too narrow.

So I think your scope, necessarily, I think, I'm giving you a much longer lever. In some sense, I expect you to be lifting a bigger rock.

Guy Podjarny: It also feels like the ability to be data driven or to understand, Some of those broader implications is important. Even a sense of some form of gut feeling around the types of mistakes that the LLM might do, because those are really the two ways in which you don't get yourself in trouble when the LLM performs or the AI performs more of the actions is one is be able to observe that with data.

And so you probably need to sharpen that [00:44:00] skill even more than you do today. And two is maybe a little bit anticipatory, just like any experienced SRE, we'll have this sort of tingling feeling of Oh, I think this is going to cause trouble. Like I think the system will not like this, choice of allocation.

They would benefit, from having some form of a gut feel for how the LLM might respond or act in those systems, like with any technology.

Armon Dadgar: And I think a lot of it will come to process design. And what I mean by that is. If you think about, Hey, I test everything by deploying to production.

Okay. You're vulnerable to a whole class of mistake. If that's your process versus if you say, Hey, I have this more regimented, I send it to test, then I promote to stage, then I promote to prod and we have a data driven validation in each of those environments, or maybe we're doing, traffic shadow into staging or whatever, there's a process design aspect of that, which actually would shield you from a lot of those types of mistakes and LLM might make because you're going to catch them at an earlier phase versus, YOLO deployed from prod, we caught it in prod and it's same thing with a human developer writing code, there's some false positive rate of them writing errors or whatever, you're not going to [00:45:00] catch everything.

So that's why we have staging environments. So I think it becomes, there's even more pressure on being that, doing that in a bit more of a rigorous. Disciplined way where I think you know, some organizations are really good at it and lots are fairly sloppy, right?

Guy Podjarny: Yeah, and I guess the positive sort of spin on that is that it's more about automated processes and I think more of these steps can can pass automatically because the hope is that LLMs take away the grind, take away the the grunt work that we have to do, but they do require us to, understand what we want better and to define the methodology to achieve it.

Exactly. Armon, this was a great conversation. Thanks for the the insights of where the world of infrastructure as code is going and this with respect to to AI and sharing a bunch of these learnings.

Armon Dadgar: Yeah, my pleasure. Thanks so much. And looking forward to all the stuff you guys are doing to make some of this stuff a reality.

Guy Podjarny: We'll try to help a little bit with this domain as well. And thanks everybody for tuning in.

And I hope you join us for the next one. Thanks for tuning in. Join us next time on the AI Native Dev brought to you[00:46:00]

by Tessl.

Podcast theme music by Transistor.fm. Learn how to start a podcast here.