Transforming Software Testing with AI: A Chat with Itamar Friedman from Codium AI

In this episode of the AI Native Dev Podcast, Guy Podjarny chats with Itamar Friedman, the co-founder and CEO of Codium AI. Discover the transformative potential of AI in test generation and software development as Itamar shares his expertise and vision for the future.

Episode Description

In this episode of the AI Native Dev Podcast, host Guy Podjarny sits down with Itamar Friedman, the co-founder and CEO of Codium AI, a leading company in the AI test generation space. Itamar brings a wealth of experience from his diverse background, including his work in chip verification at Mellanox. Before founding Codium AI, Itamar held significant roles in various tech companies, showcasing his expertise in AI and software development.

The discussion delves into the intricacies of AI test generation, exploring how AI can enhance different types of testing, from unit and component testing to system and end-to-end testing. Itamar explains how AI can assist in test planning, generating and maintaining tests, and the distinct roles of functional and regression testing. He also addresses the challenges developers face in trusting AI-generated tests and outlines the path towards autonomous AI test generation. This episode is a must-listen for anyone interested in the future of AI in software testing and development.

Resources

Chapters

  1. [00:00:17] Introduction
    • Guy Podjarny introduces Itamar Friedman and Codium AI.
  2. [00:01:14] Understanding AI Test Generation
    • Itamar explains the landscape of AI test generation and its importance.
  3. [00:03:59] The Role of AI in Testing Strategy and Planning
    • Discussion on how AI assists in test planning and strategizing.
  4. [00:09:19] AI in Generating and Maintaining Tests
    • Itamar breaks down the process of AI in test generation and maintenance.
  5. [00:18:04] The Distinction Between Functional and Regression Testing
    • Differentiating between functional and regression testing and AI's role in both.
  6. [00:26:30] Challenges in Trusting AI-Generated Tests
    • Addressing the trust issues with AI-generated tests and how to overcome them.
  7. [00:38:56] The Path to Autonomous AI Test Generation
    • Exploring the steps needed to achieve autonomous AI test generation.
  8. [00:47:15] Reimagining Software Development with AI
    • Discussing the broader implications of AI in software development.

Full Script

[00:00:17] Guy Podjarny: Hello everyone. Welcome back to the AI Native Dev Podcast. I am Guy Podjarny and today I'm here with Itamar Friedman, who is the co founder and CEO of Codium AI, who are, probably one of the top leaders in the AI test generation world. We'll talk a lot about that and they do a lot more than that.

[00:00:33] Guy Podjarny: And has a very kind of varied and interesting background. Notably has worked on verification in Mellanox, and sales before, working on sort of chip verification and, domains in which, testing and making sure it, you got it right, is a lot more,serious or taken a lot more seriously than in software.

[00:00:51] Guy Podjarny: Itamar, welcome onto the podcast. Thanks for joining us.

[00:00:54] Itamar Friedman: Thank you for having me and then congrats on the new exciting podcast.

[00:00:59] Guy Podjarny: Definitely an opportunity to have conversations with interesting people like yourself. So I guess,let's jump right into it. tell us a little bit about what is, AI test generation. And then later on, we'll dig into. It's what Codium does within that world. But what is AI test generation and why is it a topic now?

[00:01:14] Guy Podjarny: Like why, how does it work?

[00:01:16] Itamar Friedman: Yeah. so first of all, testing or verifying that code works as expected is like a huge problem already. Specifically, I can tell you that I was in charge, at least 49 percent in charge of a bug and one of the corporates I worked in that cost the company $8 million. It's just an example,

[00:01:33] Itamar Friedman: and by the way, I think like it's one of the biggest bottleneck and productivity. I think like we, many of us experience that 50 percent of the work that the user sprints, sorry, the sprints are user tickets as related to bugs, etc. so it's already a problem and testing is one way to try to,overcome this problem, trying to reduce the amount of bugs that are introduced, the low quality code, etc.

[00:01:57] Itamar Friedman: having said that, I think testing is, it's just, one word that describes a full field. There is many types of and classes of testing. I'll keep it short, just to start with, like the unit testing, component testing, integration testing, system testing, end to end testing, etc. and I think like in many of these,

[00:02:14] Itamar Friedman: there are some, inefficiencies and many, if not 100 percent of, of these testing types. There, there are inefficiencies, that AI could help with. Some of them is like actually in the plan, in the planning of the test, like even the strategy, strategizing how to test. Some of them is creating data for testing.

[00:02:33] Itamar Friedman: Some is generating code or scripts,for testing. Some are actually could be like around AI selecting the right tests, to run, for a certain, user story or a certain change in code. So AI, could really, boost, and make it feasible to, to exploit testing, which we couldn't do before.

[00:02:55] Itamar Friedman: I think, everybody would want to do testing. I think everybody tries to use testing and I think it, we didn't solve the problem of low, bad quality code or bugs in production, all of us experienced, like I mentioned. So I think that AI is about first AI and testing is about first identifying, what, how to test your code.

[00:03:15] Itamar Friedman: What is the right strategy? And then again, applying AI to generating tests and maintaining those tests over time.

[00:03:22] Guy Podjarny: Yeah. those are really important distinctions. So we talk about AI test generation and maybe, I, I just leaned into the, Hey, it writes a bunch of like unit test code on it, but you make very good points, which is, it's actually broader than that. And, choosing what needs to be tested, where does it test?

[00:03:39] Guy Podjarny: And I guess in all of these cases, the reason we say AI versus planning is because of the fluidity. why do we need AI for this? In theory, software is, fairly deterministic, right? it runs well. Like why, what is the unlock that AI brings us that we didn't have in more deterministic systems?

[00:03:59] Itamar Friedman: You mentioned my past experience working on chip verification, etc, and hardware verification. So I think like there, when you think about a certain module in the hardware, it's mostly described with a very clear specification with bits in, bits out, literally bytes in, bytes out. And you can even use formal verification, like mathematical proofs, and in order to say, okay, this works as expected, but I think that with software, although it's like you mentioned, it's deterministic,

[00:04:26] Itamar Friedman: I think the input output is human in human out. The definition is could be like given a PRD or like product requirement, definitions or document from a product that night might not be like, for formal formally covering all the happy paths, edge cases, etc, might actually be even vague of what it means.

[00:04:46] Itamar Friedman: And the same goes, I can describe the output. So basically, the fact that software in most cases, the interfaces and the outcome is supposed to serve people as human in, human out,brings the opportunity to, and actually the, I think the must have AI to cover those, so those cases where you need to communicate with people.

[00:05:05] Itamar Friedman: So I think that, it's, if you really try to cover. all the options, like you're trying in hardware, covering all the options, or at least sufficient options to make sure that you don't have any bugs, then you're having two problems doing that for software. one is that it's really inefficient.

[00:05:22] Itamar Friedman: It could be like, a nightly job for testing everything might not cover all the options that you want to cover, maybe differently than a certain model, and module and hardware. And the second thing, it's it's not necessarily clear, what are exactly all the options,that you need to cover.

[00:05:39] Itamar Friedman: And again, like AI could help you in and in both, like deciding what are those,like the. happy path and edge cases are really worth, testing. And then I actually helping to generate those tests, which is most people, like you mentioned, thinking about that. AI will help you generate tests, but what are the good tests?

[00:05:56] Itamar Friedman: What is a good test? and then, eventually like also choosing efficiently, by the way, like a small story related, related to what we just talked about. Like I heard a friend, like actually a client eventually, saying that, like he pushes his team to have 80 percent code coverage and then one team reached 85 percent code coverage, and he was like really surprised he went checking and he saw that there's no asserts Like basically tests were covering a lot of lines of code, but there weren't

[00:06:29] Guy Podjarny: We weren't asking any questions in the process.

[00:06:31] Itamar Friedman: Don't really check anything except that if it fails, like it crashes or not.

[00:06:35] Itamar Friedman: Crashing it will check, but not if the outcome, functionally wise is working as expected. I think, that's something, for example, that will almost never happen. Won't happen, like in hardware verification, because the definition itself, input output, but could happen to teams in software, so that's a, even the fact that software is deterministic, there's still a lot of test smells and know how on how to do testing, and AI could actually help with that even.

[00:07:01] Guy Podjarny: Yeah. So interesting. So I think you mentioned maybe two, I hear two big categories, over here. One is, it's this gap in the definition of what's right and what's not right. And so AI can help us, deduce from the code. What is it that you intended to do, maybe with the help, and we're going to talk a little bit about attended versus autonomous test generation, but it helps you understand that gap, which is it hasn't been stated precisely, unlike maybe in chips, and their definition, what is it that is the correct behavior, and so it does it, and then the second is, the kind of infinite realm or overly large realm of possibilities that software allows because of its flexibility.

[00:07:45] Guy Podjarny: Help us choose the paths that, that matter more, and that we should focus on or we should, that maybe represent broader, sequences. Is that right? Like we haven't gotten at all into a generation of, I guess all of this is not yet necessarily Gen AI. This is just like AI. You might, it might manifest as Gen AI, but it,it is still around insight. Correct, get into action

[00:08:08] Itamar Friedman: Correct. Because let's start with the second part, because basically you can't, it's not most software is not just about a certain algorithm, where, by the way, hardware is like almost, you can almost like,do set a code of algorithm even for let's take a quality of service module in a NIC, like in a network,like card, etc.

[00:08:28] Itamar Friedman: It's almost like writing an algorithm. You can try to prove it. And while in many other type ofsoftware like business logic of different kind, you can't like literally explicitly write some mathematics to, in many cases, to prove it. By the way, it may be in crypto, whereas there, I think, opportunity for AI, by the way, to introduce like more formal.

[00:08:45] Guy Podjarny: Because it's more formal.

[00:08:46] Guy Podjarny: Yeah. And it deals with things that are, more.

[00:08:48] Itamar Friedman: So it's not like you can cover, it's not that trivial to analyze what is the minimal, flows that you need to cover in order to, promise with a pack, for a certain error, you're 99 percent sure that you covered everything. So definitely, it's a big part. But, and also about the generation part, I would, I claim that actually, writing, generating like the event, the eventual 10 lines of tests is not the hard part of, about writing tests.

[00:09:19] Itamar Friedman: I'm not saying that AI can help there. It definitely can help there. But actually. More important is what is what you want to test? What's the, the business logic? What's, let's describe it first in natural language description. What is you wanting to describe? And then you can use like an another step in, in the flow of using AI to actually generate the code.

[00:09:40] Itamar Friedman: Let me say it in different words. When we're talking about AI generating tests, roughly speaking, dividing into two. What do we want to test? And then generating that test. I'm claiming that generating that test is a challenge, but actually a harder part is describing a natural language description. What is that we're trying to test here?

[00:10:01] Itamar Friedman: What are you trying to test comes from what is the intent? What is this piece of software trying to do? You need to fetch this information from specification, from your documentation, from other pieces of code. And then. How you want to test, what's the flow? You can still describe a natural language description.

[00:10:18] Itamar Friedman: What is the data? Do I need to pull, push, pull it from the database to, to bring the right, where's the right structure to, to put, what is the right output? It's not yet the code. It's like a logic that you need to write. And then eventually you can convert it into code, which could still be challenging because you need the right imports and the right mocking and, and things like that, but both steps are, and are very challenging.

[00:10:39] Itamar Friedman: And I would claim that even the first is more challenging.

[00:10:41] Guy Podjarny: Yeah, so it sounds like it because it's a broader space, but, there's maybe like a bit more margin of, error here or flexibility, you can make maybe seven decisions while when you generate, there's an expectation of, accuracy, right? Maybe let's dig into that. So we said when we talk about AI use in testing, it starts from understanding what it is that you want, to do.

[00:11:03] Guy Podjarny: So that's one. And then there's another layer that is still before the gen part, which is understanding, what tests need to be generated to be able to verify that, there are an infinite set of possibilities. So what type of data, how does the system behave? What are the states, like all that information.

[00:11:21] Guy Podjarny: And then subsequently, after all of that, you need to go off and generate a test. And that's the, even newer field. Cause I guess AI, there have been a few kind of, startups in the space trying to do the AI, insight piece before. And I think GenAI was the unlock around making it usable, right?

[00:11:38] Guy Podjarny: Because if the, if a system comes back to you and says here, I figured out the thousand tests that you should write, you're going to say, thanks, but so long, I'm not going to do that. And so I think that last piece of making it usable and actually being able to generate those tests,

[00:11:53] Guy Podjarny: is a big deal, no? that's the, that's at least the Gen AI unlock that allows companies like CodeMAI to come in and be useful to, to, a customer.

[00:12:00] Itamar Friedman: You know what? I'll make it interesting. I think I disagree. I'll, but I'm not doing it just to make it interesting. I'll give you an example. Let's be concrete. let's say, for example, we talked about different types of, of testing. We're talking about,for example, we talk about end to end testing and let's say that we want to do the end to end testing on the UI.

[00:12:20] Itamar Friedman: Okay, let's, again, I want us to be able to imagine. Think about an e-commerce, I don't want to say a name, but choose one of the e-commerce that, that you use. And now let's say that they're developing a new feature with a click of a button. it automatically, let's think about 2024, automatically collects, you want to buy a certain item and they want to deploy it.

[00:12:38] Itamar Friedman: This is a new feature that with a click of a button, it collects,your, similar product that you bought and will bring you to the, to buy something that is like this item, but already choosing the right color for you when displaying you the item to purchase, on the buying page. I,you could say okay, maybe the hard part, if I take what you say, like maybe the hard part is actually generating the script that,that, that creates, I don't know, the playwright, the Cypress, whatever script that actually runs, run this kind of, of tests, but I claim that actually the harder part is to think about all the relevant flows that I want to test this feature and think about the work of a QA tester.

[00:13:21] Itamar Friedman: Like usually, like the more it goes lower to the code, like the unit testing, component testing, integration testing, you see the developers doing that work. And the more that you go to the end to end, mobile and interface, you see more like a QA team or a QA automation team. By the way, I think that's going to shift left, thanks to AI, but maybe not for another point.

[00:13:40] Itamar Friedman: The first thing they do, and I think it's actually one of their hardest, is think about all the flows that,that needs to be checked. Eventually, I claim that when a when a bug is being eventually introduced to production, it's not because they didn't know how to write the code or they didn't have the time to write the code, they just forgot or didn't think about different flows they can come with.

[00:14:01] Itamar Friedman: Oh, there was already one of these items in the carts and I didn't think about checking that option, etc. So I know there are solutions, and I know there are products, but I actually claim that Gen AI, not just AI, has the opportunity here to, to reinvent this field as well. For example, if, if you give a Gen AI the opportunity to crawl, this already exists before Gen AI, to, to crawl over,different parts of your website and understand user behavior, and then generate flows, like invent flows, take the description from the product manager, and take all the experience of crawling the website prior to that, and generate flows of testing.

[00:14:42] Itamar Friedman: And then eventually also generate the test itself. So I just I'm claiming here that GenAI is also going to be introduced in the first part of analyzing what you want to test and how you want to test, and I think it's very important.

[00:14:53] Guy Podjarny: I think that's,I don't know if we're entirely in disagreement. I guess my sense is that this last part is not the hardest, but it is the grunt work that if you created a thousand of those items that will not get done. What we've seen at Snyk, for instance, was, if, if you came along and you said, here are

[00:15:09] Guy Podjarny: a thousand dependency updates that you need to do, and they are very easy to do or relatively easy to do. Maybe there's a check around the system. If you don't actually, you really need to hand it on a silver platter of, here's the precise change that you need to do to mobilize people and, even go to the distinction to the level of having that be automated and done for them.

[00:15:31] Guy Podjarny: And it's not out of any ill will. And it's not because it's hard, but it's because it's tedious. It's because it's not a task that people want to do, and they definitely don't want to do it repeatedly. So I agree, and it's interesting maybe to have us define a bit the Gen AI piece. I don't know if that's the most useful thing for us to do, there is the Gen AI piece of the LLMs of understanding

[00:15:53] Guy Podjarny: the system which we're using. So that is a new breed of AI, which is bucketed into GenAI because it's a part of this sort of new breed of technology, but it's not generative in nature. It is more analytical still, and doing it. And then you have the generative piece of, okay, now I'm actually creating a thing that would, flow through them.

[00:16:14] Guy Podjarny: But I think point taken, which is you have to understand the applications. You have to understand what was meant to be correct. You have to understand. these sort of possible states and the flows within the system. You have to decide that, and then at the very kind of last step, you need to actually maybe write some code, or Selenium scripts or whatever it is, some flow that would actually execute and interact with the application.

[00:16:39] Guy Podjarny: And then,

[00:16:39] Itamar Friedman: By the way, I agree. I agree with you. I'll, I'll say, I'll give you a concrete example. Like I think it's interesting one, so very shortly. So I agree with you, like in some cases you know what you wanna do, but actually executing that GenAI like actually enables that. And I'll give you a concrete example.

[00:16:55] Itamar Friedman: We talked a lot about testing, but actually one of our second, or maybe I would say it's hard to say which one is our strength point at Coding AI is code review and actually working around like increasing the quality of a pull request. Almost another, and here is an example that fits really well to what you described.

[00:17:21] Itamar Friedman: In many cases, companies, they already know that they need to revamp a certain, a certain old,they want to modernize, if I call it like an a bit like Gartner, word. They want to modernize a certain piece of their code. And they already, by the way, have symbolic or other, type of, tools that are not necessarily AI that tells them, give them a recipe on how to modernize a certain,a certain,guess bring two to three or whatever it is, some,but then, yeah, but then like they need a gen AI to close some. some of the final, final changes. They need AI to make a final review. They need like the last mile of what you mentioned. I know what we want to do. We want to modernize. We want to revamp. We want to refactor this part. I know that's what I want to do.

[00:18:04] Itamar Friedman: Because what I claimed that AI could help you understand what to refactor, just an example,what we talked about. I know what we want, what I want to do. But actually taking the last mile on closing some, additional piece of code like a few lines of code that are not part of the recipe and also making like giving a last review You need AI because I probably, in a big corporate 10,000 developers, I probably need to do it so many times I have to make it efficient for the reviewers to already have AI to close, to review the first few things so that's one of our use cases and it fits to what you say so I also grew a few so I would say that's probably how the world divides those use cases and challenges where it's actually hard to understand what actually you want to test, what actually you want to change.

[00:18:46] Itamar Friedman: And then there is that the part that actually if you don't have the generate, generated part to, to execute it, you, you can't actually officially,

[00:18:54] Guy Podjarny: Provide the value on it. So maybe let's shift gears a little bit and talk indeed about the types of tests that you've, mentioned in passing over there, because I think probably some of this, the importance of different aspects of this differ based on the types of tests. You rattled them off, but what would you say are the primary categories of tests?

[00:19:13] Itamar Friedman: I'm going to talk a bit,sharing about different type of tests. I also like shake a bit the boat, about their importance. I think that there are times that people described, levels, layers of testing with a triangle and sometimes as a diamond. I think originally the triangle was the most common,shape that was accepted and I'm, which means that on the bottom,you have the unit test, and then on top of that, you're building like component tests.

[00:19:40] Itamar Friedman: It's like a bit more than a unit. And then on top of that, you have, sorry, just for the sake of some of our, fellows listener, although I guess 99 percent aware of unit tests, it means like taking four lines of code, taking the smallest component, smallest element of a class or even, it could be even a sub function with, within a function or so, and trying to tie, the, test that will verify that this piece works as expected, isolated.

[00:20:07] Itamar Friedman: The most isolated way would probably work the best for it for unit tests. They're fast and then you have a bit on top of that like component tests usually like more if you want like this rule sound like a class like where you can even describe like a the shortest flow in your software And while

[00:20:24] Guy Podjarny: Unit tests tend to be very technical in nature. Component tests start to introduce some intent. Yeah.

[00:20:29] Itamar Friedman: So there's some intent, but an intent is relatively still very technical, very like software, a certain, very small piece of software. I very agree with, I'm in line with you. And then on top of that, you're starting to start to deal with microservices. It could be like traditionally four components, but it's just like a nice number, but it could be two components connected together.

[00:20:48] Itamar Friedman: It could be 100 components connected together, but it's a microservice. It's now actually serving some business, full business logic. A full application could be one microservice to many as well, but it's one of them, actually performing some service. And there, in these cases, usually you have, integration tests, etc.

[00:21:06] Itamar Friedman: There are more types, but just an example. That's the middle of the triangle. On top of that, you have system tests and end to end tests, UI testing, etc. where you actually, What you do is you try to test the entire application, the entire system, etc. There are more dimensions and more additional,different type of coloring we can do here.

[00:21:29] Itamar Friedman: I didn't talk about functional tests, non functional, and things like that. But that's roughly the one way to look on the track. And I think some people, would say, and I think it was more and more common and between 2010 to 2020, that actually the right way to look at is the diamond and most like value is in the integration test.

[00:21:48] Itamar Friedman: And I think, one of the reasons is because it's, sorry, it's f**king hard to do unit testing because it's so rigorously, you need to go function by function. It's also hard to maintain it. But I claim that actually, AI might change that a bit. And make it easier to do unit testing and make it easier to maintain them.

[00:22:07] Itamar Friedman: That maybe eventually like we will go back to that, that triangle is the way to look at it.

[00:22:13] Guy Podjarny: I think that division makes sense, but it, it always refers to just the unit size or the kind of the scope that you're testing. What about the notion of regression tests versus functional tests? I think we both put aside the non functional tests. So there's just performance testing, security testing.

[00:22:36] Guy Podjarny: Yeah. A variety of those types of tests, I guess we're putting them out of scope for this conversation for a moment, but, within, I always find important to talk about the source of truth, right? Because we talk about how AI helped you understand. the intent or, what is the,what is it that you're even supposed to ask?

[00:22:53] Guy Podjarny: And what is the correct answer? What is the correct behavior? Although those are very functional, that's, those are hard things, and they all revolve around the source of truth being the user's intent of some sort, but regression tests are different, right? Regression tests, the source of truth is really the previous version and it's almost, you even want to preserve the bugs,

[00:23:13] Guy Podjarny: on it. I guess do you see them as the same? How do you delineate the two in concrete terms?

[00:23:20] Itamar Friedman: Yeah. I love this topic. Really roughly speaking,I would like, I love your definition, like what's the source of truth. We're really, roughly speaking, we can, when we're looking on that dimension, we can divide into two. One is that we want to test a new functionality and, and it could be different subtypes of tests there.

[00:23:40] Itamar Friedman: And, or we want to test that,we're not breaking anything. And there could be a few types, there. For example, regression test, is like checking, specific behaviors, like the functionalities and the integration level, unit test level, that what, what we knew that already should have, work that we didn't, break anything, like there's no regression.

[00:24:02] Itamar Friedman: We could also, someone can put also, smoke tests and others also in this category, basically I'm making some changes in my code, and I just want to check that, some simple, I don't know, inputs provide me some simple output. By the way, these tests, for example, smoke tests, etc.,

[00:24:18] Itamar Friedman: are not necessarily, checking behaviors,intensively. But they checked you didn't break something big, like the basic functionality is working. Yeah, so I think definitely, it's worth considering that in general. And also when you're talking about AI, why? Because when you're using AI and you're talking about more on the regression testing, then the code is like a very useful source of truth.

[00:24:44] Itamar Friedman: It's you're saying this should work. How could I test that? But, when you're talking about testing new code, that you want to enter, actually your source of truth that you need to relate to is the, natural language description. It could be like a structured one, like BDD, behavioral, like a business description of certain types.

[00:25:03] Itamar Friedman: But I think that there's a lot,AI, can really enjoy from the fact that code is a formal language that you can actually compile. When you're talking about new code, that is the source of truth is the natural language description that might be formulated to some extent, it's still not formal language.

[00:25:24] Itamar Friedman: And then it's brings opportunity for mistakes. So definitely when you're thinking of either building a product, to generate tests,with AI, or you're using AI, to generate tests,you will see different, even behaviors, sorry, I want to say different behaviors, but even might be a different UX UI, right?

[00:25:42] Itamar Friedman: Where does it reside? Like in the IDE or in your JIRA, but also the interface itself. one might, might look like much more rigorous and one might look like more a chat, just an example.

[00:25:53] Guy Podjarny: Yeah. I think,so maybe let's,switch a little bit to talk about trusting AI generated tests, and it's a tricky topic. So we classified these, types, or you classified these types of tests. That grow in scope from the unit to the component to the microservice to the integration. That clearly requires a more holistic understanding of the system and a knowledge of it. And then there's regression tests, and maybe there are smoke tests, maybe they're a little bit easiest to know what's failure. In all of those cases, you need to know what is correct. And the problem is that typically to know whether it is correct or not, we use tests.

[00:26:30] Guy Podjarny: So if you wrote the tests, then you can do all sorts of wild stuff, behind the scenes. And eventually you know, you would generate something and you would test and you would see if it's correct. but if you're using AI, if you're rolling the dice to generate the tests in the first place, how do you know what to trust?

[00:26:46] Guy Podjarny: What is the means of verification? The trust, but verify, the verify is supposed to be with the tests.How would you define the way that people tackle that concern today? First of all, do you agree with this being a challenge? And then, how should someone deal with trusting that

[00:27:07] Guy Podjarny: AI generated tests today.

[00:27:09] Guy Podjarny: Yeah, so first of all, I very agree with that and immediately I can't not stop thinking about different memes and proverbs like, hey, who QA's the QA, and saying like that. in essence, this is a very,Someone needs to be the authority eventually of saying this is correct versus not.

[00:27:28] Itamar Friedman: That's a very known like a term,sorry,like a problem, like who QA's the QA, who,how do you trust that a QA actually QA'd everything, etc?

[00:27:38] Itamar Friedman: So it's in essence, I agree with you, like eventually you, you need, the buck needs to stop somewhere. And the question is, does it stop in the AI or stop in the developer? It's it's one of the questions that raises. But, and another, before I actually try to answer or more seriously relate to that, I want another, other aspect to say like food for thought.

[00:28:00] Itamar Friedman: I think basically what we're talking right now, I can ask differently. Like, how do we solve software? I think if it's almost the same question, what you just answered, what you just asked me. How do we solve software? How do we like generate software that just works? I think it's almost the same question because that's what testing is meant to do and code review.

[00:28:19] Itamar Friedman: And by the way, solving software, it's, again, it's not just working code, it's performing, it's secured,it's maintainable, it's extendable. It's according to our company's best practices. It's according to our values of the company. I'm not kidding. I think it's, oh, software development actually relates to all those companies values, especially in the era of AI.

[00:28:39] Guy Podjarny: Tests, but tests are like, I agree the tests are a necessary part of how do we solve software, but they're a part of it, solving software also requires you know, better definition of what it is that is correct also implies thinking about the evolution of software and how does it change over time.

[00:28:57] Guy Podjarny: Of course, it implies the actual writing or generation of the code and, and, a variety of other aspects of software. I guess I'm, like, we still have a low resolution. A critical part of it is that whatever it is that you do, you need to be able to say, is it correct or not?

[00:29:13] Guy Podjarny: But it, it doesn't, you can solve this and not yet solve software. You have other

[00:29:18] Itamar Friedman: The point you said that was my point, I'm saying, eventually testing is a part of it. you want tests because you want to know that your software works, but I claim that, that testing is that's our common way to do that, our, that's what we're biased towards, because that's what we did for 20 years.

[00:29:40] Itamar Friedman: But probably A, it's not the only way, and B,it's probably, you're still, it's a sub problem. you want to check that your code works, but eventually, why do you want to check if your code works? Because you want to make sure that you're not harming your users, and that you can move fast with confidence, etc.

[00:29:56] Itamar Friedman: And there are other aspects that you need to consider if you want to do that. But anyway, now back, to being

[00:30:03] Guy Podjarny: How do you trust it? How do you, and I guess, yeah, we should also talk a little bit about maybe the sources of information for it. But for starters, like if I'm a user, I'm generating a test, like

[00:30:14] Itamar Friedman: Yeah. now

[00:30:15] Guy Podjarny: I know it got it right? How,how does the system, how does the company or the organization know Those AI generated tests are trustworthy,

[00:30:25] Itamar Friedman: Great. so now to be concrete, because I think like the philosophy here is important, because We want to understand why we're doing what we're doing. Before, I'm going to answer you, I promise. In two points, but I do want to give you an example of what I'm talking about. What about doing testless testing?

[00:30:42] Itamar Friedman: What about taking the specification that you want to have, for example, let's say that a certain technical product manager did describe really well what you want to achieve in the software, or a developer, but just for the sake of example, a technical product manager described what you concisely, but actually precisely with high accurate, elaborated, specification.

[00:31:06] Itamar Friedman: What about taking the code, and what you actually generate are not tests. You take the code and generate spec out of it. And then you do spec matching. So the process is generating spec that is the same format of the spec that was generating by the technical product manager, and then running a matching algorithm.

[00:31:27] Itamar Friedman: Could be AI, could be Gen AI. We can talk about it. And,and tell you what, where's the mismatch. Hey, here's test testing. But not the old way, but the new way that is enabled thanks to generative AI. So that's why it was important for me to go to the essence of why we're doing testing at all. We want to check that the software works according to our requirements, according to our best practices, and not just okay, this piece of code, pass the test.

[00:31:53] Itamar Friedman: Okay. Now, let me be concrete. I think, there, there are, two things that we need to double click in order to test, your testing. One is that usually in most cases, in order to, for an AI system or AI assistant to generate meaningful tests. Usually, it's not performed by gathering a bit of context and doing one inference call.

[00:32:19] Itamar Friedman: Usually, there is a full flow, real product that actually works, or works,sufficiently well for you, to be useful for you. They, you're using a full flow. And by the way, I would love to double click about later, where we released,CodingAI released, a couple of, these flows as open source, and we even reached state of the art code generation, be doing better than professional competitors on coding competition, doing better than DeepMind, OpenAI, etc.

[00:32:47] Itamar Friedman: And the concept that was different than other competitors, like DeepMind and OpenAI, that we designed a flow. Okay. Back to my point, to actually generate good tests is designing a flow. And then what you can do is surface as a person that developing this, these tools, you can surface,the decision that were made as part of this flow and as a user, as a developer, you can review, this,the process that was done on the path to generate this final test.

[00:33:17] Itamar Friedman: And this gives you more confident, on you. You can gain confidence by reviewing the process that was done on the process. Let me give you an example. I'm going to give a simple example to make it. Make it short and clear. For example, as part of the process, the system might

[00:33:32] Itamar Friedman: look for other mocking, that, that is already implemented in and trusted because they run as part of the CI CD or whatever. and choose to use them as part of generating those tests. As a developer, if I see that this mock was not generated right now, it was actually taken from another piece,of testing, another test suite, that is, I know that I trust, and then I'll probably trust this new test that's based on that.

[00:34:05] Itamar Friedman: And we like it or not, like I mentioned, testing is, if you do perfect testing, you know exactly what to cover, and you generate tests, Perfectly, this is solving software. So it's not easy. Okay, so it's not there's isn't a magical answer Yes, I trust AI because it gave me a confidence number or because it did it five times correctly.

[00:34:27] Itamar Friedman: No, it's the same process that you would do with a developer. You would see the syncing process, you would see the PR description, you would see the, and then you learn to trust this,this output. And that's number one. And the second thing is that relating to the intent. If you see some connection, like you see the test and, and the AI system shows you this test was generated because here's the intent that I'm relating to.

[00:34:50] Itamar Friedman: And here's how the implementation, actually covers this intent. I think this is also will give you confidence. So these are my, I think the main two points that, that you need to look in when you're choosing a solution or as a developer, to developing these systems,

[00:35:06] Guy Podjarny: I think,those are useful kind of, steps towards it. They include, first of all, they rely on attendant.

[00:35:12] Itamar Friedman: I'll use the opportunity to say that, of course, one more thing that if the tests run, if it compiles, if it doesn't break anything, and, by the way, this is what Meta released and their work,test, Test LLM Gen, sorry, I have a blackout, I think that's how they call it. And we reproduced their work.

[00:35:32] Itamar Friedman: They didn't release open source for it. We released a work called Cover Agent. Test Gen LLM, sorry. where their way to provide you as a developer confidence that the tests are good, two..., they have more but two, two parts. that the tests build and run, and that increase code coverage.

[00:35:55] Itamar Friedman: Okay, but notice that this works only for a regression test. The fact that test builds and runs doesn't break anything in your system and actually increase coverage is only for a regression test, okay? So there's more tactical thing that you can do paired test type. Regression unit tests, you can do, for example, what Meta said, but in high level, it's what I said, showing you the process and showing you that you actually did something meaningful, like increasing code coverage, you increased the coverage of the intents, etc.

[00:36:28] Guy Podjarny: Yeah. And I think they're once again, the different layers of it, right? the first part of it is agreeing on what needs to be tested. And you gave me the example of the spec, but really that spec itself needs to be verified, and back to an attended process. Someone needs to look at that and say it's correct.

[00:36:44] Guy Podjarny: And then subsequently you need to generate the test and see that it passed. And so code is a little bit better maybe than other gen AI domains, because,you're able to run the code itself and see what has occurred, so at the very least, that it's functional, I guess there's still a leap though, about, okay, what do you need to test?

[00:37:03] Guy Podjarny: And you've generated a test that runs, but does this test actually confirm that the, requirement, that I've defined, is correct? And then subsequently you talked about the implementation piece of it, and maybe, delegating some of that trust by saying, Hey, there's an already trusted mock, for instance,

[00:37:24] Guy Podjarny: and so that's more like I've probably manually attended, verified that is correct. And therefore, I am building, on top of that. And I'm able to, to trust that is,that's correct. And all of that are even ahead of the,the broader lens of are these tests, like find coverage is one measure of it, but are these the right tests to be running?

[00:37:48] Guy Podjarny: so you, cause you can define sort of an intent. I guess there's, those are all compelling and there's still a lot that is saved. And at the end of the day, if you have all of those, then you're able to develop with confidence, whether you're developing with AI or manually or whatever, you are, however you're doing it.

[00:38:04] Guy Podjarny: Yeah. But there's a lot of attendance process over here, right? There was a lot of verification. Yes, there's human errors, but there's also, the, the goal here was, to take the tedious parts away, to be able to generate these things, we haven't even opened up the sort of the maintenance piece of it.

[00:38:21] Guy Podjarny: How, what's the path to autonomy? What's the, if we're thinking a little bit further out and we say, This is the attended test generation process that happens today that saves you a lot of time,

[00:38:33] Itamar Friedman: but still requires a lot of human attention. maybe with the exception of regression tests, which are,making the leap of faith that the current way of running is correct.

[00:38:43] Guy Podjarny: And so you can have this external,source of truth that is scalable. It's the existing, system. what's the path to, to more autonomy in the world of test generation?

[00:38:56] Itamar Friedman: Okay. okay. So let me, let me first, answer a question about autonomy, but actually it relates also to other, what we just talked in the previous question. So I think, despite like amazing videos that we saw on X, Twitter, amazing, demos that we see, I don't know, on, on stages, etc, practically most of this, agents that are autonomously, generating some software, given a certain intent.

[00:39:24] Itamar Friedman: This piece of software they're generating is very simple, I would even claim that most of these demos, you probably have a certain repo that you can find on GitHub that does very exactly that or very similar. So you can probably in five minutes do the same thing if you

[00:39:37] Guy Podjarny: It's more recall and then the,

[00:39:39] Itamar Friedman: And by the way, maybe it does transform it a bit to it's like almost like recall, like the right piece of code it was trained on, but also transforming into your specific requests.

[00:39:49] Itamar Friedman: So there is some magic, there is some good thing about it, but it's not close to even like slightly close to generate anything useful for enterprise, okay? At the same time, I claim that in five years, we will see an AI developer in enterprise. I'm calling that, that demo that we're seeing, like an AI developer, okay?

[00:40:10] Itamar Friedman: sometimes some people call it an AI engineer, but that might be confusing. An engineer is something tweaking AI for, so an AI, autonomous AI developer. We will see something for that, but only in, three, five years, the earliest, and here's how I think it's going to happen. I think we're going to see a first AI tools, being, integrated into different parts and already in the existing,software development,tool chain, processes, workflows, for example, our PR agent

[00:40:40] Itamar Friedman: is connecting to your GitHub, GitLab, Bitbucket, etc. Sorry for the shameless plug here, but it's not really what I'm saying, and it helps you already reviewing your code, verifying best practices, etc. And then, for example, you have another tool that connects your ID, helps you do, the unit testing.

[00:40:57] Itamar Friedman: And then another tool connecting to your CI CD.and helping,choosing the test. And then another AI tool, connecting to your builds and helping you analyze the logs of failures. And once you have

[00:41:10] Guy Podjarny: These are all AI assistants, right? these are all, they're not autonomous.

[00:41:15] Itamar Friedman: I'll keep it right now in the AI assistant realm.

[00:41:17] Itamar Friedman: Although I think we can say that they could be also AI agents. But what I'm saying here, that even if they're agents, they're going to perform very small, yet critical, sub process in the overall workflow of software development. They might actually be agents. They might actually choose for you automatically which test to run.

[00:41:39] Itamar Friedman: They might even analyze the build error and try to suggest you an option to fix it. Fix it, build again, see that it passes, etc. So it might be agents, but we're gonna have like many of these flows. Again, I'm referring to what we call flow engineering as the way to develop useful agents and then suddenly maybe sudden moment, maybe not.

[00:42:08] Itamar Friedman: When we cover like a certain portion, 80%, 90%, I don't know, maybe it needs actually to be 95 percent of this sub processes, sub workflows. With AI integrated in them, suddenly you can build an additional agent, an additional AI, automatic AI developer that will work with the, I don't know, technical product manager, will get and work with that manager on the intent, like formalizing, perfectionizing, to some extent, the intent, and then that agent would be able to generate that software because the guardrails agents, the angels guarding angels, the other AI that were worked really hard to integrate to the right databases, to the right frameworks, to the right best practices, to the right, etc.

[00:43:01] Itamar Friedman: Can have the feedback loop and perfection like that. That's how I, that's one option. And another option is to rethink

[00:43:07] Guy Podjarny: so it's,so this is, before we go to the other option is this

[00:43:09] Itamar Friedman: leaving that to you,

[00:43:10] Guy Podjarny: this is much more so I guess what you're describing here is, I don't want to say incremental, but it is about get better, help us as an industry, or let's, as an industry get better at doing every one of these bits,

[00:43:26] Guy Podjarny: as they work today, like the code completion stuff and all that, that will generate code that better interprets intent. The code review will work, better at spotting mistakes on it. The cover agent and, and it's similar, tools will help us test and identify, whether a change has broken on it, Codium AI itself would help us build sort of functional tests around that, but within sort of an intended fashion and then,

[00:43:51] Guy Podjarny: and analyze logs. And so we'll build all of that sort of journey of, of these different steps getting better and better, and then eventually when all of those work together, then we have enough confidence to build something. And that's tricky because when you build a lot of these different pieces, if each of them works at a 95 percent accuracy, if you chain 10 of them, it's going to be broken half the time, right?

[00:44:12] Guy Podjarny: Like whatever the multiples, work over there. But it is, I guess it's the easiest one to, imagine, right? Like it's Let's just get better at each one of these steps. And then at some point it's Hey, all I've done is I've,it, when I find myself every time just saying, yep, you're right.

[00:44:27] Guy Podjarny: Yep. You're right. You're right. At some point it's okay, just assume you're right and proceed. and I'll fall back if it seems like you were wrong this time. Am I describing the process like this path of evolution you talked about is the, I don't know if it's a linear path, but it is the gradual path,

[00:44:42] Itamar Friedman: Yeah, I agree with you. and by the way, I also agree about the disadvantages of this approach. You know,we can claim that it's broken or not, fully exploiting AI. So you're actually just like trying to do the best you can with what exists, and, the other option is to rethink it, but it actually can be like a pass that will not converge eventually.

[00:45:04] Itamar Friedman: If there is some, process, some sub process in the workflow that AI won't be able to overcome, and that's, if that sub process cannot be overcome or it cannot be automated with an agent, then you won't be able to pass this. You will still have some resistance to actually move to the autonomous AI developer.

[00:45:25] Itamar Friedman: The advantage of this approach is that, look, it's probably really freaking hard,for an enterprise to say, Hey, we're throwing all the dozens of tools, if not more, that we're already using and the processes and the workflows and everything and to change to something completely, completely new.

[00:45:44] Itamar Friedman: I'm not saying it didn't happen, and I think we can find cases in the past, especially if that thing is 10x, but it's hard. So that's a disadvantage on, sorry, that's the advantage of taking this approach. And by the way, one more reason, and it's tricky when I'm going to say, but I think we're might a little bit exaggerating with what AI can do right now.

[00:46:07] Itamar Friedman: Okay. And then, maybe it's more feasible actually to take like small parts and in the current workflows and trying to improve that.

[00:46:16] Guy Podjarny: Yeah, and get real value,get, it's not, the fact that it doesn't jump to autonomous doesn't make it not valuable, not by a long shot. Like these tools, Coding AI today, the code completion, like they are massively valuable today. They make people more productive. So they're very valuable in their own right, which is another advantage of the gradual path, which is you're already going to be doing it because there's immediate benefits to be had.

[00:46:38] Itamar Friedman: Yeah. At the same time, again,A, it might actually maybe the path to autonomous, like needs to be real, like we need to rethink and I can think about a few points that maybe

[00:46:50] Guy Podjarny: so let's talk about that. Let's let's, so let's maybe close off the podcast the other way around and say, okay, let's say we didn't increment our way to it, but we managed to throw the thing away, and reimagine the software development process and maybe let's still put a certain lens on the sort of the testing or verification aspect of it, but how would we do this better, if we were to, to reimagine the process.

[00:47:15] Itamar Friedman: Yeah. So I think like you see my, repeating notion here that we want testing, whether, especially if it's not regression, if we want to test new code is. I think it's all about the intent. Yes, I agree with you, like what we talked if we close the loop with what we talked about a half an hour ago that eventually Gen AI gives you the opportunity to do the last mile of I know what I want to test,

[00:47:45] Itamar Friedman: and then it's actually like I have 1000 of these and I couldn't do that without Gen AI, but I still think that in most cases, actually one of the hardest problem is to start with having the right, having a good intent, having a testable intent, having an intent, a spec that is already anchored in and, with the current code.

[00:48:12] Itamar Friedman: I'm developing new code and my new

[00:48:14] Itamar Friedman: specification is already anchored while I'm writing it, while I'm in an event, inventing it, while I'm creating a new user story, etc. It's already anchored in the language in, in, in the way I describe it as a, for example, the technical product manager to my code.

[00:48:31] Itamar Friedman: So to rethink, everything, the software development, also in, with the spectrum and focusing on testing and verification, it's a lot about the intent. Working with AI and reinventing how we as the R&D and product as a whole, thinking about how do we write our specification, how do we define our intent, how to exploit AI to already create a PRD or a spec, a technical spec, or a product spec that could be more easily already to start with, work, exploit AI to then generate the code and generate the test and generate a review and etc.

[00:49:17] Itamar Friedman: If we rethink that part, then we would make it easier for AI then also to complete the code and complete the testing and complete the verification.

[00:49:26] Guy Podjarny: Yeah, and I think, clearly I'm aligned, right? When we talk about AI Native software development, we think about sort of spec centric software development and how that centers it. Today, like you, you mentioned at various points, this notion of looking at the code and first understanding, maybe it comes from that PRD from someone who's written it.

[00:49:44] Guy Podjarny: Maybe it comes from the understanding of the code and create an understanding, create a spec of sorts, in the brain of CodeMAI, somewhere in the process, that says, this is how it works. But in today's software development process, the tests are that specification, right? There is no interim.

[00:50:03] Guy Podjarny: There's no at least commonly used interim representation of those intents. They come hand in hand with the tests of the implementation as well.

[00:50:14] Itamar Friedman: Good for me because that's related to something I wanted to talk about like previously and then it's Connect. Okay,There's no one way to, to test things. Like we said, like ideally tests are meant to verify that the intent is actually being implemented, but actually because there's no like formal way to do that in, in, in software and in most cases, like unless it's like an algorithm or so that can prove mathematically, there's a few ways to, to do that and different companies and different teams have opinions about it.

[00:50:44] Itamar Friedman: So a lot about, like generating, tests, and giving confidence that these tests are good ones is actually generating those tests according to the best practices that were given by the developers. So one of the things that, that, for example, is enabled in Codium AI, PR Agent, and as an example, is that a team can give a markdown,

[00:51:06] Itamar Friedman: Hey, here's how we would like the testing style to be, to be in this repo. And this is how the testing style would like to be in that repo. Once you make testing, and once you enable an AI dev tool to generate tests, work according to your best practice of working according to your test style,

[00:51:27] Itamar Friedman: and then if that tool works well enough to quite easily, I'm not saying with the click of a button, but much, 10x, 100x easier than it was before to generate tests, then yes, tests be your source of truth for the spec. Some CTOs that I talked to told me that their preferred documentation is actually working tests.

[00:51:53] Itamar Friedman: So that's aligned with what you said, like documentation as

[00:51:56] Guy Podjarny: there's a whole TDD world that really talks about tests being at the center of the universe.

[00:52:01] Itamar Friedman: Agree with you. but like I mentioned, again, it's it's not magic. It's not like just, okay, AI will generate those tests and here's my new spec. No, we need to build a tool that generates those tests according to a specific style that your dev team and even your product managers, technical one, would consider as a spec.

[00:52:21] Itamar Friedman: And then if we can make those test,executable specs, which are tests and in your style and your,an easy for you, then we can actually rethink, rethink specs, executable specs.

[00:52:34] Guy Podjarny: Yeah. Yeah. No, clearly aligned in our view over there, and, it'll be interesting. There's a lot to probably unravel that we might do in sort of future conversations on the evolution of those specs and how do they change and how do they relate to data? Itamar, this has been fascinating.

[00:52:49] Guy Podjarny: Thanks a lot for coming on, for sharing. I do encourage everybody to check out. We didn't talk too much about Codium AI's products on it, but there's a bunch of good, free tools out there in, Cover Agent, PR agents, and I guess actually most of the solutions that you have tomorrow are, have a free tier of, a way for people to, to experiment, to try out with,

[00:53:08] Guy Podjarny: and then expand into, into a premium version, right?

[00:53:11] Itamar Friedman: Correct. our ID agent, which we call Codemate, our Git agent, which we call PR agent, like our Git plugin, our ID plugin, our CLI plugin, our, they have the Codemate PR agent, cover agent names, that they're all either, they're all free and some of them are even open source. We have just one component that we didn't yet find a way on how to move it towards the free, which is our full code base analysis, like we, you can give, the Codium AI platform. You can provide it like 10,000 repos and we run like different analysis, and we do want to shift some part of it into the free, offering, but it's still on the work.

[00:53:52] Itamar Friedman: Yeah. it's okay. You have to pay the bills and some,Yeah.

[00:53:55] Guy Podjarny: I encourage everybody to try it out and tiptoe their way into this, into this brave new world of AI generated tests with the, it's great promise and, and some complexities and some adaptations we probably need to do as humans.

[00:54:06] Guy Podjarny: Uh, Itamar, thanks for coming in to share all this, insight and knowledge with us.

[00:54:11] Itamar Friedman: Thank you again. It was a pleasure. I'm happy to, we talked mostly about the interesting stuff. Codium AI is. is one tool of, I think, exciting, one set of tool, one platform of the exciting, intelligent coding system that I think we're going to see in the future. and then thinking about how we as developer imagine that, I think that's what exciting.

[00:54:28] Itamar Friedman: And then the tools are here to just enable that.

[00:54:31] Guy Podjarny: Awesome. And thanks everybody for tuning in and I hope you join us for the next one.

[00:54:36] Itamar Friedman: Thank you.

Podcast theme music by Transistor.fm. Learn how to start a podcast here.