Enterprise AI Solutions Need to be Different - Glean and ex-Slack CPO, Tamar Yehoshua, on RAG, Changing Behavior and Bring-Your-Own-Model.

September 10, 2024 Simon Maple

In this episode, Tamar Yehoshua, President of Products and Technology at Glean, shares her journey through top tech companies and dives deep into the complexities and innovations of AI in the enterprise. Discover the dual nature of Glean's AI platform, the importance of security, and the future of AI-assisted work.

See this content in the original post

Episode Description:

In this episode of AI Native Dev, host Guy Podjarny sits down with Tamar Yehoshua, a seasoned tech leader with an impressive career in engineering and product leadership roles at Amazon, Google, Slack, and currently, Glean. Tamar shares her journey and the innovative work being carried out at Glean, an enterprise AI platform. The discussion delves into Glean's dual interface for search and chat, the critical importance of security and privacy in AI solutions, and the architectural insights behind Glean's RAG-based solution. Tamar also sheds light on the evolution from AI to generative AI at Glean, the challenges of building AI solutions for enterprises, and strategies for user adoption and behavior change. Looking ahead, Tamar envisions a future where AI assistants handle repetitive tasks, allowing employees to focus on more creative and high-leverage activities. This episode is a must-listen for anyone interested in the future of AI in the enterprise.

Resources:

Glean Official Website: glean.com

Chapters:

[00:00:22] Introduction - Guy Podjarny introduces Tamar Yehoshua and highlights her illustrious career.
[00:01:08] Tamar Yehoshua's Career Journey - Tamar discusses her roles at Amazon, Google, Slack, and Glean.
[00:02:48] The Dual Nature of Glean: Search and Chat - Discussion on Glean's dual interface for search and chat.
[00:05:56] Security and Privacy in AI Solutions - Tamar emphasizes the importance of security and privacy in Glean's AI solutions.
[00:09:32] Architectural Insights: RAG-based Solution - Insights into Glean's retrieval-augmented generation architecture.
[00:14:48] Evolution from AI to Gen AI at Glean - The journey from AI to generative AI at Glean.
[00:21:32] User Adoption and Behavior Change - Strategies for user adoption and changing user behavior.
[00:27:52] Challenges in Building AI Solutions for Enterprises - The complexities and challenges of building AI solutions for enterprise environments.
[00:43:41] The Future of AI in the Enterprise - Tamar's vision for the future of AI in automating tasks and enhancing productivity.

Full Script

[00:00:22] Guy Podjarny: Hello everyone, welcome back to the AI Native Dev. Today we have a really exciting guest who I've known for the past five, six years now, and, I've had amazing conversation with and that's Tamar Yehoshua, Tamar, thanks for coming onto the show.

[00:00:35] Tamar Yehoshua: Thanks so much for having me.

[00:00:36] Guy Podjarny: Tamar is, is amazing in various ways, and had an illustrious career, right?

[00:00:41] Guy Podjarny: maybe put some highlights from that, from that sort of, engineering and product leadership career, which include, being a VP at Amazon or A9, being a VP at Google, notably on Search, as part of the journey, which is relevant to what you're doing today. You were Chief Product Officer at Slack, where we met because you also became a board member, at, at Snyk at the time.

[00:00:59] Guy Podjarny: and now you are the president of products and technology at, at Glean. is that right? Am I missing anything important in the, in the journey?

[00:01:08] Tamar Yehoshua: That's awesome. And the most important part was being on the board at Snyk where I got to meet you.

[00:01:13] Guy Podjarny: Yeah, this was definitely a fun, fun highlights times on it. And, and got conversations that span well past the,the Snyk context. Today,as we get going, we'll talk about, AI in the enterprise. We'll talk about building products with AI context. we'll talk about figuring out how to,how to wrangle them, in the first place.

[00:01:30] Guy Podjarny: Maybe start us off by just explaining a little bit, what does Glean do?

[00:01:34] Tamar Yehoshua: So Glean is an enterprise AI platform. What we do is we read the content of all of your SaaS applications, we ingest information from Microsoft Office 365, from Google, from Slack, from Salesforce, from JIRA, and make it easy to find information across all of your SaaS tools.

[00:01:53] Tamar Yehoshua: And also to ask questions in natural language. So you can think of it as ChatGPT for your enterprise and ChatGPT, you ask questions about the world. And in Glean, you can ask questions about your enterprise proprietary information within your enterprise. And we also have a platform that you can access all of this information.

[00:02:09] Tamar Yehoshua: We call it the knowledge graph of your enterprise, through our APIs and have a platform also for building no code applications. So anything that you want to do, that's building AI on top of all of the information that's stored in all of these different SaaS apps, you can do through Glean.

[00:02:27] Guy Podjarny: Yeah. So super interesting. And the breadth here is also interesting. So you're, first of all, just clarification. Is it more about discovering, a piece of information that's scattered somewhere across the, the, the enterprise systems and linking to them, more kind of search orientation, or is it more around you actually processing and understanding those things and bringing back some summarized answer for it?

[00:02:48] Tamar Yehoshua: So both, so you can use it for both. So let's say I'm going on a customer call and I want to know who's the account executive for it. I can ask Glean really simply. Who's the AE for this account? Or who's the CSM? I can also ask what were the latest outages? I can say, write me a brief, a customer brief for this, for this meeting.

[00:03:09] Tamar Yehoshua: And then it'll go through everything. It'll go through what were the latest meetings? What's the status of the account in Salesforce? What's the, what were the outages that I should know? And it'll put that together, I have to write the prompt that will put it together in the way that I want to consume it.

[00:03:24] Tamar Yehoshua: And it will summarize that and give me the information. I do this often, like any prospect, tell me about the prospect, what stage is it in? What should I know? And now I don't have to have a CSM write a briefing for me, I can do it myself. So that's an example. So it does both of any kind of capabilities that ChatGPT would have, we can do through Glean and we're keep on adding functionality as well to be able to do better at summarization and insights.

[00:03:52] Guy Podjarny: Got it. So it sounds like it is primarily the ChatGPT, like you've used that analogy. So it's more the ChatGPT than the Google for your

[00:03:59] Tamar Yehoshua: Yeah, so we have both interfaces. So we have an interface that looks like Google search, but over your, your enterprise. So we have two different, the search enterprise and the chat interface. And then we have what we called AI, AI answers. So think of Google introduced AI overviews recently, where you can get an AI overview in Slack.

[00:04:18] Tamar Yehoshua: We have the equivalent of AI answers in the search interface. So that kind of brings them together as well. And then we also have the equivalent of custom GPTs. that you can do on ChatGPT for a specific use case. Those are called Glean Apps.

[00:04:34] Guy Podjarny: Got it. Andthey extend and personalize the, I guess kind of the system prompts or the ability to,to represent some subset of the enterprise data and in some form of interaction, like ChatGPT.

[00:04:47] Tamar Yehoshua: Exactly, so a good example of a Glean App is most of our customers want to build an HR app to say these are the canonical documents for HR. So that let's say somebody wrote an answer in a slack channel to a question that was wrong. It wouldn't pick that up, but here are all the HR apps. So here's your knowledge base you should be looking at.

[00:05:05] Tamar Yehoshua: Here's when it should trigger. Here's the kind of, prompts that it should use, and it's specialized for that. And Glean will actually auto route. So if you ask a question in Glean and say, what's our vacation policy, it will auto route to the Glean app, the HR app. And then you can also, the HR team who built it, so Glean app can build actions to actually file your vacation through that Glean app.

[00:05:28] Tamar Yehoshua: And what's nice about it is that it is no code. So an HR person who does not know how to code can build a Glean app.

[00:05:34] Guy Podjarny: Interesting. So I guess we'll come back to the, to the building piece. Cause I think that's really interesting. the search angle of it, just to say I guess the, maybe this is still the Snyk DNA piece of it, but like immediately I get a little bit worried on the security front of that, it implies a system that is very knowledgeable and you're telling me stuff about a client, but how do you know if I'm allowed to do that.

[00:05:56] Guy Podjarny: And I think that's a thorny domain at the moment in, in the, in the world of AI, I guess, how do you, how do you think or handle compartmentalizing who should have access to what data?

[00:06:09] Tamar Yehoshua: So that was probably the most important thing that Glean did in the early days. It took security, privacy policies really seriously. So it didn't start by search and then we'll figure out the privacy later. It started privacy first from the get go. So we have over a hundred connectors to all your different SaaS apps and we understand the privacy and governance of each app.

[00:06:33] Tamar Yehoshua: And we, understand the metadata that tells us, like for Slack, we know which channels are private, public, what are private DMs. For Google Docs, think of it, you've got a Google Doc, and let's say you share it with me, with the permissions, anyone with the link can see it. But now my neighbor over here, you didn't share it with them.

[00:06:55] Tamar Yehoshua: So they are not supposed to see it. Glean needs to know that I can see it, but they can't. And so how do we know that? We know because we can see your email and your Teams conversations and your Slack conversations. We can know if it was shared with you. If we don't see that it was shared with you. We don't let you have access to that.

[00:07:13] Tamar Yehoshua: So we really, understand the different policies for each of the SaaS apps, and we make sure that we abide by them. And it's very rigorous in what we do. But now as you, like you, with your CISO mind, we'll say. Wait a minute, maybe I don't want everyone in the organization to have access to all information that is not locked down because there might be some people who created a document and didn't do the permissions right.

[00:07:41] Tamar Yehoshua: It's security through obscurity. You never would have found that document, but now with Glean, you can find this. So I can say, when's the next RIF? And if there happens to be a document about a RIF that somebody in HR didn't get the permissions right, I would find it. So another thing that we've done is we have AI governance modules that we will enable a company before they launch to assess if they have documents that shouldn't be open.

[00:08:08] Tamar Yehoshua: And we will flag certain types of documents for people to review. And we will enable a company to redlist or greenlist. Like you can redlist whole directories or whole types of documents to say no matter what the permissions are, we never want these to show up in Glean.

[00:08:24] Guy Podjarny: Yeah. So how's the, I mean, be doubtful a little bit here for a second. So like in LLM world, when you feed the LLM knowledge, if the LLM understands or is aware of a RIF, generally the perception in the security world is you can convince it to tell you, there isn't like the ability to, it's one thing to define the policy,

[00:08:48] Guy Podjarny: which is a challenge in its own right, which I guess is easier for me to grok how you would do that. And also maybe a human will verify that and confirm it. But in the moment requests, how, who enforces these policies? And is it LLM? Is it the same Gen AI systems that know this?

[00:09:07] Guy Podjarny: Are there. What's

[00:09:08] Tamar Yehoshua: I know,

[00:09:09] Guy Podjarny: the secret sauce here? How is this, how is this compartmentalized in a more assured way?

[00:09:13] Tamar Yehoshua: So I have to explain the architecture more. So we do not fine tune models with your enterprise data because you're absolutely right. If you fine tune a model with your enterprise data or you do a large context window with all of the information, the LLM is just going to know that information. And that's why it's so hard to do this in the enterprise.

[00:09:32] Tamar Yehoshua: So it is a RAG based solution and all of the enforcements are based on. The, the search, the retrieval part of Rav. So we only, so the way that the architecture works in the, for the assistant in the natural language query, you put in a query in natural language. We call the LLM for the planning phase, where we take that query and we translate it into search queries.

[00:09:55] Tamar Yehoshua: Then we create our retrieval engine and we get back the relevant snippet. So this is just.It's finding what are the right documents based on your permissions. And then that is the context window that's fed into the act phase. And now we call the LLMs in order to generate an answer.

[00:10:15] Tamar Yehoshua: So that information, if you didn't have the right security for it to see, the information will never get fed into the LLM. And so that's how we secure it and ensure it.

[00:10:26] Guy Podjarny: Yeah. And I think that makes sense. And I think aligns with the best practice that, that I'm aware of in the world of security, which is you want to pull in the data from a bit more, bread and butter systems or traditional systems, if you will, with authorization systems in which it says, Tamar is allowed to see this and Guy is allowed to see that.

[00:10:44] Guy Podjarny: And then, yeah, you can massage and present and process that data,in the right fashion, for it.

[00:10:51] Tamar Yehoshua: And the other thing that people are very worried about, the security in organizations, it really is sensitive because we're, we are indexing all of your organization's data, is we actually allow people to host it in their own environment. So in their own GCP project or AWS project, so it doesn't leave their cloud premises.

[00:11:13] Guy Podjarny: So what type of, limitations do you need to take on because you don't train on the customer data and what could you do, and this is, I'm thinking about this a little bit with the, with the mindset of code and I'm thinking, indexing on your code base is a topic we've already discussed a few times in the podcast,

[00:11:35] Guy Podjarny: and there's elements, all sorts of pros and cons to it. There was a certain amount of, if you brought in bad code, you might replicate mistakes and things like that, but generally there is, it's a scary proposition for a lot of organizations because they will be giving one of these giants access to their code and they're afraid legitimately or otherwise that their data will be there.

[00:11:56] Guy Podjarny: But if you don't do that, there's a lot less, or at least my understanding is there's a lot less you could do in terms of feeding that insight of your code into the code generation. is that true over here? What, what is sacrificed?

[00:12:09] Tamar Yehoshua: Yeah, I think the use cases are not as important in this case. So you could say, I want to make sure that all of the tone of, any documentation or any marketing collateral, anything that comes out of Glean is consistent across everything that everyone is doing. So you can still have as input, guidelines, and that can be part of your input.

[00:12:32] Tamar Yehoshua: And so you can create a Glean app that says, here are guidelines of how we want it to be, but then you're, the user is going to have to more explicitly do that and say, follow these guidelines as opposed to Glean understanding without the user telling them, here's what I would like to do. But what I find is in those kind of cases, like marketing will say, these are the marketing guidelines or engineering will say, here's the template for the design doc and product will say, here's how we follow the template for PRD.

[00:13:01] Tamar Yehoshua: So it isn't as much of a problem in practice because you don't ingest all the code and find the style of this organization. So it's not as it hasn't come up as an issue.

[00:13:12] Guy Podjarny: Yeah, it's interesting. It's a, it feels a little bit like you're not trying to regenerate things in the style of the data that you encounter. You're trying to find it and interpret it, but you're not trying to create a new customer record based on this conversation, because of this, and therefore you want to have, processed and digested all the previous custom records to doing it. You're trying to understand if you want to get a custom record in a standard fashion, then someone can build a Glean app that defines what is that fashion. How do you want to see that? And then you can find the right data to populate into it.

[00:13:44] Tamar Yehoshua: Exactly. And that's going to become really interesting with the Glean apps of where are the ways that we can automate. So once you get actions, then you get much more automation, much more productivity savings, because you can ingest all these things, so for example, you can ingest all of your Gong calls.

[00:14:02] Tamar Yehoshua: So one thing that we built a, the PM team at Glean built an app to look through Gong calls. Tease out the information in the call, put it in a spreadsheet and then be able to assess what are the most frequently asked requests from customers. So like this is like multi step super interesting examples, but you don't need it all fed into an LLM But I need to, it's more specific use cases and that's where things get like really interesting.

[00:14:32] Guy Podjarny: Yeah, no, I can see the power in it. And so still a little bit on the sort of the Glean journey. And we'll talk about the sausage making and how, I guess we got a little bit into the sausage making here in architecture, Glean, predates, I think, Gen AI, right? I think it was founded, and it was an AI company at the time as well.

[00:14:48] Guy Podjarny: Can you tell us a little bit about, what was before, like what you're describing right now sounds a very Gen AI ish company. tell us a little bit about what was there before? Was this a pivot? Was this a natural expansion? How did that go about?

[00:15:00] Tamar Yehoshua: Yeah, so Glean started in 2019 It was founded by Arvind Jain. Arvind was an early Google search infrastructure engineer. So he was one of the people built the original search infrastructure for Google search. Actually worked with him and Google back in the day in like 2011, so we know each other from there and, and then we, he went on to Rubrik where he was the founder, one of the co founders of Rubrik.

[00:15:24] Tamar Yehoshua: And then at Rubrik, as they were growing, he found that productivity decreased instead of increased. And a lot of it was around finding information. And by the way, we see that with our customers too. They come to us and they're like, ah, we've grown. We can't find the information. We need to get our job done. So he's like, why is there not a good enterprise search solution? So that's how it started is building enterprise search. But because he came from Google, he knew that AI was being used in search at the time. Like BERT, the original models, the precursor to LLMs, were built by Google. Why? To improve search.

[00:15:59] Tamar Yehoshua: So they had been used since, I think, 2016, 17, in search, the BERT models, vector embeddings, were used to improve the quality of search. So from the beginning, Glean was an AI search company. It was using BERT for its models. It was building vector embeddings before people were talking about vector databases.

[00:16:18] Tamar Yehoshua: They fine tuned their own models per enterprise to build the vector embeddings. So this is why Glean's enterprise search was so much better than anything else on the market.

[00:16:29] Guy Podjarny: But this is, this was search. It was Google search for your organisational data

[00:16:34] Tamar Yehoshua: Exactly.

[00:16:35] Guy Podjarny: It was around AI, the same way Google uses AI, which is, I can connect the dots between whatever words that are actually similar. I can understand maybe a bit of context, so like all of these vector databases, the embeddings

[00:16:50] Tamar Yehoshua: was for semantic matching.

[00:16:53] Guy Podjarny: Got it. And I guess to an extent he built what would have been the Google Appliance that at some point came and went, for the enterprise that connects all these dots, but it wasn't generating answers. It

[00:17:04] Tamar Yehoshua: Correct. It was not,

[00:17:06] Guy Podjarny: to a source of information.

[00:17:07] Tamar Yehoshua: it was using AI as in ML. So it was using what was AI at the time, but it wasn't using generative AI. So it was just enterprise search, you put in a query, you find the documents. But it was a better enterprise search than anything else out there. The answers were just more relevant.

[00:17:27] Tamar Yehoshua: Then and started selling as enterprise search.

[00:17:30] Guy Podjarny: And then how was the transition? Was it a sort of run the alarm, ring the alarm, all hands on deck or switching to this sort of Gen AI thing? Was it a more gradual,let's try this out.

[00:17:42] Tamar Yehoshua: It was, GPT 3 came out, and because we already had an AI team, people were keeping up with things, so it wasn't like a surprise, right? And GPT 3 came out and said, we should have a natural language assistant interface into Glean. And there was a war room, get everyone together and build the assistant.

[00:18:04] Tamar Yehoshua: And then it was build the, with the architecture that I described, it was put the generative aspect into the product. And then they started, it already always sold, enterprise search, and then they added on the assistant. And as you could buy either search or the assistant, now we're one product together.

[00:18:23] Tamar Yehoshua: But this was, it was a very fast pivot. to let's add this assistant because now ChatGPT came out and every CIO was saying we need an AI solution. So then they can turn to Glean and Glean has an assistant. It's an AI solution. So it was, I wouldn't, exactly, I wouldn't say it was a pivot because all of the technology that was built for search was the R and RAG.

[00:18:51] Tamar Yehoshua: It's the retrieval and RAG was at a clean at a head start on everyone else because for four years had been building connectors, had taken understood privacy and security and had built the retrieval engine because the exact same retrieval engine is what that was built for enterprise search is being used for the assistant.

[00:19:12] Guy Podjarny: It feels like a totally different user experience though, and almost a different positioning, right? What you've described right now wasn't, Hey, we started with your, that's history. Now that's a piece of the puzzle and the interface, the front door, the way you engage with the product as a Gen AI product is that assistant is the composition, not just finding the data, but most of the time it's less about sending me to the authoritative source and more about,

[00:19:35] Guy Podjarny: digesting what you found for me.

[00:19:37] Tamar Yehoshua: So we were going through this transition as everyone was starting to get used to ChatGPT and doing some searches on Google, some searches on ChatGPT or Perplexity, Bard / Gemini. And it's the same thing. You can toggle between the interfaces in Glean. You can go to the search interface, you can go to the assistant interface.

[00:19:55] Tamar Yehoshua: So yeah, it was another product built on top of the same foundation. There's a foundation, how we deploy, how we secure, all of that, stayed the same. So it was the user interface on top. And of course, marketing had to change and the messaging had to change and all that. And there's a new team that's building the assistant.

[00:20:14] Tamar Yehoshua: So yeah, it was an expansion, absolute expansion. And then more recently, about six months ago, we added the platform on it as well. So each of these are gradual expansions to the amount of functionality and capabilities.

[00:20:26] Guy Podjarny: Understood. And I think, this is probably a good time to raise this. I have this hypothesis that I asked Des from Intercom as well in the previous episode, which is that when you think about AI, I oftentimes think about, tools across these sort of two axes. I think about a trust axis and a change axis.

[00:20:43] Guy Podjarny: So on one hand is how much does it, ask me to change, how I work. And on the other hand, how much do I need to trust it to get it right? So the more autonomous it is, the more I need to trust it to get it right. I'm going to put it, say, for example, in Fin, the Intercom support, agent, I need to put it in front of customers.

[00:21:00] Guy Podjarny: I need to trust that it gets it right. I can spot check things, but I can't have a human verify the results that defeats the whole purpose. And I guess my, my, my thesis is that, if your product is already text based, so it's already a search, it's already a chat, it's already,I guess chat has like the Slack version of chat, has the intercom version of chat, then at least the change vector, it's easier to introduce AI because, I guess those are more naturally the way that you would interface and it's easier to imagine a user on the other side.

[00:21:32] Guy Podjarny: Does that sound right to you? I mean, you've seen Slack, you've been Google, you're now at Glean on it, is it easier? Or I guess in general, how hard is it for people to go from accepting, you're going to point me to a link to somewhere else versus you'll summarize to me or even just,

[00:21:51] Guy Podjarny: the interaction, the fact that they are in chat mode versus in search mode.

[00:21:57] Tamar Yehoshua: I believe that behavior change is hard. People just, they get used to doing things in a certain way, even look at the retention numbers for ChatGPT. People try it, they play with it, and then they forget that they can use it. Because you're not used to using it, it's not in your daily routine. Change is hard, and it needs a lot of repetition.

[00:22:20] Tamar Yehoshua: And while I agree that if you have a text interface, getting a chat interface, because you're already talking in natural language, like with Slack you're putting in messages, and there with Glean you're putting in queries. But the biggest stumbling block we have is people understanding what they can and can't do with it.

[00:22:38] Tamar Yehoshua: So it's not that, okay, now I have a chat interface, it's what can I use this chat interface for. Search, if you think about it, when Google Search first came out with queries, people didn't understand it either. What's a query like? Now everybody knows what a query is. How to formulate a query, how to reformulate a query.

[00:22:55] Tamar Yehoshua: You do a query, you don't get the results, you refactor it and you do it again, and this is just like part of life and people got used to it, but it wasn't that way in the beginning. You didn't know how to put a query in to get the right answer. So we're going through that phase with the assistant now, is that people come to the assistant and some people understand AI.

[00:23:14] Tamar Yehoshua: They understand like the whole flow and like engineers understand how it's built, so they have a mental model of what it can and can't do. But then you put it out to people who have not interacted with chatbots before, with AI chat. Not, I'm not talking about customer support because customer support is such a different use case.

[00:23:32] Tamar Yehoshua: It's like very small set of questions that people ask as opposed to universe is open and I can ask anything. So we have people coming to Glean and being like, what should my priority be for next week? Where there's no way Glean could know that information because it doesn't have the information. And people don't yet realize these are the questions that can be answered.

[00:23:52] Tamar Yehoshua: So while I agree with you that transition is easier, when you go to the assistant, what you can do, what will work, and what won't work. Like the example I gave you of the PMs creating this way to analyze Gong calls, somebody outside of the product team at Glean,

[00:24:09] Guy Podjarny: about that as a possibility.

[00:24:10] Tamar Yehoshua: Yeah, they wouldn't even occur to them that they could do this.

[00:24:13] Tamar Yehoshua: But as we get more used to this, and I contend as the next generation grows up with ChatGPT and grows up with these products, it'll be like the generation that grew up with Google search. It'll become clearer and become more obvious. And the models will get better and they'll get stronger.

[00:24:30] Guy Podjarny: Yeah. Interesting. I find the, the decision that Google has made to,introduce the AI answers into the same interface, putting aside potential antitrust aspects to it, but from a UX perspective, it's a great way to engage people on it, because they put in the search, they get a thing, they get some answer, and yeah, they can just scroll down and get what they are used to getting.

[00:24:56] Guy Podjarny: But in the meantime, they've been exposed to that answer and they build those out. And I'm curious, Glean in theory could do the same thing, right? You could toggle, as you mentioned, between the interfaces, but you could also build something that's the same, that answer is at the top,

[00:25:09] Tamar Yehoshua: We, we actually do have that. We have what we call AI answers, and we actually had it before the AI overviews at Google. but we do exactly that. We have an AI answer, and you can click more, and then it expands down, and it'll get the full answer. And that is exactly that. We're careful on how often we trigger, because you want to make sure that it's going to have, it's going to be the

[00:25:30] Guy Podjarny: if it's a good answer, basically.

[00:25:32] Tamar Yehoshua: Yeah, because one of the challenges is also is when you go to search, you voice a query in a certain way. And when you go to the assistant, you voice it in a different way. So there's also the expectation of the user and how they're communicating. So you get better quality assistant answers when you understand you're going to the assistant to chat.

[00:25:50] Tamar Yehoshua: But we absolutely do that. A lot of the engagement with the assistant is through the search interface and the AI answers that they get in the search interface. And we're going to be doing more of that, more of automatically triggering one or the other. I feel like in the future, there won't be these two interfaces.

[00:26:05] Tamar Yehoshua: And the other thing that we do is that when you're in the assistant, we give you, we tell you which query we used to query the RAG, and we give you the documents that were accessed so that you can go back and you can click on those documents to go in. It's not only, there's the citations, but there's also, here's the list of documents that we used for the answer.

[00:26:27] Tamar Yehoshua: So in a way, you're merging both directions. You're making it easy to go from the assistant back to search, search to the assistant, but So in a year I think these all will get merged into one and you won't have one or the other.

[00:26:39] Guy Podjarny: Yeah, it's interesting. I find a lot of compelling aspects to integration like this, because the user gets the choice and you get to interact with it. And I keep coming back a little bit to that limitations question, which is, I think, a bit hard to tell and depends on how well people build those policies.

[00:26:55] Guy Podjarny: And I think it's also easier to, accept that in enterprise, in the enterprise, even if the, the breadth of knowledge is bigger than just customer support, it is still vastly smaller than the whole web, and that information. And so maybe it's okay to say that over here, the security and the element of controlling who can see what takes precedence and you lose some neural connections that might've been made that would have been non trivial, if you had fed all this information into the LLM, but that's okay, it's a fair trade off, to make.

[00:27:28] Tamar Yehoshua: Yes.

[00:27:30] Guy Podjarny: So I guess, let's talk, indeed, I kept promising, talking about the sausage making. Let's do that now. So I guess what is it like building a product that works this way? The results of the product vary by the data that comes in, the success rates are very different, and that's on top of the fact that the LLMs move at such a kind of lightning pace and so you get new models all the time.

[00:27:52] Guy Podjarny: And if I understand correctly, you actually even don't pick the model. You have to work with models that the customer does. So maybe tell us a little bit about how do you interact with the models or just to set that, but what I'm really interested is, you talked about LLM as a judge, you had a post about this, just how do you know that your product works?

[00:28:09] Tamar Yehoshua: It's very difficult. No, it's, the non deterministic aspect is the most interesting and challenging. My first week at Glean, I was talking to the head of the assistant quality team just to learn what do you do and how does it work and realize that a lot of their time was spent talking to customers, Who did not have the right expectations of what the, what it could do, or were complaining that it was non deterministic.

[00:28:36] Tamar Yehoshua: I did this query once, I did it again, and it didn't get the same thing. And we're starting to get used to ChatGPT and this, the concept of non deterministic. But in an enterprise, you're a CIO. You buy software, you pay a lot of money for it. You expect it to have the same answer every time. And so one is getting our customers comfortable with what LLMs can do and what the boundary conditions are is part of what we have to do in our product.

[00:29:03] Tamar Yehoshua: And it can't just be in the marketing and in the enablement. You have to have some way in the product to take care of this as well, to understand what are the challenges going to be for people. So I explained the RAG architecture. So we have our way of evaluating the search, which most of the team came from Google search ranking, built eval tools, just like Google search had.

[00:29:26] Tamar Yehoshua: So that kind of, that's like bread and butter, like how do you eval search? We have a whole process for

[00:29:31] Guy Podjarny: this is not the user side. This is internal. This is to

[00:29:35] Tamar Yehoshua: This is the.

[00:29:35] Guy Podjarny: whether your search is working correctly with information the engineers see. Your engineers.

[00:29:40] Tamar Yehoshua: This is our ranking. The ranking algorithms with a lot of information on exactly what was triggered and what wasn't and what scores it had. Now it is a little bit more difficult because at Google you could use third party raters to say, here's a change we're making. Is this change good or bad? Evaluate it.

[00:29:55] Tamar Yehoshua: But because this is enterprise data, we can't use third party raters. We can only use our tools and engineers to look at the data. So that's another wrinkle on top for enterprise. But, but going back to your question of what's different is understanding the user mindset when they're using this product, how do you help them through that to give them guardrails so they can better understand what the product can do and then how do you evaluate it and how do you make sure that it's working as intended with this new technology that nobody fully understands why it's giving the answer that it does.

[00:30:32] Tamar Yehoshua: Yeah. What are some examples of things that you would do in the product to, to help people understand this?

[00:30:38] Tamar Yehoshua: A big one is suggest. one of the things about Glean is that we understand your organization. So in the connectors that we do, we also have a Workday connector, Active Directory. So we understand who you are, who's, who are your peers, who's on your team. So we can suggest prompts that people in your team has been, have been using.

[00:30:58] Tamar Yehoshua: Oh, you're a PM, here's a prompt from another PM that the things that they've been doing that you might want to try. And so we can suggest or generic suggestions, we've been experimenting with all different ones, but that can help. guide people into areas of this is the right, this is ways that you can get value.

[00:31:18] Tamar Yehoshua: And that's going to be really important. And we want to do a lot more of that. And the Glean apps was a big way of doing that as well. Here's some more structured prompts and triggers. This is where, if you want to have an IT question, you'll go here. If you have a, want to build a customer brief, here's a clean app for building a customer brief.

[00:31:34] Tamar Yehoshua: So that helps people go to things that somebody has curated. Somebody like the 5 percent of people in the company who really understand how to work with prompts and LLMs, they're going to do that and they're going to help. And we're going to be doing more and more of that, and I hope that is a way that's a whole nother angle of how we are doing that.

[00:31:55] Tamar Yehoshua: And then we also have just teams dedicated to eval, and understanding what changes need to be made, what don't, how to evaluate new models. As you mentioned, we, customers can decide what model they want to use. We, validate the model. We, certify, I should say, if Gemini 1.5 Pro comes out. So we will certify it for our customers before we enable them to use it.

[00:32:19] Tamar Yehoshua: But we have let our customers pick OpenAI, Anthropic or Gemini for the LLM aspect of the work. And so that's another thing that's tricky also is working with the different models.

[00:32:33] Guy Podjarny: But how,so I understand it like the suggest notion or the idea of disseminating actually probably a useful practice for any product, not an LLM specific one, which is take the forerunners and provide an easy way to disseminate the, their sample uses to the rest of the organization. But on the other side, in terms of what happens if it fails, what happens if it, hallucinations are a thing in this world, so ranges from, it didn't.

[00:32:58] Guy Podjarny: Okay. Maybe you're good on the search side. So it finds the relevant data. You feel pretty good, but didn't understand it. Did it process it correctly? Did it present it correctly? How do you evaluate it when you certify? What types of tools are, at your disposal to know when you're using a new model or even just evolving your software, that it got better

[00:33:17] Tamar Yehoshua: So in, so first of all, in the product, customers can do a thumbs up, thumbs down. Obviously we get more thumbs down than thumbs up because that's just the nature of people. But we, but that's helpful until all those queries go, come back to us so that we know here's the setup of bad queries. We also evaluate things like in search.

[00:33:35] Tamar Yehoshua: It's easy, did somebody click on it? did they find what they were looking for, in the system it's trickier because they might have gotten the answer or not gotten the answer. But for example, if they try a couple queries in the system and go to search afterwards and then find the document that they needed, we know that the assistant didn't give them the answer.

[00:33:50] Tamar Yehoshua: So we have some, we have a metric, the index, the satisfaction index from search and the assistant. So we look at that and we measure that very heavily of how many bad queries did we get, how many comments down did we get? So that's one

[00:34:02] Guy Podjarny: Yeah. And that's posted. Those are all things that

[00:34:05] Tamar Yehoshua: Yeah, that's the

[00:34:06] Guy Podjarny: people using the product.

[00:34:07] Tamar Yehoshua: Right. That's the proxy for how well are we doing?

[00:34:11] Tamar Yehoshua: And if we're doing hill climbing to improve, is it going up or going down? And then evaluating is super tricky. So we have, as you mentioned, we've started using LLM as a judge. And there are many ways that the LLM could go wrong, or the whole assistant could go wrong. It could pick the wrong query to send to the retrieval engine.

[00:34:33] Tamar Yehoshua: The retrieval could not find the document. It could find the wrong one, or it could miss something, and then in the generative, It could not be a complete answer, it could not be grounded in the facts, it might pull in public data instead of the data that you had. And then, so you've got the completeness, the groundedness, and the factualness.

[00:34:54] Tamar Yehoshua: So we've been using LLM's to judge our answers for the assistant in these different areas. So completeness is one that we get the most thumbs down. If it's not a complete answer, we'll get thumbs down for it. And then we've correlated the thumbs down with LLM as a judge and the completeness.

[00:35:14] Tamar Yehoshua: So that's the most easiest for an LLM to then evaluate the results of the LLM. And so we have completeness, we have grounded,did it, tell you which which context it came from? And then the factualness is the hardest. And the factualness, what you need is a golden set,

[00:35:30] Guy Podjarny: Factualness being the, you did not hallucinate,

[00:35:33] Tamar Yehoshua: Grounded is more of the hallucinations. We don't have a big problem with hallucinations at Glean because of the RAG based architecture and because we do the citations. So the groundedness is the most aligned to, to hallucinations, but sometimes it's not grounded in enterprise stock because it might be like the stock price that you ask for a company, and it might just be public knowledge.

[00:35:54] Tamar Yehoshua: The factfulness is, was it, LLMs are very confident. So they'll say with great confidence that something is correct. And then a user will not thumbs down those because they'll just assume it's correct. And those are the most dangerous.

[00:36:07] Guy Podjarny: Right.

[00:36:08] Tamar Yehoshua: And that what we're doing is trying to actually have a golden set.

[00:36:13] Tamar Yehoshua: Where having an LLM extract queries from documents and then measure the effectiveness of are we finding them. So we get the golden set and then we're so this one actually we're still working on the fact how to best measure it, but the best part that we've done now is now we have a repeatable process for LLM as judge, we have an eval framework of how we use, and we just turn the crank when a new model comes out, and we can get an evaluation across these metrics for new models and for changes as we're making changes in the code, we can evaluate them more easily.

[00:36:52] Tamar Yehoshua: Is it perfect? No, but it's a lot better than engineers manually going and looking at every query, which

[00:36:57] Guy Podjarny: at the results. And that golden set needs to be created per customer.

[00:37:01] Tamar Yehoshua: So we use, a lot of times we use our data for some of these golden sets to make sure, but we do have in our eval, we actually run queries in our customer environments in their deployments, because we can't look at their content, but we can run things and evaluate them and get the results of the evaluation.

[00:37:19] Guy Podjarny: Interesting. And that's a good learning. So you can't look at their data. You have to make sure that your check did not cause problems with their data or see, at least try to assess, that's the case. so what you're agreeing with them is that you'd be able to run

[00:37:35] Tamar Yehoshua: some processes that are not actually functional or run a bunch of these tests on their platform with their data, but you won't access the data, you'll just get the results of like thumbs up, thumbs down, We're

[00:37:45] Guy Podjarny: different version of the thumbs up, thumbs down to say, yeah, it feels good for you to, to deploy this, this new version or to upgrade to this new model.

[00:37:52] Tamar Yehoshua: We're very cautious of making sure we have very strict agreements of how we're handling customer data, but we absolutely run regressions and,which it was a,it's a interesting process that we've gone through.

[00:38:05] Guy Podjarny: Yeah. And, so we'll talk about LLM as a judge, which is. the LLM looks at the answer and says, the result I see in the golden set seems sufficiently familiar, sufficiently similar to the result the product, the live product gave right now, and then you, I found interesting in the blog post that you, you wrote about this.

[00:38:24] Guy Podjarny: There's this notion of LLM jury, which I don't know, has the risk of taking the analogy a bit too far, but, I guess is that, do you want to say a couple of words about that?

[00:38:32] Tamar Yehoshua: It's exactly what it sounds like, just multiple, you want to assess not just one voice, but multiple voices to make sure that you're aligning.

[00:38:42] Guy Podjarny: And I guess, partly a means to deal with the fact that LLMs themselves, like the evaluator itself is a random or is a non deterministic entity. So oddly, actually quite aligned to the reason there is a judge and jury in the actual human judicial system.

[00:38:58] Tamar Yehoshua: That's a, yeah, that's a good point. Yeah. Because people are non deterministic too.

[00:39:02] Guy Podjarny: Yeah. Yeah. it's like it's built in our image, versus, maybe non enterprise or a consumer version of it, but are there other highlights? I think the things I captured so far have been, the fact that you have to use their model, you can necessarily see their data.

[00:39:18] Guy Podjarny: Maybe even the fundamental, architecture of being RAG based because you're going to be dealing with sensitive data. I guess what other things jumped to mind in terms of the difference between building a ChatGPT for everybody and the ChatGPT for the enterprise?

[00:39:36] Tamar Yehoshua: The biggest one is the one you cited, for ChatGPT for consumers, it's run off the internet where everything is public knowledge, so you'd never have to worry about permissions. But the other thing is when you're building search for consumer versus enterprise, the signals that you use for ranking are different.

[00:39:52] Tamar Yehoshua: So in consumer search, you have a lot of click signals because you have millions, maybe billions of people using it. So it's very dense, your click signals and you're like semantic similarities, your synonyms. You just have a lot more data to go on. So the kind of signals you get in the enterprise, you're not gonna get the anchors that you have in search,

[00:40:14] Tamar Yehoshua: this, the density of click, they're very sparse. Your click data in the enterprise is too sparse to use that, but then you have different data. You have activity data. You have, Oh, this document, like a lot of people are looking at this document in the last week. So it must be important. So let's bump it up.

[00:40:31] Tamar Yehoshua: You have the, what I talked about of knowing your team. If I'm looking for an onboarding document. For my team, I'm not going to use one for marketing. I'm going to use one from somebody else in engineering or somebody in my team to as a template to start. So we, that's a signal that has no meaning in consumer, but is very important in the enterprise.

[00:40:50] Tamar Yehoshua: Things like activity and proximity, like how proximity of the person looking at the document. What is their proximity to me? So we personalization is a very strong signal in Glean. for doing ranking, which it's a much weaker signal in consumer. So that is what the team spent the first couple of years figuring out is what are the enterprise signals that you need for ranking.

[00:41:17] Guy Podjarny: Yeah. And the first, the whole architecture predicates on the fact that the first problem is find the relevant pieces of information in the enterprise. And then subsequently, as we've discussed before, the Gen AI UX on top of them is how you interact with that data versus pointing it elsewhere, which opens up its own set of opportunities and,and at the same time, maybe challenges around adoption, because it's a new practice.

[00:41:41] Tamar Yehoshua: And I guess the. what is your sense around enterprise's readiness to embrace all this new stuff, on it? Are they chomping at the bit to, to take it? Are they trying to stay away and just do the minimal whatever board, requirement to the AI? So they're more ready than I thought, like CIOs are like, we need an AI solution and they're all going out there and evaluating different AI solutions. Now, part of it is a checkbox because the board is telling them they need an AI solution, but there's a lot who are genuinely interested and genuinely curious.

[00:42:14] Tamar Yehoshua: I mean, you see a lot of IT departments in large companies. building, using OpenAI to try and build their own solutions. And a lot of come to us a year later and we tried to build this. It's hard, didn't realize how hard RAG is. We should use your APIs, but that, so I do think that there is a willingness.

[00:42:30] Tamar Yehoshua: There's also some fear. So some companies are like, I, even if you say you're secure, I just don't want any AI coming anywhere near me, and those we don't even talk to, but I have been encouraged by how many want to bring it in, and then after they bring it in, you have the champions who really are like vocal and figure out how to use it.

[00:42:54] Tamar Yehoshua: That's what's going to take the biggest, the biggest change management is how do you under, how do you really educate everyone on how much they can make their jobs better? How much they can leverage products like Glean to be more productive? Because as I said with the ChatGPT example, People just forget sometimes and just don't even think of it.

[00:43:15] Guy Podjarny: And humans change the slowest. So that's a good tee up to the final question here is we talked about the zero to one. How do people even wrap their heads around what is it that they could do. What do you imagine as the far down the road reality? I talk about AI native software development, talk about AI native in a bunch of things.

[00:43:34] Guy Podjarny: What is the AI native reality? I don't know if that's the right term here for enterprise search, for Glean's vision.

[00:43:41] Tamar Yehoshua: I think that people are going to be able to automate away a lot of what they do today, a lot of the toil that they do today. And if you think of people, some people, executives has had assistance, you'll have essentially an assistant for every aspect of what you do, and you're not going to have to do a lot of this, like writing the first versions of your documents, prepping for your customers.

[00:44:07] Tamar Yehoshua: We had a finance person build a Glean prompt to help her figure out comp for salespeople, read the spreadsheet and understand for which salesperson is getting what comp based on what they've done. Like all these things people are doing manually, and people will just get used to, it'll be a new thing, they won't even, they'll even forget when they used to have to do it manually and that'll be okay because they don't need that skill anymore and they can spend time on other things other hopefully more creative higher leverage things.

[00:44:36] Tamar Yehoshua: I'm looking forward to that day.

[00:44:38] Guy Podjarny: Yeah. and is it safe? It sounds compelling for sure. And I guess, you and I have both been in sort of leadership positions and so sometimes spoiled with having some of that work happen with,with others on the team. Do you find that to be a good, mental model? Is it like everybody would be a bit more senior in their org with having, a whole bunch of helpers around them and the type of work?

[00:45:00] Guy Podjarny: The average person or, whatever, even the more sort of lower down the hierarchy of an organization would be doing would end up being more senior compared to what we're doing today. More managerial.

[00:45:12] Tamar Yehoshua: I've heard somebody describe it as it's like having teams of interns. And so I don't, I wouldn't put it as everyone will be more managerially. I don't believe that. But I believe if you had interns who could do, a lot of the rote automated work that can't easily be automated, so you can do the creative work.

[00:45:33] Tamar Yehoshua: And so I think you will get more productivity if if you all, if everybody had this five or six interns to go, build this code here, build that document there, that presentation. I think we'd all be, I think happier in, in how we're spending our time.

[00:45:49] Guy Podjarny: Yeah. Yep. I absolutely agree with that. Tamar, this has been excellent. I'm sure we could go on for a full extra hour to dig into it, but thanks a lot for coming on the show and sharing a whole bunch of these learnings.

[00:46:00] Tamar Yehoshua: Thank you for having me. It's always great to talk to you.

[00:46:03] Guy Podjarny: And thanks everybody for tuning in and I hope you join us for the next one.

Podcast theme music by Transistor.fm. Learn how to start a podcast here.