Adam Evans explains how Salesforce keeps its AI on a leash Artwork

Techzine Talks on Tour

Techzine Talks on Tour is a podcast series recorded on location at the events Coen and Sander attend all over the world. A spin-off of the successful Dutch series Techzine Talks, this new English series aims to reach new audiences.

Each episode is an approximately 30-minute discussion that Coen or Sander has with a high-level executive of a technology company. The episodes are single-take affairs, and we don't (or hardly) edit them afterwards, apart from polishing the audio up a bit of course. This way, you get an honest, open discussion where everyone speaks their mind on the topic at hand.

These topics vary greatly, as Coen and Sander attend a total of 50 to 60 events each year, ranging from open-source events like KubeCon to events hosted by Cisco, IBM, Salesforce and ServiceNow, to name only a few. With a lot of experience in many walks of IT life, Coen and Sander always manage to produce an engaging, in-depth discussion on general trends, but also on technology itself.

So follow Techzine Talks on Tour and stay in the know. We might just tell you a thing or two you didn't know yet, but which might be very important for your next project or for your organization in general. Stay tuned and follow Techzine Talks on Tour.

All Episodes

Techzine Talks on Tour

Adam Evans explains how Salesforce keeps its AI on a leash

November 13, 2024 • Coen or Sander • Season 1 • Episode 21

Salesforce is one of the fastest innovators in the SaaS AI space. They started with a prompt builder early in 2024. During Dreamforce they launched Agentforce to build your own automated AI Agents. In February of 2025 major expansions are planned, adding voice capabilities to agents, coaching your sales team with an agent and the expansion to many more countries. How can Salesforce move so fast and make sure its AI doesn't hallucinate?

For this episode of Techzine Talks on Tour, we talk to Adam Evans, SVP of Product for the Salesforce AI Platform. Salesforce combines many data sources with the Salesforce Data Cloud. He explains how the data is being used and how the reasoning engine loops through the data and AI queries to ensure the best consistent outcome is served to the end-user.

Salesforce has built a reasoning engine that handles data and AI queries. When a query is sent to an AI modal, it doesn't always give back the right answer. Sometimes, it needs more information. The reasoning engine acts like a human to catch that. It will then search for more relevant information and feed it to the AI model. When you ask a question to a Salesforce AI Agent it may go back and forth several times to an AI model before you get an answer. Just to make sure you get the best possible answer.

Salesforce creates prompts on the fly and uses RAG to feed the most relevant information to the AI models. The reasoning engine works together with the AI Trust Layer to stay compliant and mask customer data. The reasoning engine keeps the AI on a leash.

The reasoning engine and the AI Trust Layer are all agnostic. With new AI models being released daily and the LLMs making giant steps forward every few months, it's hard to tell which AI model will dominate the industry in 12 or 24 months. Salesforce is taking that into account with how it has built its AI framework; they can switch to different LLMs very easily.

Evans also addresses critical issues in AI data privacy, emphasizing the importance of data residency and localized computing to comply with international regulations. This episode is a must-listen for anyone looking to understand the future of AI technology in SaaS and how it can improve your business.

Speaker 1: 0:22

We are here at Dreamforce, the conference of Salesforce. We're here with Adam Evans. He's SVP Product Salesforce AI Platform. It's a big title, Adam. It's a big company, yeah exactly.

Speaker 2: 0:34

So yeah, what that means is I'm in charge of product on our platform side, so that's reusing things like AgentForce, Copilots, PromptBuilder, basically anything that's GenAI inside of Salesforce. That kind of falls into my realm from a product perspective.

Speaker 2: 0:49

Okay, so you kind of developed AgentForce with your team then yeah, it's a large team, there's a lot of people behind AgentForce and my background is I've come to Salesforce twice through acquisition and focusing on AI for sales automation at first, and then the second acquisition last year was more about applying AI into service, meaning kind of self-service. So you see aspects of both of those in both product and team and more.

Speaker 1: 1:19

Throwing all your business data at AI and prompting it doesn't work. That's basically what the hyperscalers are doing. The adoption rates are pretty slow and you guys think you have solved it with agent force. Can you explain a bit what it is?

Speaker 2: 1:31

Yeah, so I think that I mean the solution. You know there's always more you can do. The nature of large language models most stochastic systems are probabilistic, so it's a question of can you get it right more and more, and more more accurate, better accurate, better kind of each time. So it's never maybe fully solved problem. It's always the pursuit of it right. Just to be clear, the um, and what we're saying is we think about uh, you know, ai's is only good as kind of the data that you have and if you think of it that way, uh, a lot of this boils down to with, like, large language models, thinking about, um, what, what are all your systems about? Getting the right information into the prompt and I use information as opposed to data because it's natural text and, at the end of the day, for the least large language models and knowing how to pull out the right data from data cloud in kind of the right context and the right moment and have that go into LLM and do that kind of in concert with other information.

Speaker 2: 2:27

So like, for example, if an agent is in a conversational setting, how much of the conversation history with the user that it's talking to or having a conversation with. Do you want to have in there If it's a really really long running conversation. Do you want the whole thing, you want a little bit. What are the important aspects of that? What about, for example, if the conversation was moving into questions about policies or product information, things that might be inside of kind of your knowledge base for health, or a deeper kind of unstructured data set? This is being able to search through that and we'd say semantic search or vector search, and your audience is quite technical. So when you're doing RAG and you're doing vector on these things, how can you bring back some of that? And then maybe also like personalization, like, have you had many conversations with this user before? If it's an employee at your company, do we know about their role, their job, why they might be talking to this agent, like how's it going to help them with their work? Or if it's a customer that's external, what activity have we had with them? What's their maybe their order, history or their relationship?

Speaker 2: 3:33

And thinking about bringing all of that information in it can be overwhelming and part of it is about, you know, I say the right information. You think about the techniques there. It's about not necessarily having too large of a prompt. I mean think about context windows and these kinds of things. It's not even about cost or really even performance, although those things obviously have implications to this. It's about consistency and accuracy of the behavior, of what you like to get from the LLM, and so understanding the techniques of what information to pull back in, how to order it, and kind of do that. This changes the outcome.

Speaker 2: 4:05

Now, as an agent, the difference between a prompt, like a one-shot prompt, and an agent is effectively doing this in a kind of a dynamic looping pattern, right? So the agent would look at information and then it might, for example, make a choice to go to either, let's say, respond in a conversation, or to go deeper and maybe look for some more information. Do some, you know, call it research for a second, if you're thinking of it that way, go find some more information before it responds and then, depending on what it comes up with next, it might decide well, I got the information that the customer asked for, the user asked for, I'm going to respond with it. Or maybe what it found on behalf of that user is actually information that led to more questions for it, that it didn't have the answer yet. So now, and I'll give you an example just to make this concrete, a very canonical example, since we've all been there with e-commerce.

Speaker 2: 5:00

Hey, I order a product and it hasn't shown up yet. And I might need to say, hey, what's going on? This product is, you know, this is my order. Where is my order? It's the classic thing. You know, it's two days late and I'm you know, I'm order number one, two, three, four, and just that kind of an utterance.

Speaker 2: 5:23

Maybe the agent goes and looks up the order one, two, three, four pulls it back. Maybe have you checked your porch? Is it there? Is it stolen? Maybe I should look up my policies of lost and stolen packages before I respond. Or maybe when it pulled back that order that same one, two, three, four order it would learn that it's in transit and it would decide that it needs to go check the the carriers to see actually what's going on, if it's held up and when the actual arrival date is.

Speaker 2: 5:44

Or maybe it checked one, two, three, four and found that there was actually no order one, two, three, four and maybe the user was confused and it responded right away and said "'Hey, are you sure this is the right order number'. And that level of agency is effectively being able to loop on information. So everything about selecting the right data into the prompt thinking about all those different contexts coming together and then the orchestration on top of that, because if it's looping multiple times and it's seeking information, you know we will let it do that, maybe a half dozen times or more I mean, there's safeguards for this kind of thing. But then it's doing research. You're asking that same question of what data, what information? How are you interfacing to this large language model, dynamically, sub-second each time it's changing?

Speaker 1: 6:29

Does that make sense? Yeah, I'm still following. A lot of questions came up during your story because I wonder now you're using small steps with the AI to process a request, because it can do many things, it has to check out a lot of things and all these small steps are being controlled by your overlaying. I think it's called Atlas processing, a reasoning engine, a reasoning engine. Sorry, but that's not an LLM. That's a different kind of AI that you're using for those steps, right? Yep?

Speaker 2: 7:03

So the well, an LLM is a pre-trained model, right? Okay, so you know GPT, generally pre-trained models with lots of information, and you get these emergent breakthrough behaviors. It's awesome. It's amazing, we've all seen it, we. There isn't a single model. I mean we look at this as models are changing rapidly, so fast. I mean there's new models. Models are changing rapidly, so fast. I mean there's new models every day. But the big models you know every. You know few months.

Speaker 1: 7:31

Every three months or so.

Speaker 2: 7:32

Yeah, that's right. And so, and we've seen this incredible, incredible catch-up from open weight models, open source models. I mean, it's a I would not have predicted that we're gonna be at this stage we're at right now, and how relatively low cost it is.

Speaker 1: 7:45

I think nobody did.

Speaker 2: 7:46

All right, and so there's all this uncertainty about these models and so and our customers are, you know, we have. You know, if you're listening to this, you have large customers, you have moving large organizations. Making a purchasing and implementation decision, rolling out new product and features to your customers, is a lot of work, you know, to move down a path and the idea that the rug that you're building on the foundation that it's going to get pulled out from underneath you or changed so much. This is a really hard thing to do and, given that we're at this moment where it's moving so quickly, we've made choices to stay effectively agnostic or above that, so you can actually swap out models that you want to have for that reasoning engine or reasoning planning.

Speaker 1: 8:24

Yeah, because that's what I found interesting. At one point Mark says yeah, just using an LLM and do trial and error, you're not going to get the results you want because it won't be as good and it can go off. It can go its own way and you need to have an AI that has an expected outcome, that you know it's not going to hallucinate or do strange things. So you need to do it in a controlled environment. But you're still using those same LLMs in a controlled way to get the output.

Speaker 2: 9:02

That's right. So some of the techniques I mentioned about choosing what data goes into a prompt inside of this inner agentic loop, inside this reasoning engine, right? So this is happening multiple times. Inside of that there's also deterministic logic that you can have policies that as the thinking or state of that loop, as the agent is moving through its research or through the conversation, as new information becomes available. I'll give you a really simple and canonical example of the authentication If I'm having a conversation with an authenticated user versus a non-authenticated user, what's the difference in terms of what the LLM can do?

Speaker 2: 9:37

And it turns out it's important to make there a distinct difference, right? So certain what we would call topics and instructions, and topics to us is a construct of being able to describe a focus of the agent at a given moment in time. So you could say, if you're doing things like the example I was giving about the order management, that would be a topic where it's focusing on understanding how to help customers with orders or order management. And if it was an unauthenticated session, a conversation, then it may only have the ability to do things like look up information of an order if it has both an order ID and an email address and maybe something else, or it could authenticate the session and suddenly have more actions and availability to it. So this eliminates this boundary of hallucination for its ability to actually have effect, to take action.

Speaker 1: 10:27

But did you build different object models within the Atlas reasoning engine to process audit management or to do pricing? Engine, or are there different AIs that you train to do that? How does it work?

Speaker 2: 10:44

There are high-level LLMs that are brought in for basically this reasoning and planning, right, like you think of that and there's very few models that we have. We assess all the models for this. We put a benchmark out, by the way, we call it our CRM benchmark when we look at all these foundational models. There are only really it's been GPT-4, but there's now other models that have been this part of the excitement of things that are changing so fast, that are now getting to the point where we would feel that we would, from our assessment, say to our customers that this is now good for this highest level reasoning and planning. There are lots of use cases that do not require that level of I don't know, I'll say, intelligence or cognitive ability of the reasoning and planning Like summarization is pretty classic. Planning Like summarization is pretty classic. You know, summarization is pretty classic. Additionally, there are built-from-scratch models inside of the reasoning engine.

Speaker 2: 11:35

By the way, there are models that are running all the time for safeguards and kind of guardrails. You can imagine you almost have a supervisor that's running in the background looking at what the agent is doing to be able to stop things or kind of keep it on rails or within boundaries. We have models that look at toxicity. We have models that look at prompt defense. So imagine there's a whole bunch of smaller, specific models.

Speaker 2: 12:00

Yeah, domain models, that's right, and it's not domain from the perspective of necessarily industry yet, although that is definitely of an interest and we're looking through those things. But it's really about having models that would apply to all kinds of things that pretty much every business would want to have. Like, am I being? Is somebody trying to be malicious? Like a prompt offense or something to that effect. And then we're using the highest level, more kind of cognitive models to kind of draw a box around them, a boundary of guardrails of what they can and cannot do, and using them to kind of reason through information. And those models you can swap out as more models of these larger kind of frontier models as that landscape continues to evolve. Our customers aren't going to be locked in yeah, it's a very advanced piece of technology.

Speaker 2: 12:42

I guess, because you're using so many different models, they're within within the reasoning engine and you know in the, and the way that this is done is that you, like you, don't need to have a computer science degree, you don't need to completely understand it, and what we're doing is shipping a product with. This is basically our commitment to continue to improve things as the technology changes and do it in a way that's backwards compatible right, so that you don't have to know. It just kind of keeps success right, it just gets better into the cloud.

Speaker 1: 13:11

Okay, and the data cloud is basically the foundation of what makes your AI so good. Right, because the data cloud is the centerpiece where you get all the data from.

Speaker 2: 13:21

Data cloud is the center of all of the information that's coming from. Like that's data. It's in the name data cloud, so is the AI needs to be able to look up information and kind of reason across things that it's look. It needs data to do that. It's coming from data cloud.

Speaker 1: 13:36

It could also, by the way, come from external systems, because how do you make sure that external data sources are also summarized in a way that the AI can handle it? Because you can also throw too much data at it, like you said in the beginning of this episode. That's right. You need to manage that as well, right?

Speaker 2: 13:54

so when that's actually a great point, that you're you're hitting on you can have the agents take action, and action can be pretty much anything you want, and action can be look something up from another system, right, so can be, and that is a way to connect data effectively. It's an API kind of aspect of a rag kind of mentality, right? If you want to filter that information down, say it's too much Like, for example, I'm doing a search across, let's call it. Actually, we have a customer, easy Cater has a lot of information on thousands, hundreds of thousands of catering restaurants and they all have menus and what they build. It's a lot of unstructured data, and one way to do it would be hit an API for an action to just search for caters in an area or some kind of parametric search. Another way to do it would be load that unstructured data into Data Cloud, and what the benefit of that is is that and this is some of the techniques that we've heard from our customers perform like what they've reported internally back to us is over twice as accurate in terms of the answers we give, versus the other techniques on the same data, the same questions.

Speaker 2: 15:06

Okay, and these techniques are all. When you put data in data cloud, these techniques kick on automatically. So if you want to be able to filter data more accurately. You're on that structured data. If you move it into data cloud, you can connect an agent to that data so that it knows how to articulate questions better, it knows how to reason and reflect on what those responses if they're accurate and that ultimately gets more accurate data back into that prompt and ultimately, like we say, like you know the AI, if you give a bad data, you're gonna get a bad outcome yeah, okay, aren't you?

Speaker 1: 15:41

are you? You're expecting thousands of agents, so it will stress your database as well. Are you confident that the infrastructure is up to the task, because we see other vendors in your area replacing their databases with others or adding NoSQL databases or adding Elasticsearch. I've seen many, many different evolvements in that.

Speaker 2: 16:06

I mean you're asking a great question. 1,000 is not the problem, it's certainly billions. But the point is is there a limit for scale?

Speaker 1: 16:15

Yeah, a limit for scale. Yeah, you know one of the most amazing things.

Speaker 2: 16:20

I'm a tech geek kind of guy. One of the most amazing things about vector, vector search, vector, you know, you know embeddings and how this works is um, it is all uh, it's meaning and in search with semantics, but it all boils basically down to math and it turns out that um, computers, gpus and kind of inverted index databases when you're searching for numbers, are really scalable when it comes to math. So with Vector it's actually more scalable in many ways than other database indexes that we've had for many, many years.

Speaker 2: 16:54

So the answer is is not really worried about that scale and also by the way, everything that we do in our infrastructure this is part of the whole cloud is that it auto-scales for us. We are monitoring this. This is what our job is for our customers Maintain uptime scale from. They shouldn't have to think about any of these things, right.

Speaker 1: 17:10

That's true, but I see other vendors just replacing the whole database infrastructure because it's not good enough. So that's why I asked, because when you create a data cloud, you weren't thinking about this use case back in the day.

Speaker 2: 17:24

Yeah, data cloud is designed for immense scale. I think that the agents are going to. Honestly it's querying data more as it's needed, more surgically. I would imagine that other systems will be stressed more from agents than data cloud okay.

Speaker 1: 17:41

Final question you're doing a world tour. You're going to many countries to present this and help customers build their agents. I saw you coming to many European kind of countries with different languages. Yes, is that a problem? Because you need to support German, dutch, french, spanish, I don't know whatever else there is.

Speaker 2: 17:58

Yeah, so right now in October, when we have general availability of this, we will support French, italian, german, spanish, japanese and Portuguese. Okay, that's a lot. Yeah, that's good. It represents a large set. Now we know there's a lot of languages that I didn't mention right there. We have 40 total languages that Salesforce supports kind of across the product lines. In fact, that's the full set. That's our target. So our plan is in our spring release, which is February, to offer support for all 40 of those with this.

Speaker 2: 18:31

Now also, large language models are able to speak many languages okay, some better than others, and they're getting better all the time. This is also why you want that agility in which language model you have. When we say that a language is supported or not, it's more from a formal perspective, documentation. There's a lot more behind that versus even right now. I've seen we don't technically support Japanese, but I've seen it work from our Japanese friends more so it's there. It'll get better, um, every day.

Speaker 2: 19:06

And, by the way, another question uh, that's related to languages also data residency and privacy and these kinds of things about where it runs, and so that's another exciting things that we move into models that are able to reasoning and planning these frontier models that um are able to be models that are able to reasoning and planning, these frontier models that are able to be more of that kind of open source side where you can run them more into a private you know, so your private cloud exactly, and so this is allows us to move the compute, move the GPUs, with the models close to inference time, you know, closer to where the data lives, biting by local regulations and stuff. So this is also something that I can't give exact dates on that, but I will say that in the last handful of months, a few months, we have seen things that are now looking really positive for being able to roll, that we're getting ready to roll out and kind of expand to that which will unlock this for many of our international customers.

Speaker 1: 19:54

Yeah, in Europe, that's very important, as well as some Middle Eastern countries as well, that they want the data residency to be there. That's right.

Speaker 2: 20:00

And also regulated industries that are. It doesn't really matter if you're in geography too as well. So all things trending that way language, data, residency, privacy, things are looking good. We're having more models that are all competing with each other, that are at that level, and we're just kind of trying to make this all productized so our customers don't have to go hire an army of computer science and PhD people to build this out. They don't have to relearn all the lessons that we've done and we can build software for them to just make ultimately everybody go faster and drive customer success and the smaller models in the reasoning engine are not language dependent, then no, all in English.

Speaker 1: 20:35

No, okay, so that's good. Okay, thank you, adam. I see you have to go so you're a busy man. So thank you for the conversation.

Speaker 2: 20:42

Thank you very much.

Speaker 1: 20:43

And hopefully we talk again in the future. Absolutely OK, thanks.