Techzine Talks on Tour

Cisco wants a (big) piece of the AI pie and doubles down on compute

Coen or Sander Season 2 Episode 5

Cisco gradually rolls out a complete ecosystem of solutions, both software and hardware, for the AI age. We discuss the objectives Cisco has in the compute space with Jeremy Foster, SVP and GM for Cisco's compute organization.

Cisco's compute business appeared to be in de doldrums a couple of years ago. That is, it appeared to coast along without a very clear objective. Cisco didn't make a lot of big announcements around UCS either, as far as we can remember.

This has changed fundamentally over the past year or so. Cisco wants to make it very clear now that UCS still plays a very important role in Cisco's strategy. The reason for this? AI of course. What else? With the rise of AI comes more and different demand for the underlying infrastructure. 

That infrastructure is about more than GPUs alone. Compute and networking also play a very important role. That's why Cisco has been focusing heavily on delivering AI stacks to customers. Hyperfabric is one of them, but the recent announcement of the AI POD, and the news together with Nvidia coming out of GTC are all clear indications of this.

During our conversation with Foster at Cisco Live EMEA, we asked him all about the changed dynamic within Cisco around UCS and compute in general. We talked about what makes a good AI stack, and what customers should take into account when setting one up. 

Foster promises one thing, Cisco's only getting started, so it's a good idea to be clued in from the start. Another good reason to listen to this new Episode of Techzine Talks on Tour now.




Speaker 1:

Welcome to this new episode of Techzine Talks. On Tour. I'm at Cisco Live EMEA and I'm here with Jeremy Foster, the SVP and General Manager for the compute business at Cisco. Right, that's correct. It's great to be here. Thanks for having me. I want to talk about AI. I don't know why.

Speaker 2:

It's been a big deal these last few years, hasn't it?

Speaker 1:

Obviously. But before we do that, let's talk about UCS and the compute kind of things that Cisco has done and does, because I got the impression a couple of years ago that you weren't doing that much around UCS anymore and around compute in general. Was that a correct observation or was it just blind and didn't I see all the fantastic stuff that you were doing?

Speaker 2:

Yeah, what I'll say is this A couple of years ago I moved into this role and it has been a big focus for me to get the energy and focus back into delivering a tremendous amount of innovation across the UCS portfolio. So I think, as a general statement, what you're seeing is AI is forcing a lot of things in the data center to change and as we've evolved UCS, if you look at what we just brought out even here today at Cisco Live, there's a whole new lineup of servers. We've more than doubled the amount of servers in the UCS portfolio over the last two years, and that doesn't even include the new servers that we just launched for AI and the new ones that we'll be launching over the next several months. And so really proud of the team and the work that we've been doing to get us where we are right now, and the response from customers has been fantastic Because there is a significant acceleration in what you have been doing right In the past couple of years.

Speaker 2:

Yes, very significant.

Speaker 1:

And we're not done yet. So what was the fundamental thing that you actually had to get right before you could do this acceleration? Because that's a Cisco trend right. So you usually do some very hard stuff first and then you do a lot of acceleration of innovation after that. What was the hard thing that you had to?

Speaker 2:

solve From a modular systems perspective. Obviously, during the pandemic, we released X-Series. So X-Series if you look back at where we started we started off very focused on what at the time people called blade servers, and we were very successful there. We still own half of that market and we told customers we're building a chassis, much like Cisco could do in the networking space. That was going to last 10 years, and we lied because it's been 15 and people are still buying it. So we definitely delivered on the promise, though, of having a very long life in that chassis.

Speaker 2:

But delivering the next generation that would be able to be capable of delivering certain use cases, even for things like AI in certain use cases, was what we were focused on with XSeries Giving customers another platform they could rely on for 10 years. And then the big acceleration you're also seeing from us is rounding out that rack mount portfolio. You know we just delivered a bunch of new servers in time with the Turin launch for AMD. You can see all of our new granite systems downstairs from an Intel perspective, with all different new designs that can do things that can help customers lower their cost to deploy those individual servers. And, most importantly, we've really been innovating. There's a massive change in what's been happening with Intersight, where we now have over a million devices being managed by Intersight and our SaaS management, which I think we lead the industry in.

Speaker 1:

Which one of these was the most fundamental thing you had to get right to actually get to where you're now.

Speaker 2:

At the end of the day.

Speaker 2:

All of it. I mean, look, the way I think about it is, you have fundamental things you have to do on the systems architecture side and being able to take the good things we do with the old UCS system and being able to deliver reductions in power and cooling and cabling and simplification for customers. And then you have the things you need to do on the management side, which is delivering a cloud-based API and really cleaning up and modernizing that process right. Intersight didn't start off as the tool that managed the server. Intersight started off as a tool that helped add processes to on-premise management. Now we've created a system that we talk to customers about, saas management before. Like I said, we have a million devices being managed by Intersight, so it's going well, but about 25% to 30% of our customers on any given quarter use Intersight on-premise. We've built a platform that's flexible enough for people to run on premise and run from the cloud, depending on their use cases or even a dark site.

Speaker 1:

And what about the things that customers come across very often, stuff like licensing and all that stuff? Did you do any major kind of innovations in that side as well?

Speaker 2:

We changed up our licensing, we simplified it a couple of years ago. That was one of the first things we actually did. We continue to look at that and how we're going to bring that forward from a licensing perspective.

Speaker 1:

So you already mentioned the forbidden word AI, obviously. So let's go. Oh, is that?

Speaker 2:

true, no, no, it's not. It's not no it's not.

Speaker 1:

You hear so much about it nowadays.

Speaker 2:

That's the big joke, right? I'm three minutes into the presentation. I haven't said AI yet. Yeah, yeah, yeah.

Speaker 1:

Well, you managed to get it. I talked in the first 10 seconds, so I was the first culprit I would have to say and let's talk about AI stacks. Right, because what does it mean to have an AI stack? Right, because there's so much talk about how you should architect your data center or whatever, for AI. So what is your definition of an AI stack?

Speaker 2:

Yeah, our definition of an AI stack and there's all different types of reference architectures from different folks. Obviously, we're tightly aligned with NVIDIA and what we've been doing with AI pods and we leverage our networking equipment. There's a huge change in the requirements, not only from compute, but people forget how incredibly important the networking pieces are to bringing together a stack. And then what I would say makes stacks different is how do you manage them at software level? Because if you take a design from the market leader in NVIDIA and they say, hey, here's an HGX reference design, I can look at my design, I can look at my competitor's design and, yeah, they may have different color lights in the front of them and some pretty good stickers, but they're not going to be dramatically different. Like they're not going to be dramatically different. Like they're not going to be.

Speaker 2:

A customer says you know what gosh? That's the reason it's the eight same GPUs in the same box. That's what I'm going to do. So you have to be able to build a better experience. And so for me, the AI stack is about how do I manage it, how do I deploy it, but, more importantly, how do I live with that infrastructure and maintain it over the next five years, because for infrastructure in the enterprise there's a certain quality that customers expect in the enterprise and like an experience for, say, running a virtualization host and a certain level of reliability, and those things are different in AI. So you've got to try and balance them out.

Speaker 1:

And also there are many types of AI. Right, there's no single AI. That's right, so you have inferencing, you have trainings, or that means different considerations.

Speaker 2:

Right, that's right and from a kind of break it down. You have the training type stacks, where we just released and started shipping our HEP server in December and so we'll be delivering an AI pod for training here over the next 90 days. Our initial set of AI pods with Cisco networking and compute all put together with our management tools, that was focused primarily on enterprise inferencing type use cases. And then, if you look across our portfolio of AI pods, it uses all different kinds of boxes, Some of those inferencing use cases where I only need a couple of GPUs we put in X-series because it's better from a power efficiency perspective than just using some rack mount servers.

Speaker 2:

But when you look at training, well you're going you're gonna need an HEP HGX type reference design, and we use our 885, and all the scalability and modularity and all these things.

Speaker 1:

They play a very important role as well, I would imagine, especially if you don't know how big your AI stack is going to get yeah, and ultimately, the front end piece of all these solutions is about making it easy for an enterprise to consume.

Speaker 2:

So the the point of the I guess buying process, if you will, from a customer's perspective, should be hey, here's my enterprise use case. These are the things that I want to do to deliver against that use case, like how big, what t-shirt size of an AI pod, how scalable, like what do I need? And I think those are the pieces we're trying to figure out and put that in front of a customer so that they can. Okay, I know this is what I need and I'm not overspending because you know you look at these boxes versus where you were on a traditional server, traditional server, asp, just to use a round numbers, call it thirty thousand dollars, and then you look at one of these boxes, it could be three hundred and twenty five thousand dollars, so you don't want to miss by too much.

Speaker 1:

For thirty thousand. You don't even have the GPU yet, right, yeah, but so, but obviously, inferencing, I think my assumption at least is going to be much bigger than training will think that's my assumption at least is going to be much bigger than training will be. That's at least I assume.

Speaker 2:

Absolutely right From a customer side of things. Where do they make their money? Where does their customer have the experience? What does that mean?

Speaker 1:

for you when you build your AI stack. So what is it? How does that impact or influence your innovation in your products and services?

Speaker 2:

Yeah, inferencing is where we started off in terms of delivering those solutions and again, I totally agree with you. That's where these things are going to head in terms of where the business value is created. If I'm one of my customers, they're not making necessarily money building a model unless their business happens to be selling a model, and a lot of these models are open source. So you know there's a reason people need to build models, but in the enterprise they're going to leverage off the shelf models and they're going to do inferencing against those and when they're doing inferencing, that means their customer is doing something that's probably delivering business value. So you know that's why you're going to see more dollars spent on inferencing long term than on training and that's why you're going to see a lot of different solutions and providers of silicon to try and optimize that process further.

Speaker 2:

Like whether you know we're talking about DeepSeq right now, which was a change in how the model itself was even written from the beginning and we can debate on. You know how it was actually done and what the validity of all these things are, and we certainly are seeing as we publish in some blogs this week. There's different levels of security that you expect from different models and we're happy to help you on the security side, secure them. Let's put it that way, because some aren't as secure as others. But the point is we can increase the efficiency of these inferencing systems, both from a software and from a hardware perspective.

Speaker 1:

And I do think the market will dramatically change. You mentioned security, right. Do you still see this as sort of a separate thing? Because I can imagine, especially with with the AI defense announcement from from last month, combine that with AI pods or whatever, or the thing that you the, the, the m8 that you announced this week would that make? Is that something that that makes it an even better story?

Speaker 2:

Yeah, you're spot on. That's exactly what I was talking about when I was mentioning how we want to be able to improve that experience for customers. Right, it's about the manageability.

Speaker 1:

It's about bringing in things like security into that AI stack, but also something else, because, if I heard you correctly, you said you're actually improving on the reference designs as well, right? So at least that's what I think. I read your blog and it's in there somewhere. At least it's attributed to you. I don't know if you wrote it, but it says Jeremy Foster. Yeah, that's good.

Speaker 2:

Yeah, I mean from a reference design perspective. What we're doing is building out reference designs for an AI pod that are leveraging Cisco Ethernet.

Speaker 2:

So that's the difference between, say for today an NVIDIA reference architecture which would leverage NVIDIA's networking. And the important thing is, let's give customers an operational process they're already used to, which is Intersight and what they do on the Cisco networking side from things like our Nexus portfolio and say look, you need to pick up a new use case like AI. Great, we can do that without changing your operational process and that's of high value to our customers.

Speaker 1:

Now let's talk a little bit about the new. I don't remember the exact number of the M8 thing that you launched, the server that you launched this week. It was something with an 8 and a 4.

Speaker 2:

An 845. Oh look, I'm almost correct. I'm not in marketing, by the way.

Speaker 1:

So what does that add to the, to the, to the, to the fold?

Speaker 2:

so if you look at Nvidia's reference architecture at HGX, and then you have mgx, and HGX is the NVLink big box that we already had before, and so we're rounding out the AI portfolio with, now, a PCI box. So say, well, what's the difference with a PCI based 248 type GPU box? Gpu box I mean the first one is that it's scalable. The second one is it's going to be targeted more towards those fine-tuning and inferencing use cases. I mean maybe some light training, but I mean it's a box that will be used in the enterprise for their primary use cases. And the reason it's important is it starts to drive out. From a cost perspective, it's less expensive to deploy a box like that. It's a little bit more flexible.

Speaker 1:

You can start small as well, right, and you can start small with just two GPUs and then grow your way up, and then obviously it just makes things easier from a yeah and I remember when we talked during the partner summit a couple of months ago, we also touched on storage, on storage which was, I think, I think I don't I don't know if it was you who said it, but so or somebody else but they said well, you have, you have compute here and I'm so the listeners won't see this, but you will see, you have computer here and networking here and then you have storage all the way somewhere over there is. Is that, is that still? Uh, will it, will it remain the case? You think? Think, will storage not be that important?

Speaker 2:

It wasn't me that said that, because my opinion of life is storage is pretty darn important and if you look at our AI pod reference design, it's again an area where we're best of breed. We're using our networking and we're using storage from NetApp, from Pure, and building out AI pods that are flex pods and flash stacks, if you will, based on extending that work that we've done.

Speaker 1:

Okay, then it must have been somebody else. It was a Cisco person who said that. I'm pretty sure that may be true, but it wasn't me, and I also think storage is critically important, obviously.

Speaker 2:

Just look at training. You have a whole set of storage players that are purpose-built to deliver the type of performance that's required for training, and they're not even, oftentimes, the traditional storage for the enterprise. When you get to inferencing, those traditional enterprise players are going to be able to deliver against those use cases, and now you're. I mean, I don't know what this hypothetical enterprise customer's application does, but what I do know is that machines typically do things faster than humans, and so if that application generates any type of data that you want to store, I mean, this is why, if you look at the market right now, every storage company on the planet is up over probably close to 2x what their market cap was two years ago.

Speaker 1:

Well, maybe I'll just have to track down who it was who said that.

Speaker 2:

If it was me, I was having a bad day.

Speaker 1:

Tell them they were wrong. But going on a little bit about the converged systems like a FlexPod AI, right. So that also has a place in this story about AI stacks, right. So what's its place compared to all the other AI stack kind of related things that you do? That's a great question actually.

Speaker 2:

So what is a FlexPod or a Flash stack? What they are at its core is a bunch of great work that my team builds around doing a Cisco-validated design or we love to use acronyms around here, so it's a. Cvd. There you go, but a Cisco-validated design is what we set up in the lab and do a tremendous amount of testing on, and we'll deliver a document.

Speaker 2:

It's not thrilling to read on your plane and ride home, but it is very helpful if you want to deploy one of these systems you know, 350 page in type of a document that tells you every best practice setting that you would want to build for, typically, when we started the soft office enterprise type use cases signal virtualization, validated design with openshift, for example and we incorporate compute, network and storage, but then also the software layer.

Speaker 2:

And since we build these as a system and we test these as, and storage, but then also the software layer, and since we build these as a system and we test these as a system, and then we can use that to basically feed into things like intersite to make sure that that best practice is configuration states that we've written into these books our customers can easily adopt. We then supply support for customers across that whole stack from the hardware up into, say, the open shift type software through Cisco, yeah, so they can have one place to call. Now take that same thing we've been doing for enterprise apps and apply it to an AI use case whether that's inferencing or, like I was saying, you know training here pretty soon and that's what the difference is between a validated design for enterprise use case.

Speaker 1:

a lot of customers might think, well, I'm going for that one then, but that may not be the correct one for them, maybe, because what's the difference from a perspective of a customer between this one and going for a more generic AI pod or whatever you're doing? Well?

Speaker 2:

the difference is between what we call a flash stack and a flex pod or an AI pod. It's the same process, it's the same validated design, it's just that the use case is not for virtualization or open shift or that's it, is it is, in fact, it's slightly different name for a validated design based, you know, enterprise application. Like it is a flex pod, it just happens to be for AI, so we call it an AI pod, because that's what somebody thought was a great idea. We called it AppPod and that's what we did.

Speaker 2:

There's not a massive difference in terms of what we're doing or testing. It's literally the use case that we're delivering against, and so we wanted for customers to understand like, hey, there's a whole. I don't know how many validated designs we have right now, but it's a lot we've built over the years and that we can support, and so we wanted to differentiate these new AI use cases so people weren't trying to figure out like, where do you start?

Speaker 1:

plus. Well, I had, I think I had a chat with somebody from that NetApp yesterday on Flexport and he he mentioned that there had been over 200 validated designs since the I think 15 years old now.

Speaker 2:

That's right, yeah, 15, 15 years old, and there's been a lot of them that have been very useful for customers, and so there's a differentiator in the name to go to it's a.

Speaker 1:

That's what I thought already, so thanks for confirming that. That also helps, I think, listeners who think, well, what's all this port nonsense about?

Speaker 2:

Well, plus, since we include software, for example, it becomes a little confusing because, well, what am I using in an AI environment that I'm using in a traditional environment, or things like containers? So if you look it up and you see Red Hat and OpenShift, is that used for this or this? Like you know, it's important for customers to go okay, this is an AI infrastructure.

Speaker 1:

And I think you already mentioned some of it, but I'm also curious to see what all the other advances in Cisco and networking that we saw this week and but also security that we had when we talked about it briefly already what do they mean for your side of the business?

Speaker 2:

First of all, we are doing a great job after going through a lot of the reorganization that we've been through over the last several months and bringing everything closer and closer together. So I don't really think about things in terms of just compute. I think about them across, solving problems in the data center. And when you look at things that we've released this week, like the smart switch, these are great opportunities for us to be able to not only make things easier to deliver security end-to-end for a customer and integrate networking security together, but we can plug in a whole lot of service into the switches, because switches do a lot more when you have servers to help run applications plugged into them well you don't have to have a DPU or on the server itself, because you can have that's absolutely right right that's the part of the interesting so if you're at small scale we didn't talk a whole lot about this week but of course we will support the DPU and be able to offload things like HyperShield on an individual server basis.

Speaker 2:

So if that's your atomic unit in a certain use case, you can do that. But then to your point. We wanted to be able to allow customers to leverage that software and aggregate it and save money. Because if you're doing it at scale, you don't want to put a DPU in every server to run those firewall services.

Speaker 2:

You're over-provisioning, you're not using all the resource on the DPU, so you may as well centralize it and it also, I think, Putting these services is just the first step into what we can deliver on top of that DPU and the network. It's a really exciting time. I mean, it's a game-changing differentiation product and you'll see us be building out validated designs that support using compute, security, networking, all those pieces together, so people are getting a block of infrastructure that they can easily deploy for those use cases we've been talking about.

Speaker 1:

And the last point I want to talk about is the different platforms for AI. Obviously, nvidia is quite big as an understatement, so I get why you use those reference designs, but are you also looking and I think you mentioned already AMD designs, so are you looking at other platforms than NVIDIA? Because during the AI pod launch, I got the impression that you were also preparing that already for being suitable for different platforms other than NVIDIA. Is that correct?

Speaker 2:

Yeah, absolutely so. First of all, I would just say NVIDIA is a fantastic partner of ours. We've done a lot of great engineering work with them.

Speaker 2:

And they will like hearing that, but it's true in terms of differentiation around things like HyperFabric for AI. So I know that was another big one that we talked about. Hyperfabric is a product that makes it simple to deploy networks. Think about like Meraki from the data center, right, just about anybody can click through and deploy this, and we've really changed the buying process, if you will, of deploying a network with that. So we're having some great customer excitement around HyperFabric, but we're taking HyperFabric and extending it to AI, and to do that, we had to work with NVIDIA to change things, to make that experience what we wanted it to be, to deploy these pods simply in the future.

Speaker 2:

So that's been huge, what I would say across all the other folks you know. So we have our eight-way GPU server that supports AMD and we're working with them and we're customers today. So we have great relationships across the board. And then, if you look at Cisco, we have a history of this Ethernet thing, and Ethernet brings everything together. Right, you need to be able to plug everything into it. I think our approach has to be that as well here, because this market's changing so fast and because there's other people that are going to provide silicon, so we're going to provide silicon, so we're not going to restrict our customers' choice.

Speaker 1:

Much more coming, I would imagine. Right, you can't talk about it yet, but maybe next time.

Speaker 2:

Well, we're looking forward to being out at GTC with NVIDIA here in a few weeks and I think there'll be a lot of things to talk about, and obviously then you have your own event in San Diego again in June, I think. Oh, yes, I've been working on some things for the last couple of years that are near and dear to my heart and will be different areas for Cisco to enter the compute market. So we'll be really, really happy to do this again in San Diego.

Speaker 1:

Looking forward to hearing about that when we get the chance to catch up there. Thanks for joining us. I thought it was quite interesting. I hope the listeners found it interesting as well.