Techzine Talks on Tour

What is the role of storage in the modern tech landscape?

Coen or Sander Season 1 Episode 22

For this episode of Techzine Talks on Tour we sat down with Patrick Smith, Field CTO EMEA at Pure Storage. Storage continues to evolve in a world where AI is the talk of the town. What exactly is the role of storage in the modern tech landscape? And how does Pure want to contribute to this? Find out now and listen to this brand new episode.

Storage may not be as important for AI workloads as compute or networking is, calling it a commodity doesn't do it justice either. Patrick sheds some light on this misconception that people may still have. He draws on insights from his discussions with customers, and explains why storage plays a crucial role for enterprises. Not only when it comes to AI, but also for building cyber resilience against threats like ransomware. 

The biggest part of the discussion is dedicated to the role storage plays in AI. From this perspective, Patrick dives into how solutions like Evergreen One for AI offer scalable and flexible storage-as-a-service options that streamline operations and break down infrastructure silos. We further explore how organizations can strategically manage hot, cold, and archived data to ensure optimal performance in AI projects. Additionally, we delve into the importance of vector databases and advanced data reduction algorithms, discussing the implications of data gravity challenges when balancing cloud and on-prem solutions. 

Speaker 1:

Welcome to this new episode of Tech Scene Talks on Tour. I'm Sander. I'm in Amsterdam actually for Pure Accelerate. Patrick Smith is here. He's the field CTO for EMEA. Absolutely Great to be here. Welcome to the show. In general, what do you see happening at the moment in the market when you visit the customers?

Speaker 2:

It's one of the most interesting parts of my job is meeting with customers across the whole region. There are some big trends that we're seeing and many of them won't be a surprise to people. It's impossible to talk to a customer without talking about AI. Yeah, we're going to talk about that some more later.

Speaker 2:

I think it's a big area of focus. Number two on the list, especially within EU countries, is cyber resilience, operational resilience. So that's a huge topic for organizations keeping people awake at night and how can they rest easy and confident in the controls and measures they've put in place. A third big area for us is the whole world of cloud, and I see a bit of a transitionary point really around the pandemic sort of pre-pandemic, post pre pandemic, post pandemic in terms of cloud and and so what do you mean?

Speaker 1:

do you mean more real, more realism around cloud, maybe, or?

Speaker 2:

yes, it's absolutely right, really an understanding and our rationalization of strategies. So pre pandemic very so pre-pandemic very much. Cloud first, we're going to move everything to the cloud and now post-pandemic, really a more rational view of and more balanced view around. I'm going to put the right workloads in the right place and so stepping back from that and that includes public cloud providers, sas providers, potentially colo environments if I'm really distilling my own data center and then my own owned data center and Really on an application by application basis, understanding the requirements, you know, the controls, the reliability requirements, performance, selecting the right.

Speaker 1:

Do you think storage, I mean, may not necessarily be sexy again, but is storage more? Is it higher on the list of priorities again than it used to be? Do you see a sort of a trend in that, Because it may not be the sexiest topic usually to talk about, but it is very important now again, right? So how do you see that shift going on?

Speaker 2:

And I think it's an interesting observation because, whilst storage may not be the hottest of topics, data is the hot topic. It underpins everything we've talked about in terms of you can't deliver an AI project without having a good, clean, accurate set of data to power it. Cyber resilience storage, or data, is the soft underbelly that cyber criminals are trying to attack. If you look at the cloud, the cloud is often driven by the data and the requirements and controls around that data, and so people are focused on data, which means they then need to take the platform that hosts that data really seriously, and so that does raise the profile of storage within an organization.

Speaker 1:

it's all driven by the data do you still see misconceptions around that and at companies? Maybe that they still in the old, maybe the old way of thinking like oh, storage is storage, it's a commodity, who cares? Do you still see that a lot?

Speaker 2:

Yes, there is that perception. You used the word commodity. There is that perception that storage is a commodity product, that there are no distinguishing features between storage vendors and their products. And I think it's only when organizations take a deep look, when they're prepared to break away from the just hit repeat and keep doing what they've always done, that they can see that there are different ways hosting their data, managing their data, operating the infrastructure that looks after their data.

Speaker 1:

Is that something that every organization should do now? So maybe for people listening think do I have this challenge or this opportunity or whatever you want to call it to now to actually reconsider and think about my storage platforms and whatever?

Speaker 2:

again now, I think for many organizations and many people who are comfortable with their existing storage provider, it's a case of understanding how the storage products have evolved to deliver new capabilities New capabilities that people maybe thought were expensive, maybe thought they didn't need and see how that they can deliver value and also challenge some of the operational aspects that people just assume come with running a storage environment. So before I joined Pure, I was involved in running infrastructure platforms in financial services and one of the biggest challenges is lifecycle management and the challenge of built-in obsolescence in technology, and keeping technology current is a real headache. It can take a lot of your time up. So the idea that you can have a storage platform that can be modernized without taking down time, without an entire forklift, upgrade and data migration, for me was a complete game changer. Now, if you're somebody who's just kept hitting the easy button and keep doing what you've always done, you aren't aware of how transformational that kind of capability can be.

Speaker 1:

So the big tip would be don't hit the easy button just because you've done that already five times. And then, just more concretely, how should you start? So maybe not blindly going for the re-up, basically, but how should you start on this sort of journey towards a different view of enterprise storage?

Speaker 2:

Yes, I think you know clearly I have a biased view.

Speaker 1:

I would imagine yes.

Speaker 2:

But I think it's understanding what your requirements are, what matters to you. So we've talked about lifecycle management. It's interesting because anybody who's not using Pure has to go through that. It's a simple fact of life. But then, if you look at one of those other big ticket items we were talking about cyber resilience how do you need to deliver cyber resilience in your environment? Are the storage infrastructure teams just assuming it's?

Speaker 1:

somebody else's problem, I can imagine, especially when it comes to security and cyber resilience. Your storage provider isn't the first provider you think of, probably, so that's a big mentality change, as well, it really is.

Speaker 2:

It's somebody else's problem. Whereas actually, if you look at the modern day threats that organizations are facing ransomware number one threat how do you recover from a ransomware attack if the fastest way to recover from a ransomware attack is from a snapshot? Yeah, is from a snapshot, oh yeah, Because you don't have to move any data around across networks and so simple, easy recovery, as long as you're quick enough in detecting the attack, obviously.

Speaker 1:

Otherwise it may also be in your snapshot.

Speaker 2:

Yes, and it's interesting how many snapshots do you take? Because if you're in a recovery process and you have to go too far back, the data may not be well of any use anyway, still lost a lot of money, right?

Speaker 1:

yeah, yeah so fast.

Speaker 2:

Recovery is really important and, and very often as part of that recovery process, people need to go back only at the point the keys were pulled, when the encryption actually takes place. Because so long as you can go back to that point, then you can do your forensics and identify where the compromise was and fix that, so you can avoid having to go too far back in history. So I think many organizations didn't bother implementing snapshots because there was a mindset of if my recovery point is on the same physical device as my primary, then if I lose the same physical device, I've lost my recovery point as well. Yeah, but there's probably an argument today to say you're more likely to be hit by a ransomware attack than you are a problem with your physical device because storage is now becoming so reliable.

Speaker 1:

But that's the old versus the new way of thinking about what a disaster is right. Yes, we used to think about a disaster as a flood or an earthquake or a big fire, and then you were right, because then having it on the same device doesn't make a lot of sense Completely. But now the number of attacks compared to the old school disasters is completely skewed right.

Speaker 1:

I think I understand what you're saying, but this also triggers, at least in my mind, the thing that you need to know quite a bit about your storage environment to actually be able to do this, and that runs a little bit counter to a lot of the messaging that also companies like Pure have, like we abstract away from all the nasty business and we make sure you can do whatever you want within guardrails and all that stuff. So maybe the question is, how much do you still need to know as an organization about your platforms and in this case, the storage platform?

Speaker 2:

I think that's a really interesting question, because our belief is that we can deliver advanced capabilities without exposing complexity. That requires a huge amount of engineering effort behind the scenes to be able to do that, but it means that for day-to-day operations in fact it's so easy to throw out the term day-to-day operations we like to think that our platforms are such that you don't need to do day-to-day operations. You do operations almost by exception, because it's not a platform that requires that daily care and feeding. Capacity management you'll be alerted on capacity management. Performance management you'll be alerted on performance management. So it's almost management by exception, not management by habit. And so that changes things and so that changes things. Now, obviously, troubleshooting now, which is where generally you do have to get into the technical depths, and you need to know your environment in general as well, right?

Speaker 2:

Troubleshooting isn't just the storage environment, it's the broader infrastructure, middleware, application environment, also today stretching out into the cloud, and so I think it's a fascinating time for IT professionals, because yeah, but their roles are changing, I would imagine, right. Going back to being broad generalists, of having to have knowledge across all of those different areas, rather than I'm the storage administrator, I only look at storage.

Speaker 1:

Is there a role still for the storage administrator, or will managing storage or looking after storage should that be part of a bigger role?

Speaker 2:

I think there is still a role for the storage administrator, but what they do is changing whereby they are the SMEs for their environment. But what they are generally doing is looking at automation, optimization so that you may do a task manually once, but you never do it manually twice. You automate and that then accelerates that cycle of driving operational efficiency. Yeah, and that that automation can start small and in your core and expand out as you integrate that automation into self-service capabilities for your virtualization environment, for your databases, whatever applications you may be hosting.

Speaker 1:

And coming back to one of your earlier points, ai. Obviously we can't have a discussion without talking about this. Does that spoil the soup a little bit in any way in terms of what you described now, how organizations should look at storage in this day and age and how people should think about this and how storage admins should handle it? Does AI sort of throw a spanner in the works and look very nice, very nicely thought out, but this actually changes the entire scene again, so to say.

Speaker 2:

It's certainly one of the things, as we talked about at the beginning, is the dependency of AI on data. Is the dependency of AI on data? But then also, I think it's all too easy to label what is a very broad topic under AI because it covers so many different areas, whether you're talking about the few organizations that are doing massive training on foundational frontier models, whether you're fine-tuning a model for your own environment, whether you're taking an off-the-shelf model and using that off-the-shelf model with RAG or retrieval, augmented generation, so that you can have timely data, data that's specific to your own organization, more accurate in terms of Gen AI. And so, across that spectrum of those three main areas, you have different storage requirements in terms of capacity, in terms of throughput, in terms of reliability and availability. There's a lot more focus on enterprise-like features Now, as organizations put especially generative AI inference models into production.

Speaker 2:

They're no longer those scientific HPC-like workloads where oh, it's failed, I can cope with that to. This is my business. That's going down, and one of the big things we're seeing is, as organizations go down that path, many of them actually don't know where they that project might go in the next 3, 6, 9, 12 and beyond months.

Speaker 1:

That must create some limitations on the decisions you can make today, right, if you don't really know where you're going. Yes, so, especially when it comes to your storage needs, because I assume all the three kind of general areas of AI have an impact on how you should rig up your storage environment.

Speaker 2:

They absolutely do, and it's one of the reasons why. So one of the things that we've absolutely done and where we've seen considerable interest is, as you know, we have had a lot of success with a storage as a service model. We call it Evergreen One. It provides a cloud-like experience for customers, even running storage infrastructure in their own data centers even running storage infrastructure in their own data centers. And earlier in the year, we announced Evergreen One for AI, which takes the guesswork out of procuring and running storage in your own environment for your AI project, because the unit of consumption in an Evergreen One for AI model is throughput to your GPUs and the storage capacity on the back end is a secondary unit of consumption.

Speaker 1:

But in that scenario, I mean we all need to accept, or maybe hope, that your projections are accurate, right? Because, you don't really know what's happening in the next couple of months, and especially not even the next couple of years. So how can you have storage as a service in such an uncertain kind of ecosystem?

Speaker 2:

And it's one of the key capabilities to support a storage as a service model is a physical platform that can be scaled in terms of delivering performance and throughput or capacity non-disruptively, with instant impact, and so that takes the guesswork out and allows organizations to react effectively very quickly to that changing workload of. We've just decided to double the number of GPUs in our AI environment and therefore we need more throughput to the back end. How can you do that without the traditional lead times associated with working out what you need? Raising a purchase order, getting it signed off, ordering it, waiting for it to get delivered In a storage as a service model, it's there instantly without any of that overhead operational, commercial.

Speaker 1:

So correct me if I'm wrong, but for your offering it doesn't really matter how things evolve to a certain extent, right. So, irrespective of whether it's in, something big happens in two months, three months, four months. It doesn't really negate any of what you just said about storage as a service.

Speaker 2:

No, absolutely. It supports that scaling and it supports the ability to temporarily scale up and then scale back down, which can be really appealing to organizations who may want to do some fine training.

Speaker 1:

I think I've read somewhere up until 10 billion parameters. There's a lot of a model training being done inside enterprises as well and that 10 million parameters isn't isn't huge, but it's still a sizable number of parameters that you can actually do the training for yourself. But you probably won't have to do that for months on end. You just do it where you need it and then afterwards you can say look, I don't need this huge overhead anymore, so you can scale back down. That's what you're.

Speaker 2:

Yes, so that ability to use the resources in a flexible manner and really only pay for the high throughput requirements for that training is really appealing to organizations who don't want to have to acquire and deploy for the highest and pay for the highest levels of utilization the biggest issue I see with with.

Speaker 1:

There are two big issues. We will recover them one by one. The first one is is AI tends to create new silos? Right, and we've been, I think, as an industry, everybody's been talking about breaking down silos. I mean, I think I've been reading and writing about it for about 15 years already and it still hasn't happened. So my little bit of a skeptical or cynical journalist brain says well, why will it happen then this time? Right, but also it's inevitable. I mean AI. When you start up an AI project, you're going to do that in its own silo. In general, maybe not towards the future, but I think that's still the reality today, right?

Speaker 2:

What I think we're seeing is we talk a lot about silos. There's the concept of a data silo, but there's also silos of infrastructure and silos of management, and one of our big areas of focus is to provide a platform that is consistent across all of our products and that means that, in terms of that management overhead, you don't have silos of. Okay, I've deployed this one very specific technology for workload a that requires high throughput, high capacity. I've got another silo here for low latency, small IO relational database. I've got another silo here for my object storage. I've got another silo here for my user and group shares, and you end up with all those different point technology solutions that are, in themselves, silos require specific operational knowledge, training, staffing. You may end up with key person dependency, where you've got one person who knows about this solution but not that solution.

Speaker 1:

And he doesn't document anything, so when he leaves, you know nothing about his environment.

Speaker 2:

And that really is the root of the value of the platform is from a pure portfolio perspective. It's the same operational paradigm, the same purity operating system across all of our platforms?

Speaker 1:

Yeah, but obviously there is the reality of hot and cold data, or even archive data that you use, that you may use for different purposes at different times. So some cold data may actually be very important for your training purposes, and that shouldn't I mean if that's in a different silo that impacts your performance, I would imagine. So how do you see that? So the sort of trying to smartly switch between different types of data, if that makes any sense.

Speaker 2:

That is a key aspect of a new development that we've announced, which is previously. We've always seen people consider Flash as high performance, spinning hard drives as low performance, and we were, as you know, we were the first in the industry to introduce QLC Flash in an enterprise class storage system and that provided, interestingly, effectively, two performance profiles, both making use of flash, one with TLC flash for high performance, one with QLC flash for lower performance, bigger capacity, and we've always kept those two separate. Now, what we have announced to help customers who say I have hot data that then cools and becomes cold and I want to have a cost-effective way to manage that effectively data lifecycle. And so with the FlashBlade platform, our unstructured data platform, we've introduced a capability we call zero-move tiering, which does what it says on the tin it allows you to tier data based on the requirements from a performance perspective without actually moving the data, and that stays in the same piece of infrastructure. And that is completely different to everybody else in the market who, when they talk about tiering, are moving data from fast storage medium to a slower storage medium, and one of the things that that data, several things that data movement, does.

Speaker 2:

Firstly, it takes time to move data fast to slow, slow to fast. Secondly, it puts unnecessary stress on the storage system because, as well as serving out data to front-end consuming applications, you're also having to shuffle data around within the system or across. To your point earlier about silos between dissimilar systems if you're tearing from one high performance system to a completely different lower performance system maybe running different operating systems, different hardware infrastructures and it provides much more efficiency from a capacity management perspective. One of the things that we often see in true tiering systems where you move data around is you have to over provision in the hot tier and over provision in the cold tier, so you've over provisioned Twice Against your capacity requirements. Yes, and so zero move tiering leaves the data where it is and controls and actually what we control is the network and compute resource that's assigned to that data. So you get fast or slow access to the data not by moving the data from different tiers, but by actually controlling the access to it.

Speaker 1:

And just to be clear, this is not only about putting predefined hot and predefined cold data together in the same system. Predefined hot and predefined cold data together in the same system. It's also about changing it when it's necessary, from cold to hot, right. So?

Speaker 2:

it's a metadata kind of play. I would imagine it's absolutely based on metadata. Now, in the first incarnation, it's user controllable in terms of being able to tag data as hot or cold. As we roll forward, we'll look at how we change that performance profile based on characteristics.

Speaker 1:

I think that would be interesting, especially when you're looking at large data volumes. First version absolutely based on the data owner understanding where that data is, but then we get back to the previous point, because they need to understand how this works.

Speaker 2:

Right Put it in the hands of the data owner and make it deterministic.

Speaker 1:

I think it's quite an interesting development.

Speaker 1:

So then it's hot and cold, not necessarily archive, right, so archive is maybe somewhere else still, or would you call that cold? I would call that cold, okay, yes, so, and I think one of the final points or the other point I have about the issue I have with AI is about vectorization. I've been reading up on this and I've been talking to many people about it, about sort of the bloat that you get from vectorizing data in your environment, which can run up to 10x, which is quite substantial, I would say that does something for the predictability of your storage to a certain extent, because you don't really know, because the size of the bloat is dependent on how you vectorize, so you can get either 2x or 4x or 6x. So it's very difficult to predict that, I think, or to predict what this does for your environment. And also it would be nice if storage vendors or data management vendors, or whatever you want to call them nowadays, could help reduce that bloat a little bit. So, from a pure storage perspective, how do you see that this entire?

Speaker 2:

Firstly, we're seeing the world of vector databases becoming very important as organizations to our discussion area on AI, looking at retrieval, augmented generation and the part that vector databases play in that and it's very interesting when we look at that bloat and in our testing and certification work we've seen that bloat and 10x is a number that's very familiar to us, a number that's very familiar to us. One of the things that it means is that organizations need to have a platform that's flexible enough to be able to react to that bloat, which you can imagine going into a project thinking well, I'll assume I'm going to get 3x growth in my data set.

Speaker 2:

And then suddenly it's 10. It's 10, and it's like well, where do I go from here? Now, an as-a-service model is perfect for that. You pay for the data you're consuming, you don't care about the infrastructure. But even if you don't like an as-a as a service model, you want to own your own infrastructure.

Speaker 1:

But even as a service model, you will. The cost will mount up, I would imagine, right, because you're storing more data. So, even if it's as a service, you can handle it because you have the as a service approach, but you're still paying, maybe more than you would like to pay. Yes, right.

Speaker 2:

And then it becomes interesting in okay, is that a temporary phase, is that permanent?

Speaker 2:

And so reacting to that change is an interesting one. If you're deploying your own systems, what is interesting in how data reduction techniques may help shrink the size of that. But the other thing that I think is interesting about the increased dependency on vector databases is where they sit in the data lifecycle for those architectures, because as well as there being a focus on capacity, there's also a focus on performance, regardless of capacity, because your end consumer is directly exposed to every link in that chain, including the fork out to the vector database and the data set that then comes in from it to feed into the large language model. And so, where organizations may have sized their databases for a certain workload, now they're putting an additional workload into that database and they need to be able to support that without slowing everything down. And so it again is is really a reinforcement about not just how important databases are, but how important the storage layer is and how important it is to be able to react to changing demand and circumstance in terms of capacity and performance.

Speaker 1:

What can Pure do in terms of making sense of this, of the bloat and of the extra, maybe 10x? Can you, from your centralized approach, from your control plane, Can you add some handy, nice, nifty features that make you well, maybe reduce that bloat a little bit? So?

Speaker 2:

one of the core tenants of our platforms is driving efficiency as much as possible, making use of advanced data reduction algorithms to shrink data, not just on the array itself, but also these. The systems making use of vector databases are becoming mission critical systems, and so it's not just the array in one data center that matters. You also want a disaster recovery capability for that as well, and shipping large amounts of data between two data centers comes with you know cost on network links, time and recovery point and recovery time objectives.

Speaker 2:

So there's a whole load of benefit to be gained from driving that efficiency at the storage layer. That then delivers value all the way up the stack.

Speaker 1:

So yeah, that's. I think it's an interesting, I think it's gonna be interesting to watch this, this, this space, to see what's what will happen, because there are some obvious disadvantages of this right I mean you're just need way too much storage. Basically, and especially if you don't really predicted that, or if you hadn't really predicted that, or if you don't really expect it, then it can be a bit of a cold shower, as we would say in Dutch.

Speaker 2:

Yes, and there are even wider implications. So organizations that have gone down a cloud path for a lot of their technology, but maybe they have a critical data set that they keep in their own data centers. Suddenly they're looking at a holistic system that spans cloud and on-prem.

Speaker 1:

Yeah.

Speaker 2:

And there's a data gravity challenge there. That means actually maybe they'll relocate one piece or another piece of that infrastructure, so they're co-located.

Speaker 1:

Especially with RAG, data gravity has become a little bit of a.

Speaker 2:

If you're constrained with some confidential data sets or data sets that your organization won't let you put in the cloud, then there's only one way you can go, and that's to bring some of your infrastructure back from the cloud into your on-prem data center to deliver that comprehensive solution.

Speaker 1:

All right, I think we're always out of time. I mean, it's been very interesting to we didn't even get to the sustainability part, but we're already 37 minutes in so time has flown. We wouldn't want to bore our listeners anymore Right. So I think it was a very interesting and insightful conversation. I hope the listeners feel the same way.

Speaker 1:

I hope so too, Otherwise we're going to do it again in a year's time or whatever. You never know, okay, so thank you for joining. Thank you for inviting me along. It's great to talk to you, looking forward to the next one, absolutely.