In this episode we talk about cloud file systems, the challenges to build a file service solution for the cloud, most common use cases and Elastifile’s product design and implementation.
We also talk take this opportunity to talk about a new automated tiering feature introduced by Elastifile with version 3 of its product.
Enjoy the listen!
For this episode I also added a complete transcript to this blog.
Enrico: Ciao everybody and welcome to, Data Driven Talk, the podcast about data and data storage. I’m your host Enrico Signoretti and today we will talk about file services on the cloud. Sharefile system to be more specific. My guest for this episode is Jerome McFarland, VP of marketing at Elastifile. Hi Jerome, thank you for joining us today.
Jerome: Hi Enrico, thanks a lot for having me.
Enrico: So, we are recording this episode on the 15th of October and I know that you have a nice news for tomorrow, I think. Right?
Jerome: Yeah, that’s correct.
Enrico: Maybe we will talk about it later at the end of the show. Also, because this one will go online later this week. Let’s start with the topic of the day. I mentioned the cloud file service meaning the file system, NFS shared services and so on. It looks likes this kind of services are a little bit of challenge for cloud providers. If I look at the history of service providers like Amazon, for example. They start with 3 and then they added femoral storage with DC2 block storage later. It’s a EFS file system, the NSF shared volumes are now available but actually they come very late in the game also they don’t look as mature as the other services, so that leaves the question, is really file services in the cloud important today?
Jerome: Definitely, we see file services as being critically important to help enable enterprises to leverage all the benefits of the cloud. I think what happened historically is as the cloud begins to emerge and the benefits and values that the cloud could offer became clear, people were initially thinking that object storage was going to be the default standard for the cloud but as it turns out, most of the applications that the enterprises leverage are written for file systems and so what you’ve seen is a growing focus and importance on delivering the proper type of file storage in the cloud to help enable those enterprise applications that really require that type of a data interface.
Enrico: So you’re meaning somehow that with more organizations delivering application to the cloud, there is a need of traditional ways to access data because this is the way we are doing this in on-premises installations, right? So they are migrating workloads somehow and they want to do the same things that they had in data [inaudible 00:02:46] centers.
Jerome: Exactly, exactly, all applications and workloads cannot and will not be re-written for the cloud. The cloud needs to be able to support those applications in an application native way and really file storage with protocols like NFS are already the standard for most of the applications that enterprises are using and so as they move to the cloud, they need those exact same interfaces whether they are doing a burst of a workload to the cloud for temporary use of additional resources in the cloud or if they are doing a wholesale migration of a workload in the cloud and shutting down a data center in some cases.
Enrico: What kind of workloads do you see the most with, at least, with peer solution?
Jerome: We see a wide range of workloads, you know, I mention the two different types of use cases, burst and also wholesale lifting shifts. On the burst side of things, we see applications in the media space, we see people bursting workloads to get additional complete resources for jobs like rendering and trans coding. We also see in the manufacturing area with semi-conductor design being an example, people bursting workloads to the cloud again to get additional resources to get their workloads completed more quickly than they could on-premises with the available resources.
Jerome: On the wholesale lifting shift side, we see things like SAP workloads moving from on-premises data centers into the cloud and the different types of use cases, both of them require, all of them require file services, but they have different requirements for performance versus costs which relates to some of the new news we will be talking about later tody from Elastifile.
Enrico: You mention that a lot of workloads but they are very, very active workloads. So are you, do you mean that nobody use file services for secondary workloads from the cloud?
Jerome: We definitely do see people leveraging file storage for workloads like backup and archive et cera. From Elastifile’s perspective, that’s not our company focus, we’re more focused on enabling enterprises to run workloads that are focused on application processing in the cloud but definitely our technology can support workloads and applications where they are using the cloud as a backup location but that’s just not just our company’s focus. But for organizations and enterprises who want to take that first step into the cloud by leveraging it as a backup location and then maybe later down the road they decide to process the data in the cloud with cloud resources, we can definitely support that as well.
Enrico: I understand. Why is it so difficult to build a reliable, scalable and performental system on the cloud? I can’t imagine all the scale-out challenges that everybody has but it not as common, a cloud file system, a scalable cloud file system is not as common in the rest of the industry right now.
Jerome: Yeah, I think it’s a challenge for several different reasons. One of them as you eluded to is scalability. When you think about the cloud and in a lot of use cases, the motivation for going to the cloud is to get scalable elastic resources and so at the data layer where we operate with the file system, we have to deliver a solution that is similarly scalable and similarly elastic and that’s not easy to do. It’s also been challenging for many vendors to move away from their legacy hardware-centric architecture that were designed for on-premises data centers and align those solutions with the needs and requirements of the cloud. Elastifile’s solution was designed from scratch for the cloud, so everything in our solution is fully distributed the way we manage metadata, all the details under the covers of how we are managing data, moving data, its all built for cloud environment. Which is something that really we feel you need to do from the beginning with your file system design to make it work accordingly and appropriately in the cloud.
Enrico: So the main difference is that you designed the file system that is good for the cloud with the kind of resources you have from a cloud provider, right? While traditional system, are designed with X86 service in mind, physical service, plenty of discs sometimes and so on, and so a different kind of resources that you have available.
Jerome: Yeah, I mean I think the way that I would look at it is we design for scalable, flexible environment that the cloud offers but in addition, being able to combine that scalability and flexibility with an enterprise experience so still giving you all the capabilities like data reduction, deduplication and compression, snapshots, high availability features, all the things that enterprise applications and workflows rely on but merged and combined with the scalability and elasticity of the cloud and its blending those two things that becomes very challenging with pre-existing solutions that are trying to be modified for the cloud.
Jerome: You have to really build for those two things from the beginning.
Enrico: Lets talk about Elastifile then, the company, the mission and suggesting introduction of your company.
Jerome: No problem, so Elastifile was founded in 2013 really with the mission and the vision to help enable enterprise adoption of the cloud. Our founders, they saw a need in the cloud for the scalable enterprise grade file services for all the reasons that we’ve been discussing. They recognized the fact that enterprise applications rely on file services today and then as those applications and those workloads increasingly adopt the cloud and seek to leverage cloud resources, that a similar types of data layer will be required, with all the data mobility, features that can help enable not only the use of a file system in the cloud but also the management of the data in the cloud and also between cloud and between on-premises cloud.
Jerome: So that’s why Elastifile was founded and so between that time and now, the company has been developing this data management platform based on our scalable cloud file system to help enable that enterprise adoption of cloud.
Enrico: Okay, how does Elatifile work in practice?
Jerome: So the elastifile software, basically it takes cloud storage resources that are available in public clouds so we support Google today, we support AWS and we also support Ajer. We take those cloud resources and we aggregate them seamlessly to present a single namespace file system that we present to the applications. So you could think of EBS volumes and AWS for example, we’ll pool those resources and create a file system and then present an NFS malpoint to the applications.
Jerome: So the applications, they interface with our file system the exact same way that they would with an on-premises file system and the customers get the flexibility with Elastifile to choose not only the capacity that they want with their file system but they can also choose the type of underlying storage resources. So, for example, they can choose in Google cloud, now taking a different example, they can choose to select the locally attached SST’s or they could, for example choose the hard drive based storage devices. So under the covers there are different storage devices and Elastifile is aggregating them together to create this single namespace file system.
Enrico: So it’s a scale-out solution but how is it to scale the solution or sometimes just shrink because you need less resources because you don’t need data anymore or whatever?
Jerome: Yes, Elastifile allows customers to scale-out or scale-in their file system as necessary based on the requirements of their use case and their workload, so through our management UI, you can add additional nodes to add capacity to an elastifile cluster, on-demand and then if additional capacity or if the capacity that you have is no longer needed then you can also remove nodes and so we’ll resize the cluster accordingly based on what the user desires at a given point in time
Enrico: Any other features like remote replication? If I want to operate in a multi-cloud environment, for example.
Jerome: Yes, definitely we see asynchronous replication as an important feature and it is something that Elastifile offers. For us actually, asynchronous replication is predominantly used within the cloud within a single cloud environment because what we see is that workloads like SAP migrations as an example, in those types of use cases, organizations want to replicate clusters across availability zones within a cloud to ensure high availability and so in those use cases, they’ll leverage asynchronous replication to make a copy of their test or their pre-production or their production environment so that if an availability zone has a failure, they won’t lose access to their data.
Enrico: I see but maybe I want to be sure that if Amazon goes down, I have all my data somewhere else.
Jerome: Yes, in addition to our asynchronous replication feature, we also have a capability that we call cloud connect. What cloud connect does is it can take data from any on-premises file system, doesn’t have to be a Elastifile, could be Netapp, could be Isilon. We’ll take that data and we’ll encode it into object format but we will retain all of file system structures. All of the file directory hierarchy, all the attributes, all the links et cera, everything that make it a files system, we will encode it into object. So then we will place that into object storage on the cloud of the customer’s choice so could be S3 if it was Amazon, could be Google cloud storage if its the Google environment. So leveraging cloud connect, customers have a way to put data from their on-premises file system into the cloud in a very cost-effective manner because we compress the data, we dedupe the data before we send it into object storage in the cloud and then it is stored, of course, on [inaudible 00:12:59] the object.
Jerome: So in this way, many customers implement multi-cloud architectures where they’ll use the on-premises environment as hub and multiple clouds as the different spokes so they can sync data to S3 and they can also be syncing data to GCS as an example. So these types of setups can be used for simple backup or disaster recovery but what we see is that customers’ leverage this as a way to stage data into the cloud for active processing and in this way they can choose, “I’d like to process on Google today” and maybe next week, “I’d like to process on Amazon” and there could be a variety of reason from cost to services that drive that decision.
Enrico: I see, and what about the licensing? How do you charge for your software?
Jerome: Our software has a license and there are two different models that the customers can use, they can either purchase a license from Elastifile and manage the infrastructure themselves within their own cloud projects or in environments like Google cloud, we are available through the Google cloud marketplace and in Google marketplace the customer can go, they can select Elastifile, they’ll choose the amount of storage that they want and the type of storage they’d like to have and for example, hard disc drive-based or SST based and then we’ll create the cluster, give them the file system, they don’t have to do anything else and they’ll just get one-bill from Google through the marketplace experience so there are two different models that customers have to choose from.
Enrico: Okay, so really nice, we mentioned at the beginning that you have a news for this week, right?
Jerome: Correct. We are releasing version 3.0 of the Elastifile solution. We are very excited about it and the key capability being introduced in 3.0 is something that we call clear tier and what clear tier is all about is giving customers the flexibility to manage the cost and performance of their cloud based cloud storage solutions in a way that makes sense for their workloads. Clear tier is basically blending the best aspects of not only file storage but also, object storage so we are now leveraging object storage to extend the namespace of the file system.
Jerome: So with clear tier, applications can see all of the files available in a file system, even if some of the file data is located on object storage and we tier the data between them seamlessly.
Enrico: Very nice, so on a time basis or policy, out of kind of policies that you can decide, right?
Jerome: Exactly, a user would specify their targeted ratio between file storage and object storage and for example, they may say, “I would like 30% of my data to be in the primary storage file system and 70% of my data to be in object storage”. Regardless of what ratio they select, all the files, all the directories, all the meta data related to the file system is available in the primary storage file system in the Elastifile cloud file system. But some of the data will be located on object storage based on the policy that they specify, when the application reads the rights to files, regardless of whether the data is located already in the primary search file system or in object storage, the application doesn’t know the difference. Our software with clear tier will route the data through the application through the file system.
Jerome: So the applications, the workflows don’t change at all, they have no idea that object storage is in the background and that tiering it taking place between the file system and the object store but now the user gets the benefits of these very disruptive cost models where now cheapened the object storage is being leveraged to store some portion of the data so all this is about giving the cost versus performance flexibility that aligns with the needs different workloads.
Enrico: I think it looks really really interesting but how we can get more from Elastifile? Where can we find more info about Elastifile?
Jerome: Well the best place to find info about Elastifile and to stay up to date is to follow us on LinkedIn or on Twitter. You can also visit us on our website, that’s www.elastifile.com and Elastifile is spelt, “E-L-A-S-T-I-F-I-L-E”, and of course, any one who wants to try the Elastifile file system can easily go to Google cloud marketplace, they can select Elastifile, spin up a cluster in just a few minutes and you can play with it, read and write some data and scale it as necessary so we encourage everyone to give it a spin.
Enrico: Very good, thank you Jerome for your time today and thank you everybody for listening to Data Driven Talk. If you like this episode you can find more on anchor.fm/datadriventalk or by searching data driven talk on iTunes or your favorite podcast catch, and please consider giving five stars to data driven talk on iTunes or leave a positive comment. You can also find me on Twitter “esignoretti” or read my blog, joker.it for comments and updates on the data storage industry.
Enrico: Ciao ciao.