In the last post we talked about the more traditional models of architecting a disaster recovery plan. In those we covered icky things like tape, dark sites and split datacenters. If you’d like to catch up you can read it here. All absolutely worthwhile ways to protect your data but all of those are slow and limit you and your organizations agility in the case of a disaster.
By now we have all heard about the cloud so much we’ve either gone completely cloud native, dabbled a little or just completely loathe the word. Another great use for “somebody else’s computer” is to power your disaster recovery plans. By leveraging cloud resources we can effectively get out of the managing hardware business in regards to DR and have borderline limitless resources if needed. Let’s look at a few ways this can happen.
DRaaS (Disaster Recovery as a Service)
For now this is my personal favorite, but my needs may be and probably are different from yours. In a DRaaS model you still take local backups as you normally have, but then those backups or replicas are then shipped off to Managed Service Providers (MSPs) aligned with your particular backup software vendor.
I can’t particularly speak to any of the others from experience but CloudConnect providers in the Veeam Backup and Replication ecosphere are simple to consume and use. Essentially once you buy the amount of space you need from a partner you then use the link and credentials you are provided and add them to your backup infrastructure. Once done you create a backup copy job with that repository as the target and let it run. If you are bandwidth restrained many will even let you seed the job with an external hard drive that you ship them full of backups then all you have to transfer over the wire is your daily changes. Meanwhile all of these backups are encrypted with a key that only you and your organization knows so the data is nice and safe sitting elsewhere.
This is really great in that it is infinitely scalable (you only pay for what you use) and you don’t have to own any of the hardware or software licenses to support it. In the case that you have an event you have options; you can either scramble and try to put something together on your own or most times you can leverage the compute capabilities of the provider to power your organization until such time you can get your on-site resources available again. As these providers will have their own IT resources available you and your team will be freed up to do the work of getting staff and customers back online as they handle getting you restored and back online.
In my mind the drawbacks to this model are minimal. In the case of a disaster you are definitely going to be paying more than you would if you are running restored systems on your own hardware, but you would have had to buy that hardware and maintain it as well which is expensive. You will also be in a situation where workers and datacenter systems are not in the same geographical area as well which may cause for increased bandwidth cost as you get back up and running but still nothing compared to maintaining this consistently. Probably the only real drawback here is almost all of these types of providers require long-term agreements, 1 year or more for the backup or replication portion of what is needed. You also need to be sure if you choose this route that the provider has enough compute resources available to absorb you if needed. This can be mitigated by working with your provider to do regular backup testing at the far end. This will cost you a bit more but it is truly worth it to me.
Backup to Public Cloud
Finally we come to what all the backup vendors seems to be going towards these days, public cloud backups. In this situation your backups are either on premises first (highly recommended) and then shipped off to the public cloud provider of your choice. AWS, Azure or GCP start messing with their storage pricing models and suddenly become cheaper? Simply just add the new provider and shift the job to the new provider, easy peasy. As with all things cloud you are in theory also infinitely scalable so you don’t have to worry about on boarding new workloads except for cost, and who cares about cost anyway?
The upside here is the ability to be agile. Start to finish you can probably be setup to consume this model within minutes and then your only limit to how fast you can be covered is how much bandwidth you make available to shipping backups. If you are doing this to cover for an external event like failure of your passive site you can simply tear it back down afterwards just as fast as you made it. Also you are only ever paying for your actual consumption, so you know how much your cost is going to be for any additional workload to be protected, you don’t ever pay for “spare space.”
As far as the drawbacks I feel like we are still in the early days of this so there are a few. While you don’t have to maintain your far end equipment for either backup storage or compute I’m not convinced that this isn’t the most expensive option for traditional virtualized workloads.
Hybrid Archive Approach
One of the biggest challenges to maintaining an on-prem, off-prem backup system is we all run out of space sometimes. The public cloud provides us an ability to only consume what we need, not paying for any fluff, as well as letting others manage the performance and availability of that storage. One trend I’m seeing more and more is the ability to supplement your on premise backup storage with public cloud resources to allow for scale out of archives for as long as necessary. There is a tradeoff between locality and performance, but if your most recent backups are on premises or well-connected to your production environment you may not ever need to access those backups that archived off to object storage so you don’t really care how fast it is to restore; you’ve just checked your policy checkbox and have that “oh no” backup out there.
Once upon a time my employer had a situation where we needed to retain every backup for about 5 years. Each year we had to buy more and more media to save these backups we would never ever restore to because they were so old, but we had them and were in compliance. If things like Veeam’s Archive Tier or similar with other vendors existed I could have said “I want to retain X backups on-prem but after that shift them to a S3 IA bucket.” In the long-term this would have saved quite a bit of money and administrative overhead and when the requirement went away all I had to do is delete the bucket and reset back to normal policy.
While this is an excellent use of cloud technology I don’t consider it a replacement for things like DRaaS or Active/* models. The hoops you need to go through to restore these backups to a functional VM are still complex and require resources. Rather I see this as an extension of your on-prem backups to allow for short-term scale issues.
If you’ve followed along for both posts I’ve covered about 5.5 different methods of backing up, replicating and protecting your datacenter. Which one is right for you? It might be one of these, none of these or a mash-up of two or more to be honest. The main thing is know your business’ needs, it’s regulatory requirements and