I’ve just completed watching the Veeam v12 Data Platform Launch Event keynote and one theme I feel is getting a lot of chatter is the new “Direct to object storage” capabilities of the release. What this means is that as I create backups of my production data the first copy that is made is written to object storage. I’ve written quite a bit about v12 and object storage before, both in how to setup and consume it in the performance tier of Scale Out Backup Repositories as well as comparing copy mode object storage and Veeam Cloud Connect Backup, but this is not that. Direct to object storage is the idea that you write your first backup copy of production data to object storage, rather than as a secondary copy.
Anton Gostev himself highlighted this as a favorite feature and outlined that there are limited uses cases where it should be used. As a long time Veeam administrator and of late an architect of their largest Service Provider partner I have some thoughts into what those use cases should and should not be that I’d like to share.
The Do’s
✅ On Premises Backup Repository
The first and in my opinion most exciting use case for direct to object storage capabilities is as a replacement for your old block based on premises backup repositories. Object storage has numerous benefits over traditional storage servers using NTFS, ReFS or even XFS. These include true object-lock backed immutability, real scalability in that you simply continue to add nodes to the cluster as needed and API first, automation driven provisioning. All these things can combine to a scalable, secure, and durable on premises backup storage location.
✅ Distributed Workforce Agent Backups
As we find ourselves in the world of remote work in 2023 systems administrators have more concerns than in the past about protecting those laptops that our remote workers are using as critical company data further escapes the datacenter. By combining Veeam Backup and Replication v12, Veeam Service Provider Console v7 and public object storage services such as 11:11 Systems’ offering we can create a robust solution for managing these backups securely and at scale with self-service capabilities included.
✅ NAS Backups
As much as cloud-based storage services such as OneDrive, Google Drive and Dropbox have become prevalent in modern IT there are still a lot of traditional Network Attached Storage file servers around. The problem is these are usually measured in 100s of TBs and backups don’t scale well in any case. In this case I believe it’s appropriate to back these up via the NAS backup feature of VBR directly to immutable cloud-based object storage. Recoverability is in terms of files as opposed to needing to write Virtual Machines to a hypervisor so far more appropriate.
The Don’ts
Now that we’ve covered what are good use cases let’s talk about some things that should not be considered from a best practices point of view but are technically possible.
❌ Cloud-Based Object Storage for First Backup
There are going to be people who begin wanting to get rid of the on-premises backups all together and replace them with low-cost cloud object storage. While the immutability and ease of scale are tempting, don’t. Just don’t. Recovery from this for any backup platform would be painful at best, unsuccessful at worst. Further you lose so much of the power of Veeam Backup & Replication by doing this with capabilities such as Instant Recovery and SureBackup instantly going away.
❌ Remote Office/Branch Office (ROBO) Server Backups
One of the use cases specifically called out by Gostev for direct to object storage was the ROBO server. While I respect him very much, I do have to disagree with him here. As someone who’s had to restore this scenario in anger the pressure will be on in most if not all these situations to get that server back up and running as fast as possible, probably in your production datacenter with some creative networking setup, to get that remote location back up and in business as soon as possible. By and large direct to cloud-based backups for virtual machines or server grade agents should not in my opinion be considered as you quickly become limited in recoverability choices. If you cannot either back these up first to a small NAS unit at the ROBO or to storage available in your production location, then VCC-B may be interesting as a first write backup. While definitely not preferred at least things like Instant Recovery to Cloud Director can be an option to get these workloads available sooner.
Conclusion
While writing backups from Veeam Backup & Replication v12 directly to object storage is an exciting innovation to the platform it is a capability that should be given a great deal of consideration before putting into production grade use. Features such as immutability and scalability will drive adoption what you are using it for is far more important than the hype.
With the upcoming version 12 release of Veeam Backup & Replication the platform will start to support object storage as a primary location for backups. This is not a new idea, it feels like we’ve been talking about this forever, but it is a radical shift for the company and frankly one that for reasons many in the Veeam Certified Service Provider community have been dreading. While I believe object storage has its place, even within a VCSP, it should not in general practice be a replacement for Veeam Cloud Connect as VCC-B and VCC-R offer capabilities and enhancements that object storage services all by themselves will never be able to replicate.
What is Veeam Cloud Connect?
It’s appropriate to level set on the technologies at hand first. Veeam Cloud Connect was first announced at VeeamON 2014 and became available through partners shortly thereafter. There are two sides to today’s Cloud Connect Service: Veeam Cloud Connect Backup (VCC-B) and Veeam Cloud Connect Replication (VCC-R).
Veeam Cloud Connect Backup allows for a service provider to provision a slice of cloud storage and securely provision that a customer as a tenant. That storage can have further enhancements such as Immutability if on Linux repositories or with v12 object storage and Insider Protection, an out-of-band temporary storage location for deleted cloud backups as part of a ransomware mitigation strategy. Once the tenant is provisioned on the SP side a customer only needs to enter 3 pieces of information to add that storage as a repository or repositories to their backup infrastructure. Afterwards they can simply target that repo for backup copy or in some situations even direct backup jobs.
Veeam Cloud Connect Replication is the same general concept as VCC-B, but just with VMs being stored directly in IaaS. Your Service Provider would provision quotas in a VMware Cloud Director or vCenter environment in which powered off copies of the source VMs are copied. These replicas maintain a shorter maximum number of restore points which are stored as snapshots on the replica. In conjunction with either a VPN connection or the Veeam Network Extension Appliance you can extend your on-prem environment into your replicated IaaS one and run those systems remotely for several reasons.
Object Storage Review
Veeam has been slowly building support for object storage over a few releases now. In short order the progression has been:
9.5u4 (2019): Support for the ability to archive or dehydrate older restore points to object storage, known as move mode. This functionality is powered by the Scale Out Backup Repository (SOBR) construct in VBR, defining performance tier (on-prem block storage) and cloud or archive tier (object storage). With this support cloud tier can only support a single bucket per SOBR which can cause scaling issues.
10.0 (2020): Copy mode is added which allows that same SOBR to make an immediate copy of any restore point that hits the performance tier to be immediately copied to the cloud tier. This is commonly seen as the capability that best competes with the VCC-B functionality. Support for Immutability and mounting backups regarding cloud tier happens in this release as well.
11.0 (2021): Addition of support for Google Cloud Platform (GCP) Storage.
12.0 (2023): Host of improvements to object storage support.
Object in the performance tier with multiple buckets supported as extents.
Support for multiple buckets of the same type in the cloud tier.
Reconfiguration of the bucket folder structure to allow for optimized performance especially around API calls.
In all there has been a steady line of improvement and innovation towards object storage making it the first-class citizen it will be with v12’s release.
Verdict
All that begs the question, should I as a Veeam Backup & Replication administrator be migrating my offsite backups from Cloud Connect Backup to object storage? In my opinion, probably not. The compelling use case today regarding offsite backups is the same that’s been around since v10; if you are only sending a copy of backups offsite for the purpose of compliance, checking the box if you will, then object storage may be a good fit for you. It is exceptional in how it supports Object Lock (Immutability). Further it is more efficient than anything else to date in how it writes data which makes it well suited for on-premises storage if available, but for cloud copy usage the capabilities quickly fall off.
In the end my biggest reason why I would still choose VCC-B today is the restore scenarios. With Object Storage unless you are utilizing a VCSP like 11:11 Systems you aren’t going to be able to have the object storage close enough to a VMware native compute to facilitate timely restore. Your other options are the opposite ends of the spectrum; low cost providers who have no compute capabilities and may have extra charges that may apply to your organization depending on deletion.Data egress or API calls along with hyperscalers that do have compute adjacent but are a completely different architecture to vSphere are other concerns. These scenarios require conversion of backups to instances and rely on your IT staff having the skillset to securely create an environment for these workloads to run from.
When you consider the tiered approach to Disaster Recovery, pairing a broad spectrum copy of backups to VCC-B with Tier 0 or 1 targeted VCC-R you gain 2 things by sticking to VCC-B:
Support for seeding replicas from VCC Backups
A defined DRaaS environment pre-created for your organization that in the event of on-prem not being available that you can boot replicas quickly and less quickly begin restoring backups into.
While not necessarily as important as restorability one other major difference to be aware of is the ability to control how much of your on-prem backups you keep off site. Keep in mind that copy mode works on the principle that you want to keep everything you have on-site in an offsite location, so if you want to have 30 days on prem and 7,14,60, or 90 days offsite there really isn’t a path besides Grandfather-Father-Son (GFS) to get there with just copy mode. Further it’s going to be everything that targets the given repository, if you have VMs you don’t necessarily want to keep off prem (think of like a door control system that does you no good anywhere but in the physical location) your only option is to create separate repositories and jobs. With VCC-B and backup copy all these things can be done in a very granular manner.
Finally, you have the issue of expertise. While many of us might be conversant in object storage and how it works you may not necessarily know the ins and outs of bucket policy, ACLs, versioning, etc. It’s a technology you do need to know but when it comes to backups and especially the need to restore them in anger using Veeam Cloud Connect not only gets you an easy path to restore. VCSP employs experts in Veeam software and access to this expertise with best practices for Veeam backups and replicas can be critical, ensuring a more seamless restoration of services when the pressure is already on.
Conclusion
While the promise and the capabilities of object storage are exciting as of the v12 release Veeam Cloud Connect Backup offers many more options when it comes to the restore and management of your backup data and for that reason should still be the go-to for offsite backup copies. If you are wanting to begin leveraging the technology an on-premises object storage platform for your local backups is a far better solution and can give you an out of the box upgrade to what you are using today.
As we come upon the end of the year, we are also getting closer and closer to Veeam’s latest release of their flagship product, Veeam Backup & Replication v12. Here at 11:11 Systems we have been testing quite a bit with the beta code as it has been released because there are quite a few features to really be excited about here. For me personally the biggest is being able to use object storage as a first hit repository and to be able to scale them out.
If you are not yet familiar with object storage now is the time. Object storage provides a good deal of native capabilities we’ve never had when using block-based storage (like volumes and partitions). These include:
What this means to your backup plans is that you will have more capabilities, while being more agile in pricing.
Flavors of Object Storage
There are several ways to purchase and consume object storage. The hyperscale cloud names you are probably most familiar with: Amazon’s AWS (Amazon Web Services) S3 and Microsoft’s Azure Blob. But these providers can have several potentially unexpected costs around data egress, API call limits, and in some cases how long you store the data. As we all know with IT and disaster recovery, a good deal of our decisions come down to cost and budgeting, so having unexpected bursts can be a challenge.
With 11:11 Object Storage, you get access to a robust, performant object storage service that while competitively priced is flat rated with nothing hidden. Our service and others that use the AWS S3 API structure are commonly called S3 compatible. This means that they support the same calls and automation capabilities as S3 but may have differing architecture.
Finally, there are also a few physical device manufacturers such as Pure Storage, Cloudian and Object First that will sell you arrays for on-premises object storage. With the 3-2-1 backup model it may be worth considering an on-prem option for your first hit of backups before copying to the cloud.
Optimizing for Performance
One consideration as we begin to look at object storage for backup data is that depending on the platform the service is built on, there may be performance considerations as buckets scale in number of objects. In previous versions of VBR only a single bucket was supported for the capacity or archive tier. In this latest version, multiple buckets will be supported per tier, a move that is anticipated to enhance performance. At 11:11 Systems, we have put a great deal of work into overcoming limitations both in object storage itself but also how Veeam software integrates with it to the point that we consider it as performant if not more so than most other offerings in this space. This includes taking an active role in the open source Ceph project that powers our service.
As I began working on testing this capability, I realized that I was spending a lot of time creating buckets, creating repositories, and then ultimately creating SOBRs full of repositories. In speaking with my friend Joe Houghes we realized we were both doing the same things and so we collaborated on scripting this capability, the final product of which can be found in this GitHub repository. If you do choose to use this, please provide feedback through GitHub.
While the code is great, the takeaway you should consider here is that Object Storage as the repository portion of your VBR Infrastructure is coming. With that great capability comes great consideration and you should look to work with experts to assist you with these changes. You should also begin to educate yourself on the key capabilities and differences.
In the past few years Immutability, the concept of making data unable to be deleted or modified, has been a core tenant of disaster recovery design and has driven the rise of Object Storage for backup purposes. To this point in the Veeam ecosphere immutability has been contained to traditional Infrastructure backups, virtual machines, and cloud native instances, but there has been a great appetite for this capability to apply to SaaS backups as well.
With the upcoming v7 release of Veeam’s Backup for Microsoft365 (VB365) the product now has a method of allowing true, verifiable immutability or object-lock on backup sets. In their interpretation of how to make this capability happen Veeam has incorporated a long known best practice for all things backup, adhering to the 3-2-1 backup methodology where there are 2 copies of the backup. This happens because immutability can only occur on the secondary copy of the backup set, leaving the primary set on unlocked, performant object storage to ensure that recovery operations are capabilities in a optimized manner.
So What’s Changed?
Starting with VB365 v6 support for backup copies have been available but only in very limited situations. First the backups must land in Azure Blob or Amazon S3 for the first copy, second the copies could only go to related secondary locations. For example, backups residing in AWS S3 could only be copied to S3, S3 Infrequent Access or AWS Glacier storage. Further while these backups could be encrypted, they could not have the object-lock attribute, the object storage attribute that powers immutability, set.
With VB365 v7 both statements have changed. First backup copies can now land on Azure, S3 or any S3 compatible backup storage that supports the necessary functions of the S3 API. Further those backups can now optionally have the object-lock attribute set at the object repository level, with the immutability expiration automatically set to the duration of the retention period. In other words, unlike Veeam Backup and Replication where retention points are set at the job level and immutability is set at the repository level, with VB365 both are set together at the repository level.
VB365 Immutability in Action
1. Add primary object storage and backup repository. On this repository is when you want to set your long-term retention as needed. Repeat the process until you have repositories to support your organization’s backup needs in accordance with best practices for maximum supported objects per repository.
2. Add your secondary object storage and VBO repositories. These should arguably be setup in a different object storage environment as your primary copy but still well connected. Because you will be writing data that is set to object-lock for the retention period of the repository you should consider the shortest retention period here in accordance with your organization’s data policy and standards.
3. Create backup jobs targeting your primary backup repositories. These will automatically copy to the secondary repository and become immutable as it is written.
Conclusion
This capability is going to be a game changer for everyone who uses VB365 to protect their M365 workloads. While this capability is only supported for VB365 data residing on object storage there’s an argument to be made that if you haven’t already moved or reseeded that data to object by now you should be working on it. Customers will be able to maintain a secondary copy of their backup sets on disparate systems to ensure the durability of their backups. If you can’t tell I’m actually very excited by this capability and look forward to using it in production!
In my last post, Configuring Veeam Backup & Replication SOBR for Non-immutable Object Storage, I covered the basics of how to consume object storage in Veeam Backup & Replication (VBR). In general this is done through the concept of Scale-out Backup Repositories (SOBR). In this post we are going to build upon that land layer in object storage’s object-lock features which is commonly referred to in Veeam speak as Immutability.
First, let’s define immutability. What the backup/disaster recovery world thinks of as immutability is much like the old Write Once, Read Many (WORM) technology of the early 00’s, you can write to it but until it ages out it cannot be deleted or modified in any way. Veeam and other backup vendors definitely treat it this way but object-lock under the covers actually leverages the idea of versioning to make this happen. I can still delete an object through another client but the net effect is that a new version of that object is created with a delete marker attached to it. This means that if that were to occur you could simply restore to the previous version and it’s like it never happened.
With VBR once Immutability is enabled objects are written with Compliance mode retention for the duration of the protected period. It recognizes that the object bucket has object-lock enabled and then as it writes it each block is written with the policy directly rather than assuming a bucket level policy. If you attempt to delete a restore point that has not yet had its retention policy expire it won’t let you delete but instead gives you an error.
Setting up immutability with object storage in Veeam is the same as with non-immutability but with a few differences. This starts with how we create the bucket. In the last post we simply used the s3 mb command to create a bucket, but when you need to work with object-lock you need to use the s3api create-bucket command.
Once your bucket is created you will go about adding your backup repository as we’ve done previously but with one difference, when you get to the Bucket portion of the New Object Store Repository wizard you are going to check the box for “Make recent backups immutable” and set the number of days desired.
You now have an immutable object bucket that can be linked to a traditional repository in a SOBR. Once data is written (still the same modes) any item that is written is un-deleteable via the VBR server until the retention period expires. Finally if I examine any of the objects in the created bucket with the s3api get-object-retention command I can see that the object’s retention is set.
Veeam Backup & Replication (VBR) currently makes use of object storage through the concept of Scale-Out Backup Repositories, SOBR. A SOBR in VBR version 11 can contain any number of extents as the performance tier (made up of traditional repositories) and a single bucket for the capacity tier (object storage). The purpose of a SOBR from Veeam’s point of view is to allow for multiple on-premises repositories to be combined into a single logical repository to allow for large jobs to be supported and then be extended with cloud based object storage for further scalability and retention.
There are two general modes for object storage to be configured in a SOBR:
Copy Mode- any and all data that is written by Veeam to the performance tier extents will be copied to the object storage bucket
Move Mode- Only restore points that are aged out of a defined window will be evacuated to object storage or as a failure safeguard only when the performance tier extents reach a used capacity threshold. With Archive mode within the Veeam UI the restore points all still appear as being local but the local files only contain metadata that points to where the data chunks reside in the bucket. The process of this occurring in Veeam is referred to as dehydration.
In this post let’s demonstrate how to create the necessary buckets, how to create a SOBRs for both Copy and Move modes without object-lock (Immutability) enabled. If you haven’t read my previous post about how to configure aws cli to be used for object storage you may want to check that out first.
1. Create buckets that will back our Copy and Move mode SOBRs. In this example I am using AWS CLI with the s3 endpoint to make the buckets.
2. Now access your VBR server and start with adding the access key pair provided for the customer. You do this in Menu > Manage Cloud Credentials.
3. Click on Backup Infrastructure then right click on Backup Repositories, selecting Add Backup Repository.
4. Select Object Storage as type.
5. Select S3 Compatible as object storage type
6. Provide a name for your object repository and hit next.
7. For the Account Settings screen enter the endpoint, region and select your created credentials.
8. In the Bucket settings window click Browse and select your created bucket then click the Browse button beside the Folder blank and create a subfolder within your bucket with the “New Folder…” button. I’ll note here do NOT check the box for “Make recent backups immutable for…” here as the bucket we have created above does not support object-lock. Doing so will cause an error.
9. Click Apply.
10. Create or select from existing traditional, Direct Storage repository or repositories to be used in your SOBR. Note: You cannot choose the repository that your Configuration Backups are targeting.
11. Right click on Scale-out Backup Repositories and select “Add Scale-out Backup Repository…”
12. Name your new SOBR.
13. Click Add in your Performance Tier screen and select your repository or repositories desired. Hit Ok and then Next.
14. Leave Data Locality selected as the Placement Policy for most scenarios.
15. In the Capacity Tier section check to Extend capacity with object storage and then select the desired bucket.
16. (Optional but highly recommended): Check the Encrypt data uploaded to object storage and create an encryption password. Hit Apply.
17. This will have the effect of creating an exact copy of any backup jobs that target the SOBR both on premises and into the object store. To leverage Move mode rather than Copy you simply check the other box instead/in addition to and set the number of days you would like to keep on premises.
Now you simply need to target a job at your new SOBR to have it start working.
In conclusion let’s cover a bit about how we are going to see our data get written to object storage. For the Copy mode example it should start writing to data to the object store immediately upon completion of each run. In the case of leveraging Move mode you will see objects written after the run after the day specified to move archive. For example if you set it to move anything older than 7 days on-prem dehydration will occur after the run on day 8. These operations can be seen in the Storage Management section of the History tab in the VBR console.
Further if I recursively list my bucket via command line I can see lots of data now, data is good. 😉
% aws --endpoint=https://us-central-1a.object.ilandcloud.com --profile=premlab s3 ls s3://premlab-sobr-unlocked-copy/ --recursive
In my last post I worked through quite a few things I’ve learned recently about interacting with S3 Compatible storage via the CLI. Now that we know how to do all that fun stuff it’s time to put it into action with a significant Service Provider/Disaster Recovery slant. Starting with this post I’m going to highlight how to get started with some common use cases of object storage in Backup/DR scenarios. In this we’re going to look at a fairly mature use case, with it backing Veeam Backup for Office (now Microsoft) 365.
Veeam Backup for Microsoft 365 v6, which was recently showcased at Cloud Field Day 12, has been leveraging object as a way to make it’s storage consumption more manageable since version 4. Object also provides a couple more advantages in relation to VBM, namely an increase in data compression as well as a method to enable encryption of the data. With the upcoming v6 release they will also support the offload of backups to AWS Glacier for a secondary copy of this data.
VBM exposes its use of object storage under the Object Storage Repositories section of Backup Infrastructure but it consumes it as a step of the Backup Repository configuration itself, which is nested within a given Backup Proxy. I personally like to at a minimum start with scaling out repositories by workload (Exchange, OneDrive, Sharepoint, and Teams) as each data type has a different footprint. When you really need to scale out VBM, say anything north of 5000 users in a single organization, you will want to use that a starting point for how you break down and customize the proxy servers.
Let’s start by going to the backup proxy server, in this case the VBM server itself, and create folder structure for our desired Backup Repositories.
Now that we have folders let’s go create some corresponding buckets to back them. We’ll do this via the AWS S3 CLI as I showed in my last post. At this point VBM does not support advanced object features such as Immutability so no need to be fancy and use the s3api, but I just prefer the command structure.
Ok so now we have folder and buckets, time to hop in to Veeam. First we need to add our object credentials to the server. This is a simple setup and most likely you will only need one set of credentials for all your buckets. Because in this example I will be consuming iland Secure Cloud Object Storage I need to choose the “S3 Compatible access key” under the “Add…” button in Cloud Credential Manager (menu> Cloud Credentials). These should be the access key and secret provided to you by your service provider.
Now we need to go to Backup Infrastructure > Object Storage Repositories to add our various buckets. Start by right clicking and choose “Add Object Storage.”
1. Name your Object Repository2. Select S3 Compatible option3. Enter your endpoint URL, region, and select credentials4. Select your bucket from the dropdown menu5. Create a folder inside of your bucket for this repository’s data and hit finish
Now simply repeat the process above for any and all buckets you need for this task.
Now that we have all our object buckets added we need to pair these up with our on premises repository folders. It’s worth noting that the on-prem repo is a bit misleading, no backup data as long as you use the defaults will ever live locally in that repository. Rather it will hold a metadata file in the form of a single jetDB file that service as pointers to the objects that is the actual data. For this reason the storage consumption here is really really low and shouldn’t be part of your design constraints.
Under Backup Infrastructure > Backup Repositories we’re going to click “Add Repository..” and let the wizard guide us.
1. Name our repository2. Specify the hosting proxy server and the path to the folder you wish to use.3. If you don’t already have one created you can add an encryption secret to encrypt the data when specifying your object repository4. Specify the object storage repository and the encryption key to use. 5. Specify the retention period, retention level and hit finish.
One note on that final step above. Often organization will take the “Keep Forever” option that is allowed here and I will say I highly advise against this. You should specify a retention policy that is agreed upon with your business/organization stakeholders as keeping any backup data longer than needed may have unintended consequences should a legal situation arise; data the organization believes to be long since gone is now discoverable through these backups.
Also worth noting item-level retention is great if you are using a service provider that does not charge you on egress fees because it gives you more granular control in terms of retention. If you use a hyperscaler such as Amazon S3 you may find this option will drive your AWS bill up because of a much higher load on egress each time the job runs.
Once you’ve got one added again, rinse and repeat for any other repositories you need to add.
Finally the only step left to do is create jobs targeting our newly created repositories. This is going to have way more variables based on your organization size, retention needs, and other factors than I can truly do justice in the space of this blog post but I will show how to create a simple, entire organization, single workload job.
You can start the process under Organizations > Your Organization > Add to backup job…
1. Name your backup job2. Select Organization as your source3. Check the box for your desired organizaton4. Select your organization and click the edit button, allowing you to deselect all the workloads not in this job.5. Once edited you’ll see just the workload you want for your organization before hitting next6. I don’t have any exclusions for this job but you may have this need7. Select your desired proxy server and backup repository8. Finally do any customization needed to the run schedule.
Once again you’d want to repeat the above steps for all your different workload types but that’s it! If we do a s3 ls on our s3://premlab-ilandproduct-vbm365-exch/Veeam/Backup365/ilandproduct-vbm365-exch/ full path we’ll see a full folder structure where it’s working with the backup data, proving that we’re doing what we tried to do!
In conclusion I went way into depth of what is needed here but in practice it isn’t that difficult considering the benefits you gain by using object storage for Veeam Backup for Microsoft365. These benefits include large scale storage, encryption and better data compression. Hope you find this helpful and check back soon for more!
Recently a good portion of my day job has been focused on learning and providing support for s3 compatible object storage? What is s3 compatible you say? So while Amazon’s AWS may have created the s3 platform at its root today is an open framework of API calls and commands known as s3. While AWS s3 and its many iterations are the 5000 pound gorilla in the room many other organizations have created either competing cloud services or storage systems that can let you leverage the technology in your own environments.
So why am I focusing on this you may ask? Today we are seeing more and more enterprise/cloud technologies have reliance on object storage. Any time you even think of mentioning Kubernetes you are going to be consuming object. In the Disaster Recovery landscape we’ve had the capability for a few years now to provide our archive or secondary copies of data to object “buckets” as it is both traditionally cheaper than other cloud based file systems and provided a much larger feature set. With their upcoming v12 release Veeam is going to be providing the first iteration of their Backup & Replication product that can write directly to object storage with no need to have the first repository be a Windows or Linux file system.
To specifically focus on the VBR v12 use case many customers are going to choose to start dipping their toes into the idea of on-prem s3 compatible object storage. This can be as full featured as a Cloudian physical appliance or as open and flexible as a minIO or ceph based architecture. The point being that as Veeam and other enterprise tech’s needs for object storage matures your systems will be growing out of the decisions you make today so it’s a good time to start learning about the technology and how to do the basics of management from an agnostic point of view.
So please excuse the long windedness of post as I dive into the whys and the hows of s3 compatible object storage.
Why Object Then?
Before we go further it’s worth taking a minute to talk about the reasons why these technologies are looking to object storage over the traditional block (NTFS, ReFS, XFS, etc) options. Probably first and foremost it is designed to be a scale-out architecture. With block storage while you can do things like creating RAID arrays to allow you to join multiple disks you aren’t really going to make a RAID across multiple servers. So for the use case of backup rather than be limited by the idea of a single server or have to use external constructs such as Veeam’s SOBR to allow those to be stitched together, you can target an object storage gateway that then write to a much more scalable, much more tunable, infrastructure of storage servers underneath.
Beyond the scale-out you have a vast feature set. Things that we use every day such as file versioning, security ACLs, least privilege design and the concept of immutability are extremely important in designing a secure storage system in today’s world and most object storage systems are going to be able to provide these capabilities. Beyond this we can look at capabilities such as multi-region synchronization as a way to ensure that our data is secure and highly available.
Connecting to S3 or S3 Compatible Storage
So regardless of whatever client you are using you are going to need 4 basic pieces of information to connect to the service at all.
Endpoint: This will be a internet style https or http URL that defines the address to the gateway that your client will connect to.
Region: This defines the datacenter location within the service provider’s system that you will be storing data in. For example the default for AWS s3 is us-east-1 but can be any number of other locations based on your geography needs and the provider.
Access Key: this is one half of the credential set you will need to consume your service and is mostly akin to a username or if you are used to consuming Office 365 like the AppID
Secret Key: this is the other half and is essentially the generated password for the access key.
Regardless of the service you will be consuming all of those parts. With things that have native AWS integration you may not be prompted necessarily for the endpoint but be assured it’s being used.
To get started at connecting to a service and creating basic, no frills buckets you can look at some basic GUI clients such as CyberDuck for Windows or MacOS or WinSCP for Windows. Decent primers for using these can be found here and here.
Installing and Configuring AWS CLI Client
If you’ve ever used AWS S3 to create a bucket before you are probably used to go to the console website and pointy, clicky create bucket, set attributes, upload files, etc. As we talk more and more about s3 compatible storage that UI may or may not be there and if it is there it may be wildly different than what you use at AWS because it’s a different interpretation of the protocol’s uses. What is consistent, and in some cases or situation may be your only option, is consuming s3 via CLI or the API.
Probably the easiest and most common client for consuming s3 via CLI is the AWS cli. This can easily be installed via the package manager of your choice, but for quick and easy access:
Windows via Chocolatey
choco install -y awscli
MacOS via Brew
brew install awscli
Once you have it installed you are going to need to interact with 2 local files in your .aws directory on your user profile, config and credentials. You can get these created by using the aws configure command. Further aws cli supports the concept of profiles so you can create multiple connections and accounts. To get started with this you would simply use aws configure --profile obj-testwhere obj-test is whatever name you want to use. This will then walk through prompting you for 3 of those 4 pieces of information, access key, secret key and default region. Just as an FYI this command impacts 2 files within your user profile, regardless of OS, ~/.aws/config and ~/.aws/credentials. These are worth reviewing after you configure to become familiar with the format and security implications.
Getting Started with CLI
Now that we’ve got our CLI installed and authentication configured let’s take a look a few basic commands that will help you get started. As a reference here are the living command references you will be using
Awesome! We’ve got our first bucket in our repository. That’s cool but I want my bucket to be able to leverage this object lock capability Jim keeps going on about. To do that you use the same command but add the –object-lock-enabled-for-bucket parameter.
So yeah, good to go there. Next let’s dive into that s3api list-buckets command seen in the previous screenshot. Listing buckets is a good example for understanding that when you access s3 or s3 compatible storage you are really talking about 2 things; s3 protocol and the s3api. For listing buckets you can use either:
aws --profile jimtest --endpoint-url=https://us-central-1a.object.ilandcloud.com s3 ls
While these are similar it’s worth noting the return will not be the same. The ls command will return data much like what it would in a standard Linux shell while s3api list-buckets will return JSON formatted data by default.
Enough About Buckets, Give Me Data
So buckets are great but the are nothing without data inside them. Let’s get to work writing objects.
Again writing data, especially if you are familar with the *nix methods to s3 can be very similar. I can use s3 cp or mv to copy or mv data to my s3://test-bucket-2-locked/ bucket or any other I’ve created or between them.
Now that we’ve written a couple files let’s look at what we have. As you can see above once again you can do the same actions via both methods, it’s just the s3api way will consistently give you more information and more capability. Here’s what the api output would look like.
Take note of a few things here. While the s3 ls command gives more traditional file system output s3api refers to the objects with their entire “path” as the key. Essentially in object storage for our benefit it still has the concept of file and folder structure but it views each unique object as a single flat thing on the file system without a true tree. Key is also important because as we start to consider more advanced object storage capabilities such as object lock, encryption, etc. the key is often what you need to supply to complete the commands.
A Few Notes About Object Lock/Immutability
To round out this post let’s take a look at where we started as the why, immutability. Sure we’ve enabled object lock on a bucket before but what that really just does is enable versioning, it’s not enforcing anything. Before we get crazy with creating immutable objects its important to understand there are 2 modes of object lock:
Governance Mode – In Governance Mode users can write data and not be able to truly delete it as expected but there are roles and permissions that can be set (and are inherited by root) that will allow that to be overridden and allow data to be removed.
Compliance Mode – This is the more firm option where even the root account cannot remove data/versions and the retention period is hard set. Further once a retention date is set on a given object you cannot shorten it in any way, only extend it further.
Object Lock is actually done in one of two ways (or a mix of both); creating a policy and applying it to a bucket so that anything written to that bucket will assume that retention or by actually applying a retention period to an object itself either while writing the object or after the fact.
Let’s start with applying a basic policy to a bucket. In this situations for my test-bucket-2-locked bucket I’m going to enable compliance mode and then set retention to 21 days. A full breakdown of the formatting of the object-lock-configuration parameter and the options it provides can be found in the AWS documentation.
Cool, now to check that compliance we can simply use s3api get-object-lock-configuration instead against the bucket to check what we’ve done. I’ll note that for either the “put” above or the “get” below that there is no s3 endpoint equivalent, these are some of the more advanced features I’ve been going on about.
Ok, so we’ve applied a baseline policy of compliance and retention of 21 days to our bucket and confirmed that it’s set. Now let’s look at the objects within. You can view a particular object’s retention with the s3api get-object-retention command. As we are dealing with advanced features at the object level you will need to capture the key for an object to test. If you’ll remember we found those using the s3api list-objects command.
So as you can see we have both mode and retention date set on the individual object. What if we wanted this particular object to have a different retention period than the bucket itself? Let’s now use the s3api put-object-retention option to work and try to set that down to 14 days instead. While we use a general purpose number of days when creating the policy when we set object level retention it’s done by modifying the actual date stamp so we’ll simply pick a day 14 days from today.
Doh! Remember what we said about compliance mode? That you could make the retention shorter than what was previously set? We are running into that here and can see that it in fact works! Instead let’s try this again and set it to 22 days.
As you can see now not only did we not get an error but when you check your retention it is now showing for defined timestamp so it definitely worked.
This feels like a good time to note that object locking is not the same as deletion protection. If I create an object lock enabled bucket and upload some objects to it, setting the object retention flag with the right info along the way, I am still going to be able to use a basic delete command to delete that file. In fact if I use CyberDuck or WinSCP to connect to my test bucket I can right click on any object there and successfully choose delete. What is happening under the covers is that a new version of that object is spawned, one with the delete marker applied to it. For standard clients that will appear that the data is gone but in reality it’s still there, it just needs to be restored to the previous version. In practice most of the UIs you are going to use to consume s3 compatible storage such as Veeam or developed consoles will recognize what is going on under the covers and essentially “block” you from executing the delete, but feel secure that as long as you have object lock enabled and the data is written with a retention date the data has not actually gone away and can be recovered.
All of this is a somewhat long winded answer to the question “How S3 Object Lock works” which Amazon has thoughtfully well answered in this post. I recommend you give it a read.
Conclusion
In the end you are most likely NOT going to need to know via the command line how to do all the above steps. More likely you will be using some form of a UI, be it Veeam Backup and Replication, AWS console or that of your service provider, but it is very good to know how to do these things especially if you are considering on-premises Object Storage as we move into this next evolution of IT and BCDR. Learning and testing the above is a relatively low cost consideration as most object services are literally pennies per GB, possibly with the egress data charges depending on your provider (hey AWS…), but it’s money well spent to get a better understanding.
Hi there and welcome to koolaid.info! My name is Jim Jones, a Geek of Many Hats living in West Virginia.
This site was created for the purpose of being a locker full of all the handy things I’ve learned over the years, know I’m going to need again and know I’ll forget. It’s morphed a bit over the years as all things do but still that’s the main purpose. If you’d like to know more about me check out any of the social links at the top left of the site, I’m pretty much an open book.
If you’ve found this page I hope you find it’s contents helpful. Finally, anything written here are solely my views and do not reflect those of my employer.
You must be logged in to post a comment.