DR Scenarios For You and Your Business: Getting Cloudy With It

In the last post we talked about the more traditional models of architecting a disaster recovery plan. In those we covered icky things like tape, dark sites and split datacenters. If you’d like to catch up you can read it here. All absolutely worthwhile ways to protect your data but all of those are slow and limit you and your organizations agility in the case of a disaster.

By now we have all heard about the cloud so much we’ve either gone completely cloud native, dabbled a little or just completely loathe the word. Another great use for “somebody else’s computer” is to power your disaster recovery plans. By leveraging cloud resources we can effectively get out of the managing hardware business in regards to DR and have borderline limitless resources if needed. Let’s look at a few ways this can happen.

DRaaS (Disaster Recovery as a Service)

For now this is my personal favorite, but my needs may be and probably are different from yours. In a DRaaS model you still take local backups as you normally have, but then those backups or replicas are then shipped off to Managed Service Providers (MSPs) aligned with your particular backup software vendor.

I can’t particularly speak to any of the others from experience but CloudConnect providers in the Veeam Backup and Replication ecosphere are simple to consume and use. Essentially once you buy the amount of space you need from a partner you then use the link and credentials you are provided and add them to your backup infrastructure. Once done you create a backup copy job with that repository as the target and let it run. If you are bandwidth restrained many will even let you seed the job with an external hard drive that you ship them full of backups then all you have to transfer over the wire is your daily changes. Meanwhile all of these backups are encrypted with a key that only you and your organization knows so the data is nice and safe sitting elsewhere.

This is really great in that it is infinitely scalable (you only pay for what you use) and you don’t have to own any of the hardware or software licenses to support it. In the case that you have an event you have options; you can either scramble and try to put something together on your own or most times you can leverage the compute capabilities of the provider to power your organization until such time you can get your on-site resources available again. As these providers will have their own IT resources available you and your team will be freed up to do the work of getting staff and customers back online as they handle getting you restored and back online.

In my mind the drawbacks to this model are minimal. In the case of a disaster you are definitely going to be paying more than you would if you are running restored systems on your own hardware, but you would have had to buy that hardware and maintain it as well which is expensive. You will also be in a situation where workers and datacenter systems are not in the same geographical area as well which may cause for increased bandwidth cost as you get back up and running but still nothing compared to maintaining this consistently. Probably the only real drawback here is almost all of these types of providers require long-term agreements, 1 year or more for the backup or replication portion of what is needed. You also need to be sure if you choose this route that the provider has enough compute resources available to absorb you if needed. This can be mitigated by working with your provider to do regular backup testing at the far end. This will cost you a bit more but it is truly worth it to me.

Backup to Public Cloud

Finally we come to what all the backup vendors seems to be  going towards these days, public cloud backups. In this situation your backups are either on premises first (highly recommended) and then shipped off to the public cloud provider of your choice. AWS, Azure or GCP start messing with their storage pricing models and suddenly become cheaper? Simply just add the new provider and shift the job to the new provider, easy peasy. As with all things cloud you are in theory also infinitely scalable so you don’t have to worry about on boarding new workloads except for cost, and who cares about cost anyway?

The upside here is the ability to be agile. Start to finish you can probably be setup to consume this model within minutes and then your only limit to how fast you can be covered is how much bandwidth you make available to shipping backups. If you are doing this to cover for an external event like failure of your passive site you can simply tear it back down afterwards just as fast as you made it. Also you are only ever paying for your actual consumption, so you know how much your cost is going to be for any additional workload to be protected, you don’t ever pay for “spare space.”

As far as the drawbacks I feel like we are still in the early days of this so there are a few. While you don’t have to maintain your far end equipment for either backup storage or compute I’m not convinced that this isn’t the most expensive option for traditional virtualized workloads.

Hybrid Archive Approach

One of the biggest challenges to maintaining an on-prem, off-prem backup system is we all run out of space sometimes. The public cloud provides us an ability to only consume what we need, not paying for any fluff, as well as letting others manage the performance and availability of that storage. One trend I’m seeing more and more is the ability to supplement your on premise backup storage with public cloud resources to allow for scale out of archives for as long as necessary. There is a tradeoff between locality and performance, but if your most recent backups are on premises or well-connected to your production environment you may not ever need to access those backups that archived off to object storage so you don’t really care how fast it is to restore; you’ve just checked your policy checkbox and have that “oh no” backup out there.

Once upon a time my employer had a situation where we needed to retain every backup for about 5 years. Each year we had to buy more and more media to save these backups we would never ever restore to because they were so old, but we had them and were in compliance. If things like Veeam’s Archive Tier or similar with other vendors existed I could have said “I want to retain X backups on-prem but after that shift them to a S3 IA bucket.” In the long-term this would have saved quite a bit of money and administrative overhead and when the requirement went away all I had to do is delete the bucket and reset back to normal policy.

While this is an excellent use of cloud technology I don’t consider it a replacement for things like DRaaS or Active/* models. The hoops you need to go through to restore these backups to a functional VM are still complex and require resources. Rather I see this as an extension of your on-prem backups to allow for short-term scale issues.

Conclusion

If you’ve followed along for both posts I’ve covered about 5.5 different methods of backing up, replicating and protecting your datacenter. Which one is right for you? It might be one of these, none of these or a mash-up of two or more to be honest. The main thing is know your business’ needs, it’s regulatory requirements and

DR Scenarios For You and Your Business Part 1: The Old Guard

It is Disaster Recovery review season again here at This Old Datacenter and reviewing our plans sparked the idea to outline some of the modern strategies for those who are new to the game or looking to modernize. I’m continually amazed by the number of people who I talk to that are using modern compute methodologies (virtualization on premises, partner IaaS, public cloud) but are still using the same backup systems they were using in the 2000s.

In this post I’m going to talk about some basic strategies using Veeam Backup and Replication because that is primarily what I use, but all of these are capable with any of the current data backup vendors with varying levels of advantages and disadvantages per vendor. The important part is to understand the different ways about protecting your data to start with and then pick a vendor that fits your needs.

One constant that you will see here is the idea of each strategy consisting of 2 parts. A local backup first to handle basic things like a failing VM, file restore, and other things that aren’t responding to all systems down. Secondly then archiving that backup to somewhere outside of your primary location and datacenter to deal with that systems down or virus consideration. You will often hear this referred to as the 3-2-1 rule:

  • 3 copies of your data
  • 2 copies on different types of physical media or systems
  • 1 copy (at least) in a different geographical location (offsite)
On Premises Backup/Archive to Removable Media Backup

This is essentially an evolution on your traditional backup system. Each night you take a backup of your critical systems to a local resource and then copy that to something removable so that it can be taken to somewhere offsite each evening. In the past this was probably only one step, you ran backups to tape and then you took that tape somewhere the next morning. Today I would hope the backups would land on disk somewhere local and then be copied to tape or a USB hard disk but everybody has their ways.

This method has the ability to get the job done but has a lot of drawbacks. First you must have human intervention to get your backups somewhere. Second restores may be quick if you are restoring from your primary backup method but if you have to go to your second you first have to physically locate that correct data set and then especially in the case of tape it can take some time to get that back to functional. Finally you own and have to maintain all the hardware involved in the backup system, hardware that effectively isn’t used for anything else.

Active/Passive Disaster Recovery

Historically the step up for many organizations from removable media is to maintain a set of hardware or at least a backup location somewhere else. This could be just a tape library, a NAS or an old server loaded with disks either in a remote branch or at a co-location facility. Usually you would have some dark hardware there that could allow systems to be restored if needed. In any case you still would perform backups locally and maintain a set on premises for the primary restore, then leverage the remote location for a systems down event.

This method definitely has advantages over the first in that you don’t have to dedicate a person’s time to ensuring that the backups go offsite and you might have some resources available to take over in case of a massive issue at your datacenter, but this method can get very expensive, very fast. All the hardware is owned by you and is used exclusively for you, if ever used at all. In many cases datacenter hardware is “retired” to this location and it may or may not have enough horsepower to cover your needs. Others may buy for the dark site at the same time as buying for the primary datacenter, effectively doubling the price of updating. Layer on top of this the cost of connectivity, power consumption and possibly rack space and you are talking about real money. Further you are on your own in terms of getting things going if you do have a DR event.

All that being said this is a true Disaster Recovery model, which differentiates from the first option. You have everything you need (possibly) if you experience a disaster at your primary site.

Active/Active Disaster Recovery

Does your organization have multiple sites, with datacenter capabilities in each place? If so then this model might be for you. With Active/Active you design you multisite datacenters with redundant space in mind so that in the case of an event in either location  you can run both workloads in a single location. The ability to have “hot” resources available at your DR site is attractive in that you can easily make use of not only backup operations but replication as well, significantly shortening your Restore Time Objective (RTO), usually with the ability to rollback to production when the event is over.

Think about a case where you have critical customer facing applications that cannot handle much downtime at all but you lose connectivity at your primary site. This workload could fairly easily be failed over to the replica in the far side DC, all the while your replication product (Think Veeam Backup & Replication or Zerto) is tracking the changes. When connectivity is restored you tell the application to fallback and you are running with changes intact back in your primary datacenter.

So what’s the downside? Well first off it requires you to have multiple locations to be able to support this in the first place. Beyond that you are still in a world of needing to support the load in case of having an event, so your hardware and software licensing costs will most likely go up to support this event that may never happen. Also supporting replication is a good bit more complex than backup when you include things like the need for ReIP, external DNS, etc. so you should definitely be testing this early and often, maintaining a living document that outlines the steps needed to failover and fallback.

Conclusion

This post covers what I consider the “old school” models of Disaster Recovery, where your organization owns all the hardware and such to power the system. But who wants to own physical things anymore, aren’t we living in the virtual age? In the next post we’ll look at some more “modern” approaches to the same ol’ concepts.

The Basics of Veeam Backup & Replication 9.5 Update 4 Licensing

Veeam has recently released the long-awaited Update 4 to their Backup and Replication 9.5 product and with it has come some changes to how they deal with licensing. As workloads that need to be protected/backed up/made available have moved from being 100% on-premises and inside our vSphere or Hyper-V environments to mixes of on-prem, off-prem, on physical, public cloud, etc. my guess is their customers have asked for a way to make that protection and licensing portable.In Veeam’s move they have decided this can be solved by creating per instance licensing, which is similar to how you consume many other cloud based services. This rides along with the established perpetual licensing we still have for VBR and Veeam Availability Suite.

I will be honest and say that the upgrade was not as smooth as I would have hoped. Now that I’ve got to the bottom of my own licensing issues I’ll post here what I’ve learned to hopefully keep you from experiencing the same headaches. It’s worth noting that there is a FAQ on this but the content is varying quite a bit as this gets rolled out.

How We Got Here

In the past if you were using nothing but Veeam Backup and Replication (VBR) you did all your licensing by the socket count of protected hypervisors. After that came along Veeam Agents for Windows and Linux and we had the addition subscriptions levels for VAW Server, VAW Workstations, and VAL. As these can be managed and deployed via the Veeam Console this license was required to be installed on your VBR server as well so you now had 2 separate licenses files that were commingled on the server to create the entire solution for protecting VBR and Agent workloads.

Now as we look at the present and future Veeam has lots of different products that are subscription based. Protecting Office365, AWS instances, and Veeam’s orchestration product are all per consumable unit subscriptions. Further when you consider that due to Veeam’s Service Provider program you as an end customer have the option of either buying and subscribing directly from a VAR or “renting” those licenses from a server provider. As you keep counting up you can see where this model needed (and still needs) streamlined.

Update 4 License Types

So that brings us to the here and now. For now and for as far as I can get anyone to tell me perpetual (a.k.a. per socket) licensing for Veeam Backup and Replication and the Veeam Availability Suite which includes VBR and VeeamONE is here to stay. Any new products though will be licensed through a per instance model going forward. In the middle there is some murkiness so let’s take a look at the options.

  1. Perpetual (per socket) only. This is your traditional Backup and Replication license, licensed per protected socket of hypervisor. You still have to obtain a new Update 4 license from my.veeam.com but works exactly the same. If you have a Veeam Server without any paid VAW/VAL subscriptions attached you can simply just run the installer and continue on your current license. An interesting note is that once you install your Update 4 perpetual license and if you have no instances it will automatically provide you with 1 instance per socket up to a maximum of 6. That’s actually a nice little freebie for those of us with a one-off physical box here or there or a just a couple of cloud instances.
  2. Instance based. These are the “portable licenses” that can be used for VBR protected VMs, VAW, VAL, Veeam for AWS, etc. If you are an existing customer you can contact licensing support and migrate your per socket to this if you want, but unless you are looking at a ROBO site, need more cloud protection or have a very distributed use case for Veeam (small on-prem, workstations, physical servers, cloud instances) I don’t see this being a winner price-wise. For those of us with traditional workloads perpetual makes the most sense because it doesn’t matter how many VMs we have running on our hypervisors, they are still all covered. If you’d like to do the math for yourself they’ve provided a instance cost calculator.

    I will mention that I think they are missing the idea in the calculator that unless they are doing something magical this is based on buying new. Renewals of perpetual licenses should be far cheaper than the given number and I’ve never heard of a subscription license service having a renewal rate.It is also worth noting that even if you aren’t managing your licensed (as opposed to free) Veeam Agent for Windows and Linux with VBR you will need to go to the Update 4 License management screen in my.veeam.com and convert your subscription licenses to Update 4 instances ones to be able to use the 3.0 versions of the software. It doesn’t cost any thing or make a difference at this point, but while you could buy subscription licenses in any numbers you choose per instance licenses have a minimum level of 10 and are only sold in 10 packs. So while for now it might be nice that your licenses are rounded up understand you’ll have to renew at the rounded up price as well.

    Further its worth noting that back when VAW was subscription there were separate lines for workstations and servers, with 1 server license costing the same as 3 workstations. In the new per instance model this is reflected by consumption. A server of any kind will consume 1 instance, but a workstation will only consume 0.33 of one. Same idea, different way of viewing it.

  3. The Hybrid License. This is what you need if you need/want to manage both perpetual and instances from the same VBR server . If you previously had per socket for your VMs and subscription licenses for VAW/VAL you will need to hit the merge button under your Update 4 license management screen. This only works if you are the primary license administrator for all support IDs you wish to merge.

Just to make sure it’s clear in previous versions you could have both a per socket and subscription license installed at the same time; this is no longer the case thus the reason for option 3. You cannot have a 1 and a 2 installed on the same server, the 2 will override the 1. So if you are consuming both perpetual and per instance under the same VBR server you must be sure to merge those licenses on my.veeam.com. In order to do so you will need any and all licenses/Support IDs to be merged to be under the same Primary License Administrator. If you did not do this previously you will need to open a case with support to get a common Primary set for your Support IDs.

Conclusion

As we begin, or continue, to move our production workloads from not only our own datacenters to others as well as the public cloud those workloads will continue to need to be protected. For those of us that use Veeam to do so handling the licensing has, for now, been made simpler and is still cost effective once you can get it lined out for yourself.

Dude, Where’s My Managed Service Accounts?

So I am probably way late to the game but today’s opportunities to learn have included ADFS and with that the concept of Managed Service Accounts.

What’s a Managed Service Account you ask? So we’ve all installed applications and either set the service to run with the local system account or with a standard Active Directory account. Since the release of Windows Server 2008 R2 this feature has been available (and with Windows Server 2012 greatly enhanced,) gMSA lets you create a special type of account to be used for services where Active Directory itself manages the security of the account, keeping you secure while not having to update passwords regularly.

While there are quite a few great step by step guides for setting things up and then creating your first Managed Service account, I almost immediately ran into an issue where my Active Directory didn’t seem to include the Managed Service Accounts container (CN=Managed Service Accounts,DC=mydomain,DC=local). My domain was at the correct level, Advanced Features were turned on in AD Users & Computers, everything seemed like it should be just fine, the container just wasn’t there. In this post I’ll outline the steps I ultimately took that resulted in getting the problem fixed.

Step 0: Take A Backup

While you probably are already mashing on the “take a snapshot” button or starting a backup job, its worth saying anyway. You are messing with your Active Directory, be sure to take a backup or snapshot of your Domain Controller(s) which holds the various FSMO roles. Now that you’ve got that backup depending on how complex your Active Directory is it might be worth leveraging something like Veeam’s SureBackup (er, I mean DataLab) like I did and create you a test bed where you can try it out on last night’s backups before doing this in production.

Step 1: ADSI Stuff

Now we are going to have to start actually manually editing Active Directory. This is because you might have references to Managed Service Accounts in your Schema but are just missing the container. You also have to tell AD it isn’t up to date so that the adprep utility can be rerun. Be sure you are logged into your Schema Master Domain Controller as an Enterprise Admin and launch the ADSIEdit MMC.

  1. Right click ADSI Edit at the top of the structure on the left, Click Connect… and hit OK as long as the Path is the default naming context.
  2. Drill down the menu structure to CN=Domain Updates, CN=System, DC=<mydomain>,DC=<mytld>
  3. Within the Operations Container you will need to delete the following containers entirely.
    1. CN=5e1574f6-55df-493e-a671-aaeffca6a100
    2. CN=d262aae8-41f7-48ed-9f-35-56-bb-b6-77-57-3d
  4. Now go back up a level and right click on the CN=ActiveDirectoryUpdates container and choose Properties
    1. Scroll down until you find the “revision” attribute, click on it and click Edit
    2. Hit the Clear button and then OK

Step 2: Run ADPrep /domainPrep

So now we’ve cleaned out the bad stuff and we just need to run ADprep. If you have upgraded to your current level of Active Directory you probably have done this at least once before, but typically it won’t let you run it on your domain once its been done; that’s what the clearing the revision attribute above did for us. Now we just need to pop in the (probably virtual) CD and run the command

Yay! It actually worked!
  1. Mount the ISO file for your given operating system to your domain controller. You can either do this by putting the ISO on the system, right click, mount or do so through your virtualization platform.
  2. Open up a command line or powershell prompt and navigate to <CDROOT>:\support\adprep
  3. Issue the .\adprep.exe /domainPrep command. If all goes well it should report back “Adprep successfully updated the domain-wide information.”

Now that the process is completed you should be able to refresh or relaunch your Active Directory Users & Computers window and see that Managed Service Accounts is available right below the root of your domain as long as Advanced Features is enabled under View and you are now good to go!

Fixing the SSL Certificate with Project Honolulu

So if you haven’t heard of it yet Microsoft is doing some pretty cool stuff in terms of Local Server management in what they are calling Project Honolulu. The latest version, 1802, was released March 1, 2018, so it is as good a time as any to get off the ground with it if you haven’t yet. If you’ve worked with Server Manager in versions newer than Windows Server 2008 R2 then the web interface should be comfortable enough that you can feel your way around so this post won’t be yet another “cool look at Project Honolulu!” but rather it will help you with a hiccup in getting it up and running well.

I was frankly a bit amazed that this is evidently a web service from Microsoft not built upon IIS. As such your only GUI based opportunity to get the certificate right is during installation, and that is based on the thumbprint at that, so still not exactly user-friendly. In this post, I’m going to talk about how to find that thumbprint in a manner that copies well (as opposed to opening the certificate) and then replacing the certificate on an already up and running Honolulu installation. Giving props where they do this post was heavily inspired by How to Change the Thumbprint of a Certificate in Microsoft Project Honolulu by Charbel Nemnom.

Step 0: Obtain a certificate: A good place to start would be to obtain or import a certificate to the server where you’ve installed Project Honolulu. If you want to do a public one, fine, but more likely you’ll have a certificate authority available to you internally. I’m not going to walk you through this again, my friend Luca Dell’Oca has a good write up on it here. Just do steps 1-3.

Make note of the Application ID here, you’ll use it later

Step 1: Shut it down and gather info: Next we need to shut down the Honolulu service. As most of what we’ll be doing here today is going to be in Powershell let’s just do this by CLI as well.

Now let’s take a look at what’s currently in place. You can do this with the following command, the output should look like the figure to the right. The relevant info we want to take note of here is 1) The port that we’ve got Honolulu listening on and 2) The Application ID attached to the certificate. I’m just going to reuse the one there but as Charbel points out this is generic and you can just generate a new one to use by using a generator.

Pick a cert, not any cert

Finally, in our quest to gather info let’s find the thumbprint of our newly loaded certificate. You can do this by using the Get-ChildItem command like this

As you can see in the second screenshot that will give you a list of the certificates with thumbprints installed on your server. You’ll need the thumbprint of the certificate you imported earlier.

Step 2: Make it happen: Ok now that we’ve got all our information let’s get this thing swapped. All of this seems to need to be done from the legacy command prompt. First, we want to delete the certificate binding in place now and the ACL. For the example shown above where I’m using port 443 it would look like this:

Now we need to put it back into place and start things back up. Using the port number, certificate thumbprint, and appid from our example the command to re-add the SSL certificate would look like this. You, of course, would need to sub in your own information.  Next, we need to put the URL ACL back in place. Finally, we just need to start the service back up from PowerShell.

Conclusion

At this point, you should be getting a shiny green padlock when you go the site and no more nags about a bad certificate. I hope as this thing progresses out of Tech Preview and into production quality this component gets easier but at least there’s a way.

Thoughts on Leadership As Told to a 5 Year Old

I am lucky enough to be a father to a wonderful 5-year-old daughter, fresh into her Kindergarten year of school. Recently she came home with the dramatic cry of a 5-year-old, upset that her class has a Leader of the Month award and she didn’t win it. Once the sobbing subsided she got around to asking me how to be a leader, one of those basics of life type questions that all parents know and yet always get thrown by. How do I boil down the essence of leadership to something she not only can understand but can apply herself?

Thanks to the reoccurring themes of Special Agent Oso I got the idea to try to condense leadership to 3 simple steps. Simplistic I know, but the more I thought about it the more I realized that not only would it get her on the right track but that, to be honest, there are a great number of adults in leadership positions that experience differing levels of success with them. So thanks to my daughter and our good friend Oso I present Jim’s 3 simple steps to being a good leader.

Step 1! Have A Good Attitude

Seriously, there are so many studies/articles on the effect that a leader’s public attitude has on the productivity and efficiency of their team. If those linked articles aren’t enough for you Google it, there are a lot more. I know we all have our days when it all falls apart and have experienced these myself, but when those days stretch into weeks or months that team you are leading or even being a part of is going to start to fall apart as well. Some days it is going to require you to put a good face on what you are internally feeling, but coming in with a bounce in your step and not being negative is a great first step.

One of the realizations that I’ve had recently is that with any position there comes a time when you have to decide to either have some internal reconciliation with the job you have and make peace with the issues you have with it or decide to make the jump to something new. I’m not saying you should try to make the things you dislike better, you absolutely should, but staying in something you actively dislike and have lost the willpower to try to fix not only hurts you it hurts those around you.

Step 2! Listen

So for a 5-year-old trying to get them to listen to anything that isn’t on Netflix or YouTube is a challenge. Guess what? The same is true for me and you and those around you in the workplace from time to time. As framed to her, listening means making sure you pay attention not only to your teachers when they talk (and you absolutely must do this) but also to those around you that may be having issues. The teacher is talking about how to make the perfect “R” but your neighbor is having a problem? If she’s got it down this is a great opportunity to help!

As this applies to us how many of us go into meetings or conversations with our co-workers with a preset agenda or outcome of it already in our mind? Worse still is when we try to find the implied meaning of what is unsaid in a conversation. We as workers and leaders do better when we open our minds and actually listen to the words that are being spoken, then ask for clarification as needed. I know I have to work hard to quit relying on my own biases or trying to guess what the speaker means simply because we don’t want to ask “Ok, I didn’t get that, can you expand on this?”

Step 3! Apply What You’ve Learned

As we complete our journey through leadership we need have to apply what we’ve learned in the first two steps. We’ve spent the last 20 minutes learning how to make that perfect “R” and now we have to practice in class. My 2.0 may have this down, but she see’s her neighbor is trying to get help from the teacher. With a great attitude and an understanding of what’s expected this is her time to shine and help out her classmate!

At work this is important too. We have a great attitude now and we really want to be a helper, getting things done, and we’ve listened to those around us and learned not only about the needs and processes of our organizations but also the needs of our co-workers and staff. With all this ammo we are now in a position to truly make a difference because regardless of who you are helping when the knowledge gets distributed the potential for greater outcomes goes up. For those of us in traditional IT or other “support departments” this is where the true value lies. If we can’t learn, interpret and apply technology to the organization and staff’s needs then we aren’t supporting anyone.

Conclusion

So there you have it. Yes I know it’s simple, yes I know there are other factors (ego, time management, etc) that go into it as well but as in all things, you have to master the basics first. How many of you that have made it to this point were shaking your head and considering interactions with others as you read them? How many read these and had a “doh” thought about something you yourself have done? I know I did in thinking about it and there’s nothing wrong with that as long as you learn from it.

Notes on Migrating from an “All in One” Veeam Backup & Replication Server to a Distributed System

One of the biggest headaches I not only have and have heard about from other Veeam Backup & Replication administrators have is backup server migrations. In the past I have always gone the “All-in-One” approach, have one beefy physical server with Veeam directly installed and housing all the roles. This is great! It runs fast and it’s a fairly simple system to manage, but the problem is every time you need more space or your upgrading an old server you have to migrate all the parts and all the data. With my latest backup repository upgrade I’ve decided to go to a bit more of a distributed architecture, moving the command and control part out to a VM with an integrated SQL server and then letting the physical box handle the repository and proxy functions producing a best of both worlds setup, the speed and simplicity of all the data mover and VM access happening from the single physical server while the setup and brains of the operation reside in a movable, upgradable VM.

This post is mostly composed of my notes from the migration of all parts of VBR. The best way to think of this is to split the migration into 3 major parts; repository migration, VBR migration, proxy migration, and VBR migration. These notes are fairly high level, not going too deep into the individual steps. As migrations are complex if any of these parts don’t make sense to you or do not provide enough detail I would recommend that you give the fine folks at Veeam support a call to ride along as you perform your migration.

I. Migrating the Repository

  1. Setup 1 or more new repository servers
  2. Add new repository pointing to a separate folder (i.e. D:\ConfigBackups) on the new repository server to your existing VBR server exclusively for Configuration Backups. These cannot be included in a SOBR. Change the Config Backup Settings (File > Config Backup) to point to the new repository. This is also probably a good time to go ahead and run a manual Config Backup while you are there to snapshot your existing setup.
  3. Create one or more new backup repositories on your new repository server(s) to your existing VBR server configuration.
  4. Create Scale Out Backup Repository (SOBR), adding your existing repository and new repository or repositories as extents.
  5. All of your backup jobs should automatically be changed to point to the SOBR during the setup but check each of your jobs to ensure they are pointing at the SOBR.
  6. If possible go ahead and do a regular run of all jobs or wait until your regularly scheduled run.
  7. After successful run of jobs put the existing extent repository into Maintenance Mode and evacuate backups.
  8. Remove existing repository from the SOBR configuration and then from the Backup Repositories section. At this point no storage of any jobs should actually be flowing through your old server. It is perfectly fine for a SOBR to only contain a single extent from a data locality standpoint.

II. Migrate the Backup and Guest Interaction Proxies

  1. Go to each of your remaining repositories and set proxy affinity to the new repository server you have created. If you have previously scaled out your backup proxies then you can ignore this step.
  2. Under Backup Proxy in Backup Infrastructure remove the Backup Proxy installation on your existing VBR server.  Again, if possible you may want to run a job at this point to ensure you haven’t broken anything in the process.
  3. Go to each of your backup jobs that are utilizing the Guest Processing features. Ensure the guest interaction proxy at the bottom of the screen is set to either your new repository server, auto or if scaled out another server in your infrastructure.

III. Migrate the Veeam Backup & Replication Server

  1. Disable all backup, Backup Copy and Agent jobs on your old server that have a schedule.
  2. Run a Config Backup on the old server. If you have chosen to Encrypt your configuration backup the process below is going to be a great test to see if you remember or documented it. If you don’t know what this is go ahead and change it under File>Manage Passwords before running this final configuration backup.
  3. Shutdown all the Veeam services on your existing backup server or go ahead and power it down. This ensures you won’t have 2 servers accessing the same components.
  4. If not already done, create your new Veeam Backup and Replication server/VM. Be sure to follow the guidelines on sizing available in the Best Practices Guide.
  5. Install Veeam Backup, ensuring that you use the same version and update as production server. Safest bet is to just have both patched to the latest level of the latest version.
  6. Add a backup repository on your new server pointing to the Config Backup repository folder you created in step 2 of the Migrating the Repository step.
  7. Go to Config Backup and hit the “Restore” button.
  8. As the wizard begins choose the Migrate option.
  9. Change the backup repository to the repository created in step 5, choose your latest backup file which should be the same as the one created in step 2 above.
  10. If encrypted, specify your backup password and then choose to overwrite the existing VeeamBackup database you created when you installed Veeam in step 4. The defaults should do this.
  11. Choose any Restore Options you may want. I personally chose to check all 4 of the boxes but each job will have its own requirements.
  12. Click the Finish button to begin the migration. From this point if any screens or messages pop up about errors or issues in processing it is a good idea go to ahead and contact support. All this process does is move the database from the old server to the new, changing any references to the old server to the new along the way. If something goes wrong it is most likely going to have a cascade effect and you are going to want them involved sooner than later.

IV. Verification and Cleanup

  1. Now that your server has been migrated it’s a good idea to go through all the tabs in your Backup Infrastructure section, ensuring that all your information looks correct.
  2. Go ahead and run a Config Backup at this point. That’s a nice low-key way to ensure that all of the basic Veeam components are working correctly.
  3. Re-enable your disabled backup, backup copy and Agent jobs. If possible go ahead and run one and ensure that everything is hunky dory there.

Gotchas

This process when working correctly is extremely smooth. I’ll be honest and admit that I ran into a what I believe is a new bug in the VBR Migration wizard. We had a few SureBackup jobs that had been setup and while they had been run they have never been modified again since install. When this happens VBR notes the job_modified field of the job config database record as NUL. During the migration the wizard left those fields blank in the restored database, which is evidently something that is checked when you start the Veeam Backup Service. While the Service in the basic services.msc screen appears to be running under the hood you are only getting partial functionality. In my case support was able to go in and modify the database and re-include the NUL data to the field, but if you think you might have this issue it might be worth changing something minor on all of your jobs before the final configuration backup.

Conclusion

If you’ve made it this far, congrats! You should be good to go. While the process seems daunting it really wasn’t all that bad. If I hadn’t run into an issue it wouldn’t have been bad at all. The good news is that at this point you should be able to scale your backup system much easier without the grip and rip that used to be required.

Cisco Voice Servers Version 11.5 Could Not Load modules.dep

About 6 months ago we updated 3/4 of our Cisco Telephony environment from 8.5 to 11.5. The only reason we didn’t do it all is because UCCX 11.5 wasn’t out yet so it went to 11. While there were a few bumps in the road; resizing VMs, some COP files, etc. the update went well. Unfortunately once it was done we starting having a glorious issue where after a reboot the servers sometimes failed to boot, presenting “FATAL: Could not load /lib/modules/2.6.32-573.18.1.el6.x86_64/modules.dep: No such file or directory”. Any way you put it, this sucked.

The first time this happened I call TAC and while they had seen it, they had no good answer except for rebuild the VM, restore from backup. Finally after the 3rd time (approximately 3 months after install) the bug had been officially documented and (yay) it included a work around. The good news is that the underlying issue at this point has been fixed in 11.5(1.11900.5) and forward so if you are already there, no problems.

The issue lies with the fact that the locked down build of RHEL 6 that any of the Cisco Voice server platforms are built on don’t handle VMware Tools updates well. It’s all good when you perform a manual update from their CLI and use their “utils vmtools refresh” utility, but many organizations, mine included, choose to make life easier and enable vCenter Update Manager to automatically upgrade the VMware tools each time a new version is available and the VM is rebooted.

So how do you fix it? While the bug ID has the fix in it, if you aren’t a VMware regular they’ve left out a few steps and it may not be the easiest thing to follow. So here I’m going to run down the entire process and get you (and chances are, myself when this happens in the future) back up and running.

0. Go out to the cisco.com site and download the recovery CD for 11.5. You should be able to find that here, but if not or if you need a different version browse through the downloads to Downloads Home > Products > Unified Communications > Call Control > Unified Communications Manager (CallManager) > Unified Communications Manager Version 11.5 > Recovery Software. Once done upload this to any of the datastores available to host your failing VM resides on.
1. If you’ve still got the VM running, shut it down by right clicking the VM>Power>Power Off in the vCenter Web UI or the ESXi embedded host client.
2. Now we need to make a couple of modifications to the VM’s settings to tell it 1) attach the downloaded ISO file and check the “Connected at boot” box and 2) Under VM Options> Boot Options to “Force BIOS setup” at next boot. By default VMs do not look at attached ISOs as the first boot device. Once both of these are done it’s time to boot the VM.
3. I personally like to launch the VMware Remote Console first and then boot from there, that way I’ve already got the screen up. After you power on the BIOS in a VM is the same old Phoenix BIOS we all know and love. Simply tell the VM to boot to CD before hard drive, move to Exit and “Save and Exit” and your VM will reboot directly into the recovery ISO.
4.  Once you get up to the Recovery Disk menu screen as shown to the left we need to get out to a command prompt. To do this hit Alt-F2 and you’ll be presented with a standard bash prompt.
5. So the root cause of all this issue is that the initramfs file is improperly sized after an automatic upgrade of VMware tools has been processed. So now that we have our prompt we first need to verify that we are actually seeing the issue we expect. To do this run the command “ find / -name initramfs* .” This command should produce the full path and filename of the file. So to get the size of this file you now need to run an ls -lh against it. In my example your full command would be “ ls -lh /mnt/part1/boot/initramfs-2.6.32-573.18.1.el6.x86_64.img .” If you aren’t particularly used to the Linux CLI once you get past …initr you should be able to hit tab to autocomplete. This should respond by showing you that that file is incorrectly sized somewhere between 11-15 MB.
6. Now we need to perform a chroot on the directory that contains boot objects. In most cases this should simply be “ chroot /mnt/part1 “

7. Finally we need to manually re-run the VMware Tools installer to to get the file properly sized. These are included locally on the Recovery Disk so just run the command “ /usr/bin/vmware-config-tools.pl -d ” There are various steps throughout the process where it is going to ask for input. Unless you know you have a reason to differ just hit enter at each one until it completes.

Once the VMware Tools installation is done up arrow to where you checked the size of initramfs…img file above and rerun the command. You should now see file size changed to 24 MB or so.

8. Now we just need to do a little clean up before we reboot. You need to make sure you go into Settings for your VM and tell it not to connect the ISO at boot. Once you make that change you should be able to flip back over to your console and simply type reboot  or shutdown -r 0  to reboot back to full functionality.

 

Windows Server Deduplication, Veeam Repositories, and You!

Backup, among other things, is very good at creating multiple copies of giant buckets of data that don’t change much and tend to sit for long periods of time. Since we are in modern times, we have a number of technologies to deal with this problem, one of which is called deduplication with quite a few implementations of it. Microsoft has had server-based storage versions since Windows 2008 R2 that has gotten better with each release, but as any technology still has its pitfalls to be mindful of. In this post I’m going to look a very specific use case of Windows server deduplication, using it as the storage beneath your Veeam Backup and Replication repositories, covering some basic tips to keep your data healthy and performance optimized.

What is Deduplication Anyway?

For those that don’t work with it much imagine you had a copy of War and Peace stored as a Word document with an approximate file size 1 MB. Each day for 30 days you go into the document and change 100 KB worth of the text in the document and save it as a new file on the same volume. With a basic file system like NTFS this would result in you having 31 MB tied up in the storage of these files, the original and then the full file size of each additional copy.

Now let’s look at the same scenario on a volume with deduplication enabled. The basic idea of deduplication replaces identical blocks of data with very small pointers back to a common copy of the data. In this case after 30 days instead of having 31 MB of data sitting on disk you would approximately 4 MB; the original 1 MB plus just the 100 KB of incremental updates. As far as the user experience goes, the user just sees the 31 files they expect to see and they open like they normally would.

So that’s great when you are talking about a 1 MB file but what if we are talking about file storage in the virtualization world, one where we talking about terabytes of data multi gigabyte changes daily? If you think about the basic layout of a computer’s disk it is very similar to our working copy of War and Peace, a base system that rarely changes, things we add that then sit forever, and then a comparatively few things we change throughout the course of our day. This is why for virtual machine disk files and backup files deduplication works great as long as you set it up correctly and maintain it.

Jim’s Basic Rules of Windows Server Deduplication for Backup Repositories

I have repeated these a few times as I’ve honed them over the years. If you feel like you’ve read or heard this before its been part of my VeeamON presentations in both 2014 and 2015 as well as part of blog posts both here and on 4sysops.com. In any case here are the basics on care and feeding your deduplicated repositories.

  1. Format the Volume Correctly. Doing large-scale deduplication is not something that should be done without getting it right from the start. Because when we talk about backup files, or virtual disks in general for that matter, we are talking about large files we always want to format the volume through the command line so we can put some modifiers in there. The two attributes we really want to look at is /L and /A:64k. The /L  is an NTFS only attribute which overrides the default (small) size of the file record. The /A controls the allocation unit size, setting the block size. So for a given partition R: your format string may look like this:
  2. Control File Size As Best You Can. Windows Server 2012 R2 Deduplication came with some pretty stringent recommendations when it came to maximum file size and using deduplication, 1 TB. With traditional backup files blowing past that is extremely easy to do when you have all of your VMDKs rolled into a single backup file even after compression. While I have violated that recommendation in the past without issue I’ve also heard many horror stories of people who found themselves with corrupted data due to this. Your best bet is to be sure to enable Per-VM  backup chains on your Backup Repository (Backup Infrastructure> Backup Repositories> [REPONAME] > Repository> Advanced).
  3. Schedule and Verify Weekly Defragmentation. While by default Windows schedules weekly defragmentation jobs on all volumes these days the one and only time I came close to getting burnt but using dedupe was when said job was silently failing every week and the fragmentation became too much. I found out because my backup job began failing due to corrupted backup chain, but after a few passes of defragmenting the drive it was able to continue without error and test restores all worked correctly. For this reason I do recommend having the weekly job but make sure that it is actually happening.
  4. Enable Storage-Level Corruption Guard. Now that all of these things are done we should be good, but a system left untested can never be relied upon. With Veeam Backup & Replication v9 we now have the added tool on our backup jobs of being able to do periodic backup corruption checks. When you are doing anything even remotely risky like this it doesn’t hurt to make sure this is turned on and working. To enable this go to the Maintenance tab of the Advanced Storage settings of your job and check the top box. If you have a shorter retention time frame you may want to consider setting this to weekly.
  5. Modify Deduplication Schedule To Allow for Synthetic Operations. Finally the last recommendation has to do more with performance than with integrity of data. If you are going to be doing weekly synthetic fulls I’ve found performance is greatly decreased if you leave the default file age before deduplication setting (3 or 5 days depending on version of Windows) enabled. This is because in order to do the operation it has to reinflate each of the files before doing the operation. Instead set the deduplication age to 8 days to allow for the files to already be done processing before they were deduplicated.  For more information on how to enable deduplication as well as how to modify this setting see my blog over on 4sysops.com.

Well with that you now know all I know about deduplicating VBR repositories with Windows Server. Although there is currently a bug in the wild with Server 2016 deduplication, with a fix available, the latest version of Windows Server shows a lot of promise in its storage deduplication abilities. Among other things it pushes the file size limit up and does quite a bit to increase performance and stability.

Fixing Domain Controller Boot in Veeam SureBackup Labs

We’ve been dealing with an issue for past few runs of our monthly SureBackup jobs where the Domain Controller boots into Safe Mode and stays there. This is no good because without the DC booting normally you have no DNS, no Global Catalog or any of the other Domain Controller goodness for the rest of your servers launching behind it in the lab. All of this seems to have come from a change in how domain controller recover is done in Veeam Backup and Replication 9.0, Update 2 as discussed in a post on the Veeam Forums. Further I can verify that if you call Veeam Support you get the same answer as outlined here but there is no public KB about the issue. There are a couple of ways to deal with this, either each time or permanently, and I’ll outline both in this post.

The booting into Safe Mode is totally expected, as a recovered Domain Controller object should boot into Directory Services Restore mode the first time. What is missing though is that as long as you have the Domain Controller box checked for the VM in your application group setup then once booted Veeam should modify the boot setup and reboot the system before presenting it to you as a successful launch. This in part explains why when you check the Domain Controller box it lengthens the boot time allowed from 600 seconds to 1800 seconds by default.

On the Fly Fix

If you are like me and already have the lab up and need to get it fixed without tearing it back down you simply need to clear the Safe Boot bit and reboot from Remote Console. I prefer to

  1. Make a Remote Console connection to the  lab booted VM and login
  2. Go to Start, Run and type “msconfig”
  3. Click on the Boot tab and uncheck the “Safe boot” box. You may notice that Active Directory repair option is selected
  4. Hit Ok and select to Restart

Alternatively if you are command inclined a method is available via Veeam KB article 1277  where you just run these commands

it will reboot itself into normal operation. Just to be clear, either of these fixes are temporary. If you tear down the lab and start it back to the same point in time you will experience the same issue.

The Permanent Fix

The problem with either of the above methods is that while they will get you going on a lab that is already running about 50% of the time I find that once I have my DC up and running well I have to reboot all the other VMs in the lab to fix dependency issues. By the time I’m done with that I could have just relaunched the whole thing. To permanently fix the root issue is you can revert the way DCs are handled by creating a single registry entry as shown below on the production copy of each Domain Controller you run in the lab.

Once you have this key in place on your production VM you won’t have any issues with it going forward as long as the labs you launch are from backups made after that change is put in use. My understanding is this is a known issue and will eventually be fixed but at least as of 9.5 RTM it is not.