6 Common Hyper-V Configuration Mistakes to Avoid

HYPER-V A BRIEF HISTORY LESSON

The purpose of this short eBook is to showcase what you should not do when setting up your Hyper-V farms. We have travelled around the world and consulted with customers small and large on Hyper-V since before it was even called Hyper-V. The early iterations of Hyper-V came from an acquisition at Microsoft of a company called Connectix. Now many people don't remember when this happened, as it was all the way back in February of 2003. The most important part of this purchase by Microsoft, in my opinion, was the addition of some key resources for their Virtualization team. Most notably Jeff Wosley @WSV_GUY and Ben Armstrong the one and only @VirtualPCGuy.

One of Jeff and Ben's first projects at Microsoft was to convert the Connectix branded Virtualization product into what we had known as Microsoft Virtual PC and Microsoft Virtual Server. After the purchase, it took a few years for the first Microsoft branded version of this to come out. In fact, it was on July 12, 2006, that Microsoft released Virtual PC 2004 SP1 for Windows free of charge.

The next iteration, Virtual PC 2007, was released only for the Windows platform, with public beta testing beginning October 11, 2006, and production release on February 19, 2007. This version added support for hardware virtualization, viewing virtual machines on multiple monitors, and support for Windows Vista as both host and guest.

On May 15, 2008, Microsoft released Virtual PC 2007 Service Pack 1, which added support for both Windows XP SP3 and Windows Vista SP1 as guest and host OSes, as well as Windows Server 2008 Standard as a guest OS.

We are not sure if you remember these times, but as Microsoft Certified Trainers back in the day both Cristal and Dave used Virtual PC on nearly a daily basis while delivering courses to students around the world. It is also worth noting that both Jeff and Ben still work at Microsoft today and are good friends.

Though we have been longtime fans of the product, it was not painless in the beginning. The problem with Virtual PC was that it was a standalone workstation product. At this time, VMWare was coming out with Version 1.0 of the first enterprise Hypervisor and were instantly dominating this space. Microsoft's answer to this was the introduction of Microsoft Virtual Server 2005. Microsoft Virtual Server was a virtualization solution that facilitated the creation of virtual machines on the Windows XP, Windows Vista and Windows Server 2003 operating systems.

Virtual machines were created and managed through a Web-based interface that relies on Internet Information Services (IIS) or through a Windows client application tool called VMRCplus.

The last version using this was named Microsoft Virtual Server 2005 R2 SP1. New features in R2 SP1 included Linux guest operating system support, Virtual Disk Precompactor, SMP (but not for the guest OS), x64 host operating system support, the ability to mount virtual hard drives on the host machine and additional operating systems support, including Windows Vista. It also provided a Volume Shadow Copy writer that enabled live backups of the Guest OS on a Windows Server 2003 or Windows Server 2008 host. A utility to mount VHD images had also been included in SP1. Virtual Machine Additions for Linux were available as a free download. Officially supported Linux guest operating systems included Red Hat Enterprise Linux versions 2.1-5.0, Red Hat Linux 9.0, SUSE Linux and SUSE Linux Enterprise Server versions 9 and 10.

Now let's fast forward to 2008 when Microsoft first released a brand-new virtualization platform called Hyper-V. Microsoft Hyper-V, codenamed Viridian and formerly known as Windows Server Virtualization, is a native hypervisor. The best part of Hyper-V is that Microsoft built it into the Kernel and could simply be enabled by turning on a Role. Starting with Windows 8, Hyper-V was also included in the desktop class operating systems and has now completely replaced Windows Virtual PC.

Note: Did you know that the Hyper-V core binaries and code are the same on Windows 10 and Windows Server 2016. In the same way, as with Windows 8.1 and Windows 2012 R2. This is what allows features like Live Migration to work from a desktop to a server.

From Virtual Server 2005 to what you now know as a cloud-based hyper-scale platform called Azure with its billions of dollars in revenue, and its billions of reinvestment back into the virtualization stack. With all this, Microsoft has given us what we now have to work with inside of Windows Server 2016's Hyper-V. It is pretty amazing to think just how far the product has come.

Can you agree that Microsoft has come a long way with their virtualization platform? We think so, and now that is enough with the history lesson and on with the show.

Why are you reading this eBook about common Hyper-V configuration mistakes? The answer is so pure and simple and can be summed up like this: There are a lot of books and eBooks out there showing you how to configure, implement, and maintain Hyper-V. We found that there was not a lot of documentation showcasing what not to do. Now, understand we love Hyper-V and these tips and tricks are merely being written up to help you avoid some painful and avoidable mistakes. So, ready to learn? Let's go.

SAMPLE FILES

All sample files for this book can be downloaded from http://www.checkyourlogs.net and at http://www.github.com/dkawula

ADDITIONAL RESOURCES

In addition to all tips and tricks provided in this book, you can find extra resources like articles and video recordings on our blog at http://www.checkyourlogs.net and at the Altaro Hyper-V Hub.

COMMON HYPER-V HOST DEPLOYMENT MISTAKES

Enabling Hyper-V Server as part of Windows 2016 is easy. You just click on the Hyper-V Role, reboot twice, and you have a machine that is capable of running Hyper-V. You can spin up VMs and you are off to the races.

Yes, the above statement is 100 percent true yet there is a HUGE difference between running Hyper-V on a Windows 10 Laptop and a Production Server. By huge, I mean that on a production Hyper-V Host Server you should never install any third-party software, applications, additional drivers, agents, or anything that is not 100 percent supported by Microsoft. On a Laptop / Development environment you are ok to do what you want.

WHAT NOT TO INSTALL ON YOUR HYPER-V HOST SERVER?

I think the most important place to start with this conversion of what not to do when installing software on your Hyper-V Host server, is to look at what is actually allowed by Microsoft. There are licensing implications if you do not follow their rules. Microsoft gives you licensing grants to run instances of Virtual Machines on your Host Servers and in Windows Server 2016 Standard Edition you can only run two additional Virtual Machines. On those Virtual Machines, you can do whatever you like but, on the Host Server there are limitations per the licensing agreements with Microsoft.

For example, if you install anything that is on the parent partition that is not used in the management and operation of Hyper-V like: Active Directory, DNS, or DHCP. You will consume one of the licenses that you have been granted for your server.

Note: Although you can do this we do not recommend installing any unnecessary software on your Hyper-V Host Server. Not only can it cause licensing ramifications for you it can also cause issues with Hyper-V. The Active Directory Role when installed, disables write-back caching which can greatly reduce the overall performance of your Hyper-V Host. There are a lot of other reasons like: Domain Security, no support from Microsoft, system hardening, backup and recovery issues, and more. These can be found in detail here: http://www.altaro.com/hyper-v/reasons-not-to-make-hyper-v-a-domain-controller/

We have always found that if you leave the Hyper-V Host alone and just run the core and critical agents and software, they work great and have no issues. If you are going to install anything on your Hyper-V Host Servers make sure it is supported and that you thoroughly test it prior to going into production.

Note: Here is my obligatory statement when talking about Microsoft and Licensing.

Software licensing is a legal matter, not a technical concern. The author and presenter of this work is not a lawyer and no lawyers were consulted when writing this work. Its contents are intended to be a guideline to aid in comprehension of the concepts of the specific licensing detail. It does not constitute legal advice or interpretation. Neither the author and presenter nor Altaro Software, Ltd. are offering legal advice and this work cannot be construed as such. We cannot be held responsible for any negative outcomes of the usage of any of the contents, whether it is through an error on our part or a misunderstanding on yours. For official answers, contact Microsoft Licensing or check with your reseller. Authorized resellers should have someone on staff that can authoritatively answer licensing questions. There are very steep fines and bounties associated with licensing violations. It is worth your time to get official answers.

The most detailed publicly-available material that Microsoft publishes on the subject is the Product Use Rights document. The clearest publicly-available material that Microsoft publishes is the Windows Server 2016 Licensing Datasheet. The license agreement that is included with your software is the only legally binding document. If your compliance is ever tested in court, the licensing agreement is the only applicable document.

DON'T LEAVE YOUR HYPER-V HOST IN A GENERAL SERVER OR WORKSTATION OU

Leaving your Hyper-V Host Servers in a general Server OU is a very bad move. We have seen customer environments where Hyper-V Host Servers randomly reboot because they are being tagged by a workstation patching policy. Upon investigation, was noted that the Active Directory Computer objects for the Hyper-V Host Servers were actually in the workstations OU.

You should design and configure dedicated Group Policy Settings for your Hyper-V Hosts that meet the SOE (Standard Operating Environment) standards for your datacenter. This can be different from customer to customer. If you don't have an SOE for your organization, that is fine, just make sure to exclude your Hyper-V Hosts from general policies not designed for Servers.

DON'T OPERATE HYPER-V INSECURELY

With the proliferation of Malware and Ransomware around the world we thought it would be prudent to discuss some of the best security recommendations. These are seen in the list below:

Note: For more information on Securing Hyper-V Microsoft has great blog post up on Technet titled: Planning for Hyper-V Security. https://technet.microsoft.com/en-us/library/cc974516.aspx

SECURE THE HOST OS

  • You can Minimize the attack surface by using the minimum Windows Server installation option that you need for the management operating system. We don't recommend that you run production workloads on Hyper-V on Windows 10.
  • Keep the Hyper-V host operating system, firmware, and device drivers up to date with the latest security updates. Check your vendor's recommendations to update firmware and drivers.
  • Don't use the Hyper-V host as a workstation or install any unnecessary software.
  • Remotely manage the Hyper-V host. If you must manage the Hyper-V host locally, use Credential Guard.
  • Enable code integrity policies. Use virtualization-based security protected Code Integrity services.

SECURE YOUR NETWORKS

  • Use a separate network with a dedicated network adapter for the physical Hyper-V computer.
  • Use a private or secure network to access VM configurations and virtual hard disk files.
  • Use a private/dedicated network for your live migration traffic. Consider enabling IPSec on this network to use encryption and secure your VMs data going over the network during migration.

USE SECURE STORAGE MIGRATION NETWORKS

  • Use SMB 3.0 for end-to-end encryption of SMB data and data protection tampering or eavesdropping on untrusted networks. Use a private network to access the SMB share contents to prevent man-in-the-middle attacks.
  • Configure hosts to be part of a guarded fabric
  • Use Shielded Virtual machines and configured a guarded fabric for critical infrastructure and VMs. Use hardware with a Trusted Platform Module (TPM) 2.0 chip to set up a guarded fabric.

THINK ABOUT DEVICE SECURITY

  • Secure the storage devices where you keep virtual machine resource files.

USE DRIVE ENCRYPTION

  • Use BitLocker Drive Encryption to protect resources.

LOCK DOWN THE AND HARDEN THE HOST OS

  • Use the baseline security setting recommendations described in the Windows Server Security Baseline.

USE A MOST RESTRICTIVE PERMISSION MODEL

  • Add users that need to manage the Hyper-V host to the Hyper-V administrators group.
  • Don't grant virtual machine administrators permissions on the Hyper-V host operating system.
  • Use the Just Enough Administration (JEA) helper tool

DON'T MOUNT UNKNOWN VHDS

  • This can expose the host to file system level attacks.

SECURITY INSIDE THE GUESTS VIRTUAL MACHINES

Our guest virtual machines are really the most critical asset of any virtual infrastructure. We need to protect them as just as much as the Host. Here is a quick list of some things you should consider locking down on your guests.

SECURE THE GUEST VMS

  • Create generation 2 VMs for supported guest operating systems. You should consider enabling Secure Boot as a core feature set of a generation 2 VM.

PATCH PATCH PATCH

  • Install the latest security updates before you turn on a virtual machine in a production environment.
  • Install integration services for the supported guest operating systems that need it and keep it up to date. Integration service updates for guests that run supported Windows versions are available through Windows Update.
  • Harden the operating system that runs in each virtual machine based on the role it performs. Use the baseline security setting recommendations that are described in the Windows Security Baseline.

SECURE THE GUEST VIRTUAL NETWORKS

  • Make sure virtual network adapters connect to the correct virtual switch and have the appropriate security setting and limits applied. Considerations for SDN Software Defined Networking and VLAN Isolation must be taken into consideration.

SECURE THE VIRTUAL DEVICES ATTACHED

  • Configure only required devices for a virtual machine. Don't enable discrete device assignment in your production environment unless you need it for a specific scenario. If you do enable it, make sure to only expose devices from trusted vendors.

CONFIGURE ANTIVIRUS, FIREWALL, AND INTRUSION DETECTION SOFTWARE

  • Having a good understanding of when, how, and why an attack happens is critical. Consider monitoring your critical Hyper-V Guests using Microsoft Security Center and Operations Management Suite. Only install agents within virtual machines as appropriate, based on the virtual machine role.

CHAPTER SUMMARY

In conclusion, it is imperative that you gain a solid understanding of your Microsoft Licensing in your datacenter. With this you should only run management software (for example, antivirus software, backup software, or virtual machine management software) on the Parent Partition of your Hyper-V Host. You should not deploy other server-based applications (for example, Exchange, SQL Server, Active Directory, or SAP). The Parent Partition should be reserved for running guest virtual machines only. You should also, pay close attention to the security configuration of your environment and follow best practice guidelines to lockdown and standardize your Hyper-V environment.

DON'T SPAN NUMA NODES

Are you aware of a something called non-uniform memory access (NUMA)? Many people in the industry have a very difficult time understanding NUMA and its use cases. What is NUMA and why should we care as virtualization admins? More importantly, what happens if we misconfigure NUMA settings in Hyper-V?

In this 2nd chapter, we will dig into some of the basics with NUMA and how to properly configure Hyper-V for a NUMA Aware workload such as SQL Server.

WHAT EXACTLY IS NUMA ANYWAYS?

I like the definition that is given from our good friends at Wikipedia:

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

Modern CPUs operate considerably faster than the main memory they use. In the early days of computing and data processing, the CPU generally ran slower than its own memory. The performance lines of processors and memory crossed in the 1960s with the advent of the first supercomputers. Since then, CPUs increasingly have found themselves "starved for data" and having to stall while waiting for data to arrive from memory. Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.

Limiting the number of memory accesses provided the key to extracting high performance from a modern computer. For commodity processors, this meant installing an ever-increasing amount of high-speed cache memory and using increasingly sophisticated algorithms to avoid cache misses. But the dramatic increase in size of the operating systems and of the applications run on them has generally overwhelmed these cache-processing improvements. Multi-processor systems without NUMA make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access the computer's memory at a time.

NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for servers and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks). Another approach to addressing this problem, used mainly in non-NUMA systems, is the multi-channel memory architecture, in which a linear increase in the number of memory channels increases the memory access concurrency linearly.

The traditional model for multiprocessor support is symmetric multiprocessor (SMP). In this model, each processor has equal access to memory and I/O. As more processors are added, the processor bus becomes a limitation for system performance.

Developers use NUMA to increase processor speed without increasing the load on the processor bus. The architecture is non-uniform because each processor is close to some parts of memory and farther from other parts of memory. The processor quickly gains access to the memory it is close to, while it can take longer to gain access to memory that is farther away.

In a NUMA system, CPUs are arranged in smaller systems called nodes. Each node has its own processors and memory, and is connected to the larger system through a cache-coherent interconnect bus.

The system attempts to improve performance by scheduling threads on processors that are in the same node as the memory being used. It attempts to satisfy memory-allocation requests from within the node, but will allocate memory from other nodes if necessary. It also provides an API to make the topology of the system available to applications.

In lay man's terms, NUMA bundles together Processors and Memory into groups called Nodes. These Nodes can be divided into Local and Remote. Local, meaning that a processor has its own memory within its Node. Remote, being when you span to another group of Processor(s) with their own memory.

These NUMA Nodes can then be allocated to applications designed to take advantage of them (NUMA Aware). One such application that we are all familiar with is Microsoft SQL Server. Incorrectly configuring NUMA settings in Hyper-V can have a dire impact on performance.

HYPER-V AND NUMA

Starting with Windows 2012 Microsoft introduced support for Virtual NUMA architecture into Hyper-V Virtual Machines. Being able to leverage NUMA inside of virtual guests greatly improved the performance of large workloads such as Microsoft SQL Server which is NUMA aware.

Note: To avoid remote access penalties, a NUMA-aware application attempts to allocate storage for data and schedule processor threads to access that data in the same NUMA node. These optimizations minimize memory access latencies and reduce memory interconnect traffic.

VIRTUAL NUMA

A Hyper-V Host server can match the physical topology of NUMA nodes with the virtual topology that is exposed to the guest virtual machines. This is incredibly important because the applications using NUMA gain massive performance gains by having these topologies match.

NUMA SPANNING

The host settings for NUMA in Hyper-V are global and affect all Virtual Machines running on that Hyper-V Host Server. Spanning NUMA Nodes allows Hyper-V administrators to gain greater density on host servers. This can adversely impact performance and is not recommended for NUMA Aware applications such as Microsoft SQL Server.

Figure 1 - NUMA Spanning enabled by default in Hyper-V Settings

NUMA spanning is enabled by default on all Hyper-V Host Servers so that when memory is constrained Virtual Machines will still be allowed to start, restore, and/or be migrated. If NUMA spanning is disabled, you can run into issues where a Virtual Machine is not allowed to start because it is out of resources inside of its NUMA node. To maximize usability of the hypervisor both Microsoft and VMware leave NUMA Spanning enabled by default.

If you have a workload such as SQL Server that is NUMA aware we need to take careful consideration in the placement of this SQL Server in our Hyper-V environments. Generally speaking, we like to typically have dedicated Hyper-V Hosts for our database tier and can thus control settings such as NUMA Spanning without impacting other workloads that are not NUMA aware and don't really care if memory is shared between NUMA Nodes.

CALCULATING NUMA FOR VM PLACEMENT AND SIZING

How is NUMA Calculated in Hyper-V? This is a common question that we often get asked when presenting on virtualization at conferences around the world. There are two calculations for determining this:

Total Memory / NUMA Nodes - Host Reserve = Maximum Memory

Figure 2 - NUMA Configuration on a Hyper-V Virtual Machine

To determine the optimal configuration based on your Hyper-V host's hardware you can click on the Use Hardware Topology button. After clicking on Use Hardware Topology, you will get a calculation of the Maximum amount of Memory that you should configure on a Virtual Machine before it spans over to another NUMA Node. Also, depending on if you have disabled NUMA Spanning this becomes your value for the maximum amount of available Memory that you can assign to a Microsoft SQL Server.

Here is what that calculation looks like to determine the optimal amount of Memory on this host.

For example, 192GB of RAM and 2 x NUMA Nodes - 2 GB Host Reserve = 95304 MB/Node

Now let's look at the same calculation to figure out the Maximum number of Virtual Processors we can assign before NUMA becomes misaligned and we start spanning across NUMA Nodes.

The first thing we need to check is the number of Cores available via Task Manager.

Figure 3 - Showing NUMA Configuration in Task Manager

If you right click on CPU panel you can change the graph to NUMA Nodes. This is an important calculation for us because this is how we will determine how the Logical Processors will be carved out with NUMA.

Figure 4 - Viewing NUMA Nodes in Task Manager

In our case, we have Intel E5-2620 v3 Processors and these will give us a total of 24 Logical Processors.

The Calculation to determine how many Virtual Processors we can have in a single NUMA Node looks like this:

Number of NUMA Nodes / Number of Logical Processors = Maximum Number of Virtual Processors per NUMA Node

This Hyper-V Host Server that we are using for the above examples can be configured based on this calculation. If you look at the screen shot below you can see that we have allocated 12 Virtual Processors to a Virtual Machine.

Figure 5 - Configuring Hyper-V Virtual Processors

Look at the same configuration with 13 Virtual Processors. Immediately Hyper-V Manager is warning us that we are going to be spanning NUMA. Or does it?

Figure 6 - Crossing NUMA Nodes with an incorrect Virtual CPU Conifguration

Well not quite as it is only telling us that we are using 2 Sockets which means we are using 2 x NUMA Nodes for this configuration. As you have learned from the earlier descriptions of NUMA this configuration could immediately impact our performance and this is likely not a good idea.

This Hyper-V Host should have a Maximum VM Configuration of no more than 12 Virtual CPU's and 95304 MB of RAM. Remember that you can determine this by clicking the Show Hardware Topology button in the NUMA Settings.

Figure 7 - Use Hardware Topology button in NUMA Settings

CHAPTER SUMMARY

In conclusion, it is imperative that you properly plan NUMA configurations for your workloads. You first need to determine if you will be disabling NUMA Spanning. This setting can prevent a potential misconfiguration of the Virtual Machine running Microsoft SQL Server. However, the impact can also result in an inability to Start, Restore, or Migrate a Virtual Machine if you don't plan correctly. Second, you should watch how you are configuring your Virtual Machines to ensure that you are not Spanning NUMA Nodes. This will guarantee that you will always have the best performance when dealing with your workloads regardless if they are NUMA Aware or not.

DON'T MISCONFIGURE YOUR ANTI-VIRUS

It was as recently as last week that we had an emergency phone call from a client. Here is a quote from the client "Exchange is down and the Hyper-V Management Console is locked up. When I check the CPU, it is pinned at 100% and I can't do anything". This can be quite a concerning call to make for the customer and for the consultant dealing with this situation. In troubleshooting issues like this we normally lean to one thing at the very top of our list. Do you know what that is?

It is improperly configured Anti-Virus software that is installed on the Host Servers and in the Virtual Machines running on those hosts. This issue is the first one that we will look at in this book as a major thing not to do with Hyper-V.

TYPES OF I/O STORMS

If you are not familiar with an I/O Storm, it is probably enemy number 1 for Virtualization Admins. This can crush a farm of servers and instantly produce an unplanned outage. If you are new to this term let's dig in a bit into what exactly an IO Storm is and what can cause it.

BOOT I/O STORMS

A Boot Storm occurs when you have an influx of I/O on your storage, from Virtual Machines starting together at the same time. A typical Virtual Machine will generate anywhere between 200-400 IOPS during its initial boot-up and then level out when it is idle. If you do some simple math and you have 20 virtual machines running on a host server you could generate around 8000 IOPS just by turning on your VMs. Now what happens if your storage can't keep up? Things don't turn off, they just get really slow because the Disk Subsystem starts to queue those requests. Until the queue has a chance to clear, the VMs will feel extremely slow, have a lot of lag, and even start dropping packets on the network.

DISASTER RECOVERY SITE I/O STORMS

One of the most common areas today of the above-mentioned Boot Storm is a disaster recovery event. Most organizations will configure replicas of their virtual infrastructure at some type of a Disaster Recovery Site, and to be quite honest, typically use inferior hardware at this site. Slower processors, less RAM, and slower disks. During a critical failover, they will try to turn on the VMs too fast and will actually bring that infrastructure at the Disaster Recovery site to a grinding halt, because it simply cannot keep up with the I/O demand.

INVENTORY I/O STORMS

This one is particularly hard to trace on a network and can be incredibly bad for virtual infrastructure. We have personally seen a misconfigured System Center Configuration Manager (SCCM) network scan and inventory scan bring down an entire network and the associated virtual infrastructure. This is because the scans are typically set to run at a particular date and time on a regular basis. One example, was a support case that we were called into at a regional bank. Where every third Thursday of the month the teller workstations would lose access to the servers and the Citrix Farm would become so unresponsive that it was timing out all connections. In digging into this issue, we found that it was caused because all of the virtual Citrix XenApp Servers and associated infrastructure on the Host Servers had been added to SCCM. SCCM had misconfigured the inventory and network scans, and every single machine on the network was initiating the scan at the same time and day. The associated I/O generated in the virtual infrastructure actually brought the entire farm down.

ANTI-VIRUS I/O STORMS

Anti-Virus in virtual infrastructure can be a real Catch 22 because you can improve performance by just not running it, but you risk infection of Malware and Ransomware by not running it. Today with the influx of cyber security threats we cannot run our virtual machine hosts and virtual machines without some type of protection. Anti-Virus I/O Storms behave in a very similar fashion to the Inventory I/O Storms. They will both scan every file on every virtual machine at the same time causing a catastrophic increase in IOPS that will seriously impact performance.

Note: It is worth noting that you can greatly reduce the attack surface by running Hyper-V on Windows Core or even Nano Server.

VM CORRUPTION

We have been involved with customer cases on Hyper-V in the past where certain VMs became corrupt and the customer ended up having to restore from backup. Then about 2 months later the same thing happened again and they ended up having to restore a second time. This wasn't a very good situation for either us as the consultants, nor for the customer. This situation actually had them starting to lose faith in Hyper-V as their hypervisor and had them starting to look at VMware. Honestly, if this was happening in a VMware environment the customer would likely start looking at Hyper-V and other hypervisors. So, what was the mysterious cause of this VM corruption? It was incorrectly configured anti-virus exclusions. There is a very good reason why we have these things called anti-virus exclusion requirements.

ANTI-VIRUS EXCLUSION LISTS

I think we have learned our lessons that not having anti-virus software running on the Hosts and guest virtual machines is a BAD idea. In light of this you need to plan and configure your anti-virus scans and software properly. Most vendors publish these types of lists, but finding a comprehensive list from a Vendor such as Microsoft can be very daunting. In fact, to this point Microsoft includes and enables Windows Defender by default on Windows Server 2016.

WINDOWS DEFENDER AUTOMATIC EXCLUSIONS

Starting with Windows Server 2016 Anti-Virus protection is enabled by default. Now in order to prevent VM Corruption issues Microsoft has introduced a new concept called Automatic Exclusions. These exclusions have been defined and are updated by Microsoft, and the best part, can be used by administrators to build their own exclusion lists in the event they don't want to use Windows Defender as their core Anti-Virus. This is also important as many of you will have mixed farms with Windows 2012 R2 and Windows 2016 or earlier VMs in your farms. These Automatic Exclusions only pertain to Windows Server 2016 and later.

WHAT TO EXCLUDE ON MICROSOFT WINDOWS SERVER 2016?

This section lists the default exclusions for all Windows Server 2016 roles.

GENERAL FILES

Windows "temp.edb" files:

%windir%\SoftwareDistribution\Datastore\*\tmp.edb

%ProgramData%\Microsoft\Search\Data\Applications\Windows\\\.log

Windows Update files or Automatic Update files:

%windir%\SoftwareDistribution\Datastore\*\Datastore.edb

%windir%\SoftwareDistribution\Datastore\*\edb.chk

%windir%\SoftwareDistribution\Datastore\\edb\.log

%windir%\SoftwareDistribution\Datastore\\Edb\.jrs

%windir%\SoftwareDistribution\Datastore\\Res\. log

Windows Security files:

%windir%\Security\database\*.chk

%windir%\Security\database\*.edb

%windir%\Security\database\*.jrs

%windir%\Security\database\*.log

%windir%\Security\database\*.sdb

GROUP POLICY FILES

%allusersprofile%\NTUser.pol

%SystemRoot%\System32\GroupPolicy\Machine\registry.pol

%SystemRoot%\System32\GroupPolicy\User\registry.pol

WINS files:

%systemroot%\System32\Wins\\\.chk %systemroot%\System32\Wins\\\.log

%systemroot%\System32\Wins\\\.mdb

%systemroot%\System32\LogFiles\

%systemroot%\SysWow64\LogFiles\

FILE REPLICATION SERVICE (FRS) EXCLUSIONS

Files in the File Replication Service (FRS) working folder. The FRS working folder is specified in the registry key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Working Directory.

%windir%\Ntfrs\jet\sys\*\edb.chk

%windir%\Ntfrs\jet\*\Ntfrs.jdb

%windir %\Ntfrs\ jet\log\\\. log

FRS Database log files. The FRS Database log file folder is specified in the registry key HKEY_LOCAL_MACHINE\System\Currentcontrolset\Services\Ntfrs\Parameters\DB Log File Directory.

%windir%\Ntfrs\\Edb\. log

The FRS staging folder. The staging folder is specified in the registry key HKEY_LOCAL_ MACHINE\System\Currentcontrolset\Services\NtFrs\Parameters\Replica Sets\GUID\ Replica Set Stage

%systemroot%\Sysvol\\Nntfrs_cmp\\

The FRS preinstall folder. This folder is specified by the folder Replica_root\DO_NOT_REMOVE_NtFrs_PreInstall_Directory.

%systemroot%\SYSVOL\domain\DO_NOT_REMOVE_NtFrs_PreInstall_Directory\\Ntfrs\\

THE DISTRIBUTED FILE SYSTEM REPLICATION (DFSR)

DFSR Database and working folders. These folders are specified by the registry key HKEY_LOCAL_MACHINE\System\Currentcontrolset\Services\DFSR\Parameters\ Replication Groups\GUID\Replica Set Configuration File.

%systemdrive%\System Volume Information\DFSR\$db_normal$

%systemdrive%\System Volume Information\DFSR\FileIDTable_*

%systemdrive%\System Volume Information\DFSR\SimilarityTable_*

%systemdrive%\System Volume Information\DFSR\*.XML

%systemdrive%\System Volume Information\DFSR\$db_dirty$

%systemdrive%\System Volume Information\DFSR\$db_clean$

%systemdrive%\System Volume Information\DFSR\$db_lostl$

%systemdrive%\System Volume Information\DFSR\Dfsr.db

%systemdrive%\System Volume Information\DFSR\*.frx

%systemdrive%\System Volume Information\DFSR\*.log

%systemdrive%\System Volume Information\DFSR\Fsr*.jrs

%systemdrive%\System Volume Information\DFSR\Tmp.edb

PROCESS EXCLUSIONS

%systemroot%\System32\dfsr.exe

%systemroot%\System32\dfsrs.exe

HYPER-V EXCLUSIONS

This section lists the file type exclusions, folder exclusions, and process exclusions that are delivered automatically when you install the Hyper-V role.

File type exclusions:

*.vhd

*.vhdx

*.avhd

*.avhdx

*.vsv

*.iso

*.rct

*.vmcx

*.vmrs

Folder exclusions:

%ProgramData%\Microsoft\Windows\Hyper-V

%ProgramFiles%\Hyper-V

%SystemDrive%\ProgramData\Microsoft\Windows\Hyper-V\Snapshots

%Public%\Documents\Hyper-V\Virtual Hard Disks

Process exclusions:

%systemroot%\System32\Vmms.exe

%systemroot%\System32\Vmwp.exe

SYSVOL EXCLUSIONS

%systemroot%\Sysvol\Domain\*.adm

%systemroot%\Sysvol\Domain\*.admx

%systemroot%\Sysvol\Domain\*.adml

%systemroot%\Sysvol\Domain\Registry.pol

%systemroot%\Sysvol\Domain\*.aas

%systemroot%\Sysvol\Domain\*.inf

%systemroot%\Sysvol\Domain\*.Scripts .ini

%systemroot%\Sysvol\Domain\*.ins

%systemroot%\Sysvol\Domain\Oscfilter.ini

ACTIVE DIRECTORY EXCLUSIONS

This section lists the exclusions that are delivered automatically when you install Active Directory Domain Services.

NTDS database files. The database files are specified in the registry key HKEY_LOCAL_ MACHINE\System\CurrentControlSet\Services\NTDS\Parameters\DSA Database File.

%windir%\Ntds\ntds.dit %windir%\Ntds\ntds.pat

The AD DS transaction log files. The transaction log files are specified in the registry key HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\Parameters\ Database Log Files.

%windir%\Ntds\EDB*. log

%windir%\Ntds\Res*. log

%windir%\Ntds\Edb*.jrs

%windir%\Ntds\Ntds*.pat

%windir%\Ntds\EDB*. log

%windir%\Ntds\TEMP. edb

The NTDS working folder. This folder is specified in the registry key HKEY_LOCAL_ MACHINE\System\CurrentControlSet\Services\NTDS\Parameters\DSA Working Directory.

%windir%\Ntds\Temp.edb %windir%\Ntds\Edb.chk

Process exclusions for AD DS and AD DS-related support files:

%systemroot%\System32\ntfrs.exe %systemroot%\System32\lsass.exe

DHCP SERVER EXCLUSIONS

This section lists the exclusions that are delivered automatically when you install the DHCP Server role.

The DHCP Server file locations are specified by the DatabasePath, DhcpLogFilePath, and BackupDatabasePath parameters in the registry key HKEY_LOCAL_MACHINE\ System\CurrentControlSet\Services\DHCPServer\Parameters

%systemroot%\System32\DHCP\\\.mdb

%systemroot%\System32\DHCP\\\.pat

%systemroot%\System32\DHCP\\\.log

%systemroot%\System32\DHCP\\\.chk

%systemroot%\System32\DHCP\\\.edb

DNS SERVER EXCLUSIONS

This section lists the file and folder exclusions and the process exclusions that are delivered automatically when you install the DNS Server role.

File and folder exclusions for the DNS Server role:

%systemroot%\System32\Dns\\\. log

%systemroot%\System32\Dns\\\.dns

%systemroot%\System32\Dns\\\. scc

%systemroot%\System32\Dns\*\BOOT

Process exclusions for the DNS Server role:

%systemroot%\System32\dns.exe

FILE AND STORAGE SERVICES EXCLUSIONS

This section lists the file and folder exclusions that are delivered automatically when you install the File and Storage Services role. *The exclusions listed below do not include exclusions for the Clustering role.

%SystemDrive%\ClusterStorage

%clusterserviceaccount%\Local Settings\Temp

%SystemDrive%\mscs

PRINT SERVER EXCLUSIONS

This section lists the file type exclusions, folder exclusions, and the process exclusions that are delivered automatically when you install the Print Server role.

File type exclusions:

*.shd

*.spl

Folder exclusions. This folder is specified in the registry key HKEY_LOCAL_MACHINE\ SYSTEM\CurrentControlSet\Control\Print\Printers\DefaultSpoolDirectory:

%system32%\spool\printers\*

Process exclusions:

spoolsv.exe

WEB SERVER EXCLUSIONS

This section lists the folder exclusions and the process exclusions that are delivered automatically when you install the Web Server role.

Folder exclusions:

%SystemRoot%\IIS Temporary Compressed Files

%SystemDrive%\inetpub\temp\IIS Temporary Compressed Files

%SystemDrive%\inetpub\temp\ASP Compiled Templates

%systemDrive%\inetpub\logs

%systemDrive%\inetpub\wwwroot

Process exclusions:

%SystemRoot%\system32\inetsrv\w3wp.exe

%SystemRoot%\SysWOW64\inetsrv\w3wp.exe

%SystemDrive%\PHP5433\php-cgi.exe

WINDOWS SERVER UPDATE SERVICES EXCLUSIONS

This section lists the folder exclusions that are delivered automatically when you install the Windows Server Update Services (WSUS) role.

The WSUS folder is specified in the registry key HKEY_LOCAL_MACHINE\Software\ Microsoft\Update Services\Server\Setup.

%systemroot%\WSUS\WSUSContent

%systemroot%\WSUS\UpdateServicesDBFiles

%systemroot%\SoftwareDistribution\Datastore

%systemroot%\SoftwareDistribution\Download

CHAPTER SUMMARY

In conclusion, it is imperative that you properly plan two things when deploying Anti-Virus solutions to your virtual infrastructure. First, you must offset your scanning schedules to avoid unnecessary Anti-Virus I/O Storms. Second, you must configure and plan your Anti-Virus exclusions properly to avoid VM Corruption issues.

WATCH YOUR CHECKPOINTS

It is really funny, because there are so many great Backup vendors out there for Microsoft Hyper-V and VMWare today, that we still find situations where customers do not have an active backup strategy in place except for keeping a couple of Checkpoints around. This is not only obviously concerning, but is in most cases not supported by the vendor of the hypervisor. In Microsoft's case Hyper-V Checkpoints were not supported in production until the release last year of Windows Server 2016. At that time, they released a new feature called Production Checkpoints.

Maybe we should take a step back before we get started here. Who reading this book knows what a Hyper-V Checkpoint is? Now I know some of you just put your hands up and that is amazing. For those of you that didn't read on for the next few paragraphs so we can level set some knowledge.

WHAT IS A HYPER-V CHECKPOINT (SNAPSHOT)

Eric Siron from the Altaro Software Hyper-V Blog gave a great description on what exactly is a Hyper-V Checkpoint? Seeing as we are writing this book for Altaro, why not show some additional love back to their amazing team.

"A Hyper-V checkpoint is an unchanging point in the lifespan of a virtual machine. The virtual machine can still be used as normal, but the checkpoint is protected from any changes that are made to the virtual machine. Such "changes" are generally understood to refer to data within attached virtual hard disks. That's only part of the story. No change to the virtual machine affects the checkpoint; it is also isolated from dis/ connection of virtual hard disks, network topology changes, memory re-assignments, and almost anything else about a virtual machine that can be modified." Eric Siron (Altaro Software).

Note: We are not here to learn about all the fundamentals of what a Hyper-V Checkpoint is in this eBook as this could take up the entire book itself. For more information read Eric's blog post Standard and Production Checkpoints in Hyper-V: https://www.altaro.com/hyper-v/standard-production-checkpoints-hyper-v/

CHECKPOINTS ARE NOT BACKUPS

ACTIVE DIRECTORY

Depending on the version of your guest operating systems, you may or may not be in a supported state using Checkpoints. These are not supported if you run:

  • Windows Server 2003
  • Windows Server 2008
  • Windows Server 2008 R2

They are supported starting with:

  • Windows 2012
  • Windows 2012 R2
  • Windows 2016 +

These newer operating systems are made aware of the Checkpoint using the Generation ID. Starting with Windows 8 and Windows 2012 there was a new feature that allowed for the hypervisor to detect a time shift event. Time shift events typically occur as a result of the application of a Checkpoint or the restore of a VM.

Virtual environments present unique challenges to distributed workloads that depend upon a logical clock-based replication scheme. Active Directory Directory Services replication, for example, uses a monotonically increasing value (known as a USN or Update Sequence Number) assigned to transactions on each domain controller. Each domain controller's database instance is also given an identity, known as an InvocationID. The InvocationID of a domain controller and its USN together serve as a unique identifier associated with every write-transaction performed on each domain controller and must be unique within the forest.

Active Directory Directory Services replication uses InvocationID and USNs on each domain controller to determine what changes need to be replicated to other domain controllers. If a domain controller is rolled back in time outside of the domain controller's awareness and a USN is reused for an entirely different transaction, replication will not converge because other domain controllers will be out of sync with those changes. A virtual machine makes it easy for hypervisor administrators to roll back a domain controller's USNs (its logical clock) by, for example, applying a snapshot outside of the domain controller's awareness.

Beginning with Windows Server 2012, Active Directory Directory Services virtual domain controllers hosted on hypervisor platforms that support virtual machine generation ID functionality can detect and employ necessary safety measures to protect the Active Directory Directory Services environment if the virtual machine is rolled back in time by the application of a virtual machine snapshot, or similar operation.

SQL SERVER

Microsoft SQL Server supports virtualization-aware backup solutions that use VSS (volume checkpoints / snapshots). For example, Microsoft SQL Server supports Altaro Backup. Virtual machine snapshots that do not use VSS volume checkpoints are not supported by SQL Server. Any checkpoint / snapshot technology that does a behind-the-scenes save of a VMs point-in-time memory, disk, and device state without interacting with applications on the guest using VSS may leave Microsoft SQL Server in an inconsistent state.

For this reason, it is imperative that you not use Hyper-V Checkpoints on your SQL Servers, as it is NOT supported.

Note: For more information on using Checkpoints / Snapshots with Microsoft SQL Server see the support policy: https://support.microsoft.com/en-us/help/956893/support-policy-for-microsoft-sql-server-products-that-are-running-in-a-hardware-virtualization-environment

EXCHANGE SERVER

Some hypervisors include features for taking snapshots of virtual machines. Virtual machine snapshots capture the state of a virtual machine while it's running. This feature enables you to take multiple snapshots of a virtual machine and then revert the virtual machine to any of the previous states by applying a snapshot to the virtual machine. However, virtual machine snapshots aren't application aware, and using them can have unintended and unexpected consequences for a server application that maintains state data, such as Exchange. As a result, making virtual machine snapshots of an Exchange guest virtual machine is NOT supported.

Note: For more information on using Checkpoints / Snapshots with Microsoft Exchange Server see the support policy: https://technet.microsoft.com/en-us/library/jj126252(v=exchg.141).aspx

BACKING UP ACTIVE DIRECTORY, SQL, EXCHANGE

The process of backing up Microsoft Active Directory, SQL, and Exchange are significantly different from taking backups. When a backup is run, as long as it is Application Aware for the above applications, it can use the Hyper-V Guest VSS Backup (Volume Checkpoint) in Windows Server 2012 R2. This overcomes the support issues as stated above. What you need to watch out for, is manually creating Checkpoints of these workloads in 2012 R2.

Figure 8 - Hyper-V Backup (Volume checkpoint)

BACKUPS AND CHECKPOINTS IN WINDOWS 2012 R2

Hyper-V has always provided the ability to backup all your virtual machines from the host operating system. In order to provide a consistent backup of the virtual machine -Hyper-V has traditionally employed two approaches:

  • If the guest operating system has the Hyper-V backup integration service installed and running: use VSS (for Windows) or file system freeze (for Linux) to create a data consistent backup of the running virtual machine.
  • If the guest operating system does not have the Hyper-V backup integration service installed or running: put the virtual machine into a saved state, and perform a backup of the saved virtual machine.

This second approach has always been problematic because it takes a running virtual machine offline for the backup process. The good news is that this second approach has been drastically improved in Windows Server 2012 R2. Now, rather putting the virtual machine into a saved state - we take a checkpoint of the virtual machine. This checkpoint is backed up, and deleted after the operation is complete.

The net result of this is that no matter what the guest operating system, and no matter what the state of the integration services inside the guest operating system, Hyper-V will never interrupt a running virtual machine as part of backing it up (anymore).

HYPER-V BACKUPS AND CHECKPOINTS IN WINDOWS 2012 R2

In many cases, the challenge means that Hyper-V has to take a brute force approach to ensure that a virtual machine is in a state that is consistent enough for a VSS backup. Hyper-V places the VM in a saved state, triggers the VSS writer, and wakes the VM once the snapshot has been taken. This process is typically very fast, but it does not constitute a "live" or a "hot" backup. Hyper-V does have the ability to perform a backup of its guests without interrupting them, but there are several conditions that must be met.

  • The latest Integration Services must be installed and the backup integration service must be offered. In the properties dialog of the VM, from either Hyper-V Manager or SCVMM, look on the Integration Services tab and ensure that "Backup (volume snapshot)" is checked.
  • The guest operating system must have its own VSS support (Vista desktops and later, 2003 Server and later).
  • All of the virtual machine's volumes must be formatted with NTFS. The volume that contains the .VHD(s) for the VM must also be formatted with NTFS. The guest operating system's disks must be "Basic", not "Dynamic" (this is not the same as dynamic vs. fixed VHDs, see screenshot below). These disks must each have 2 GB free space.
  • The "COM+ Event System", "Distributed Transaction Coordinator", "Remote Procedure Call (RPC)", "System Event Notification" and "Volume Shadow" services must be running within the VM. By default, these are set to "Automatic" and/or "Automatic (Delayed Start)". The "COM+ System Application" and "Microsoft Software Shadow Copy Provider" services must at least be set to Manual, which is the default for these. It is acceptable, but not required, to set them to "Automatic" or "Automatic (Delayed Start)".

The simplistic explanation of these backup types is that the VSS writer is called within the Hyper-V host. It notifies the Integration Services within the VM that a backup is about to occur. Integration Services negotiates with the VMs VSS writer to take the VSS snapshot. With the exception of the Integration Services, all the above requirements are generally the same for a VSS-based traditional backup; the difference is that all components of the VM must comply. Because the entire VM must be synchronized with the parent partition's VSS writer, the entire VM must be able to interact with VSS or the entire VM must be paused while the parent's VSS writer operates on it.

HYPER-V PRODUCTION CHECKPOINTS

As discussed earlier Hyper-V Checkpoints are not supported in most productions scenarios. With the release of Windows Server 2016 Microsoft introduced a new feature called Production Checkpoints. Production Checkpoints take advantage of the guest's VSS Writers and can effectively serve as a native backup to those applications. Hyper-V Production Checkpoints are actually what Windows Server Backup uses as a technology to backup Virtual Machines in Windows Server 2016.

PRODUCTION CHECKPOINTS VS. STANDARD CHECKPOINTS

Production checkpoints are "point in time" images of a virtual machine, which can be restored later on in a way that is completely supported for all production workloads. This is achieved by using backup technology inside the guest to create the checkpoint, instead of using saved state technology.

Standard checkpoints capture the state, data, and hardware configuration of a running virtual machine and are intended for use in development and test scenarios. Standard checkpoints can be useful if you need to recreate a specific state or condition of a running virtual machine so that you can troubleshoot a problem.

USING PRODUCTION CHECKPOINTS

Production Checkpoints are application consistent which means they leverage VSS to backup the virtual machine. This is similar to the way a VSS Backup such as Altaro would backup a virtual machine. This means that you can now use a Production Checkpoint to take a "Snapshot" of Microsoft Exchange, Microsoft SQL, or even Active Directory workloads.

Production Checkpoints are enabled by default in Windows Server 2016 as illustrated in the screen shot below.

Figure 9 - Hyper-V2016 Production Checkpoint

CHAPTER SUMMARY

In conclusion, you should always watch how you use the Checkpoint options in Hyper-V. You can potentially run into support issues and have unwanted consequences for your environment. Using a backup technology such as Altaro VM Backup for Hyper-V is fully supported by Microsoft and is extremely easy to setup and configure.

For more information visit Altaro.com

We really hope you have enjoyed this short eBook and will come back for more from Altaro Software!

REFERENCES

ABOUT THE AUTHORS

DAVE KAWULA - MVP

Dave is a Microsoft Most Valuable Professional (MVP) with over 20 years of experience in the IT industry. His background includes data communications networks within multi-server environments, and he has led architecture teams for virtualization, System Center, Exchange, Active Directory, and Internet gateways. Very active within the Microsoft technical and consulting teams, Dave has provided deep-dive technical knowledge and subject matter expertise on various System Center and operating system topics.

Dave is well-known in the community as an evangelist for Microsoft, 1E, and Veeam technologies. Locating Dave is easy as he speaks at several conferences and sessions each year, including TechEd, Ignite, MVPDays Community Roadshow, and VeeamOn.

As the founder and Managing Principal Consultant at TriCon Elite Consulting, Dave is a leading technology expert for both local customers and large international enterprises, providing optimal guidance and methodologies to achieve and maintain an efficient infrastructure.

BLOG: www.checkyourlogs.net

Twitter: @DaveKawula

CRISTAL KAWULA - MVP

Cristal Kawula is a Microsoft Most Valuable Professional with over 20 years of experience in the IT Industry. She is the co-founder of MVPDays Community Roadshow and #MVPHour live Twitter Chat. She has worked on technical teams that have authored content for Microsoft Corporation internally for their SMSGR and GTR teams. The technologies focused on Exchange Server, Windows Server Performance, Deployment, and Security.

Cristal has set the stage behind the scenes for many technical professionals to obtain their MVP designations through her MVPDays Community Roadshow. This Roadshow has now helped over 5 IT professionals achieve their Microsoft MVP's by giving them opportunities in hard to find public speaking spots and community experience. She has unwavering willingness to others and a great dedication to the IT community she holds dear.

Cristal is the co-founder and President of TriCon Elite Consulting, and has sat on the Technical Advisory Board of Silicon Valley startups. Cristal is also only the 2nd woman in the world to receive the prestigious Veeam Vanguard award.

BLOG: www.checkyourlogs.net

Twitter: @supercristal1

ABOUT ALTARO

Altaro Software (www.altaro.com) is a fast growing developer of easy to use backup solutions used by over 30,000 customers to back up and restore both Hyper-V and VMware-based virtual machines, built specifically for SMBs with up to 50 host servers. Altaro take pride in their software and their high level of personal customer service and support, and it shows; Founded in 2009, Altaro already service over 30,000 satisfied customers worldwide and are a Gold Microsoft Partner for Application Development and Technology Alliance VMware Partner.

ABOUT ALTARO VM BACKUP

Altaro VM Backup (formerly known as Altaro Hyper-V Backup) is an easy to use backup software solution used by over 30,000 SMB customers to back up and restore both Hyper-V and VMware-based virtual machines. Eliminate hassle and headaches with an easy-to-use interface, straightforward setup and a backup solution that gets the job done every time.

Altaro VM Backup is intuitive, feature-rich and you get outstanding support as part of the package. Demonstrating Altaro's dedication to Hyper-V, they were the first backup provider for Hyper-V to support Windows Server 2012 and 2012 R2 and also continues support Windows Server 2008 R2.

For more information on features and pricing, please visit: http://www.altaro.com/vm-backup/