VSAN: Reimagining Storage in vSphere

Introduction

At VMworld in August of 2013, VMware announced VMware Virtual SAN (VSAN). It was in public beta until early

March and went GA (General Availability) on March 10th. VSAN is VMware's native version of Software Defined Storage (SDS). It is simple and easy to setup and managed by user-defined policies that are then applied to VMs as needed. It is this policy-based control that makes VSAN so powerful.

This white paper will look at VSAN, including what it is, basic requirements, how it works, and how various types of failures are handled. Some common uses will also be discussed.

What is VSAN?

Vendor lock-in on the storage side is a big problem in many environments today due to the expense of getting the storage area network (SAN) or Network Attached Storage (NAS) array (often in the hundreds of thousands to millions of dollars per array), not counting the expertise required to operate the array, tune it, and do the necessary provisioning, monitoring, and optimization tasks. In addition, with the cost per gigabyte dropping rapidly, purchasing storage well in advance of when it is actually needed is expensive (it would be cheaper if purchased just before it was needed), but that can lead to a "death by a thousand cuts" syndrome of constantly having to go back to management and ask for another few disks (which are relatively inexpensive), a new shelf (somewhat more expensive), or even an entire new array (a very expensive proposition). Up until now, the cost was deemed unavoidable and worth the cost to ensure high availability, shared access across hosts, low latencies, etc. These features will still probably be required for large, complex companies (and for core data center functions, etc.) for years to come, but in many other cases, they may not be required (see the Use Cases section for some ideas on where this technology may make sense).

VSAN is implemented at the kernel level, and thus doesn't suffer from the performance disadvantages of the Virtual Storage Appliance (VSA), which was (and is) implemented as a virtual appliance (VA). While the VSA was designed as a small-to-medium business (SMB) or remote office / branch office (ROBO) solution where a SAN or NAS array was too expensive, VSAN is designed for use in the enterprise in addition to the VSA use cases. Both the VSA and VSAN have the same basic purpose: take local storage located in individual servers and turn it into shared storage that can be used by HA, vMotion, DRS, etc.

VSAN is implemented at the cluster level, similarly to HA and DRS today; in fact, it is just another property of a cluster. It can be enabled in just two clicks, though there are many advanced options that can be set, along with storage policies to provide the needed availability to VMs at the best cost and performance (see the Storage Policies section on page 4 for more detail on this). The nice thing about this product is that you can scale up by adding additional storage within an ESXi host (up to 42 disks); you scale out by simply adding another ESXi host into the cluster (up to the vSphere maximum of 32 nodes).

Requirements

VSAN has the following requirements:

  • 3–32 hosts per vSphere cluster
  • HA must be enabled for the cluster (DRS often will be as well)
  • 1 SSD and 1–7 magnetic (spinning) disks, which create a disk group
  • 1 GbE minimum, with 10 GbE recommended
  • vSphere and vCenter (5.5 or higher)
  • VSAN license key

A few quick notes on disks and disk groups before we move on... First, SSD space is used for caching (both read and write) only, and thus any discussion of usable space ignores all of the SSD space in every host. Second, VMware's best practice is that 10% of the space in each disk group be SSD to ensure there is enough space for caching (it will work if there is less, but performance may be impacted). Third, each host can have zero to five disk groups located on it. Any host with zero disk groups on it can run VMs like any other host, but storage requests will go to the other nodes. Note: Just because a VM runs on a given host, there is no guarantee the storage needed by that VM is local to the same host.

While some may question performance and/or scalability, both of the VSAN solution as well as the CPU performance cost, VMware has tested and shown nearly two million input/output operations per second (IOPS) in a single cluster (read only; roughly half that in a mixed read/write scenario), and at that level, only a 10% hit to CPU performance. While the 10% may sound like a lot, most ESXi servers today are running closer to 50% CPU utilization, so the extra 10% hit will not likely affect VM performance. Each cluster supports up to 4.4 petabytes of space as well, allowing for large amounts of data per cluster. Note that this space is given directly to VSAN to use; if any RAID is used (and usually it is not—just the raw disks are given to VSAN to use as it sees fit), only RAID 0 is supported. In fact, in many ways, it acts like Storage Spaces in Windows Server 2012 in this regard.

How VSAN Works

VSAN is designed to perform well and maximized available space. Previously it was mentioned that it is implemented at the vmkernel level to maximize performance from the compute side. This requires SSD drives for caching to maximize performance from the disk perspective, and requires at least 1 GbE (with 10 GbE recommended) on the network side. To maximize space, all vmdk files are thin provisioned and no parity or mirroring RAID is employed on the hosts. Additional capacity can be added by simply adding disks to a host and giving VSAN access to the new disks.

Enabling

Enabling VSAN is a very simple process. Once VSAN has been properly licensed, simply go to the desired cluster and check the box for VSAN, like you would for DRS or HA. Note that if HA is already enabled, you will need to temporarily disable it so that VSAN can be configured, then you can re-enable it.

Once you check the box, the only major question is how VSAN should get the disks it needs to work: automatically or manually? If you choose automatically, VSAN will automatically use all the SSD and magnetic disks that it can find that are not used elsewhere in the system (for example to boot the host) and will create disk groups automatically. If you want more control over which disks are to be used and/or which disks belong in which disk groups, choose manual and configure the disks as desired.

Storage Policies

The power of VSAN is not so much that it turns local storage into shared storage, though that is very impressive, but rather that policies can be setup and applied to VMs and that the system will automatically enforce those policies. There are several policies that can be set in VSAN. They include:

  • Failures to Tolerate: Defines the number of copies of the disk(s) that should be created by specifying the number of concurrent failures that need to be withstood (from zero to three). If zero is specified, any failure of a disk or host will cause the VM to be inaccessible until the failure is corrected (and possibly data restored from backup). The number of copies will be one more than the number specified (i.e., one means two copies). This is sometimes known as RAIN (redundant array of independent nodes) and is always a mirroring-style replication (i.e., RAIN 1, even if more than one failure is tolerated). The purpose of this parameter is availability.
  • Stripe Width: Specifies how many physical disks the data for a single vmdk is spread across. This is always done in RAIN 0 fashion (although it can be combined with the Failures to Tolerate to create a RAIN 0+1 design if desired). This setting is most important when greater performance is necessary and the read caching provided by the SSD drive is insufficient, requiring data to be read from the spinning disk(s), instead of the SSD read cache. If a value greater than one is specified, the data is split into 1 MB chunks across the disks. Note that enabling this feature will consume additional system resources.
  • Object Space Reservation: Reserves the specified percentage of the disk space for the VM (assuming it is thin provisioned); it does not thick-provision the space, but rather uses the calculated space as if it were already provisioned in much the same way that a CPU or memory reservation is used in compute calculations used by vSphere, HA, and DRS. If a vmdk has been configured as either eager or lazy zero thick, this parameter is ignored (i.e., it is already at 100%).

By default, a single policy is created and used by everything that uses VSAN, and that policy is not visible in the Web Client. It is simply configured to tolerate the loss of a host, disk, or disk group by setting Failures to Tolerate to one.

Note: Don't confuse Storage Policies (used by VSAN only) with Storage Profiles (usable with any datastore type). Storage policies determine the performance and availability of a VM located on a VSAN datastore and are fully automatic once assigned, while Storage Profiles define a preferred storage type (typically based on the speed of the underlying disks) for a VM. The profiles are manually created and manually assigned and do not automatically move VMs to other disks if the profile-configured type is not the actual location of the VM (Storage vMotion would typically be manually invoked to fix the issue).

What VSAN Does to VMDKs

VSAN will look at the storage policy assigned to each VM and then automatically apply it, placing each .vmdk file on disks it chooses (or the other VM files, collectively known as "VM Home" In this white paper, the term vmdk has been used throughout for brevity, but it applies equally to the VM Home folders).

For example, if you asked to tolerate one failure, VSAN would create two copies of the disk, each on a separate host. While you can determine where individual pieces of a VM are stored via the Web Client, the beauty of VSAN (and SDS in general) is that it really doesn't matter—the policy is what matters and the system will automatically enforce the policy (assuming enough hosts and capacity is available).

How Failures are Handled

You may be wondering how failures are handled. VSAN is very fault tolerant and can continue to operate in the event of a disk, network, or server failure. The ability to handle failures depends on the storage policies previously described. No data loss will occur in any case as writes are not acknowledged to the VM until all copies on all hosts have acknowledged the write as complete.

Let's look at each of the scenarios and review how VSAN responds.

Disk Failure

In traditional environments, RAID solves the problem and thus the loss of a disk is transparent to vSphere; simply replace the disk and the RAID controller rebuilds the LUN. The VSA works this way as well, but due to this need for redundancy at both the LUN and server levels, if RAID 10 is used (and often it is for performance reasons), only 25% of the space purchased is actually usable (half lost to RAID 10 and half to RAIN 1). VSAN cuts that in half due the use of individual disks (or at worst RAID 0, which is the same from a redundancy as well as disk overhead perspective).

With VSAN, your data is replicated by policy to multiple locations (unless you choose to have no redundancy for a non-critical VM), and thus when a disk is lost, VSAN will see that it is not in compliance with the defined storage policy and immediately begin copying the data to another disk to automatically come into compliance again. When speaking of disk loss, it is important to note that the loss of a SSD will cause the disk group to go offline and copying of the data to a new location to automatically begin. Note that no administrator intervention is required in this process at all (except for physically replacing the failed disk of course). This is probably the most common scenario (from a failure perspective—the planned loss of a server [temporarily] for maintenance, patching, etc. will probably be the most common scenario in most environments).

Network Failure

In the event of complete network failure, only VMs that are running on local storage (relative to where the VM is located) will continue to run; HA can attempt to restart the VMs on other nodes if capacity exists where the disk files are located, assuming that HA still has a valid network path (unless a storage policy that striped a single VMDK across multiple nodes was defined). This is why redundant network paths (ideally to redundant switches) are always recommended.

A few notes are important when using VSAN in a HA-enabled cluster. First, to handle network partition scenarios, a witness is assigned on an additional host in the cluster to make an odd number of nodes so that a majority (a quorum) could be on line if a partition were to occur. In the event of a network partition, VSAN will always restart any affected VMs in the partition that has quorum. Second, the normal heartbeat communication that takes place across the management network is changed to the VSAN network instead (except for checking for host node isolation, which will still use the default gateway (by default) of the management network). Third, datastore heartbeats are disabled if the cluster only has VSAN datastores as there is no additional availability gained.

The failure of the switch is a rare event, however, and thus the foregoing is unlikely to occur, while the loss of a single network port, cable, or NIC is far more probable. In those cases, HA will simply restart any affected VMs on nodes that still have access to storage (locally or across the network) and the VM will be back online quickly. This is also a fairly rare event in most environments.

Server Failure

The final failure scenario is the failure of a server. When the server fails, HA will restart the VMs elsewhere in the cluster like normal. VSAN does not start rebuilding the lost data on another server right away however, unlike in the disk failure scenario. The reason for this is that the server may be rebooting after a patch for example and will be back online soon. To prevent lots of extra space being used unnecessarily in situations like these, when the server goes down, a 60-minute timer begins; if the server is back online within that time frame, VSAN will simply synchronize the copies of the data. On the other hand, after the timer runs out, VSAN will automatically create a new copy of the missing data, much like in the disk failure scenario previously described.

This brings up an interesting question: What happens if the host is placed in maintenance mode? As with most things in IT, the answer is: "It depends." When a host is placed in maintenance mode, there are three things VSAN can do with the data on that host:

  • Full Data Migration: Migrate all data off the host to other nodes in the cluster. This could take the longest to complete and probably would not be used for short term outages, such as patching, but rather for longer term changes, most likely when the node is being removed from the cluster.
  • Ensure Accessibility: Migrate data to other hosts in the cluster if needed to ensure that at least one copy of the data remains accessible. In this case, most data will probably not be migrated, making it faster to get the node in maintenance mode, but accessibility policies may be violated. This is the default and most often used option in most environments.
  • No Data Migration: No data is migrated, even if it is the only copy available; in that case the VM would be unavailable under the node came back online. This method offers the fastest entry into maintenance mode, but again, VM availability may be impacted.

Use Cases

Common use cases include:

  • VDI, where storage can easily be 40–60% of the cost of implementation
  • ROBO sites, where a SAN or NAS device is not feasible and/or cost justified
  • Tier 2/3 storage needs
  • Disaster recovery (DR), usually at the backup site, at least initially
  • DMZ servers, especially where an air gap is required between internal and DMZ storage
  • Management clusters, which need the usual SAN capabilities, but often don't justify the costs of those capabilities

These use cases will be used to prove the capability and resilience of VSAN and as it becomes well tested and proven will probably move into the mainstream for storage needs in vSphere.

Conclusion

VSAN will fundamentally reshape the role of storage in many organizations over the coming years. In much the same way that virtualization was looked at somewhat skeptically in the early years and is now considered standard practice for most workloads in all environments (including mission critical production)—so too storage and network virtualization are in the early phases of adoption, but will soon become mainstream for many use cases. While VMware's stated goal is to coexist with existing SAN and NAS environments, it will likely replace many of them in the coming years, being much less costly, much simpler, and far easier to manage via policies.

About the Author

John Hales (A+, Network+, CTT+, MCSE, MCDBA, MOUS, MCT, VCA-DCV, VCA-Cloud, VCA-Workforce Mobility, VCP, VCP-DT, VCAP-DCA, VCI, EMCSA) is a VMware instructor at Global Knowledge, teaching all of the vSphere and View classes that Global Knowledge offers. John has written a book called Administering vSphere 5: Planning, Implementing, and Troubleshooting published by Cengage, as well as other technical books—from exam-preparation books to quick-reference guides, as well as custom courseware for individual customers. John lives with his wife and children in Sunrise, Florida.