Poor Practices that Hinder VM Disk IOPS

Spend time in enough IT shops, and you'll eventually discover that the same mistakes are made everywhere. At least that's the feeling I get when pondering all the virtual environments I've seen in my consulting travels. From large to small, simplistic to highly advanced, you'd be surprised how often the same poor practices are incorporated into people's designs.

Most interesting about those mistakes, particularly in the case of virtual machine (VM) performance, is how unnoticed they often go. IT shops with heavy‐duty hardware experience the classic signs of poor performance and often don't even realize it. Others might realize performance isn't to par but focus troubleshooting attentions on entirely the wrong things, such as resources like processing and memory that comprise virtual environments, incorrect configurations, or omitting key technologies the lack of which creates big problems down the road.

Your storage represents one of those oft‐forgotten areas where poor VM performance can come from. Too often, storage itself is thought of only in terms of capacity: "I have fifteen terabytes of storage I can provision to virtual machines." Yet today's storage and the demands we put on it requires a second metric that's just as important: performance.

Input/Output Operations per Second (IOPS) is a common measurement for quantifying storage performance. In general terms, a unit of IOPS represents how many "things" a storage device can accomplish in a given unit of time. Those things might be reading from a disk or writing to it, deleting data from it, or performing storage maintenance tasks.

The amount of IOPS you have to work with—your supply—is greatly driven by your design. Incorporate faster disks, more storage processors, or a wider connection bandwidth, and you'll see IOPS go up. It is also driven by the collection of decisions you've made in configuring hosts and VMs. Overload your connections, ask too much of your disk spindles, or configure VMs in ways that require more‐than‐necessary storage attention, and you'll quickly find that IOPS suffers. And when IOPS suffers, so do your VMs.

In my travels, I've seen plenty of poor storage practices. They're laid into place by wellmeaning administrators who simply forget that storage performance is as important as storage capacity. Let me share a few of my favorite stories from those travels. In the telling, hopefully you'll learn to avoid common poor practices that hinder VM disk IOPS.

Poor Practice #1: Overextending SAN‐to‐Server Connections

One of my favorite poor performance stories begins during the days of the Great Hypervisor War. Back then, a common conversation among virtual administrators was the debate between Microsoft Hyper‐V and VMware ESX as hypervisor of choice. During that time, each side found itself seeking reasons for their side's superiority over the other. It was a raucous time in our industry's past.

Back then, I visited a client's data center to help them track down a performance difference between VMs in their VMware ESX environment and those atop comparable Hyper‐V hardware. Spending a day tracing the similarities between the two configurations, I was baffled about why their Hyper‐V VMs were an order of magnitude slower than those atop ESX.

It wasn't until late in the day when I realized the difference—one so slight in the client's eyes that they neglected to bring it up until day's end. During their comparison, this client was also introducing themselves to the network implications of iSCSI SAN storage. Their previous experience in TCP/IP networking had them concerned primarily about connectivity. The focus on that concern had them forgetting completely the impact of throughput.

Turns out their Hyper‐V servers in Building A were in fact connected to storage in Building B, traversing a single fibre pair and sharing the bandwidth with that entire building's regular network traffic. Their Hyper‐V VMs' demand for IOPS far exceeded their storage connection's available supply.

An easy fix, but the moral of that day is to always remember storage networking requires more from a network than traditional networking. Segregating traffic where appropriate and monitoring utilization is critical to preventing an IOPS bottleneck.

Poor Practice #2: Using Poorly‐Performing Disks in High‐Load Situations

Another client, this one a hospital, found themselves developing an interest in virtualization. Like all hospitals, storage of patient records mandated early on (at the time) powerful SAN equipment. A business that embraced technology's leading edge, this hospital's previous‐generation SAN was given a second life in hosting VMs the day its replacement arrived.

When I arrived to troubleshoot the ensuing performance issues, I reminded them that not all SANs are built alike—nor will all SANs perform alike. No matter how many processors or disks you provision to virtual hosts, VMs won't perform well atop previous‐generation SATA drives that lack the IOPS virtualization requires. The resolution here: Dump the old SAN and acquire one with an IOPS supply that exceeds VM demands.

Poor Practice #3: Creating VMs with the Wrong Disk Format

Virtual platforms like VMware vSphere's early versions didn't support thin provisioned disks. This was for a reason: Although requiring the use of "thick" disks added costs in wasted disk space, those disks were guaranteed to operate with best performance. It wasn't until much later that waste‐conserving thin provisioning was eventually made available.

Yet saving on space with thin provisioned disks doesn't come without a cost. That cost is paid with a slight performance loss, particularly when disks are expanded to add space. The performance difference between thick and thin grows smaller with each new virtual platform version, but some difference still remains today.

Even more insidious are linked disk clones, which begin one disk's life based on the configuration of another. Though linked clones may garner even greater space savings, they do so by paying a tax on performance. Forcing disk activity to exist across what are now two disks instead of one means adding to a VM's IOPS demand.

Poor Practice #4: Disk Misalignment

A physical disk is broken into blocks of data, as is a virtual disk. A block represents the smallest unit of data that can be read from or written to a virtual or physical disk. Blocks can be linearly read from a disk, not unlike a needle following grooves on a record. Sometimes, though, a virtual disk's blocks aren't laid down in alignment with those of its physical host. Instead, they're offset by just a bit, sitting a VM's block now atop two physical blocks. When this happens, reading from or writing to that virtual disk requires extra effort across those two physical blocks.

With the right software, misaligned disks are becoming less of a problem in today's virtual platforms. Not paying attention to them, however, means their extra effort becomes a source of reduced IOPS. Worse yet, they're difficult to track down and even more difficult to fix with native tools alone. Pay particular attention to the approaches used by your software and storage device or suffer the pain of double effort at every read and write.

Poor Practice #5: Neglecting Spindle Count

Another of my favorite stories highlights the peril in focusing on capacity to the exclusion of performance. This tale deals with another client delving into desktop virtualization. The skills required for success here are very much the superset of those for simple server virtualization. There are just so many extra activities required to assure a good experience when users are provisioned virtual desktops.

During the design phase, this client got too excited about recent improvements in storage capacity. Their excitement is understandably warranted, if misguided. With virtual desktops often incurring huge storage costs over the traditional model, bigger disks usually mean smaller dollars‐per‐gigabyte. Yet compressing more data into the same form factor also compresses more data onto the same number of disk spindles. Insufficient IOPS supply is the natural result, as virtual desktop users vie for data access and disk thrashing ensues.

Disk thrashing will be a problem with desktop virtualization (or, really any workload) when enough spindles aren't brought to bear. This client learned the hard way that dense storage can also be slow storage when placed under too heavy a load.

Poor Practice #6: Excessive Snapshotting

One of virtualization's early promises was the career‐protection device VM snapshots could provide. You remember this storyline, "Are you about to install a patch, or change a configuration that could create a problem? Just snapshot the VM first and you've got an instant time machine!"

Snapshots still provide this functionality today; however, snapshots were never intended as a long‐term storage mechanism. Repeat this statement to yourself.

One reason for their short‐term nature centers around the same problems discussed earlier with linked clones. Creating a snapshot automatically creates another location across which data must be managed. That doubling of data locations adds to storage effort, eventually reducing performance. Layering multiple snapshots atop each other reduces it even further. Reduce unnecessary storage effort by eliminating snapshots. Use them sparingly, and only for short‐term needs.

Fragmentation: The Hidden Drag on IOPS

There's a final IOPS impact that any review of poor practices can't conclude without. This hidden drag relates to the extra effort placed on storage when volumes become fragmented. Fragmented volumes, as any Windows administrator should know, shatter individual files and folders into hundreds (or even thousands) of tiny pieces, each of which requires attention and reintegration during every disk access.

That added attention impacts IOPS, and can significantly reduce VM performance. In fact, the extra attention fragmentation requires leads directly into this series' next article, which discusses specifically the impact fragmentation can have on VM IOPS.