The Secondary Storage Squeeze: How Can I See It Coming?

Introduction

Backup and storage administrators have a two-pronged mission: Don't lose any data, and don't run out of room. Accomplishing that mission is simple, but it is not easy. As admins work out the best way to retain an ever-increasing volume of data, they face the secondary storage squeeze.

In the first part of the squeeze, described in the e-book The Secondary Storage Squeeze: How Can I See It Coming?, business users generate the data and want to keep it close at hand, but primary storage is expensive. The way out of that part of the squeeze is for business owners and IT admins to agree on which data and apps are most important, then apply policies and service level agreements (SLAs) on data retention and secondary storage.

In the second part of the squeeze, administrators must reliably back up a fast-growing balloon of primary data within available time limits. Adding hardware and capacity to the primary storage environment seems to relieve the pressure, but in fact it leaves even less time to complete incremental and full backups for increasing volumes of data.

The first e-book examined ways to resolve the business side of the squeeze: establishing retention policies, putting together a storage strategy and conducting a business impact analysis. This e-book covers the technology side of the squeeze: implementing the deduplication technology and protocol accelerators in the Quest DR Series Disk Backup and Deduplication Appliances. Storage and backup administrators will discover a valuable way to overcome outmoded backup software, limited network bandwidth and the secondary storage squeeze.

In this part of the squeeze, admins must reliably back up a fast-growing balloon of primary data within available time limits.

Primary and secondary storage tiers

The term "secondary storage" applies in this context to the external devices not connected directly to production servers. Secondary storage is generally used for backup data sets for three reasons:

  • It is less expensive — as much as two orders of magnitude less expensive — per protected megabyte than primary storage.
  • It can be connected via LAN, storage area network (SAN) or network attached storage (NAS) connections to media servers and client servers to accept backup workloads quickly,
  • Connections to the network make it accessible for a full range of recovery tasks, from individual files to entire applications with operating systems stored as virtual machines.

Figure 1 shows a representative layout of primary site storage, with tiered disk, backup disks and virtual standby. Corresponding to the tiered and backup disks at the primary site are less expensive, slower media at the secondary site.

Figure 1: Storage resources

The secondary site is not intended for production, applications or web facing assets like Microsoft Exchange, SQL databases, Salesforce.com or pricing information; however, its data access suffices for business functions like recovery.

Why do I need secondary storage?

Secondary storage is excellent business continuity insurance against data loss. Consider some alternatives:

  • Replicating to primary storage is perfect insurance against data loss, but it's expensive.
  • Primary storage systems can take snapshots, which are good insurance for immediate recovery of data (up to the point at which the snapshot was taken, anyway), but they are not made for a full recovery in the event of an outage or disaster. Furthermore, snapshots reside on the same array as primary storage, making it a single point of failure.
  • Another option is to use local cache data or data replicated internally to a site in primary storage, then replicate the data to other primary storage. But it is still in primary storage.
  • Local backups can be useful as recovery sites, but only in a pinch.
  • The cloud has become a popular backup destination. However, backup is slow and recovery often slower.

Backing data up from primary to secondary storage is ideal. Getting the data there efficiently is the tricky part.

Secondary storage is excellent business continuity insurance against data loss.

Backup appliances and deduplication: The silver bullet for finishing backups in less time?

Backup across any of the connections mentioned above takes precious time, no matter how fast the network is. Besides the amount of data being generated, there is the amount retained for compliance or business continuity, resulting in average data growth of up to 40 percent per year into the next decade.1

As the volume of data in an organization continues to grow, the risk increases that administrators will be unable to complete their backup jobs in the allotted time because of network congestion, system/process interruptions and an insufficiently long backup window. Admins have turned to different forms of secondary storage — local, remote or cloud — but connecting secondary storage to primary storage adds the complexity of media/backup servers, backup software and policies for implementing backups.

It's not a simple way to get their data to secondary storage, but many companies implement backup software, set up policies, construct backup storage repositories and live with the complexity until something breaks or the business requires a more responsive process that can meet service-level expectations. And still they do not remove all the risk and storage headaches of backup; in 2014, 73% of users were less than very confident they could restore critical data when needed. Incomplete backup jeopardizes business continuity and intensifies the secondary storage squeeze.2

They can greatly reduce risk and headaches by implementing purpose built backup appliances — disk backup devices designed as storage repositories. With backup appliances, companies can maintain backup data on disk longer, for faster, more accurate restores when needed. To deal with the ever-shrinking backup window, backup appliances include deduplication technology.

Deduplication reduces the amount of space required for backups by identifying and referring to repeated blocks of data instead of backing up the same data multiple times. For example, if a company performs a full backup of its customer resource management (CRM) data once a week and incremental backups each night, deduplication algorithms may find that only 10 percent of the data has changed from one day to the next.

The backup software sends only the changes, reducing the amount of data to be stored, the amount of time and space required to store it and the network bandwidth to send it.3

Backup appliances simplify secondary storage. Deduplication techniques can save on the order of 70 percent of storage space. Despite those advantages, however, the combination is not necessarily a panacea for the secondary storage squeeze.

The technical problem: Resource limitations

Deduplication is subject to some technical limitations:

  • Deduplication algorithms parse the data, determine which blocks are duplicates and replace them with pointers. They drain resources from other tasks running on the server.
  • To send data to a disaster recovery site, it is necessary to replicate all the full copies and incremental copies. To restore the data, it must be "rehydrated," or have all data blocks stitched together to reconstitute the latest copy with all changes.
  • Even if deduplication can store the data in vastly less space, the data must still traverse the network. Sending data over any network from the original servers through backup servers to the repository takes a lot of time in a brief backup window. For example, it takes a backup application protecting 2TB of data on a Gigabit Ethernet (1GbE) network almost five hours to transfer the data to the target device.

To address those limitations, backup software providers have developed source-based deduplication, in which a software plugin or agent installed at the data source runs the algorithm. Source based deduplication increases the speed at which data goes into the backup funnel through deduplication at the source. It removes the redundant data before sending it over the network, which reduces network congestion and improves backup speeds by a factor of three to four.

But source-based deduplication requires adequate hardware resources (CPU, memory, disk) to function, so admins face a trade-off between increasing resources on the source machine and increasing network bandwidth.

The business problem: limitations of network bandwidth, time and budget

Backup appliances and deduplication are also subject to business limitations.

Most medium and large organizations connect their enterprise applications to secondary storage devices with different protocols like Network File System (NFS) for NAS storage, OpenStorage Technology (OST) for Veritas backup software products and Common Internet File System (CIFS) for other backup applications.

As their backup window shrinks and they seek relief from the secondary storage squeeze, these organizations turn to backup appliances featuring deduplication technology that supports their existing investments in backup software applications and processes. Some backup hardware manufacturers offer protocol accelerators to speed the ingestion of data during backup, but the accelerators work only with their proprietary storage appliances and they cost more.

The business problem arises when it becomes necessary to manage backup within the available budget and replication within the available network bandwidth. If the organization introduces additional backup applications, the appliance capacity must scale to accommodate the increased workload. Furthermore, the costs for protocol accelerators, replication, encryption and maintenance can add up.

Quest DR Series Disk Backup and Deduplication Appliances

The DR Series of Disk Backup and Deduplication Appliances addresses technology and business problems so storage administrators can extend their backup windows and protect their data reliably.

Appliances in the DR Series use DR Rapid to support the protocols of all leading backup applications — NFS, CIFS, Rapid NFS, Rapid CIFS, OST and Rapid Data Access (RDA) — to reduce the time and storage space needed for deduplication and backup. Additionally, Quest makes the protocol accelerators available at no additional charge, so administrators can perform source-based deduplication on data from a wide variety of backup applications.

Quest makes the protocol accelerators available at no additional charge, so administrators can perform source-based deduplication on data from a wide variety of backup applications.

How The DR Series Eliminates Bottlenecks

In the following scenarios, assume a variety of servers hosting Microsoft Exchange and other business applications on a 1GbE network.

Figure 2 depicts backup to an ordinary secondary storage device. At the best-case throughput of 1GbE (.4093 TB per hour), the device takes 4.9 hours to back up 2TB of data from the Exchange server alone. If four other servers are backed up in parallel — a normal expectation in most companies — the network will limit the ingest rate of the backup target device to .8186 TB per hour. Throughput is divided among all five machines, so it can take up to 12.2 hours to back up 2 TB of data from each one.

Figure 2: Typical backup

Replacing the backup target device with a DR Series backup appliance does not change the ingest rate or network throughput, but it does change the amount of data moving across the network.

Figure 3 represents a backup path as a road and data as vehicles moving along the road. Data is backed up from the Exchange server to the DR Series appliance (or to any appliance) for the first time, and deduplication takes place downstream. Redundant data (dark blue car) is discarded after traversing the network. Needlessly sending that 20 percent of data is an inefficient use of network bandwidth, but on a first backup it may be acceptable.

Figure 3: First backup to DR Series appliance

Figure 4 shows a subsequent backup, with new and changed data represented by the grey car. The resulting backup data is complete, but 80 percent of the data sent from the Exchange server is redundant and was discarded after traversing the network. Sending so much useless data aggravates the secondary storage squeeze.

Figure 4: Subsequent backup to DR Series appliance

How DR Rapid Keeps Redundant Blocks Off the Network

A protocol accelerator residing as a plugin on the Exchange server performs deduplication before the data traverses the network. DR Rapid identifies redundant data at the source, places references to the repeated blocks in the backup stream, eliminates the duplicate blocks and sends only the changed data (grey car), over the network to the DR Series appliance.

The combination of DR Series Disk Backup Appliances and DR Rapid deduplication has relieved the secondary storage squeeze for organizations worldwide:

  • Wholesale Electric Supply implemented the Quest DR4100 Disk Backup Appliance and reduced system backup time up 75 percent.
  • IT security company ICS8 chose the DR4100 and shrank its backup footprint by a 31:1 ratio.
  • With the DR4100, Yeovil District Hospital achieved deduplication and compression figures of nearly 90 percent, protecting around 164 terabytes of data in 16 terabytes of disk space.
  • The City of Atlanta, Georgia, used the Quest DR4000 Disk Backup Appliance to back up its Exchange Server comfortably within its backup window, at 70 percent higher data rates.
  • Allergy Partners deployed DR4000 appliances in its primary and secondary data facilities to back up data, retain it for 10 days, replicate it nightly, and deduplicate 12 terabytes of data frequently and at a significantly lower cost per year.

Figure 5: Backup to DR Series appliance with source-based deduplication

Conclusion

With the growth in the volume of data that companies want to retain, most organizations eventually include secondary storage in their storage strategy as a means of reducing the risk of data loss. With a combination of purpose-built backup appliances, deduplication technology and protocol accelerators, they can address the secondary storage squeeze and shortened backup window to protect their data reliably.

Quest DR Series Disk Backup and Deduplication Appliances and DR Rapid deduplication help them overcome the technical and business problems of secondary storage. Applications can back up freely over all major connection protocols, send their data over the network, consume far less network bandwidth and occupy much less space.

About Quest

Quest helps our customers reduce tedious administration tasks so they can focus on the innovation necessary for their businesses to grow. Quest® solutions are scalable, affordable and simple-to-use, and they deliver unmatched efficiency and productivity. Combined with Quest's invitation to the global community to be a part of its innovation, as well as our firm commitment to ensuring customer satisfaction, Quest will continue to accelerate the delivery of the most comprehensive solutions for Azure cloud management, SaaS, security, workforce mobility and data-driven insight.