Strategies for Upgrading a Production Data Center

With an existing data center, there is ongoing pressure to maintain an environment that is cost effective and productive. As technologies evolve and external factors affect the costs of doing business, IT executives have to stay on top of the technologies that can make their existing facilities as efficient as possible. Understanding these technologies and how they can impact the efficiencies of your data center provides a firm foundation for making choices about the future of your data center infrastructure.

Upgrading an existing data center means making the right choices to make your data center more energy efficient. As we have covered in earlier chapters, there are both technological and organizational efforts that can be applied to achieve this end. The tasks involved in upgrading your data center fall into two general categories; those involved with the actual physical environment, such as layout, power, and cooling, and those that focus on the overall process, such as meeting best practice standards and management.

Best Practices for Data Center Cooling and Power Migration

Like any major business project, it is important to develop or adopt a set of best practices for the process of upgrading or migrating your data center power and cooling infrastructure. This guide has attempted to give you the basics you need to start your research and development process for your data center upgrade.

Making Your Data Center More Energy Efficient

This is the bottom line. When making your plans, evaluating technologies, and looking for hardware and software solutions, the goal for each item is to increase the efficiency of the data center. Don't forget the cumulative impact of changes and additions that you plan and be sure to look out for different people or groups who end up working at cross purposes in evaluating solutions and making purchasing decisions. To keep everyone on the same path, it is critical to have a plan. Standardizing the process will result in a successful project.

Defining the Project

There are a wide variety of tasks and sub‐projects that can be applied to your upgrade projects. It is important to make sure that all involved are on the same page and that the individual project milestones are well defined and lead toward the same goals.

Defining the Goals

Although the end goal is clear, the interim goals are incredibly important. The core issue of upgrading an existing data center is keeping existing services up and running. Thus, interim goals that allow effective staging of the upgrade process are critical. It is possible that energy consumption will increase at some points in the process as redundant systems are brought online before legacy systems are retired. But the end result of the process should be a more effective and efficient data center power and cooling architecture.

The Costs of Not Upgrading

As power costs continue to increase, there will be a hidden tax on inefficient data centers.

This wasted money is difficult, in many cases, to quantify, but it is a problem nonetheless. With tight IT budgets the norm, wasting money because of a lack of progressive thinking in your data center is a waste that can be difficult to sustain. Beyond the simple cost issues, there are potential problems that can arise from failing to update your data center.

The most significant is the inability to deliver the services and technologies that the business will demand in the near future to remain competitive or to create a competitive advantage. With the changes in IT technology resulting in much more dense data center IT loads, it is critical that IT be able to support those loads and have an agile and adaptable data center infrastructure. The inability to deliver the necessary power and cooling services in the data center will prevent the business from taking advantage of the technological edge that efficient IT can deliver.

Layout & Cooling

With your existing data center, it is most likely you currently have the traditional data center cooling model; a large room, possibly with a raised floor, where the temperature and humidity are maintained at a specific level throughout the space. The problem that we face is that when this model was originally implemented, the data center was a fairly static place. Changes were made slowly and gave the data center staff plenty of time and opportunity to make adjustments in the power and cooling for the space, as necessary.

But older data centers are faced with two problems: the first is that they have changed over time and what was once considered state of the art is today a static environment that is unable to respond to the quickly changing technology environment that the modern data center has become. With rapid changes being made to the equipment in the data center, virtualization, on‐demand provisioning, and server consolidation, the demands on the power and cooling infrastructure have changed dramatically, and today's infrastructure needs to be flexible enough to respond to these ongoing changes.

Second, older data centers often suffer from cooling and power problems that have been introduced by the slow change in IT loads over time. Delivery systems that were once running with excess capacity are now pressed to their limits and regularly operate out of their highest efficiency zones. The physical layout of the server racks that was initially appropriate now impedes the efficient cooling of the space, and the years have wrought changes in the infrastructure that simply don't allow for quick and easy fixes.

Figure 4.1: Cooling patterns change as more demanding equipment is added to the data center.

Changing the layout of an existing data center can be done in such ways as to address cooling problems and allow for more efficient utilization of existing cooling and power. At the same time, you can make room for upgraded equipment, for cooling and power and IT loads.

Efficient Layout of the Data Center

Legacy data centers are, in most cases, primarily making use of the room cooling architecture, as Figure 4.2 shows. More recently designed locations are more likely to be using the row and rack‐based cooling models that offer greater flexibility, allow for increased densities, and improve the efficiency of the data center.

Figure 4.2: The three primary cooling architectures in use. Blue arrows indicate cooling supply paths.

Let's look at the benefits of each model, from the perspective of a potential upgrade to an existing data center:

  • The room cooling architecture—The primary benefit of the room architecture is that in most circumstances it is already in place. By reorganizing the layout of the equipment within the data center, you can get greater efficiency if problems with the supply and return cooling paths were a significant issue in the past due to the equipment layout within the room.
  • The row cooling architecture—The cooling capacity can be targeted at the rows that need it, in the capacity that the row requires. There is no need for a raised floor installation, and different rows can have different amounts of cooling capacity applied to them based on the actual demand of that row. CRAC losses due to the need to cool large amounts of unrelated space are reduced, having the effect of reducing the amount of power consumed overall, improving overall energy efficiency.
  • The rack cooling architecture—As cooling and rack are completely integrated, the rack cooling architecture offers the greatest flexibility in deployment and load. The result is minimal loss in terms of CRAC utilization; power goes into cooling the specific rack, and the capacity of the rack‐mounted CRAC is completely dedicated to that role. Much higher density packaging is possible with this model than with either of the others.

It is, of course, also possible to combine all three of these models in a single data center (see Figure 4.3). Changes in the data center may already have added row architecture cooling (in a dedicated area of the data center), in addition to the room architecture cooling already present.

Figure 4.3: All three common cooling architectures in the same data center.

This model is very likely to be an appropriate one for a large existing data center that wants to slowly migrate to the support of higher density IT loads. It allows for the existing infrastructure to be utilized while a more modern and efficient infrastructure is put into place.

Implementing the High‐Density Zone

With the upgrade and migration of the data center, support for new high‐density IT loads can be a major issue. This is why the combined cooling architecture model can be so effective. The key, however, to adding high‐density IT loads to an existing data center, without the expense of completely replacing the data center or upgrading the entire infrastructure, is the high‐density zone.

Figure 4.4: The basic concept of the high­density zone.

High‐density zones have the advantage of concentrating the high‐density IT loads into a smaller area that can be controlled and monitored more closely than if the loads are spread out throughout the data center. To a large extent, the high‐density zone is treated as its own mini‐data center; it has its own cooling and power delivery and is kept separated from the rest of the data center environment. Ideally, it is completely thermally neutral, neither adding to nor easing the existing environmental load of the data center room.

By using a row‐based cooling architecture, it is possible to utilize CRACs and modify the current room architecture to utilize a zone containment model to divert cooling to and waste heat from the high density zone.

The Benefits of Containment

Containment allows for hot and cold air streams to remain separate, increasing data center efficiency. When implementing a high‐density zone in an existing data center with existing perimeter air conditioning, containment is advised in order to maintain the environmental neutrality of the zone.

Hot‐Aisle vs. Cold‐Aisle Containment

Hot‐aisle and cold‐aisle containment models are the primary choices for building a highdensity zone in an existing data center. As we discussed in an earlier chapter, the hot‐aisle technique requires that the aisle be completely contained with a separate exhaust method for hot air so as not to contaminate the rest of the data center.

Obviously, this solution (see Figure 4.5) requires a minimum of two rows of racks for the high‐density zone to allow the proper containment to be created. There are no special space requirements otherwise, as the space requirement for an even number of racks is the same as it would be for any pair of low‐density racks in the existing data center.

Figure 4.5: Hot­aisle containment for a high­density zone.

Rack containment has the benefit of not requiring complete pairs of rows for the containment model to work. It can be used for a single rack or single row of racks and is effective in that configuration.

Figure 4.6: Rack containment for the high­density zone.

In the standard rack containment model, the hot air exhaust is contained to the rear of the rack with a front containment panel, and a rear panel or series of panels is used to channel the contained hot air. An optional configuration uses both front and rear containment to allow for a completely standalone high‐density rack.

Utilizing the Pod Model

If part of the data center migration plan includes expanding the data center, it is worth taking a look at the pod concept. In this model, the pod becomes the building block of the data center. Each pod is a complete entity containing:

  • Power
  • Cooling
  • Servers
  • Storage
  • Networking

This setup allows the capabilities of each component of the pod to be maximized, which makes it a very efficient model in terms of resources. Power and cooling is tailored specifically to the IT load of the pod, which results in minimal waste. Conceivably, different flavors of pods could be designed; servers could be one type of pod, while storage another, all with power cooling and networking components tailored to the specific needs of the pod. Growing the environment would mean adding pods of the appropriate nature.

Pod technology has also proven successful for organizations that provide services, such as application and managed service providers. Some are looking to roll out containerized modular data centers for the ultimate in flexibility and efficiency as IT demand grows.

Cold air containment systems (CACS) allow for the data center to make use of an existing room‐cooling architecture. They do so by utilizing techniques such as hanging plastic curtains to contain the cool air from a traditional perimeter cooling system where the cooled air is delivered via a raised floor plenum (vendors are now offering ceiling panels and aisle doors to deliver a much less "home grown" approach to implementing CACS). In this model, the delivery of the cooled air is controlled and the remainder of the room functions as the hot air return plenum.

Although this solution is effective at delivering cool air where needed in an existing data center, it is extremely inefficient when compared with more modern and aggressive cooling techniques. There is a lower limit to the power consumption at each rack using this model, generally figured at 6kW/rack, and it can be very difficult to deliver adequate air pressure to the CACS due to issues of distance and potential airflow congestion in the raised floor due to cabling and power runs.

Power

The power question is one of being able to deliver sufficient power to the IT loads at the specified level of redundancy so that availability is assured. How this can be done in your data center is a matter of what your existing power infrastructure looks like.

Upgrading Existing Systems

In the initial design of the existing data center, there may have been provisions to scale the UPS systems. By scaling the existing system to meet the current needs, the power capabilities of the data center can be upgraded to a certain point. However, once the legacy system has reached its maximum power delivery capability, the problem of delivering power will once again need to be dealt with. If the choice is made at that point to completely replace the existing system, the expenditure to scale the system will likely have not yet been amortized. Remember that scaling the legacy UPS system is also a matter of scaling the redundant as well as the primary system.

Introducing Scalable, Rack‐Based Solutions

Scalable, rack‐based UPS systems will allow not just for the power delivery in that rack to grow but additional racks to be installed without impacting the existing systems' power delivery. It is entirely possible that upgrading legacy systems will be a less expensive alternative in terms of capital outlay.

The expense, however, isn't just budgetary; upgrading parallel‐bus UPS systems requires shutting down the systems, performing the upgrades, then testing and recommissioning the system before bringing the power to the data center back online. Thus, the data center will go offline for at least 24 hours in a normal upgrade process.

Figure 4.7: Rack­mounted UPS.

Scalable rack‐based systems get around this problem completely by being self‐contained racks within the data center; the worst‐case scenario brings down a single rack while the UPS system within the rack has additional redundant modules added. In many cases, however, the IT load on the rack need never go down as part of this process. Simple UPS upgrades can be done during scheduled downtime or when the workloads are the lightest.

Evaluating the Costs

Hardware costs are a capital expense and can be budgeted for, but downtime is incredibly expensive due to the cascade effect that being offline has on business processes. Upgrading the legacy systems entails significant downtime for the data center. A simple 48 hours of downtime for a complete upgrade of the data center power supply could easily entail costs approaching half a million dollars; even more in a large enterprise business entity.

With a scalable, rack‐based alternative, the cost of upgrading system capacity is likely to be less than 5% of the cost of upgrading legacy whole‐data center style UPS systems. This is a significant cost savings and represents a much more efficient utilization of IT resources. The same holds true for row‐based pods, as we discussed earlier. By adding a new zone to handle the new IT loads, disruption of the existing IT loads can be avoided.

Management

Management of the data center power and cooling infrastructure has traditionally been a somewhat haphazard affair. Although tools existed to manage specific devices, they rarely integrated with each other and almost never provided information of the ongoing behavior of the power and cooling systems relative to the impact of specific devices on environmental changes. Understanding what caused a specific problem was often just a series of "best guess" attempts; trying to evaluate the potential impact of adding highdensity racks to the data center at a particular location was basically impossible.

Problems caused by changes in the data center infrastructure often led to unexpected problems with the power and cooling infrastructure as the data center staff lacked the ability to evaluate even the potential for problems after the IT loads had been moved, changed, or added. Adding the ability to reliably predict the behavior of the data center power and cooling infrastructure is a crucial component of building an energy‐efficient data center that is responsive to business needs and requirements.

Capacity Management

Capacity management for power and cooling has traditionally been done at a fairly high level. Data center managers knew what their total capacity was and took pains not to get too close to their maximum capacity based on the requirements of each piece of equipment within the data center. However, tools for managing the capacity of the data center at a more detailed level were lacking. Individual pieces of equipment often had vendor‐specific tools that allowed their equipment to be monitored, but the issue is one of being able to monitor the overall capacity of the data center, track environmental changes, and plan and evaluate the potential effect of different IT loads at different locations.

Figure 4.8: Capacity management is a major component of service delivery.

Capacity management and planning tools need to be able to provide data about ongoing operations of the data center as well as answer fairly specific questions about what is going on in the data center and what the effect of changes would be:

  • What are the best locations to install new equipment?
  • Can new technology be deployed using the existing power and cooling infrastructure?
  • Will relocating blade servers within the data center improve operational conditions?
  • What impact will new equipment have on safety margins for power, cooling, and back‐up runtime?
  • How close to the available limits of power and cooling is the existing configuration?
  • At what point will additional capacity be required?
  • Is the configuration able to maintain power and redundancy even in fault conditions?

In order to determine the answers to these questions, it is necessary to monitor the data center at the appropriate level. For the most effective information necessary for accurate capacity planning and management, monitoring at the rack level provides the best choice. Monitoring the individual racks allows for detailed information about a very small area of the data center that is easily aggregated for an overview of the entire data center infrastructure and environment.

Figure 4.9: Rack­level management works best for capacity management and planning.

In the past, data centers were often built with hugely over‐designed power and cooling delivery systems. This meant that there was very little need to have a detailed understanding of the data center operational environment. The existing IT technologies would be incapable of outgrowing the over‐sized systems in the data center.

But as time went on, the data center IT technology and loads changed and what was once an over‐sized system gradually got to a point where it was being taxed to meet the needs of the IT loads. Conversely, many data centers also remained under‐utilized, operating far below their maximum capacity, yet because of the nature of their over‐sized supply systems, they cost far more to operate than they should. Both of these situations are unacceptable in today's business world. The rapid changes occurring in IT technology, which lead to fairly quick changes in IT equipment within the data center, means that the inability to monitor and plan for the changes within the data center quickly becomes a business liability, negatively affecting IT's ability to respond to business demands.

By implementing rack‐level capacity management, it is possible to accurately monitor, manage, and evaluate the ongoing capacity issues within the data center. For building the most energy‐efficient data center possible, a comprehensive capacity planning solution is critical.

Hardware Management and Monitoring Tools

In addition to capacity management and planning software, physical infrastructure monitoring and management software should be considered a valuable addition to your management suite. These are tools that let you perform centralized management of the physical infrastructure of your data center.

Ideally, these tools will be vendor neutral and allow simple plug and play with devices and equipment within the data center. Capabilities will range from real‐time device monitoring to video surveillance systems and physical security monitoring.

Like any enterprise management application, the tools will be able to provide detailed reporting capabilities and allow the user to create alerts and automated processes regarding managed devices. Integration with a larger enterprise management system is usually accomplished by using SNMP, and the infrastructure management tools will also use SNMP to allow any appropriate device capable of using SNMP to integrate into the system.

Figure 4.10: Infrastructure management tools give IT additional information about all aspects of the data center.

Familiar consoles, like the one shown in Figure 4.9, allow IT staff to quickly get up to speed on the use of the infrastructure management tools. Direct integration with power and capacity planning tools is possible if software and hardware from the same vendor is selected.

Power and Cooling Management as a Component of IT Management Systems

No enterprise IT department is without an overall IT management tool or collection of tools. Depending upon how your IT management is configured and utilized, integration of the data center infrastructure tools may or may not make sense.

At most, it is likely that you would want only a top‐level view of the data center delivered into a centralized IT management infrastructure. General status information and problem reports would tend to be the information that gets floated to the top level as detailed information on power and cooling status is of little use beyond the needs of the data center management team.

Most IT management tools/consoles are able to accept information from other information gathering devices and applications. If your data center management tools are from different vendors than your overall IT management system, you'll need to deal with the vendor on the specifics of reporting information to another upstream tool.

Server Upgrades, Consolidation, and Virtualization

Throughout this guide, we have made regular mention of the rapidly changing technology that directly affects the data center. As we discussed in the first chapter, the technologies that have the most significant impact are consolidation, virtualization, and the server hardware technology that provides the greatest support for the software technologies.

How Implementing Newer Server Technologies Can Improve Your Overall Energy Efficiency

With the technology emphasis on "greener" computing, current‐generation server hardware, while not using any less power than its predecessors, uses that power more efficiently. The storage rack that was previously drawing n amount of power still draws the same amount of power but now supports two to three times the amount of storage. Multiprocessor servers still draw the same power, but now each CPU socket contains a multicore processor that significantly increases the processing power of the server.

Server consolidation and virtualization takes better advantage of the hardware. The previous tendency of server hardware to spend most of its time sitting idle has been reduced to a large degree as the IT loads on that server have been increased due to the software technologies that allow the server to do more work.

Hardware

As IT departments have moved to higher‐density servers, the blade server has become commonplace in the data center. But as technology has advanced, the rack‐mounted servers and blade are now sporting multi‐core processors allowing the same space in the data center to do significantly more work.

The downside of this increase in work is an increase in heat; hotspots where new technologies such as multi‐processor, multi‐core servers are being racked becomes commonplace. Although it is possible to build very efficient systems in terms of IT workload relative to power and cooling consumption, implementing these servers means that a detailed understanding of the cooling and power requirements of the data center must be maintained.

Software

Virtualization and consolidation have really become the hottest topics in IT. With virtualization becoming commonplace, the higher performance servers are far better utilized and IT is able to deliver improved services and be more flexible to the deployment and provisioning of new services. As this can be done now without adding servers to the data center, we go back to the basics of being able to support the IT load on the servers already in place. As the data center is provisioned to meet those demands and techniques such as right‐sizing are applied, the combination of server virtualization and the ability to match the power and cooling needs of these data center hotspots efficiently reduces the overall cost of data center operations.

Conclusion

The decision to upgrade your data center or migrate to new power and cooling technologies is a critical one that impacts the future of the business. Thus, the decision, process and planning stages all require detailed input from both the technology and business sides of the organization. Accurate planning for future growth and a dedication to doing it right will be important aspects of this process.

Stop‐gap actions and a band‐aid approach to the issues of power and cooling in your existing data center are a recipe for disaster. Although there may be no specific point where systems stop working or are unavailable, the inability to quickly respond to a rapidly changing business environment has an ongoing negative effect on business. Upgrading the data center to allow this type of flexibility and efficiency—without sacrificing availability— provides the strongest possible base for future business development.