In the previous six chapters, I've spent a lot of time discussing what works and what doesn't when it comes to network configuration management. I've shown you various processes that you might adopt or modify and begin using in your environment. I've explained some of the underlying technologies that support automated configuration management, and I've discussed different categories of tools that you might be interested in to help automate configuration management in your environment. In this chapter, I'll introduce you to the IT industry's best practices for change and configuration management, and help you understand how they apply more specifically to change and configuration management in network devices.
Most major professional industries have sets of best practices. Accountants, for example, follow Generally Accepted Accounting Practices (GAAP), which are a set of best practices that have evolved over time. Attorneys also have best practices, as do doctors, nurses, and many other professions. IT, however, has evolved at such a fast pace that formal best practices haven't been forthcoming.
ITIL is essentially the IT industry's GAAP: ITIL are a set of documented best practices that come from the industry's long experience with IT management. Like GAAP, they're not laws or hard-and-fast rules, but rather a set of common guidelines that have worked for a number of IT organizations over a long period of time. Using ITIL, like using GAAP, isn't guaranteed to keep you out of trouble, but you're a lot less likely to encounter problems in your IT infrastructure by implementing the practices set forth in ITIL.
You can draw some interesting parallels between ITIL and GAAP. GAAP, the set of best practices used by accountants, is simply a set of practices that everybody more or less agrees on as being accurate. They're not laws or formal rules, and they don't cover every given situation. But they're a good idea, and accountants who stay well within the guidelines established by GAAP have a better chance of winning an audit or accounting review, simply because everyone agrees that GAAP is the right way to do things.
We in the IT industry haven't been subject to the kind of intense scrutiny that accountants are. However, the situation might be changing. Security, service availability, management, and other IT aspects can all have a major impact on IT operations and on businesses that rely on IT. It is becoming more common for IT managers to be formally "called to the carpet" to explain costly downtime, security breaches, and so forth. In the past, such incidents might have earned the wrath of a director or even the company CEO; today, it might gain the attention of shareholders and even government regulatory agencies. In fact, many aspects of IT management are already gaining regulatory recognition.
ITIL is the only comprehensive set of documentation for best practices in the IT industry.
Originally published as a set of best practices books by the British Office of Government Commerce (OGC), ITIL has been adopted by a number of companies and organizations across the world. In the United Kingdom, ITIL books can be purchased directly from the government's Stationery Office; outside the UK, a number of independent publishers have licensed the rights to reproduce the ITIL books.
ITIL is organized into several major publications:
The concepts and processes in ITIL are non-proprietary, so you are free to use them however you choose within your organization. Borrow from them, adopt them wholesale, or simply use them as a guide to improving your existing processes.
There is no formal certification process for proving that you're ITIL-compliant, although a number of consulting companies offer services that they claim will ITIL-certify your organization. The OGC has several individual-level certifications designed to demonstrate a person's knowledge and experience with ITIL; there are no organization-level certifications along the lines of ISO9001.
On the OGC Web site (http://www.ogc.gov.uk/index.asp?id=1000368), OGC notes that it is waiting for the forthcoming BS15000 standard, which will incorporate aspects of ITIL. This standard is being produced by BSI (http://www.bsi-global.com), which serves as the National Standards Body of the UK.
Contrary to popular belief, the ITIL materials are not public domain in the usual sense of the term. Although they are widely available, they are copyrighted by the UK government, and that copyright is recognized under international copyright laws and agreements.
There are several places where you can learn more about ITIL in general. Two useful resources are:
Be cautious, however, about simply searching Google for "itil." Several consulting companies, unaffiliated with OGC, have domain names that include ITIL and go out of their way to look like "official" ITIL Web sites. ITIL publications should run about $150 to $200, so if you're being asked to pay more than that, or being asked to buy them in a package with consulting services, read the fine print carefully.
For this chapter, I'm going to focus entirely on two aspects of ITIL: change management and configuration management. ITIL has very specific definitions for those terms, and defines different processes and activities for them. Although separate, change and configuration management are complementary aspects of IT management.
My goal in this chapter isn't to parrot ITIL or provide you with a complete education in its principles and processes. Instead, I'm distilling the ITIL processes relevant to network device configuration management, and showing you how these processes work in an actual, production-level environment. Where possible, I'll simplify more complex or abstract ITIL concepts to keep them on a relevant, working level.
Although many sources tend to use the terms interchangeably, ITIL makes a distinction between change management and configuration management. In this section, I'll focus on change management; I'll cover configuration management later in this chapter.
ITIL defines change management as a means of controlling all changes that occur within the IT environment. These changes might include updates to software applications, configuration changes to network devices, redesign of the network infrastructure, or even something as simple as changing a backup and restore schedule. The ultimate goal of change management is to accomplish all changes without errors or wrong decisions, so changes never create a negative situation (such as downtime) and there is never a need or reason to roll back or undo a change that has been made.
Figure 7.1 illustrates a simplified version of the ITIL change management process.
Figure 7.1: A simplified illustration of the ITIL change management process.
This process includes several discrete functional areas, including:
In the next several sections, I'll discuss each of these activities in more detail.
Reading through the ITIL documents and related information on the Web can be an exercise in jargon. The reason, in part, is the result of the many specialized terms and acronyms used in ITIL and, in part, because the original documents were developed in the UK, where the language is slightly different than American English. Some of the general terms you'll need to keep in mind include:
The OGC maintains an ITIL glossary online at http://www.ogc.gov.uk/index.asp?id=1000369; this online glossary is a useful reference for any unfamiliar terms you come across.
The purpose of change logging and filtering is to apply some sensible precautions to incoming RFCs. Primary goals include categorizing RFCs and allocating resources to handle them. ITIL recognizes that most companies have sufficiently complex bureaucracies already, and suggests several steps, including review time limits, designed to prevent change management from becoming an all-encompassing process that never actually gets anything done. "Shipping is a feature" is a common internal quote heard at many software companies. This thinking makes it clear that all the features in the world are useless if they're not in users' hands. ITIL adopts a similar attitude regarding change—change that is never implemented isn't change at all.
Changes—or, more specifically, RFCs—occur for several reasons:
Obviously, some of these reasons—such as RFCs intended to address an immediate failure—are more urgent than others, and some—such as a connection to a partner network—require significantly more planning than others. The first phase of the change-management process, then, is assessing and allocating changes so that they are handled with appropriate urgency and in the appropriate order. Some RFCs might be rejected as being unsuitable, undesirable, or for other reasons; your process should incorporate an RFC appeals process to upper management for rejected RFCs.
Individuals submitting RFCs should be asked to classify their urgency. Although this classification might not be the final determination of the RFC's treatment, it provides input for the requestor's assessment of the RFC. Suggested urgency levels include:
The CAB or EC needs to either actively agree with the RFC's original priority or modify it appropriately. RFCs that are downgraded in priority are not reviewed further, but are postponed until the appropriate, scheduled occasion for reviewing updates of the new priority level.
When the time comes for a CAB or EC assessment of the RFC, the following factors should be considered:
I strongly recommend that these criteria be made available to individuals who might submit an RFC, and that they be encouraged to try and answer these questions within the RFC itself. Obviously, an emergency call in the middle of the day reporting a failed firewall isn't going to be written up in an RFC; the change request—to change the firewall's condition from "broken" to "operating"—might take the form of a Help desk trouble ticket and might not go through any kind of formal review process. That's normal, and will nearly always be the case for these reactionary situations.
However, for proactive situations in which the change is not mitigating or correcting some immediate negative condition, a review is a good idea. It lets you place the change into the context of your overall network. Sure, changing a routing table is an easy task, quickly accomplished by any administrator; but the impact on the environment can be significant and the actual benefit of the change might be negligible. The review process might, therefore, prioritize the change as Next Release, lumping it in with another set of changes in a more planned, orderly fashion.
Of course, this entire phase of the process depends heavily on RFCs that contain adequate information for the CAB and EC to make decisions. In some organizations, you might want to assign the task of creating RFCs to your Help desk staff or another organization, allowing them to assist end users and other non-technical personnel with the process of understanding and completing the RFC. To summarize, the information you'll need to collect in each RFC includes:
The reviewing CAB or EC will need to ensure that this information is relatively complete in order to make an assessment. That assessment should also include an assignment of the change's impact: Minor, Moderate, or Major. This impact assessment will affect the next step in the process.
The ITIL model provides an expedited path for urgent changes, allowing them to be immediately and quickly reviewed, built, scheduled, and implemented. ITIL offers the following recommendations for keeping urgent changes a smooth part of the overall change management process:
Ensure that urgent changes are reviewed by the CAB at their next regular meeting to assess the successfulness (or lack thereof) of the change. Whenever possible, changes that negatively impact SLAs should be avoided.
Once changes have been assessed, they can be allocated and worked on. Changes that are considered very minor can generally be built and deployed directly by an administrator who has been delegated that authority by the change manager. For example, a regular change to a device's SNMP community string might be considered a minor change because it has a very low probability of causing negative end-user impact. Even if done incorrectly, it will impact only management operations and can be easily corrected or rolled back, if necessary. Regardless, the results of these changes should be documented and reported back to the CAB for future analysis.
More serious changes—that is, changes that have a broader potential impact, a longer build time, or affect a larger number of devices—should be reviewed by the CAB prior to implementation. Doing so will help ensure that the change is prioritized correctly and that disaster recovery plans are in place to recover from an error.
Changes with a major impact should be first reviewed by upper management, then passed to the CAB for scheduling and implementation. Such changes include sweeping infrastructure changes, long-term implementation projects, and so forth. Figure 7.2 illustrates the categorization portion of the simplified ITIL process.
Figure 7.2: Categorizing change impact.
The ITIL recognizes that the formal creation of a CAB and EC, implementation of a formal changemanagement process, and introduction of the other management-related layers of the process can quickly create an undesirable bureaucratic layer in your organization. ITIL therefore provides recommendations for keeping the process a common-sense effort:
As I pointed out earlier, acknowledging that the purpose of change management is to facilitate change— rather than make it an exercise in paperwork and meetings—will help keep the process trim and usable. If you find that your CAB is spending more than a couple of hours a month reviewing changes, you need to seriously examine the nature of those changes; perhaps it's possible to roll them into a larger, more comprehensive project that can be treated as a single major change to the infrastructure.
Once reviewed by the CAB, the change is considered accepted (unless, of course, the CAB rejects it) and enters the more straightforward, implementation portion of the changemanagement process. Figure 7.3 illustrates this portion of the process, which includes scheduling, building, and implementing the change.
Figure 7.3: Scheduling, building, and implementing the change.
Whenever possible, changes should be packaged together into releases. A release is simply a group of changes (or introductions of new CIs) into the environment. The IT industry has a sort of general distaste for releases, often feeling that making too many changes at once is asking for problems. This feeling comes from times when IT professionals have introduced a series of changes, only to have all of them go wrong—meaning they're now swamped in mitigation activities. After that horrible experience, they introduce changes slowly—one at a time—and fix problems as they come up.
Had the changes been properly planned and reviewed in advance, they would have been much less likely to cause problems. Packaging them together into a bundled release would have represented one opportunity to simple human error to cause a problem. Another way to look at it: Every time you open something for changes, you might click a wrong button or take some other unintended action and cause a problem. Bundling changes into a release provides few opportunities for ancillary complications. A configuration management tool that helps deploy changes can also reduce error by consistently applying changes to multiple devices with much less possibility of wrong buttons being clicked or incorrect commands being entered.
However, release size should be managed to a reasonable standard. Introducing 300 new changes in a single morning might, for example, be beyond the capability of your support organization. It's a proven fact that Help desks are often swamped with calls after changes are made to any visible portion of the IT infrastructure: Users distrust change and often become confused when confronted with new processes and procedures. Thus, intelligently packaging your releases into a manageable size will help keep any subsequent, non error-related support issues at a manageable volume.
The CAB and EC play an important role in scheduling. Lower-priority RFCs might get bumped to later and later releases in order to make room for higher-priority releases, while keeping the size of the overall release at a manageable size. The CAB must maintain the balance between the business' needs and the technical issues resulting from changes. All such decisions should be documented, providing subsequent CAB reviews of each RFC with some context of why the RFC is where it is in the process. Figure 7.4 shows a sample release schedule, visually depicting RFCs that have been bumped to later releases.
Figure 7.4: Graphical release schedule showing delayed RFCs.
Change production is one area in which ITIL falls short in terms of best practices. ITIL suggests that authorized changes be passed to technical personnel, who actually develop the change. In the case of network devices, that usually entails building a new configuration file for a device, specifying new hardware, or something similar. Rollback procedures (called back-out procedures in most ITIL documents) must be documented at the same time; an automated network device configuration management solution can often provide built-in configuration rollback capabilities in the case of device configuration changes. However, for more hardware-level changes, such as implementing a new network segment, be sure some physical rollback plan is in place.
ITIL also suggests that the change be tested, which is of course a good idea. ITIL's idea of testing, however, is to more or less maintain a mirror of the production environment for testing. This suggestion is completely reasonable in the case of software applications—which is, in fact, where most of ITIL's change management process comes from—but wholly impractical in the case of physical network devices.
ITIL also neglects a formal peer review process. In the absence of practical testing in a mirrored test environment, peer review is absolutely essential to help ensure that changes don't contain errors or create unexpected conditions. Figure 7.5 shows an illustration of a modified simplified ITIL process that incorporates an iterative peer review process—something I've discussed at length in previous chapters—in place of the ITIL testing step.
Figure 7.5: Incorporating a peer review into the process.
Keep in mind that an effective network device configuration management solution can help facilitate peer review by making proposed configuration file changes available to the reviewer and incorporating a workflow process that helps ensure that all changes are reviewed and approved before being deployed by the solution.
Finally, with the change built and approved, it is ready to be deployed. The change manager's job is to ensure that changes are deployed on schedule and that the change is deployed successfully. The change manager should also ensure that communication channels exist to inform support personnel of exactly when a change will occur and to confirm that it has been implemented in the production environment.
In the case of network devices, because full-scale testing isn't often practical, that the CAB or EC should make a recommendation for the change's deployment timeframe. Changes considered moderate or major in impact might only be deployed during off-peak hours, ensuring minimal production impact in the event of a problem. Minor changes or those of an urgent nature might be cleared for deployment during regular working hours or during peak network usage.
I can't stress enough the important role that constant communication plays in the overall process. I've been in a number of environments in which lower-tier support personnel took corrective actions with the belief that an announced change had already been implemented, when in fact it had been delayed. Communications need to occur whenever:
If you use email to make these announcements, send them all from a single email address or alias. Doing so will allow technical support personnel to implement local rules on their email client to specially color-change announcements or highlight them in some other way so that the messages stand apart from the usual flow of email in the business. Encourage personnel to implement these rules so that important change-related announcements aren't "lost in the shuffle."
The makeup of your CAB (and its subset EC) is extremely important. First, understand that the CAB is intended to be a somewhat dynamic organization: If you're reviewing changes related to the network infrastructure, the CAB should be comprised of people who understand the business and technical issues of that infrastructure.
Because the ITIL change-management process is intended to address all aspects of change— including software, servers, client computers, infrastructure, and more—ITIL assumes that a separate CAB will exist for each (although each CAB might have overlapping members, of course).
The CAB should be small. The change manager, one or two business-savvy members, and a couple of senior technical professionals should be sufficient. The goal of the CAB is to meet, quickly review pending RFCs—remember, just a couple of hours every 2 to 3 weeks should be sufficient—and move on. A large CAB will almost invariably result in too much "management by committee," which will simply defeat the change-management process and result in a motionless bureaucracy.
In the ITIL framework, coordinating change is the responsibility of the change manager in your organization. Coordinating change is one part change management and one part project management, as Figure 7.6 shows.
Figure 7.6: Change and project management work together.
The basic change management process consists of four tasks that involve actual change management:
These are the primary responsibilities of the change manager. Other management steps occur, but are more technical in nature and classified as project management:
Although these tasks will often be undertaken by someone other than the change manager, Figure 7.6 shows that the interaction between change and project management is frequent. The individuals managing these two activities need to communicate constantly to ensure the proper flow of the overall change-management process.
The last step in the change-management process is a post-mortem, or final review of how the change implementation went and how well the process worked. You should consider a number of criteria in your post-mortem:
A basic report should be made available to the CAB or EC, and in the case of more major changes, to upper IT management. This report should summarize the original goals of the change, detail how the implementation occurred, and list any problems along with, if possible, their reasons.
Listing the reasons for an unsuccessful change shouldn't come down to finger-pointing. Reporting that "John messed up the BIOS flash" isn't helpful unless you're trying to fire John; reporting that, "Administrator error caused the wrong BIOS image to be downloaded" is useful information. You might use that information in future changes to, for example, have a peer verify that the correct BIOS image has been selected prior to downloading.
Auditing and reporting is an important part of the change-management process in ITIL. Management reporting is essential to measure the effectiveness of the change-management process. Some overall goals of this process include:
Auditing plays an important role in ensuring that the process works effectively. Auditing is a chance to objectively review everything about the process, including changes that might have occurred outside the process. Auditing can provide a frank view of the process' failure and successes, and an opportunity to revise and refine the process for continual improvement.
Auditing should focus on randomly selected RFCs, formal records of changes, minutes from CAB and EC meetings, change implementation schedules, and the records and reports associated with closed and completed RFCs. The goal of auditing should simply be to highlight areas in which the formal change-management process wasn't followed, allowing the change manager to put more focus on maintaining the integrity of the process.
ITIL recognizes that change management won't be universally accepted or successful, at least not at first. Auditing plays a role in keeping things on track, but understanding why change management can fail as a process can help you prevent problems. Common reasons include:
Be sure you plan for these and any other challenges in your environment. Solicit feedback from users, managers, and administrators to find out where your obstacles will be, and start preparing to work through them.
In the ITIL world, configuration management is somewhat more analogous to asset management. Its goal is to control CIs (such as network devices), continuously confirm their status, and audit them to ensure that they remain configured properly. There are actually four primary steps to the ITIL configuration management process:
Because ITIL configuration management is really a form of asset management, I'm going to cover it only briefly. The ITIL change-management process really focuses on the internal software configuration of network devices and is the most appropriate model to use when developing a process to manage those devices. Configuration management (ITIL-style) certainly plays a role, but it's a much less complex process.
This step requires to you positively identify each asset, or CI, under your control. On small networks, this task is easy; on larger networks, you might want to use an automated network configuration management tool to help identify and inventory resources for you.
Your inventory should be comprehensive and include device type, name, model, revision or build level, installed options (such as memory, expansion cards, and so forth), serial number, and so on. Ideally, your inventory should be detailed enough that a non-technical individual could acquire an exact replacement unit should the need arise.
The goal in this step is to ensure that CIs are not altered or replaced without authorization. That authorization should come through your change-management process, which I described earlier in this chapter. Ideally, you should implement an automated solution, such as a network configuration management solution, that can inform you of unauthorized changes to CIs.
You need a means to continually confirm the status of your network devices. Automated tools can help with this step, verifying that CIs haven't changed since the inventory or last approved change.
Auditing should be a manual process that matches physical devices to your inventory and change schedule. This step ensures that the process is being observed and that your on-hand assets match your configuration management database.
As I mentioned earlier, there are several consulting companies who will be happy to assess your current configuration and change-management practices to help you direct them more toward the ITIL standard. None of these represent any kind of formal process certification along the lines of ISO9001, but they can still be useful. You can also purchase self-assessment packages, which usually cost about $200 and include several spreadsheets that walk you through a sort of interview in which you rate various aspects of your current processes. The product then scores you for ITIL compliance and offers pointers for improving your processes by using the ITIL standards.
To get you started on a manual assessment, I'll walk you through a brief assessment that I've used with consulting clients in the past. This overview isn't intended to be a complete ITIL assessment (it focuses mainly on change management, not the entire IT services discipline), but it will hopefully help you highlight areas of your change-management process that need focus for improvement or formalization.
Ask yourself the following questions. For each "yes" answer, give yourself one point; for each "no" answer, give yourself zero points:
Ask yourself the following questions. For each "yes" answer, give yourself two points; for each "no" answer, give yourself zero points.
Ask yourself the following questions. For each "yes" answer, give yourself one point; for each "no" answer, give yourself zero points.
How did you do? If you scored less than four points for your change filtering process, you probably need a more solid, ITIL-style process for accepting, filtering, and categorizing change. Implementing such a process is the first step to a change-management process, because the change filtering step is what filters all input to that process. As the old saying goes "garbage in, garbage out;" without filtering the input to your change-management process, you can't expect it to succeed.
If you scored less than 10 points in the change implementation category, you probably don't have a formal change-management process in place, or your process is incomplete, not enforced, or poorly defined. Using the examples in this guide to define a formal change-management process and ensuring that all changes are passed through that process will help you achieve the benefits of greater uptime, greater security, greater stability, and reduced cost and effort.
If you scored less than three points in the review category, you're not devoting sufficient time toward ensuring that your process works, ensuring that your process is followed, and applying lessons to future effort. Without an adequate review and audit process, your entire changemanagement process might as well not exist.
The maximum overall score was 25. If you scored less than 20, you probably don't have a robust change-management process. If you scored less than 15, the process you do have, if any, is probably incomplete enough as to be a waste of your time. A score of less than 10 indicates that absolutely no formal process is in place or that the process in place isn't followed consistently or applied for best effect.
The IT industry is a challenging, always-changing one, which makes it difficult to develop best practices. Authors can preach good ideas based on long experience and lessons learned, but it's easy to disregard such advice as coming from one source without experience in your particular environment. ITIL, however, is the result of years of experience by hundreds of practitioners in a variety of industries and organizations. It's a formal set of best practices, documented and detailed, and applicable to almost any IT effort. This chapter has provided you with an overview of ITIL's change and configuration management best practices as they apply to network device configuration management; hopefully you'll be able to use this information as a formal starting point for your own change and configuration management processes.
In the next chapter, I'll finish this guide by going over several sample processes that you might adopt. These will be based on large part on the previous chapters of this book and the ITIL best practices, with each process tweaked to meet slightly different working conditions.