The information technology (IT) in an organization is a dynamic resource that is constantly adapted to meet changing needs. The rational practice of controlling that change is known as systems management. As with many other organizational practices, systems management has evolved from informal ad hoc responses to immediate needs to a well-understood, formalized practice. This guide examines best practices for systems management with an emphasis on a modularized approach known as service-oriented management (SOM).
The responsibilities of systems administrators and IT managers are growing in complexity. The need to support a growing number of systems with increasing dependencies between those systems, meet growing quality of service (QoS) expectations, and be prepared for constant security threats are just a few of the challenges faced by systems managers. Fortunately, as the demands have expanded so too have the tools and practices for meeting those demands. The purpose of this guide is to help managers and administrators apply these practices and tools to their specific systems management challenges.
This chapter presents a high-level overview of the nature of systems management with a discussion of three aspects of the discipline:
Let's begin with a fundamental issue for all IT operations—aligning with business objectives.
Systems management serves many purposes. To a systems administrator, systems management is about keeping servers up and running, keeping databases responsive to user queries, and ensuring the network continues to function. To IT managers, systems management is the means to another set of ends, including meeting service level agreements (SLAs), controlling operations costs, and meeting production schedules. To executives, systems management is an area for controlling the integrity of information and maintaining compliance with any number of regulations that may apply to the business.
There is quite a bit of overlap between the different perspectives on systems management, and the key purposes include four fundamental objectives:
These objectives provide the measures and criteria to which the systems are maintained and controlled. These are the goals of systems management.
The objective of business alignment is to ensure that the information processing needs of lines of business (LOB) are met by IT applications and infrastructures. This sounds so logical that it seems like common sense—and it is. Unfortunately, the constraints of real-world organizations present significant challenges to realizing business alignment.
The challenge with business alignment is not in understanding the need for it or even convincing business or IT personnel about its importance; the challenge is with the execution. Three common problems encountered with business strategy execution are:
In many cases, IT managers have to address one or more of these in the course of their systems management work.
The first challenge is articulating a coherent business strategy. Again, what seems simple at first glance is actually quite involved. Businesses have the general goal of maximizing revenue. They do so by implementing strategies related to product development, acquiring market share, forming partnerships, and so on. Large and even midsized organizations constantly run into the problem of multiple perspectives on a single strategy.
Consider an example of an insurance company selling auto insurance. The company's strategy may include building market share in the northeast and mid-Atlantic regions of the United States. Marketers in corporate headquarters might want to initiate a set of campaigns to build market share that may conflict with the objectives of executives responsible for third-party brokers in that region. Similarly, the Underwriting department may want to exclude a certain demographic group, such as male drivers under 32, even though that is one of the demographic groups that Marketing would like to target. Add to these issues the fact that the business case for a particular strategy is based on revenue projections that include measures such as profit margins that are calculated differently by different LOB. Before IT operations can align with business strategy, the strategy must be well defined.
A second difficulty with business alignment is meeting complementary needs with a minimum number of applications. Continuing with the insurance example, both internal sales staff and third-party agents may need to use the same policy origination system to sell insurance. Internal sales staff may use direct mail and telemarketing as the primary means to generate sales. They work in company offices and prefer a client/server model for the policy origination system. Third-party agents, however, sell products from multiple insurers and have no interest in having a client application installed on their computers from each of their partners. Web-based applications are preferable to them because applications can be downloaded on demand, do not require permanent installations, and entail fewer maintenance issues for users. Application developers are now faced with either maintaining two interfaces, one client/server and one Web, or trying to implement full client/server functionality in a Web application. The application interface is just one example of juggling multiple requirements.
In addition to the transaction-oriented operations that involve large numbers of relatively simple read-and-write operations, such as creating policies and submitting claims, users need management reporting to understand the overall state of business. Claims adjusters may want to understand trends in claims by policy type, by characteristics of the policy holder, and by geographic region. This type of aggregate reporting is common but functions best under a different set of database design models than transaction-oriented systems. Management reporting, or business intelligence reporting, works best when done with operational data stores for short-term integrated reporting and data marts and data warehouses for historical reporting and trend analysis. Here again, IT is left to create, maintain, and manage another set of systems.
Business must constantly respond to changes in the marketplace. Some of these changes are relatively slow, such as the move from strictly internal combustion powered cars to hybrid cars, some are more moderately paced, like the shift to buying downloadable music from buying it on discs, and still others are rapid changes, such as price spikes in the cost of petroleum products. How effectively an organization can respond to these changes is dictated, in part, by the organization's ability to change its IT systems. Consider some of the roles IT applications play in adapting to market changes:
In addition to these common business dynamics, there are the expected but unpredictable events, such as natural disasters, that can disrupt operations and shift priorities.
There will always be a time delay between changing a business objective and making the necessary changes to implement that in IT infrastructure and procedures. The length of the delay, though, can have a qualitative impact on the ability to execute the new business operations. As Figure 1.1 shows, IT responses may take so long that not long after they are implemented, new changes are required. In the worst case, the modifications are not finished before the next round of changes is defined.
Aligning IT operations with long-term business strategies and shorter-term objectives is a process; it is not a static state that is ever reached. The goal of IT should be to minimize the time it takes to align with business objectives, and well-developed systems management practices are fundamental to reaching that goal.
Figure 1.1: Time delays between the change of business strategy and the ability of IT to implement slow the ability of organizations to adapt to changing market conditions.
IT and business alignment is based on a number of assumptions, including the technical integrity of the IT infrastructure.
Technical integrity is the quality of information systems that ensures data is accurate, reliable, and not subject to malicious or accidental changes. Like business alignment, technical integrity is one of the characteristics that are so logical that we take it for granted. Systems administrators do not take it for granted, though. Consider some of the challenges to maintaining technical integrity:
Each of these categories of threats can create substantial disruption to IT operations.
Let's face it, software has bugs. Complex software is difficult to build and programmers know this all too well; an old quip among programmers is that "if builders built buildings the way programmers built programs, one woodpecker would destroy civilization." Although this statement is an obvious exaggeration, the sentiment reflects the frustration even software developers have with the practice of programming.
Using software development best practices can increase the quality of software, but even with these methodologies, the impact of tracking and eliminating all bugs would make any reasonably complex program either too expensive or available too late to be of practical use. As a result, users, systems managers, and developers have all learned to manage less-than-perfect applications.
One practice used to manage software deficiencies is patch management. Software developers release corrections to applications, known as patches, which are applied by systems administrators or through an automated process to correct known errors in software. Patching is not a trivial process and several factors should be considered when patching:
In addition to these basic considerations, applications may have issues. For example, a relational database may be used to support two applications. One application does not function because of a bug that can be patched but the patch breaks another function required by the second application. Should the database administrator patch the system anyway? Keep the current version and work around the bug? Install another instance of the database, patch one instance, leave the other instance un-patched, then run the two applications on their respective instances of the database? To find the right answer, systems administrators and database administrators have to weigh the costs and benefits. Just as business objectives can create competing demands when addressing IT and business alignment, patch management can leave systems administrators to choose between equally undesirable options.
Malicious software, commonly known as malware, includes software that ranges from annoying to destructive. Some of the best known forms are:
The list of malicious software categories is intimidating. To make matters worse, malware writers seem to be in a constant cycle of responding to anti-malware developer's countermeasures who then respond to the malware writer's new tricks, and on and on. On the positive side, systems administrators are able to keep malware at bay with effective anti-malware devices.
The use of anti-malware software, such as desktop antivirus software, and network appliances, such as content filters, can provide adequate protection for most needs. The need for these countermeasures introduces additional responsibilities for systems administrators. In addition to the desktop productivity applications, application servers, databases, routers, Web servers, and all the other business-specific and supporting tools that must be managed in an IT environment, systems administrators must now manage a large class of mission-critical security applications and devices. Malicious software often takes advantage of poorly secured configurations.
Another threat to system integrity is configuration vulnerabilities. It has been a common practice to ship complex software with standard configuration that works "out of the box" on most systems. The goal is to make software as easy to install and use as possible. This goal is understandable; we do not expect to have to master the intricacies of a new application just to install it. The problem with this approach is that to make software functional with minimal user intervention on a wide variety of machines, applications tend to install services that are not always needed, highly privileged accounts, or other potential security vulnerabilities:
Software vendors are getting better about installing more secure default configurations, but as systems grow in complexity, the potential for insecure configurations grows as well.
Another aspect of configuration vulnerabilities is that installed systems may require patching. Such is especially the case when an OS is first installed; security patches created since the software was released should be installed immediately. This task can be relatively easily and automated or it may require significant effort on the part of systems administrator.
Microsoft, for example, provides the Microsoft Update service for Windows OSs, which can be configured to automatically download and install critical patches (see Figure 1.2).
Figure 1.2: The Microsoft Update service can determine necessary critical patches and install them on Windows OSs.
For more thorough vulnerability scanning, the Microsoft Baseline Security Analyzer (MBSA) can detect configuration vulnerabilities as well as missing patches, and MBSA allows systems administrators to scan multiple systems in a single session.
Figure 1.3: The Microsoft Baseline Security Analyzer scans systems for vulnerabilities as well as missing patches.
In other cases, systems administrators must keep abreast of critical patches by subscribing to mailing lists or checking vendors' support Web sites to finds patches. Even when patches are released, they should not be installed in production without first testing on a quality control platform. Although installing a critical security patch for Windows can create unanticipated problems, in most cases, the risks are far outweighed by the benefits. Nonetheless, systems administrators should have a contingency plan in place for restoring the original configuration if unanticipated problems occur.
It should be noted that even with security analyzing tools, one type of malware—rootkits—are particularly difficult to control once they have compromised a system. Rootkits hide their presence and activities by changing registry settings or other system parameters, hiding files, erasing log entries, and other techniques. Once installed, it is difficult to guarantee that rootkits have been removed without performing a full reinstall of system software.
Even with updated patches, securely configured servers and desktops, and malware countermeasures, systems managers must contend with yet another threat: inappropriate access controls.
Access controls grant privileges to employ information resources to users typically based on users' role in the organization. A CFO is likely to have full read access to any data in the financial system while an accounts payable clerk would have no access or highly restricted access to accounts receivable information. This variance in access based on roles reflects one of the goals of security administrators, which is to ensure that users and processes have the least privilege required to accomplish their functions. Although the CFO and accounts payable clerk examples are relatively simple, determining least privilege can become more complex. For example
Managing access controls is a complicated and, in some cases, time-consuming process. One of the objectives of systems management is to ensure that access controls are synchronized with organizational policies at all times. These policies should include descriptions of who is allowed administrator access to systems, password strength and term of use, and audit procedures.
Maintaining the technical integrity of multiple servers, desktops, mobile devices, and network components requires a thorough understanding of application and OS configurations, countermeasures to threats from malicious software, and effective access controls and other security measures. Closely related to the issue of technical integrity is systems availability.
Disasters happen. Some are natural, such as hurricanes, and some are technical, such as the SQL Slammer worm that effectively shut down large segments of the Internet in 2003. In both cases, businesses and organizations lose some degree of access to their systems. The practice of business continuity planning has evolved to address these kinds of disasters as well as other less dramatic events that can nonetheless have an impact on systems availability.
Systems administrators play a key role in continuity planning because of their knowledge of systems organizations, dependencies between systems, and the process that operate on those systems. As IT infrastructure becomes more complex, business continuity planning become more difficult. Information about the state of IT systems is needed to adequately prioritize services (for example, in the event of a service disruption, restore payroll systems and then customer service systems, leaving other production system for later) and ensure that all necessary systems and processes are accounted for. A centralized and up-to-date database with information on the IT infrastructure is required for cost-effective business continuity planning and execution. Figure 1.4 shows an example of how IT assets can be centrally managed in relation to other assets and organizational structure.
Figure 1.4: Centralized information about IT assets is a key enabler of effective business continuity planning.
The term compliance is getting a lot of press these days, perhaps to the point where we've stopped paying attention to it, but doing so would be a mistake. The importance of maintaining the privacy and accuracy of information is becoming more broadly recognized. Some of the best known regulations make that clear:
A host of other IT-related regulations have been enacted by governments around the world. In addition, non-governmental or quasi-governmental bodies have adopted standards and frameworks related to financial reporting and security best practices. Table 1.1 lists some relevant but less well-known regulations and frameworks that apply to particular countries or industries.
Regulates credit and risk reporting for banks
21 CFR Part 11
Regulates pharmaceutical information management
Computer Fraud and Abuse Act
Makes unauthorized access to information in federal and financial institutions illegal
Electronic Signatures in Global and National Commerce Act
Recognizes the use of electronic signatures in commerce
Regulates business continuity planning in financial institutions
Regulates consumer privacy in banking
Regulates information security planning in federal agencies
EU Directive 95/46/ EC - Data Protection and EU Directive 2002/58 EC – Directive on Privacy
Controls the use and distribution of personal information of EU citizens
Protects personal information of Canadian citizens
Australian Federal Privacy Act
Enacts Australian privacy principals
Table 1.1: Additional regulations related to IT.
These and similar regulations are placing new demands on IT mangers and systems administrators to not only comply with these regulations but also demonstrate that they are in compliance. As with business continuity planning, compliance requires a centralized management view of all information assets to meet these demands efficiently.
The goals of systems management range from maintaining technical integrity and system availability to achieving compliance with regulations and aligning with the strategic plans of the business. These are demanding goals and reaching them is not guaranteed, especially if systems management practices are not sufficient for the task.
Systems management, like other areas of IT management, has a range of management philosophies. Some are loosely coupled procedures that are created—and evolve—as needed in response to immediate needs, while others are highly structured and formalized around written policies. For the purposes of this guide, it helps to examine three high-level categories of systems management practices:
This list is not meant to be exhaustive and there are nuances within each category that will not be examined. The goal is to understand that not all systems management practices are the same— some are more viable than others—and that the level of effort required of and effectiveness realized from different practices can vary widely.
At one end of the systems management spectrum is the ad hoc style, which is characterized by a lack of formal policies and procedures. This approach to systems management responds to requirements, incidents and problems as they arise without the benefit of planned responses. The results are suboptimal, at best.
Consider scenarios that can arise when ad hoc systems management is used:
Although these examples are fictitious, the consequences described will probably sound familiar to many IT professionals. A lack of central planning, uncoordinated decision making, and the willingness to make changes to IT infrastructure to meet an immediate need without concern for the ripple effects on the rest of the organization are the hallmarks of ad hoc systems management practices. The consequences are predictable.
The lure of ad hoc systems management is that it appears responsive and unencumbered by unnecessary bureaucracy. If a user needs a program, he gets one. When an analyst needs better reporting, she develops her own Microsoft Access database. If the server runs low on space, the administrator buys another disk. Decisions are made quickly and executed just as rapidly. The consequences from these decisions, like the effects of a degenerative disease, accumulate over time.
One of the first noticeable impacts of ad hoc management is a lack of overall understanding of the state of the IT infrastructure. There is no place a manager can go to find a list of assets, their patch levels, the licenses associated with them, the state of their backups, or the applications running on particular devices.
Auditors would quickly point out that a lack of well-defined policies and procedures leave the company potentially in violation of regulations governing information integrity (for example, the Sarbanes-Oxley Act) and privacy (such as HIPAA).
Poor management increases costs as well. Rather than manage storage, some would rather "throw another disk" at the problem, adding the immediate cost of new hardware and the ongoing cost of supporting the additional hardware to IT spending. When infrastructure grows, known as "server sprawl," without an overall plan, there is the potential for a new hardware configuration that requires a new software configuration, which just adds to the management headaches of administrators. And because an organization is dealing with an ad hoc management attitude, tools such as a configuration management database are not available to shoulder the burden of additional configurations. Other costs creep in because of poor use of resources. For example, instead of sharing a disk array among multiple departments, each department may buy their own direct connect storage with each server, along with backup devices for each department. Without centralized planning, there is little opportunity to leverage the economies of scale.
Security suffers because of poor management practices. Malicious software, information theft, and other threats are a constant problem for systems administrators and IT managers. At the very least, basic information security management requires:
Clearly, the lack of centralized management information and ineffectual or poorly implemented procedures that characterize ad hoc management undermine even the most basic security requirements.
Lack of management controls, poor use of resources, lack of compliance, and the potential of security threats should be motivation enough to move beyond ad hoc management to a welldefined and centrally controlled management model.
As IT organizations grow and mature, the need for planning and control becomes more pressing. The disadvantages of ad hoc management are apparent; the advantages of well-defined policies and procedures are equally obvious. Industry best practices for systems management have been defined in two frameworks: the IT Information Library (ITIL) and Control Objectives for Information and Related Technologies (COBIT).
In a controlled systems management environment, the core IT operations are performed according to a series of policies that attempt to realize broad business goals by providing reliable and responsive computing and network resources at the most efficient cost. The core processes within IT that should be governed by policies are:
The policies addressing each of these areas define the purpose of the operation, identify the scope of the process, and offer guidelines for implementing the policy. For example, the purpose of implementing security controls is to ensure the integrity, confidentiality, and availability of information services. The scope includes areas such as physical security, access controls, telecommunications, and network security, as well as auditing and system monitoring. The guidelines surrounding security might include a discussion of the need to balance security with ease of use and the risk level acceptable to the organization.
Procedures are a series of specific tasks that implement a policy. For example, the procedure for acquiring software might include a review of the business requirements and deficiencies in existing software and an assessment of buying instead of building a solution. Next, the procedure might call for a review of the proposed solution. Which OSs does the application run on? What kind of server is required? What changes to the network configuration are required? Will the application be accessible from beyond the firewalls? Does it depend on other services, such as a Web server, on the network? After these steps are performed, the final stage may be a review by a change control committee made up of application owners, network administrators, and systems managers who make one last review for potential problems with the solution.
To some, these steps may sound like bureaucratic overkill, but it is better than the alternative presented by ad hoc management. Controlled systems management builds on policies and procedures that recognize that IT operations consist of a series of services, such as client management, network management, application management, and so on—not just a collection of independent, unrelated devices and programs as understood within the context of ad hoc management (see Figure 1.5).
Figure 1.5: Controlled systems management depends on well-defined policies and procedures that address each of the key services provided in an IT environment.
It is not the goal of controlled management to slow IT operations or impose arbitrary bureaucratic overhead. In fact, controlled management improves responsiveness and effectiveness in the long run even though it may seem to slow changes or acquisitions in the short term.
There are two ways to streamline controlled management, and they dovetail well. First, centralize management information and automate routine tasks. A centralized configuration management system is the foundation for this method and will be discussed in detail in Chapter 4. The second is to leverage the information in the centralized configuration management system (along with data about how procedures are implemented) to improve the way you implement the tasks of systems management.
Much has been written in the popular business press about quality and improvement. These topics do not garner the press they once did, but the principals of quality improvement and innovation are still relevant, even in systems management. Perhaps the best-known and most well-established approach to realizing continuous improvement is Six Sigma, a data-centric quality approach; another practice, Management by Fact (MBF), also emphasizes the importance of managing by using measurements of performance.
Once an IT group has implemented controlled systems management practices, the group will have information about assets, business use of those assets, and the changes those assets undergo. In essence, the organization will have procedures for effectively managing assets as well as data about the performance of those assets and related operation. With this, IT managers and systems administrators can find ways to improve on operations.
For example, by measuring the time from identifying the need for a new application to deploying the solution, as well as key milestones in between, the organization can better understand the average time to deploy, common bottlenecks in the process, and shared characteristics of failed efforts. These and other key performance indicators (KPIs) form the foundation for measuring improvement.
Another example relates to security. If a security breach occurs, a well-managed environment will have audit trails and logs to help diagnose the breach as well as recovery procedures for getting operations back online and data restored to its correct state. Configuration management information, along with audit trails and logs, can help identify both specific and general vulnerabilities in the current environment, which can be addressed to prevent future breaches of the same sort.
Exceptionally well-run IT operations are not the result of one or two geniuses formulating a perfect solution; they are instead the product of disciplined policies and procedures tightly linked to business objectives. Like the business objectives themselves, the policies and procedures are not static but are subject to innovation. As the well-respected management researcher and writer Peter Drucker noted,
The purposeful innovation resulting from analysis, system and hard work is all that can be discussed and presented as the practice of innovation. But this is all that need be presented since it surely covers at least 90 percent of all effective innovations. And the extraordinary performer in innovation, as in every other area, will be effective only if grounded I the discipline and the master of it (Source: Peter Drucker, "Principals of Innovation" in The Essential Drucker (Harper Business, 2001).
The discipline of systems management can be mastered and the practice of systems management can be adapted and improved to meet the specific needs of different organizations.
The spectrum of systems management ranges from the reactive, uncontrolled ad hoc approach, through a controlled, procedure-guided method to an adaptive model built on well-defined controls that use performance measures to improve operations. Although there are many ways to organize systems management operations, one of the most promising for the complex heterogeneous IT environments of today is the SOM model.
Systems management is a discipline that can be mastered and perfected. To get to highperformance levels in systems management, organizations must implement policies and procedures that systematically define and control how changes are made, how new infrastructure is introduced, how systems are maintained, and a number of other areas. The diverse nature of systems management requires a classification scheme that allows you to organize the discipline into domains that can then be analyzed and optimized. For example, systems management includes both client management and security management. There are clearly overlap between the areas, but treating them as distinct operations will prevent you from being overwhelmed with the complexity of dealing with all the issues in both domains at once. One way of controlling the complexity is to think of systems management as a series of services that need to be delivered.
The domains of systems management can be viewed as services provided to users, applications, and the organization as a whole. These services are managed within an umbrella framework that is both modular and open. Some of the most important are:
At first glance, these domains seem unrelated—such as financial management and change management—but they are all required for effective systems management and therefore must be included in any framework that purports to support the full breadth of demands in systems management. The details of these domains are beyond the scope of this chapter; instead, this chapter will examine the defining characteristics of a service-oriented architecture:
A unified management framework is a common set of tools and services that support a variety of systems management operations. The centralized configuration management database is the best example of a constituent component of the unified management framework. Others include communication protocols, reporting systems, and client interfaces. As Figure 1.4 showed earlier aspects of different services are available from a single Web interface when a unified platform is used. Within the unified management framework, a series of modular services are available.
It is important to treat domains within systems management as distinct areas with their own set of requirements. For example, change management requires information about the state of software and hardware configurations throughout the enterprise. Before a port is closed on a firewall, a network administrator needs to know whether an application is using that port. This type of information is not required to manage the financial aspects of systems management and should be isolated from financial functions. At the same time, however, some of the configuration information has a definite impact on financial matters. For example, knowing the number and versions of OSs running within the organization is essential to managing licensing costs.
In addition to isolating information complexity, a modularized approach to systems management enables the framework to incorporate new services and management models as needed. A midsized company might not need capacity planning services initially, but as the company grows and the complexity of the IT infrastructure increases, manual methods for capacity planning may no longer be efficient or sufficient. For this reason, it is critical that a SOM framework be open.
An open architecture is one that uses common, well-known protocols that are not proprietary to any one vendor or organization. In the world of systems management, an open architecture lends itself to incorporating multiple modules from a single vendor as well as leveraging services available from third parties. For example, a router vendor may provide data on router performance through the Simple Network Management Protocol (SNMP), which is collected in the centralized configuration management database, then integrated with other data collected from other network devices used in management reports generated by the systems management reporting module. By combining the benefits of a unified management framework, modular services, and an open architecture, organizations can realize the benefits of service-oriented systems management.
There are several benefits of service-oriented systems management beginning with improved understanding of the state of IT infrastructure and better control of that infrastructure. From there, organizations can realize improved cost effectiveness and improved QoS by standardizing infrastructure and procedures. With a centralized configuration management system in place, there are opportunities for automating manual processes—another area of potential cost savings. The benefits are not limited to just operation efficiencies.
Better systems management and standardized infrastructure and procedures lend themselves to improved security. When systems administrators can automate routine tasks—such as checking audit logs, distributing security patches, detecting which laptops are not running personal firewalls, and other mundane and time-consuming but essential, tasks—they have more time to spend on high-level security issues, such as risk management and compliance. As Figure 1.6 shows, the basic benefits enable organizations to meet higher-level needs as well as day-to-day operational requirements.
Figure 1.6: Service oriented systems management enables organizations to meet both operational and strategic objectives.
Any organization with IT systems practices some form of systems management. How well they do so varies. The goal of systems management ultimately is to meet the strategic objectives of the organization, which include aligning with business operations, preserving the integrity of systems and information, and adapting to the changing needs of users. Systems management is a broad discipline with many domains; some of the domains are similar, some are less so. Underlying the entire practice, though, is a common set of information, processes, and procedures that are best managed as a unified whole. At the same time, the complexity of systems management requires a modularized approach to enable cost-effective and manageable solutions.
SOM builds on the best practices of systems management, information security, governance, and related areas. The remaining chapters of this guide will describe in detail the elements of these best practices, the tools needed to implement the best practices, and the organizational direction and policies needed to realize the benefits of SOM.