This guide's previous chapter brought closure to the ongoing story of TicketsRus.com, presenting a comprehensive example of how an APM solution can be leveraged by multiple stakeholders in an organization. Through that story, service desk employees, administrators, developers, and even IT and executive management were able to work together through a common set of visualizations towards the resolution of a major problem.
That chapter also showed how effectively resolving problems requires a data‐driven approach, one with a substantial amount of granular detail across multiple devices and applications. Using this approach, it is possible to trace a system‐wide performance problem directly into its root cause. By integrating into databases, servers, network components, and the end users' experience itself, a fully‐realized APM solution is uniquely suited to gather and calculate metrics for entire business services as a whole.
Yet the topics in the previous chapter's story were fundamentally focused on the technologies themselves, along with the performance and availability metrics associated with those technologies. Its resulting visualizations were heavily focused on the needs of the technologist:
Missing, however, in the previous chapter's story is another set of business‐related metrics that convert technology behavior into useable data for business leaders. This class of data tells the tale of how a business service ultimately benefits—or takes away from—the business' bottom line. It also creates a standard by which the quality of that service's delivery can be measured. It is the gathering, calculation, and reporting on these businessrelated metrics that comprise the methodology known as Business Service Management (BSM).
The IT Information Library (ITIL) v3 defines BSM as an approach to the management of IT services that considers the business processes supported and the business value provided. Also, it means the management of business services delivered to business customers. Businesses that leverage BSM look at IT services as enablers for business processes. They also look at the success of IT as driving the ultimate success of the business.
This is a critical approach to how IT brings value to the business; however, it isn't one that is used by all organizations. Those without high levels of IT maturity are intrinsically unable to attain alignment between IT and the business.
Chapter 2 talked at length about this problem of IT and business alignment. It discussed how different IT organizations display different levels of organizational maturity, with greater maturity bringing greater business value. Like APM, BSM is a methodology that both requires IT maturity while it develops IT maturity.
To better understand the concepts in this chapter, you may consider turning back to review those first introduced in Chapter 2. As successfully implementing BSM requires a high level of IT maturity, understanding exactly how that maturity is developed and measured is important.
BSM and APM are two methodologies that are naturally linked by their requirements for data. The information gathered through an APM solution's monitoring integrations directly feed into the requirements of a BSM calculations engine. Performance, availability, and behavioral data of the overall business service and its components are all metrics that aid in calculating that service's overall return. These metrics also provide the kind of raw data that helps identify how well a business system is meeting the needs of its customers.
Figure 9.1 shows a logical representation of where BSM links into APM. Here, APM begins with the creation of monitoring integrations across the different elements that make up a business service. Those monitoring integrations gather behavioral information about the end users' experience. They collect application and infrastructure metrics as well as other customized metrics from technology components. APM's data by itself is used primarily by the IT organization for the problem resolution and service improvement processes discussed to this point in this guide.
Figure 9.1: BSM converts technologyfocused monitoring data into businesscentric metrics.
The addition of BSM creates a new layer atop this APM infrastructure. Here, the business itself becomes a critical component of the monitoring solution. Business processes and service level expectations are encoded into a BSM solution, with the goal of creating business service views that validate and report on how well the technology is meeting the needs of the business.
This linkage is realized through an extension of the Service Model that was first introduced in Chapter 6. If you turn back to Chapter 6's explanation of APM's Service Model, you can obviously see the direct linkages between service technologies and their representation within the model. However, that discussion didn't focus largely on how representations for business services should be implemented.
Figure 9.2 shows a graphical example of how Chapter 6's physical model can be augmented with additional elements that represent the Ticketing System's business service. Here, the technology infrastructure components that comprise that service are abstracted under the element titled Ticketing System Infrastructure. Above this element, three more are added to represent the geographic locations where that service is available to customers. Finally, atop the entire model is the ultimate representation of the Ticketing System itself.
Figure 9.2: The technology underpinnings of a business service feed into BSM's Service Model.
This positioning of the Ticketing System at the model's top is significant. Not shown in Figure 9.2 but important for BSM's Service Model is how multiple business services can be represented in parallel through this augmented model. For example, the same organization may support business services for Ticket Brokering and/or Vendor Management. These services, which might or might not be available to customers in different geographic locations, are added to the topmost level of the model and linked into its geographic elements. The resulting multiple‐service representation allows business leaders a singleglimpse view across all the services and locations that make up their business.
As with the technology‐oriented Service Model represented in Chapter 6, monitoring data and associated threshold values exist underneath the BSM model's individual elements. As with the initial technologist‐oriented model, that data provides the logic that determines whether a business service, geographic location, or any other element is represented in green versus red. Further, it populates the more granular explanations of system behaviors when a user drills down into individual elements.
Different here is the type of data that is represented within this level of the model. Here, business leaders are primarily interested in information that measures "how well the technology is meeting the needs of the business" and is commonly manifested at a high level using a metric referred to as Service Quality.
Quality is a term that has been discussed previously in this guide, yet thus far with a strong focus on the technology underpinnings to business services. In the BSM perspective, the idea of quality has been formalized. It corresponds to a quantitative and numerical representation of the success or failure of a business service.
Now, at first blush, assigning some numerical value to an abstract concept such as "quality" seems inappropriate for a data‐driven solution. But think for a minute about what makes a quality service—one that meets the needs of its end consumers. To borrow a line of thinking from Chapter 6:
It is obvious to see how the first question represents a business service that is operating with a large amount of quality: The service is operational. Users are interacting with it and accomplishing their needs. It is also obvious to see how the scenario in the second bullet operates with zero quality: The service is completely down and no one is accomplishing anything with it.
Yet complex systems operate in more than just a binary "on" versus "off" state. A system can be functional or non‐functional, but it can also operate in many different states inbetween. For example, built‐in redundancy can mean that individual component failures may not affect that service's overall availability. Those same failures may only slightly affect its performance. Reductions in performance can also render a functioning service to a state of slowness that no user would want to interact with. Here, the service is operational, but not with any level of performance that can be considered "successful." Reductions in quality can also occur when the loss or reduction in performance of downlevel components ultimately means a partially‐functional service.
As you can see, the situation gets murkier when the state of a system exists in this area between "on" and "off." In each of the final three bullets, how well is this system operating? How well is it fulfilling the needs of its users? As an example, in the third bullet's scenario the service is mostly functional with some non‐functional actions. If some users can accomplish their tasks, does this make its quality higher than in the fifth bullet's scenario where the service appears functional but really isn't working at all?
The answers to these questions are obviously non‐trivial. As such, this extended conversation is intended to prove that a spectrum measurement of "quality" is necessary in addition to the simplistic green versus red representation of an element's state. A visualization of just that spectrum is shown in Figure 9.3. There, you can see how the quality of service for multiple business services is shown in a single image. Visualizations such as this one enable business leaders to identify which services are meeting the needs of their customers, yet without being bogged down in the minutiae of their technology underpinnings.
Figure 9.3: A measurement of Service Quality across multiple business services.
Further, BSM's numerical representation of quality needn't necessarily be an instantaneous value. Knowing the quality of a service over the medium‐ or long‐term enhances a business leader's situational awareness even further.
Consider the situation where the quality of a business service is constantly changing between acceptable and unacceptable states (see Figure 9.4). When today's quality measurement is reported at 77, while yesterday's was reported at 95, it is easy to recognize that the overall system's effectiveness has diminished between these two days.
Figure 9.4: A historical view of Service Quality.
When quality measurements change rapidly between acceptable and unacceptable thresholds, this information gives the business leader the data he or she needs to direct improvement activities. Planning, budgeting, and expansion activities can all be directed to the services that need them the most. Ultimately, measurements like these give IT as well as business leaders the information they need to recognize when their services are (or, more importantly, are not) meeting the needs of their customers.
You'll notice some very specific terminology used in this section associated with quality measurements. The statement "of high quality" is used rather than "of full quality" or "of complete quality." This choice of wording is important because business systems always have areas for improvement. This means that today's measurement of high quality may be below tomorrow's measurement. Quality improvement activities, an important ITIL component, enable the measurement of "today's highest quality" to be continually moved upwards as improvements are incorporated into the business service or its technology components.
This concept of a single number that represents a service's quality is great, but how can it be calculated? How does a system like BSM gather the thousands of raw metrics gathered by monitoring integrations into a single‐number definition of a service's behavior? You'll find that this process is also non‐trivial but can be made easy through the right software.
The first step in developing this sense of quality is obviously in creating the abstraction of the business service that is its Service Model. This process has been explained in detail throughout this guide. By developing the Service Model, IT and the business define the elements that make up the business service as well as their interconnections. Both the elements as well as the interconnections are important because each fills out the picture of the service in its own way.
Once that Service Model is fully realized, the next step requires the mapping of service levels, Key Performance Indicators (KPIs), user impacts, and revenue impacts atop its structure. In this process, the metrics that define success or failure within business processes are used as thresholds. For example, if the business defines a particular rate of completed transactions to mean that the service is acceptably fulfilling customer needs, that metric should be added to the appropriate element. Or, if the user drop rate from a Web front end remains within a particular parameter, adding this metric in the appropriate place is another useful threshold. In the following sections, consider a few of the metrics that you likely already have in your business today.
Service level metrics are often first defined by the IT organization or through Service Level Agreements (SLAs) between the business and IT. In these agreements, the business identifies that services and their components must be available during certain hours with an identified minimum of downtime. Mean‐time between failures, failure rate, allowable downtime, and expected performance are all common metrics that can be applied to network, server, application, or other elements by the business.
Highly mature organizations are often capable of adding performance‐based metrics into their SLAs as well. With these types of service levels, specific thresholds for element performance are known and documented. Lacking an APM solution, these types of performance metrics are often exceptionally difficult to gather and report on. Their monitoring may be accomplished through multiple, non‐integrated solutions that are separately controlled by individuals within each technology domain. With such point solutions in place, the sharing of information between domains can be difficult or even impossible. However, when APM monitoring is extended across the IT infrastructure, the business gains the ability to gather these kinds of performance‐based metrics across many different types of technology elements at once.
Service levels with external service providers are another area in which APM provides a great assist. By gathering availability and performance metrics against contracted service providers, your organization gains its own set of data to be used during outages. This information is also useful when contract disputes require chargebacks for SLA breech events with service providers. Any business service that relies on external services for a portion of its activities requires this kind of monitoring to truly gain a representation of overall service success.
KPIs are often business‐oriented metrics used to define the success level of an organization or business process. These metrics are often used to quantify activities that are otherwise subjective in nature. These can be activities such as leadership development, the level of customer‐business engagement, and overall customer satisfaction.
As a rule, KPIs should be designed as actionable metrics, with the value of the metric driving some necessary action by the organization. KPIs should also be defined to provide information on status, trends, or variance to business artifacts such as plans, forecasts, or budgets. These two elements are critically important to KPIs that will eventually be encoded into BSM, as they provide a basis for quantifying its data. In essence, you need metrics whose value eventually drives some change to the environment if they are to be useful in the context of APM. By leveraging metrics that have a known reaction line, your BSM solution can further be used to provide necessary alerting when that action needs to be taken.
Mapping KPIs to business artifacts in addition to technology components also enables the later assignment of dollar values to incidents. As you'll learn in a minute, a fully‐realized BSM solution can highlight where expensive problems need immediate resolution or when system or user behaviors are impacting the business bottom line.
Accomplishing all of this requires some sort of design tool. An effective BSM solution will provide the necessary logic to match incoming KPI data to defined thresholds and management reaction lines. That logic can be incorporated through user‐defined rules, through relationships between elements, or through complex expressions. Complex expressions may use if‐then‐else expressions or regular expressions with which to construct the necessary thresholds.
Figure 9.5 shows an example design tool where a complex expression has been constructed that validates availability metrics. In this example, both minimum and expected thresholds are described. The combination of these two values quantifies the behaviors that are considered appropriate and inappropriate for the assigned element.
Figure 9.5: A BSM solution's design tool provides a location where expressions can be constructed based on incoming data.
Chapter 8's story discussed a few examples of how the level of user impact drives the targeting of troubleshooting resources. It argued that those problems with a greater user impact should in most cases be prioritized over those with a smaller impact. Defining those user impacts is another activity that occurs within a BSM solution.
Using the BSM software's designer tool, it is possible to identify the impact associated with each of the elements in the model. For example, when a particular network connection goes down to a geographic site, the number of users in that site are known to be affected by the problem. Or, when one of a pair of clustered transaction processing servers goes down, it can be assumed that the total level of processing will be reduced by half.
Once the user impacts for individual elements are known and entered into the system, the Service Model with its interconnections is then used to identify the flow‐up and flow‐down impacts for each element. This is represented in the simplistic example shown in Figure 9.6. Based on the dependencies encoded through the model's interconnections, individual element impacts can be combined to understand how many users are affected by a problem with any element. In this example, the Inventory Processing System is known to have 1700 users, while the External Web Cluster is known to have 8300 users. Here, the loss of the Inventory Processing System can impact its 1700 users as well as a portion of those who use the External Web Cluster.
Figure 9.6: A simplistic example of how user impacts can be assigned to Service Model elements.
Knowing how many users are impacted tells one story of a problem. But knowing exactly how service behaviors impact revenues is yet another. Information about revenue impacts can be inputted manually into a BSM's threshold logic. Or, more dynamically, they can be gathered from business artifacts such as budgets, sales metrics, or other revenue data.
BSM is uniquely suited above all other monitoring solutions in that it includes the capacity to aggregate traditional technology monitoring with financial data from these kinds of sources. When sales or budgetary data is available in a format that the BSM solution can work with, it becomes possible to relate technology and user impacts to hard dollar gains or losses to the organization.
One example of this can be seen in Figure 9.7. In this example, technical information from a site's external Web metrics has been related to financial information as gathered from a sales or revenue database. In this example, a historical trendline can be developed that shows the relationship between unique visitors to a Web site and the level of daily revenue that occurs as a function of those visitors.
Figure 9.7: Using a BSM visualization to relate user count to revenue statistics.
This information becomes fantastically useful to the business leader because it provides a real‐time and historical look at the systems under their management. Yet it does so without exposing the complex technology underpinnings that aren't part of their job role. Such visualizations relate the technology behaviors to business successes as a function of revenue and/or sales. Business leaders with access to this kind of information have a much greater capability to quickly shift focus, activities, and even entire lines of business as needed based on quantitative information.
In some cases, a BSM solution needn't be used at all with its technology monitoring elements. A BSM solution provides a single‐glimpse visualization of business activities, so it becomes a location where purely financial information can be gathered for regular consumption. Figure 9.8 shows an example of this, displaying monthly profit versus payout information that is sourced directly from a financial database.
Figure 9.8: Information in BSM visualizations can be gathered from purely financial sources as well.
Nowhere in this visualization is information that arrives through APM monitoring integrations. The data here is calculated purely from financial databases or other business artifacts. Visualizations like this are often placed alongside others that contain APM‐related information in a management dashboard. The result is a more holistic situational awareness of the business service and its revenue impacts.
There has for a long time been a problem intrinsic to the collection and reporting on business‐related metrics. That problem relates the quantity of time required to collect, compute, and report metrics when done through traditional manual processes. The problem here is that for much of the history of business itself, the only way to collect these types of metrics is through those very manual and time‐consuming processes.
Notwithstanding the labor cost associated with collecting and creating the necessary reports, the manual collection process also masks another problem: data granularity. Think about the situation where it takes a few days of labor to compile and report on business metrics. The result is that metrics are always at least a few days old, and there is no capacity to see in real‐time how incremental system changes directly impact business revenues. Lacking data granularity, you're always making business decisions on old data. Today's business climate mandates that businesses constantly adapt themselves to changing situations in the economy. Purchasing or customer trends must be analyzed as they happen, with decisions made quickly, if the business is to remain agile in its products or services. This need for a constant flow of real‐time information goes completely counter to the traditional manual collection efforts that have been its tradition. Needed are solutions that collect data on a constant basis and present that information to decision makers in what could effectively be called real‐time.
BSM represents one solution that can accomplish just that goal. With a fully‐realized BSM implementation in place, business leaders are given access to real‐time information about technology as well as financial impacts. Because the data that drives their visualizations is gathered constantly through its APM underpinnings, the business leader gains greater visibility into customer and system behaviors. Leaders with this information can much better reposition their business when conditions mandate changes to the business model.
Also important here is a recognition of what to do when systems or services simply aren't meeting the needs of their customers. It's been said before that there are an unlimited number of ways in which a complex business system can be constructed but only a few that will ultimately provide value. Without the deep instrumentation gained through a BSM‐ontop‐of‐APM (BSM/APM) solution, you likely don't have the raw data that quantitatively validates that your services are fully optimized. Further, you'll never be able to make improvements to service delivery if you don't know where improvements can be made.
One important metric that is gained through the creation of quality metrics is actually the functional inverse of quality: the Cost of Poor Quality (CoPQ). When a business service is not meeting the needs of its customers, it is not bringing in a maximum level of revenue. With a BSM solution's metrics in place, it grows possible to measure the lost revenue that is incurred through poor quality service delivery. That lost revenue is directly and inversely related to the level of quality in your system.
CoPQ is a term that is defined by Six Sigma as those costs which are generated as a result of producing defective material. This cost includes the cost involved in fulfilling the gap between the desired and actual product/service quality. It also includes the cost of lost opportunity due to the loss of resources used in rectifying the defect.
Although this traditional definition relates primarily to manufacturing environments, CoPQ can be measured in other IT‐related environments as well. Think for a minute about the high‐level variables that can go into a CoPQ calculation: You need to know the quality of a system as well as its level of potential quality. Subtracting these two numbers and adding a multiplier for cost gives you this information.
More on Six Sigma and other frameworks in a minute, but for now, recognize that metrics such as CoPQ become easy to measure once metrics like quality and revenue impacts are well defined. BSM solutions have the potential to provide all of these numbers.
Today's businesses are also more global in nature. Whereas locally‐oriented businesses can easily determine their hours of operation based on those that are industry‐ and locallyacceptable, global businesses have a much harder time determining when the sign at their front door flips from "Closed" to "Open."
This situation grows even more problematic when global businesses sell their wares on the Internet. Business on the Internet is commonly considered a 7×24×365 operation, with services and businesses never really closing for operations. The expectation of never being down presents a set of problems to the Internet‐connected business. If Web services are always up,
With users connecting in from areas around the globe, one solution for the always‐on business is the creation of a business calendar. In the context of BSM, the business calendar represents the periods of time each day when inbound users are at their peak versus nonpeak hours. Depending on the type of business, this period of time can be at very different times of the day. For example, a Web service whose customers are primarily other businesses will see greater attention during the workday. In contrast, others who service families might see greater attention when users have gone home for the evening.
Creating such a calendar is another non‐trivial task. First and foremost, actual metrics of user counts must be collated and averaged based on time and day. Those metrics must be aggregated across the multiple geographic locations and time zones where the business service is primarily located. Internet‐based services with replicated infrastructures in other parts of the world will see greater levels of inbound users at different parts of the day.
Figure 9.9 shows an example of how a simplistic business calendar can be constructed across United States, EMEA, and Asia‐Pac localities. Here, the areas shaded in red indicate peak hours for that locality. Those in green represent non‐peak hours. A BSM solution that calculates a service's business calendar must then aggregate those metrics into an aggregated business schedule. That schedule can be used to answer the previously‐posed questions as well as define the hours where servicing affects the least users.
Figure 9.9: An example business calendar.
It is worth mentioning here that replicated service infrastructures across multiple localities can impact how the business calendar is used. For example, if a maintenance activity needs to occur on a local device, the local business calendar should be consulted. If the device under maintenance is used by the entire infrastructure, the aggregated calendar will determine its maintenance time. Another important point is that the BSM business calendar is intended to be a real‐time metric. As your business evolves over time, so will your business calendar.
The metrics gained through a BSM implementation are also useful when fed into management frameworks such as ITIL or Six Sigma. Like BSM's roots in APM data, these frameworks are often highly data‐driven in how they accomplish and improve upon the tasks of IT. One of the common limitations, however, in successfully implementing ITIL and Six Sigma framework processes is in gathering enough data of the right kind to be useful. The data gathering and calculation potential of a BSM/APM solution enables greater success with both frameworks. Without delving too deep into their details, let's take a quick look at the areas of each where BSM and APM can both provide added value.
The ITIL is comprised of a set of industry best practices that identify the necessary activities that are common to an IT environment. ITIL is specifically comprised of 5 stages and 24 activities within those stages. Activities within ITIL span the life cycle of service strategizing, design, transitioning, operations, and continual improvement.
A full discussion on ITIL can take an entire book (or in fact six books, which is what comprises the entire library to date). For the purposes of ITIL's linkages with BSM, consider the activities in Figure 9.10. The 12 activities highlighted in red are those that stand to gain a direct benefit from the quantitative data gathered and calculated by a BSM/APM solution.
Figure 9.10: ITIL's 5 stages and 24 activities. Those that are directly impacted by BSM are highlighted in red.
You can view more detailed information about the ways in which BSM improves each of these activities by turning to Chapter 9 of The Definitive Guide to Business Service Management.
Many of these activities have been discussed in this guide so far, although without specifically calling them out by their ITIL nomenclature. One in particular of note is the entire fifth stage of the ITIL service life cycle, Continual Service Improvement. In this stage, services in operations are analyzed with an eye towards their capacity to meet their original stated goals as well as the needs of their consumer.
BSM provides a substantial added value to this process through its identification and quantification of service quality. This quantification enables improvement teams to very discretely identify areas of gap in service delivery, develop appropriate solutions, and visibly see how well those solutions impact the overall quality of service delivery. In essence, using BSM's metrics, service improvement teams can measure the difference in asis and to‐be levels of service quality, proving that their improvement activities have indeed brought about improvement.
Whereas ITIL has a widespread focus on the required activities of an IT organization and its services, Six Sigma's focus is entirely on the improvement process alone. Again, without delving too deep into the purpose and history of Six Sigma, it is important for the discussion here to recognize that Six Sigma's improvement activities also gain quantitative measurements through the data within a BSM/APM solution.
The Six Sigma Define, Measure, Analyze, Improve, and Control (DMAIC) process is comprised of five phases: Defining the services or components that are critical to quality, measuring their behaviors, analyzing those measurements with an eye towards finding areas of gap, implementing and validating improvements, and finally, building the controlling structures that ensure the improvement remains in place over time.
Figure 9.11 highlights each of these five phases as well as some of the common activities that are accomplished during each phase. Important to recognize here is that each of the activities noted in Figure 9.11 can actually be augmented through the data provided by a BSM/APM solution. For example, sampling data can be gathered through APM monitoring integrations. That data can be used to create a baseline of configuration and behaviors, which is then continually measured through the same APM monitors.
Figure 9.11: The activities and phases of Six Sigma.
Those behaviors can then be analyzed for poor quality and associated costs, creating failure mode effect and Pareto charts to identify areas of highest impact. Ultimately, gaps can be identified and improved upon, with the same BSM/APM data validating the positive impact of the improvement. Similar to the process improvement example with ITIL, BSM/APM quality data establishes the metric by which all improvement activities are ultimately measured against.
BSM's bottom line arrives with its unique capability to convert technology metrics into dollars and cents impacts. By seeing how the impact of technology behaviors changes the business bottom line, technologists and business leaders alike are more empowered to make smart decisions about service delivery.
Most importantly, business processes are the backbone of a well‐managed business. Their efficient completion ensures that business activities are executed properly and ultimately drive value—rather than cost—to the business. The historical problem, however, with business processes has been their integration into IT technologies. Too often in immature IT organizations, business processes are forced to function within the capabilities of the IT infrastructure rather than the opposite. In the most egregious of examples, business processes are simply not fulfilled by the technologies that are deployed by the IT organization.
Alignment between your IT processes and your business processes is critical to transforming IT from its traditional role as business cost center into a new role of business partner. As should be obvious in this chapter, the right data and the right solutions enable you to do just that.
This chapter effectively concludes this guide's discussion on APM. Its focus on the business side of technology is fundamentally critical, as every IT organization is a function of its business, and that most businesses today can't function without their IT organizations. In the end, the data‐driven approach that such an organization gains through the implementation of an APM solution gives a far greater situational awareness of the technology environment than domain‐focused point solutions.
Although this chapter concludes the discussion, it does not conclude the guide. The final chapter, Chapter 10, arrives as a sort of primer on the topics discussed throughout this entire guide. It summarizes the important points discussed in each chapter and is intended to be the handout you can use to educate others about what you've learned in the other nine chapters. Most importantly, Chapter 10 serves as a more concise explanation that you can deliver to the decision makers in your organization should you determine that an APM solution is a necessary addition to your IT environment.