A Guide to Implementing Analytics in IT

Introduction

The need for analytics in all facets of IT management is widely acknowledged owing to the many benefits it offers, including minimized business downtime, improved service quality, optimized resource utilization, and maximized ROI on IT assets.

However, implementing an analytics system can be challenging. It requires clear vision and strategy, along with a good understanding of what business intelligence and data analytics entail. This guide illustrates the detailed steps needed to implement an analytics solution, and provides guidelines for scaling analytics to all sub-departments of IT.

Challenges in implementing analytics for IT

IT is an umbrella term that encompasses many sub-departments such as IT service management (ITSM), infrastructure and IT operations management (ITOM), change and project management, endpoint management, and security, depending on the expanse of an organization's IT ecosystem. And therein lies the problem. IT infrastructures can be so complex and yet compartmentalized that the one-size-fits-all approach to

implementing analytics just won't work. The best approach is to implement analytics in a slow and phased manner, advancing from one department to the next.

In this e-book, we shall explore steps to implement analytics in IT, specifically for the two main sections of IT, ITSM and ITOM. For simplicity, we've picked improving SLAs as the objective for both ITSM and ITOM.

Implementing analytics for ITSM

ITSM encompasses all systems, processes, and practices employed to deliver IT services to end users. Here are the steps to implement analytics in ITSM.

Step 1: Clearly define your goals

Before you embark on your analytics journey, decide on a set of well-defined goals that are practical, measurable, and time-bound. For the purpose of this e-book, as mentioned earlier, we will consider improving SLA compliance as our goal.

Step 2: Determine key performance metrics (KPIs)

Once your goals are set, define how success is going to be measured using metrics and KPIs that are relevant to your goals. The best way to see how any organization is honoring SLAs is to look into the performance indicator "percentage of SLA compliance." Step 3: Check current status

The next step is to conduct a reality check on your current SLAs. This will indicate if you are ahead of or off your projected path. The SLA compliance trend report paints a clear picture of the percentage of requests in compliance with SLAs over the past few months.

SLA compliance trend

In August, the percentage of SLA compliance was much lower than in the past. The next step is to figure out why.

Step 4: Leverage analytics to gain insights

Establishing KPIs for success won't help if you aren't willing to dig deeper to identify and weed out problems. Additionally, deep diving into data can help streamline processes, accelerate growth, and uncover new opportunities. Let's look at our example: Through experience, we know that requests miss their resolution SLAs due to many reasons, such as an increase in request volume, unrealistic SLA configurations, large technician workload, frequent technician reassignments, or a lack of knowledge base articles. In this step, leveraging analytics can help you narrow down the cause.

Using analytics, let's assess a few possible reasons for a decrease in SLAs.

  • Spikes in request volume can throw technicians off balance, causing your SLAs to take a hit. The Request volume trend report shows that the request volume has been fairly consistent for the past 12 months.

Request volume trend

  • Uneven distribution of the workload could put too much weight on a few technicians, making it difficult for the service desk to keep up with SLAs. Let's compare the workload shared by the various technicians.

The Technician load factor report clarifies that there's not much difference in the workload shared by the various technicians.

  • Up-to-date solutions promote usage among users and technicians, and help reduce the time taken to resolve requests. The Solution effectiveness report can help you measure knowledge base usage by displaying the frequency of usage and the knowledge base's usefulness in resolving requests.

Solution effectiveness

The above report shows that a steady number of requests are resolved using KB solutions, proving that the solutions in the repository are quite effective.

  • Tickets often change hands due to technicians' other priorities, lack of know-how, or scheduling conflicts. This increases the resolution time of tickets. Let's check how tickets are shuffled among technicians.

Avg. number of technician changes per request

In our case it looks like there was an increase in the average number of technician changes per request in August. Frequent technician changes are known to increase the overall resolution time of requests, causing the technicians to disregard SLAs; this phenomena could reduce the overall SLA compliance rates.

Step 5: Monitor progress

Revisiting SLA compliance rates periodically ensures you make a positive impact on your SLA compliance percentage, and helps you visually spot deviations from the planned path. The SLA compliance trend report compares current SLA compliance percentage with that from three months ago for various request categories.

SLA compliance trend (Current vs. three months ago)

Step 6: Iterate goals

The final step in implementing analytics for the IT service desk is to repeat the goals created in the first step. Here you can either raise the bar for your SLAs a little higher, or focus on other areas of service management, such as decreasing resolution times, increasing customer satisfaction levels, or improving service delivery.

Implementing analytics for ITOM

Operations monitoring tools churn out enormous volumes of data that, if tracked and analyzed efficiently, can unlock insights of tremendous value. Additionally, analyzing operational data can help network operations center (NOC) teams gain foresight on outages, breakdowns, and service disruptions, buying them sufficient time to proactively approach these issues.

However, the challenge is to determine how to implement analytics in ITOM. The following steps can help.

Step 1: Clearly define your goals

Well-defined goals help streamline actions leading to your objective. In this step, let's consider achieving 99 percent service availability as our primary goal.

Step 2: Determine KPIs

The next step is to track your goals and review your progress. In our case, we need to track our SLA compliance level, which tells us the percentage of total time the services were up. The SLA compliance trend report below paints a clear picture of SLA compliance levels in the past 12 months.

SLA compliance trend

Step 3: Leverage analytics for forensic analysis

In the previous step, we discussed the KPI that can help you track your progress. The next step is to deploy forensic analysis to understand what drives performance and identify performance gaps. This is a crucial step for operations management because once you identify actions that drag down performance, you can then prescribe remedial measures to improve performance.

In above example, there's a noticeable dip in the SLA compliance percentage in May and June. Deeper analysis will reveal the possible reasons for this dip. Some of the common reasons why SLAs could experience such a drastic downturn include problematic network devices in the production environment, non-availability of technicians to resolve alarms, or high mean time to resolution (MTTR).

Here's a report that shows the priority-wise distribution of alarms among available technicians.

Technician-wise alarm ownership

As per the report, a few technicians are handling the bulk of incoming alarms, while the remaining technicians are working on relatively few alarms. This indicates an imbalance in technician utilization. An extremely high workload for technicians eventually causes burnout, and in turn, can result in technician turnover. NOC teams should always aim to keep the workload distributed evenly among technicians. So the insight gained from this report indicates that a few alarms should be reassigned to balance the workload.

Here's another report that shows the alarms generated due to problematic network devices. You can see certain devices raising alarms repeatedly. All these alarms add to your technicians' workload, but you can easily avoid this problem by scheduling regular maintenance or developing a replacement plan for problematic devices. These activities will go a long way in freeing up your technicians' time so they can focus on other important or high-priority alarms.

Problematic devices

Step 4: Monitor progress

Once the root cause of the problem is identified and rectified, the next step is to monitor progress on a regular basis to ensure the applied fix is effective and to keep you on track to attain established goals.

The report below comparing the historical trend of alarms shows a decrease in alarms for August.

Overall alarm trend

You can verify your progress by comparing the current SLA compliance percentages against the earlier months, as given in the report below.

SLA compliance trend

Step 5: Iterate goals

Establishing analytics for IT operations is important, and the key to implementing it successfully in your organization is to monitor operational metrics and leverage operations analytics continuously. This helps obtain insights that can consistently improve IT operations, and optimize operations management procedures and processes to build a cohesive IT engine. In our case, you can select the next set of goals, such as to create early warning signs of outages, or create dynamic thresholds for monitoring systems.

Implementing analytics in IT: A continuous process

Implementing analytics for IT does not end with putting together a few reports and dashboards for ITSM and ITOM alone. Analytics works best when you feed information from all facets of your IT. As more sub-departments are created in IT, it's important to extend analytics to those departments as well. This helps to identify and establish a relationship tree among outages, services, resources, performance, and people to gain deeper visibility and an understanding of IT, predict future outcomes, and continuously optimize resources and processes to achieve better results.

About Analytics Plus

Analytics Plus is self-service IT analytics software that lets you visualize your IT data in the form of colorful charts, reports, and dashboards. It offers out-of-the-box integrations with ServiceDesk Plus and other ManageEngine tools that help you get an in-depth look at your IT infrastructure. It features a simple drag-and-drop reporting interface that eliminates the need for a data analyst to help your help desk managers optimize operations and improve service delivery.