I still remember that day the boss walked into my office. And I still remember her fateful yet enticing words that talked me into tackling the project, The Project That Would Change Everything. "You're the Scripting Guy. You'll make it work," she said. I agreed, and the rest is history.
You're probably familiar with this story: Big projects in IT start that way. Big projects that may seem great on paper but then get very, very complex as they're developed. Big projects whose multiple applications across multiple platforms require substantial integration effort.
But this story isn't only about that one big project. It's also about all the other little automations—scripts for Windows and Oracle and Active Directory (AD), SQL packages, Linux cron jobs, and so on—that creep into every IT environment over time. Those automations absolutely solve a business' immediate needs. But without central management, they also come with a cost. That cost arises as scripts age, technology changes, and the script owners relocate or leave the company.
Back then, I was known as The Scripting Guy. If you needed a quick data transformation, or a scheduled movement of files from one system to another, I was your go‐to person. I had developed a command of the major scripting languages, along with all the other necessary add‐ons one needed to be The Guy. Over the course of several years, my integration prowess had grown to include platform‐ and application‐specific technologies such as WMI, ADSI, SQL, some Oracle, and even a bit of Linux and IBM AIX.
I put that knowledge to what I thought was good use. Over the years, I had gotten to the point where much of my daily responsibility was automated…sort of. Sometimes my little automations broke. Sometimes they were accidentally deleted or otherwise wiped out through the regular change that happens in any data center. Sometimes their need went away, or a server's configuration was updated, and as a result, I had to go find them once again and remember what they were intended to do.
The result was a not‐very‐well‐oiled machine; one that created even bigger problems the day I moved on to a new employer. You see, I had little automations scattered around the company servers with my name on them. Last I heard, they're still finding them years later, usually after some process breaks that nobody ever knew existed.
You can probably guess what we were missing. We needed enterprise IT job scheduling. That's why I'm writing this book, to explain what this approach is and help you realize you could probably use it as well. Throughout this book, I intend to return back to that big project's story along with a few of the other little ones to show you why.
This book wouldn't exist if every IT technology seamlessly talked with each other, transferring data, events, and instructions across platforms and applications with ease. If every technology could perfectly schedule its activities with itself and others, you wouldn't be reading these pages.
But you are, and consequently this guide indeed exists.
It exists because IT job scheduling is a task that every enterprise needs, as do many small and midsize organizations as well. "Jobs" in this sense represent those little packages of automation I discussed earlier. Some work with a single system. Others integrate the services of multiple systems to create a kind of "mixed workload" that produces a result your business needs.
Consider a few of those jobs you're probably already creating today:
Your jobs might be less complex, working with only a single system or application. Or they might be exceedingly so, requiring the participation of multiple applications across different platforms in different parts of the world. At issue here isn't necessarily how complex your jobs are. The "simple" ones have many of the same requirements and necessitate as much due diligence as the "complex" ones. Rather, the issue has more to do with the workflow that surrounds those jobs, and the solutions you implement to manage, monitor, audit, and otherwise keep tabs on every activity at once.
It also has to do with the very different languages and techniques that each IT technology uses and requires. Those differences represent a big headache inside today's heterogeneous data centers. Your IT operating environment surely has Windows systems. But it probably also has Linux, Oracle, HP‐UX, Solaris, and others. You probably need to transfer XML documents, DOCX files, and XLSX spreadsheets over multiple protocols like SMB, FTP, and SSH. Even your monitoring comes in many flavors: SNMP for switches and routers, WMI for Windows systems, and all the various UNIX and Linux widgets for keeping tabs on their activities.
You can imagine just a few of the headaches these radical differences in applications and platforms create:
Solving these five problems is the primary mission of an IT job scheduling solution. From a central location (see Figure 1.1), an IT job scheduling solution creates a platform on which to run all your little "automation packages" that might otherwise be spread across technologies. Using a centralized solution, a database job, a UNIX/Linux job, and an FTP job are all parts of the same management framework. All begin their execution from the same place, and all are managed and monitored from that single location.
Figure 1.1: An IT job scheduling solution centralizes jobs of all types, across all platforms and applications.
As you can probably guess, such a solution has to be exceptionally comprehensive. A solution that works for your company not only needs to support your management, monitoring, and workflow needs. It must be more than just a more‐powerful version of the Windows Task Manager. It also needs to support the integrations into every OS, platform, and technology that your business processes incorporate. That's why finding a solution for automating IT job scheduling can be such a challenging activity.
To help you out, you can consider this guide to be a kind of automation "idea factory." Its four chapters will present you with questions to ask yourself, helping you frame your need for a job scheduling solution. It delivers a set of real‐world use cases for seeing scheduling in action. It deconstructs an IT job so that you can peer inside its internal machinery and understand the power of a centralized solution. And it will conclude with a checklist of requirements you should consider when seeking the software that creates your solution. I'll be your guide, and throughout this process I'll share a few of my own stories to bring some real‐world experience into this complex topic.
Note: In this book, you'll hear me use the term job scheduling. Another commonlyused term for job scheduling is workload automation. For the purposes of this book, you can assume that the two are interchangeable.
Now, back to my story from long ago. Every OS and application comes equipped with multiple ways to perform its core functions. You already know this. An OS includes one or more scripting languages to enact change and read data. Every modern database has its own scheduling and automation functions, enabling the creation of packages for inserting and selecting data. Even middleware technologies and applications have their own APIs, which can be interfaced either inside or outside the application.
But the internal languages and automations that come with a product are rarely equipped to handle actions outside that product. Ever try to use an XML document to instruct a SQL Server to update an Oracle database row so that an SAP application can provision a process to an AIX mainframe? Whew! That's pretty close to the situation I experienced as I started on The Project That Would Change Everything.
Let's start with a little background. Why that project was needed is really unimportant, as is what we were doing with its data. What is important, however, are the interconnections between each of its disparate elements. Multiple applications running atop more than one OS, integrating with different databases, and requiring data from both inside and outside the organization was just the start.
To get going, I attempted to diagram its components, creating something close to what you see in Figure 1.2. At a high level, this system was constructed to aggregate a set of data from outside our organization with another set on the inside. Our problem was the many different locations where that data needed to go.
Figure 1.2: My unfriendly application.
Let me break down the mass of arrows you see in Figure 1.2. The flow of data in this system started via an FTP from an external data source. That data, along with all its metadata, needed to be stored in a single, centralized SQL database. There, permissions from Windows AD would be applied to various parts of the data set. Some data was appropriate for certain users, with other data restricted to only a certain few. Information inside the FTP data stream would identify who should have access to what.
Users could interact with that data through a Microsoft IIS server running a homegrown Web application. That Web application used XML to transfer data to and from the SQL database. Certain types of data also needed to be added to our company SAP system running atop Oracle, requiring data transformations and delivery between those two systems.
Occasionally, portions of that data would need to be ingested into a UNIX mainframe for further processing. There, it would be consolidated with data from other locations for greater use elsewhere in the company. An email server would ensure users were notified about updates, new data sets, and other system‐wide notifications.
That's a lot of arrows, and each of those arrows represent an integration that needs to be laid into place in order for the entire system to function. Each arrow also represents an activity that needs to happen at a particular moment in time. Data heading towards the Oracle database obviously couldn't be scheduled to go there until it was actually received at the SQL Server system. Users shouldn't be notified unless something important to them was actually processed. Just the scheduling surrounding each arrow's integration was a complex task unto itself.
Does this look like one or more of the systems that are currently in your data center? If you're doing much with data transformation and movement, you might have the same scheduling headaches yourself. That's why there are four critical points that are important to recognize:
It is the combination of these four realizations that helped me understand that I needed to step outside my application‐specific mindset. It helped me realize I needed to look to solutions that schedule activities across every platform and every application. That's when I started looking into enterprise IT job scheduling solutions.
Let's now take a step back from the storyline and think for a minute about what IT job scheduling should be. I've already suggested that a "job" represents some sort of automation that occurs within an IT system. But let's get technical with that definition. I submit that an IT job represents an action to be executed. An IT job might be running a batch file or script file. It might be running a shell command. It could also be the execution of a database job or transformation. Essentially, anything that enacts a change on a system is wrapped into this object we'll call a job.
Using an object‐oriented approach, it makes sense to consolidate individual actions into separate jobs. This single‐action‐per‐job approach ensures that jobs are re‐usable elsewhere and for other purposes. It means that I can create a job called "Connect to Oracle Database" and use that job any time I need to make an Oracle database connection anywhere.
Now if each job accomplishes one thing, this means that I can string together multiple jobs to fully complete some kind of action. I'll call that string an IT plan. A plan represents a series of related jobs that can be executed with the intended goal of carrying out some change. Figure 1.3 shows a graphical representation of how this might work.
Figure 1.3: Multiple jobs are connected to create a plan.
In Figure 1.3, you can see how three different jobs are connected to create the plan. Job 27 connects to an Oracle database. It passes its result to Job 19, which then extracts a set of data from that database. Once extracted, the data needs to be sent somewhere. Job 42 completes that task, as it FTPs the data to a location somewhere.
There's obviously an art to creating good jobs. That's a topic that I'll discuss in greater detail in Chapter 3, but I need to introduce some of the basics here. A good job, for example, might not necessarily have any specific data or hard information that's stored inside the job. Rather than a connection string to a specific server for Job 27, a much better approach would be to use some kind of variable instead.
Developers use the techie term parameterization to represent this generalizing of job objects and the subsequent application of variables at their execution. Figure 1.4 shows how a parameterized plan can link three generic jobs. At the point this plan is run, those jobs are fed the variable information they need to connect to the right database, extract the right data, and eventually pass it on to the correct FTP site.
Figure 1.4: Feeding parameters to jobs in a plan.
By parameterizing the plan in this way, I now get reusability of the plan in addition to all the individual jobs that make up that plan. Should I down the road need to attach to a different database somewhere, pull off a different set of data, and send it to some other FTP site, I can accomplish this by reusing the plan and modifying its variable information. That's reusability on top of reusability!
There's obviously quite a bit more to this whole concept of working with jobs and plans. I'll spend more time in Chapter 3 helping you understand the various characteristics that can be assigned to a job and a plan as well as other objects a typical IT job scheduling solution will use.
But there is one characteristic that merits attention before moving on. That characteristic is the schedule itself, which needs to be applied to the object to tell it when to run. I mentioned earlier that scheduling for large systems like The Project requires a kind of flexibility you just can't get by looking at the clock on the wall. Rather, the kinds of jobs that project needs tend to be more related to actions or state changes that occur within the system.
Let's assume that Figure 1.4's "Plan 7" relates to some data transfer that needs to happen inside The Project. In this case, let's assume that the data transfer occurs between its SQL Server and UNIX mainframe. Figure 1.5 shows a graphical representation of how this might be applied. There, you can see how three different schedules could potentially be attached to the newly‐created plan:
Figure 1.5: Applying a schedule to a plan.
Any of these three schedules can be appropriate, depending on the needs of the system and its components. For example, the first schedule might be appropriate if a daily data dump is all that's necessary. In that case, a date/time‐centric schedule might be all that's necessary to complete the action. Very simple.
The second and third tasks highlight some of the more powerful scheduling options that could also drive the invocation of the plan. In the first, the plan is executed not based on any time of day. Rather, it executes when a set quantity of new data has been added to the database. This could be a smart solution if you want these two databases to stay roughly in sync with each other. It is really powerful when you consider how difficult that kind of scheduling would be to create if you were using just the native SQL or UNIX tools alone.
That third schedule is particularly interesting, because it could be used alone or in combination with the second. That third schedule instructs the plan to run only if the server isn't terribly busy. Using it in combination with the second allows you to maintain a level of synchronization while still throttling the use of the server. A good job scheduling solution will include a wide range of conditions that you can apply to plans to direct when they should kick off.
Again, I'll dive deeper into this deconstruction of an IT workflow's components in Chapter 3. But before you can truly appreciate the power of this modular approach, there are likely a few questions that you're probably asking yourself. If you're not, let me help you out with a list of ten good questions about your own environment that you should probably ask yourself. Your answers to these ten questions will determine whether you'll want to turn the next chapters. If you're experiencing zero headaches with the tools you have today for scheduling your IT activities, you won't need the rest of this book.
Everyone else will.
It wasn't many years ago that one of my jobs was in keeping a set of servers updated. Monthly updates were de rigueur with some on even shorter schedules. Each came with a very short time window when they could and should be applied. The big problem resulted from the fact that these updates typically required a server reboot to get them applied.
At that time, our reboot window was in the wee hours of the morning, many hours past the usual 8‐to‐5 workday. For me, sticking around once a month to complete these updates represented a hardship on self and family. That's why I created my own automation that wrapped around these updates' installation. For my solution, when updates were dropped into a particular location, they were applied at the next window. My mobile device notified me should any problems occur.
From that part on, adding updates to servers meant simply adding them to the right location and making sure my mobile device was near the bedside. Yes, sometimes they'd experience a problem, but those could be fixed through a remote control session. Successful months could go by without loss of sleep or important family time.
Although your IT job scheduling needs might not necessarily go down the path of system update installation, this time (and money) savings becomes an important parable. Computers are designed by nature to be automation machines. Thus, it stands to reason that any manual activity should have an automation‐friendly adjunct. It is that adjunct that can be a part of your greater scheduling solution.
Can you afford to pay the risk of inappropriate execution, forgetfulness, or user error in your critical activities? If not, creating flexible and reusable workflows via an IT job scheduling solution should pay for itself in a very short period of time.
That first question introduces the possibility of three kinds of risk in any manual system. First is the risk of inappropriate execution. Any task that requires manual intervention also introduces the notion that it could be executed at an inappropriate time. Or, more dangerously, such a task could be re‐parameterized to send data to the wrong location or execute it in an inappropriate way. There is a recognizable cost associated with this risk.
I remember a situation where a script was created that would apply a set of data to a specific server upon execution. That script took as parameters a list of servers to send the data. One day, a junior administrator accidentally invoked the script with the "*" wildcard in place of a list of servers. As a result, data was distributed to every server all across the company. That single invocation cost the company significantly to clean up the mess.
Forgetfulness and user error are both additional risks that can be addressed through a job scheduling solution. In such a solution, jobs and plans are run within the confines of the system and its security model. Dangerous jobs can be specifically restricted against certain individuals or execution models. Centralizing your job execution security under a single model protects the environment against all three of these costly manual errors.
You probably have monitoring in place to watch servers. You've probably got similar monitoring for network components, perhaps even as part of the same product. But does your data center also leverage a unified heads‐up display for monitoring jobs along with their execution success? A failure in a job can cause the same kinds of outages and service losses as a failure in the network or its servers.
If your business systems interconnect through multiple scheduling utilities across multiple products and platforms, there usually isn't a way to centralize all those activities under one pane of glass. What you need is the same kind of monitoring for IT jobs that you've already got in place for your other components.
As you can see in Figure 1.6's mock‐up, you can get that by using a centralized approach.
There, every action across every system and application is centralized into a single screen.
Determining which jobs ran successfully is accomplished by looking in one place.
Figure 1.6: Daily activity under one pane of glass.
Your data center environment already has multiple scheduling engines in place today. Nearly every major business service technology comes with its own mechanism for scheduling its activities. In fact, those mechanisms are likely already performing a set of duties for your services.
Yet the problem, as you can see in Figure 1.7, has to do with the languages each of these platform‐specific and application‐specific scheduling tools speaks. SQL, for example, comes equipped with a wide range of tools for manipulating SQL data and SQL Server systems; but how rich are those tools when data needs to exit a SQL Server and end up on a UNIX mainframe?
Figure 1.7: Multiple scheduling engines.
Often, the native tools aren't sufficient, forcing an external solution to bridge the gap. That solution can be in the form of individual "little automations" like the scripts this chapter started with. Or they can be wrapped underneath the banner of a holistic job scheduling solution. Chapter 4 will discuss the capabilities you'll want to look for in the best‐fit solution.
Considering the answer to question 4, some platform‐ and application‐specific scheduling tools indeed include limited cross‐platform support. Their scheduling capabilities may be able to fire jobs based on actions or state changes.
However, one state change that is particularly difficult to measure across platforms is when tasks take too much time. Task idling in a state‐based scheduling system can cause the entire workbook of plans to come to a halt if not properly compensated for. Essentially, this idle time represents when part of when a task does not complete, leaving the next one waiting.
Figure 1.8: Unmanaged task idling can kill a nonautomated workflow.
Idling need not necessarily be a problem within a piece of code or script. It can be simply the waiting that is natural in some types of on‐system activities. For example, not knowing when a person will submit a file or not knowing when a piece of data is ready for the next step in its processing.
These idle states are notoriously difficult to plan for using time‐based scheduling alone. With time‐based scheduling, your jobs are built with no intelligence about changes that occur within a system. Rather, they simply run an action at some set point in time. Your job scheduling solution must include the logic necessary to add that intelligence. As you'll learn in later chapters, that intelligence can occur through event‐based scheduling or triggerbased scheduling. In either of these cases, an on‐system event or trigger recognizes when a change has been made and initiates the next step in processing.
If you haven't yet standardized on an enterprise job scheduling solution, can you honestly say how many tasks are operating everywhere in your data center? I used to think I knew where all of them were in that former job of mine. But then I left, and took with me the sum total of that knowledge. As I mentioned at the beginning of this chapter, those little automations are still being found years later—often after one of them breaks and causes downtime. More importantly, those are only my scripts. There were others in that company as well with scripts of their own that probably eventually got lost.
A centralized job scheduling solution creates a single point of control for automation. It enables auditors and IT teams to know where changes are being sourced from. It is essentially a single point of control, which makes auditors, security officers, and the troubleshooting administrators very happy.
If you can't, you how can you correlate issues across those teams, platforms, and applications? If you can't, troubleshooting becomes a game of finger‐pointing and proving why not.
I was once told a story about a company in real need of an enterprise‐wide job scheduling solution. Their business system was much like The Project in that it involved multiple technologies across some very different platforms. Like The Project, managing that system fell to a somewhat distributed group of individuals. SQL Server was managed by the SQL Server team. SAP was administered by SAP administrators. Even the AD had its own group of people responsible for its daily care and feeding.
The problem in this company was not necessarily its application‐specific scheduling tools. It was in its people. Those widely‐distributed people feared the centralization that a job scheduling solution brings. That fear in part was due to the usual technologist's fear of centralization, but it was also a result of the assumption that a centralized tool would mean re‐creating SQL, SAP, AD, and other jobs on a new and completely different system outside their direct control.
An effective enterprise job scheduling solution shouldn't require the complete re‐creation of existing jobs within each platform and application. Recall that a job itself represents the change that is to be made, the individual script or package that must be executed. A job scheduling solution represents the wrapper around that invoked action.
This story ends as you'd expect, after a very small but very major problem in one subsystem impacted the system as a whole. Fully unable to track down that minor job with a major impact, the company discovered why centralizing is a good idea.
I once built my own scheduling system in the now‐ancient scripting language of VBScript. VBScript is still in use many places, and it has a long history. But it's known for not having superior built‐in methods for scheduling activities. That said, its scheduler worked fine for the task I assigned to it. But the next time I needed a scheduler, I found myself reinventing the wheel. Even with the limited code modularization VBScript can present into a script, my scheduler's reusability was very limited.
Imagine having to replicate that scheduling across multiple applications and platforms using different languages—and even using different approaches, both object‐oriented and structured. Homegrown schedulers are indeed an acceptable way of handling the triggering needs of individual scripts and packages; however, a global scheduler that works across all jobs and plans obviously creates a superior framework for job execution.
More importantly, the human resources that are necessary to keep a homegrown job scheduler can be much greater than they seem at first blush. Those resources need to keep an eye on the logical code itself alongside the jobs that the scheduler attempts to run. In many cases, you'll find that the extra costs associated with creating your own scheduler include the fact that this project will take away from time that could be better served by working on other more value‐oriented projects.
Nowhere is that value‐add more pertinent than when two jobs rely on each other. This kind of job construction happens all the time within distributed systems. In it, the first task in a string completes with a set of data. That data is needed by the next task in the string. This sharing of information can be handled through file drop‐boxes or richer mechanisms like Microsoft Message Queue or database triggers.
Like with the problem of multiple languages, these queuing solutions tend towards being very platform‐centric. It becomes very difficult using a database trigger to invoke an action in AD, for example. A centralized job scheduling solution with rich support for applications will become the central point of control for all cross‐task action linking.
Last is the handling of error messages in custom‐coded scripts, a process that itself consumes a vast quantity of script development time. Errors are notoriously difficult to track down, and become even more challenging when scripts need to span platforms and applications. Error handling requires special skills in trapping variables and determining their intended and actual values. All of these activities grow even more difficult when scripts are run automatically as opposed to interactively because error messages in many cases cannot be captured. Chapter 3 will go into greater detail on the error‐handling functionality of a good job scheduler. For now, recognize that a homegrown script without error handling is ripe for troubleshooting headaches down the road.
So do you have good answers to these questions? Do you feel that your existing scheduling tools bring you zero headaches? If yes, then thanks for reading. If no, then it is likely that your next thoughts will be toward the types of IT challenges that an enterprise job scheduling solution can support. With the basics discussed and the initial questions answered, the next chapter introduces seven real‐world use cases for automating IT job scheduling. It should be an interesting read because the types of use cases outlined in that chapter are probably pretty close to those you've already got under management today.
Find me the business that runs atop a single application—a single instance!—and I'll show you a business that doesn't need IT job scheduling. Everyone else probably does.
In fact, most data centers have far more. Your average midsize data center runs applications for handling its databases, along with middleware systems for processing the data. That data center requires servers and protocols for staging of data in and out of the organization. Applications run atop client/server operating systems (OSs) and mainframes, servers, and perhaps even a few desktops. All of these elements need to communicate with each other, many don't share the same OS, and all suffer under the management complexities brought about by product‐specific toolsets.
Today's IT technologies are fantastic in the business processes they automate, but rare are two that seamlessly talk with each other. Rarer still is the IT product that is superior all by itself in creating and scheduling workflows that meet business requirements. Needed to integrate activities among disparate technologies is a central solution that can interact with each at once.
An IT job scheduling solution is that Rosetta Stone between different platforms, OSs, and applications. It is intended to be the data center's solution for converting raw technology into business processes. In this book, I hope to show you how to incorporate such a solution into your own business.
You've already experienced a taste of how an IT job scheduling solution might work. Chapter 1 was constructed to help you recognize that job scheduling is a service your IT organization probably needs. That said, Chapter 1's discussion intentionally stayed at a high level. You haven't yet explored deeply the features and capabilities such a solution might bring.
You won't get that deep dive in this chapter either. That's because I've found that the best explanation of IT job scheduling requires first a look at the problems it intends to solve. Once you understand where it fits, you'll then appreciate the logic behind its behaviors. It is my hope that by the conclusion of Chapter 1 you began nodding your head, affirming that this purported solution is something your data center desperately needs.
My task now is to further enlighten you with a series of ideas to help you find that best fit. These ideas will take place in the form of a seven use cases; essentially, seven little "stories" about issues that have been resolved—or made easier—through the incorporation of a job scheduling solution. These stories themselves will be mostly fictitious but are based on real events and real problems. I'll use faux names to keep the narrative interesting.
There's an important point here. Even if some portion of these stories is made up, you should find that the problems and solutions in each aren't far from those you're experiencing.
The first of these stories has nothing to do with a customer‐facing solution. Neither is it directly related to a line‐of‐business application. Rather, the first of these stories starts simple. It explains the administrative situation at Company A, a mature company with a procedurally‐immature IT organization. Lacking many centralized processes, operating with marginally‐effective change and configuration control, and managed by five different administrators, Company A's data center is a mish‐mash of fiefdoms and technology silos. Problem is, these fiefdoms need to communicate with each other, even if their managing IT administrators won't.
John, Bob, Jane, Sara, and Jim are those five IT administrators (see Figure 2.1). Each is responsible for some portion of the data center infrastructure, with each having some overlap of responsibilities. To accomplish administration, they've created scripts, tasks, and packages that keep the individual business workflows running. Those automations indeed enact change on servers and get data moved from system to system but with no interconnection of intelligence.
Figure 2.1: An interconnection of automations.
Figure 2.1 explains this problem in graphical form. In it, you can see that individual automations are sourced without considering their context. If John creates a job, his information cannot be based on instrumentation that is gained through another created by Sara. As a result, there is no way to orchestrate the activities between each individual, no way to schedule activities so that they do not conflict, and no way to base information or scheduling from one off of the results of another.
A much better solution is in aggregating these five people's automations into a single and centralized solution. Through that single solution, each administrator's jobs can be seen by the others. The jobs of each person can also be aligned with the needs of the others to ensure resources aren't oversubscribed. Additionally, because jobs are collocated in a single location, information and instrumentation from any automation can be used to drive other automations—or feed into their future scheduling.
Figure 2.2: Sourcing automations through an IT job scheduling solution.
In short, even if your automations are administrative in nature, an IT job scheduling solution can bring substantial benefit.
Yet an IT job scheduling solution isn't solely about its actors. In fact, in many ways, the actors can be one of that solution's least‐important impacts. An IT job scheduling solution really has more do with the data in a data center. That's why the second story in this chapter deals with the different applications that are used by Company B.
Different from the IT administration example told in the previous story, Company B's story centers around their line‐of‐business (LOB) application. That LOB application is comprised of several components, each of which is represented in Figure 2.3. Transactions among these systems occur through a carefully choreographed set of tasks, jobs, packages, and workflows. As you can probably imagine, the system in aggregate crosses Windows and UNIX boundaries, and includes multiple database management systems and even a bit of middleware. It is the classic business service.
Figure 2.3: Individual schedulers for each individual application.
All of these individual components enable the functionality of the LOB application. But all also leverage their own built‐in toolsets for scheduling activities: The SQL server runs its
SSIS packages, the Linux SAP server runs its own tasks and cron jobs, and even the Informatica server enacts change through its workflows.
You are correct in assuming that this one‐scheduler‐per‐component configuration can indeed work for many systems. Data and actions that occur inside Informatica can be based on wall‐clock time or other schedule characteristics. The SQL database can run its SSIS packages based on its own settings, and so on. However, like the actors in the first story, this environment is likely to experience problems as individual system activities conflict with those on other systems.
Contrast this situation to the superiority in design one gets through job consolidation. In Figure 2.4, the individual task schedulers in that same LOB application have been replaced by a single and centralized IT job scheduling solution. This is possible because, as I mentioned in Chapter 1, a primary benefit of such a solution is its ability to speak the language of every application in the business service.
Figure 2.4: Consolidating tasks across applications.
With that centralization of data and actions comes an enhancement to job scheduling, based on results or data in other jobs. This chapter's sixth story will explain in greater detail how triggering capability dramatically improves service performance; but know here that centralization of scheduling brings to bear greater instrumentation about the health of jobs across the entire application infrastructure.
John is an Oracle DBA who has been with Company C since the very beginning. In his role as database administrator, he built the company's business system infrastructure nearly from the ground up. As a result, he understands those systems inside and out: He has tuned the system over time to improve its performance and weed out non‐optimizations. He's built numerous scripts and other automations that gather data, translate it, process it between application components, and present reports for review by stakeholders. That system is critical to the company. It provides important revenue data for its sales teams and executives. It also means a lot to John.
Then one day the company grew. Substantially. Overnight. Acquiring a completely new line of business, Company C suddenly found its internal IT systems insufficient to handle the new reality of work and its associated data needs.
John was approached shortly after the merger by some very important people in the now larger company. Those people recognized his strengths in creating and managing the original revenue system that brought much value to the smaller company. They wanted another system, "…just like the first one, but this time for selling sprockets instead of cogs." Graciously accepting the offer to improve the company and fortify his resume, John immediately realized that simply replicating the original system would not be a trivial task. Although his scripts absolutely did everything requested of them in the original system, they were also hard‐coded into that original system. Its database architecture was designed to deal with cogs. His transformations were cog‐based in nature. Even the server names and script names were hard‐coded into each individual script, task, and package. Worse yet, there were hundreds of tiny automations spread everywhere.
Translating even a simple database job from cogs to sprockets, like what you see graphically in Figure 2.5, would take months of detective work, recoding, and regression testing. John was in for a great deal of work, and the result might not be as seamlessly valuable as his original system.
Figure 2.5: Replicating an automation to a completely new system.
Had John's scripts, tasks, and packages instead been objects within a central job management system (see Figure 2.6), this new company need might not be fraught with so much risk.
Figure 2.6: Objects remain objects as they're translated into a new system.
Recall Chapter 1's conversation about how good IT jobs are those that are coded for reusability. IT plans are then constructed out of individual job objects to enact change. In good jobs, variables are used to abstract things like server names and script names— sprockets and cogs, if you will—so that entire plans can be re‐baselined to new systems with a minimum of detective work, recoding, and regression testing. An IT job scheduling solution takes the risk out of business expansion, giving IT the flexibility to augment services as the business needs.
Company D's current situation is a product of its own success. Starting as a small organization with a single mission, their focus has changed and evolved over the years as lines of business come and go. Indeed, even whole businesses have been grafted on and later spun off as the winds shift in Company D's industry.
As a result, Company D suffers under many of the problems you would associate with any classic enterprise company. It has thousands of applications under management, some of which are used by only a very few people. Some homegrown solutions only remain because they were coded years ago to solve a specific problem for a specific need that has not changed.
Being a company that is more like the summation of lots of little companies, these business applications are nowhere near homogeneous. One budgeting application might store its data in SQL, another in Oracle, a third in some obscure database language spoken only by IT professionals long past retirement. Tying these applications and their data together is a big job with big consequences to the organization.
Many enterprise organizations that rely on disparate databases leverage Business Intelligence (BI) solutions like Crystal Reports. These BI solutions aggregate information across the different architectures. Using BI tools like Crystal Reports, data in an Oracle format can be compared and calculated against data in a SQL format, and so on. These tools come equipped with rich integrations, enabling them to interconnect nearly all database formats all at once (see Figure 2.7). Company D uses Crystal Reports to gather budgetary data across business units and individual project teams.
Figure 2.7: Connecting solutions like Crystal Reports to multiple databases.
Yet Figure 2.7 doesn't fully show the reality of how Company D's data is generated. Before that data ever becomes something tangible that can be ingested into a BI solution, it starts its life inside any number of down‐level systems. Figure 2.8 shows just a few of those underlying systems, all of which integrate to create the kind of data a BI solution desires to manipulate.
Figure 2.8: Underlying jobs make Business Intelligence data usable.
Notice in Figure 2.8 how a portion of the first business unit's information comes from a partner company external to the data center. The second and third business units have projects in combination that require orchestration and synchronization between databases. The second business unit has further integrations into an e‐commerce server in order to gather a full picture of budget levels.
BI solutions can indeed present a more‐unified view of data across different platforms, but they do not provide a mechanism to unify transactions between down‐level systems. For Company D, creating that unified workflow lies within the realm of an enterprise IT job scheduling system. If gathering data is more complicated than simply gathering data, a job scheduling system ties together the entire system to accomplish what you really need.
Company E had a big problem not long ago when they began extending customer services onto the Internet. They quickly found out that you can indeed interact with customers there, but creating a holistic system that gets customer data in all its various types into the hands of the right person isn't as easy as it looks.
What Company E found out is that dealing with customers over the Internet automatically creates a lot of data. That data arrives in various formats, with each format requiring a different mechanism for handling.
You can see a graphical representation of Company E's story in Figure 2.9. Internet customers that desired services interacted primarily through a Web server. Inside that Web server was contained the requisite logic to inform customers about products, and interact with them as purchases are made. Interesting about Company E is that their system involved two‐way communication with customers not only via the Web site and email but also in the transfer of data files.
Figure 2.9: Different data formats require different data handling.
Data files, as you know, are much different than XML files in Web transactions or emails back and forth through an email server. They're larger, they can come in many different formats which create particular issues when you're working with unstructured customers, and the management tools to work with them don't necessarily integrate well into other formats and workflows.
Company E needed a multipart mechanism to solve their formatting workflow problem. They needed to recognize when an order was placed, generate an FTP URL for the user to upload data, move that uploaded data from a low‐security FTP server to high‐security database server, and finally notify the user when the transaction was complete. Adding to the complexity, those exact same steps were required in reverse at the time the order was fulfilled.
You can imagine the protocols and file formats at play here: XML, SMTP, FTP and SFTP, along with a little SSH and SOAP to tie the pieces together, just like you see in Figure 2.10. Complex needs like Company E's require data transfer handling that can support the recognition that files have been downloaded. Such handling can either monitor for a file's presence or use event‐ or message‐based notification. An IT job scheduling solution wraps file transfer logic into the larger workflow, enabling XML to trigger SSH, to fire off SMTP, and finally to invoke SOAP at the point the application requires.
Figure 2.10: Multiple protocols at play, each with its own management.
For complex file transfer needs, those that must exist within a business workflow, IT job scheduling solutions solve the problem without resorting to low‐level development.
This chapter's sixth story brings me back to the one I started in Chapter 1. There I explained some of the introductory pieces in The Project That Would Change Everything. I also explained how an IT job scheduling system was very quickly identified as the only class of solution that could enable the kind of functionality our complex project needed. Let me tell you a little more in that story.
Our determination happened shortly after whiteboarding the various components we knew the project needed. Figure 2.11 shows a re‐enactment of that whiteboarding activity, with only a few of its myriad interconnecting magic marker lines in place. My team knew that when external company data arrived on the FTP server, transferring that data to the SQL server must occur in as close to real‐time as possible.
Figure 2.11: Whiteboarding the triggers for The Project.
Fulfilling that requirement with traditional FTP alone created a fantastically problematic design. The idea of constructing an always‐on FTP session that constantly swept for new data was a ridiculous notion. And it wasn't a good idea for security. Needed was some kind of agent (or, better yet, an agentless solution) that would simply know when data arrived. Then it could provision that data from the FTP server's data storage to the SQL database.
But that wasn't our only challenge. At the same time, our SQL and Oracle databases needed to remain in strong synchronization. Changes to specific values in SQL must replicate to Oracle, also in as real‐time as computationally possible. Synchronizing Oracle with SQL meant also synchronizing metadata with SAP.
Even a single one of these near–real‐time requirements can be challenging for a developer to build. Lacking developers, a development budget, and the desire to complete this project using off‐the‐shelf components, we demanded a solution that would accomplish the task without reverting to low‐level coding.
What we needed were triggers.
Triggers are the real juice in an IT job scheduling solution. The kinds and capabilities of triggers a job scheduling solution supports makes the determination between one that's enterprise ready and one that's not much more than the Windows Task Scheduler.
The Project required a wide range of these triggers to get the job done: Our project's FTPto‐SQL integration required a filebased trigger, kicking off a job or plan when a file appeared at the FTP server. Messagebased triggers were also necessary for the SQL‐toOracle integration, enabling the two applications to notify each other about
synchronization activities. Event triggers were necessary in the Oracle‐to‐SAP integration, allowing Oracle to create events about changes to its state and alerting SAP to make associated changes based on event characteristics. Those same event triggers gave Active Directory (AD) the data to quickly tag permissions into the data. Finally, timebased triggers kicked off occasional data transfers between the SQL database and the UNIX mainframe.
Wildly, simple triggers alone weren't sufficient. All by itself even the best trigger couldn't fulfill the multi‐server and multi‐action real‐time requirement our system demanded. We also needed the ability to tether or "chain" individual triggers together. By chaining triggers, we could speed the process, get data where needed, and ensure the system remained in convergence. I'll talk more about chaining triggers in the next chapter.
Pay careful attention to the triggering capability of your selected IT job scheduling solution. The very best will come with the richest suite of triggering abilities.
Company F was a midsize company with a midsize IT organization. Responsible for all the tasks typically associated with IT, its IT team got along well and generally provided good service to the company at large.
But even the most well‐meaning of IT organizations occasionally makes mistakes, and sometimes those mistakes are large in impact. That's the situation that occurred one day after two administrators began sharing their quiver of scripts, tasks, and packages.
You already know that one of the primary benefits of moving to script‐based automation is consistency in reuse. By packaging a series of tasks into a script, that script can be executed over and over with a known result. Parameterizing those same scripts makes them even more valuable to IT operations. Once created, the same script can be used over and over again across a range of different needs. Yet reusability can sometimes be a risk as well. When a person can simply double‐click a script to activate it, the chance presents itself that that double‐click might happen inappropriately.
That's exactly what was intended the day Sara "borrowed" one of Jane's scripts for use in another system. Jane's scripts were brilliantly designed, smartly parameterized, and well documented. Built right into each script was all the necessary information another administrator would need to reuse the script elsewhere. For Sara, Jane's script perfectly solved her problem at hand.
The central problem, however, was that Sara wasn't really authorized to run Jane's script. In fact, the well‐meaning Sara wasn't supposed to be working on the system at all. When she executed Jane's script, it brought the system down unexpectedly. Company F learned a valuable lesson that day in the openness of simple scripts.
That's why shortly thereafter Company F invested in an IT job scheduling solution for aggregating their automations into a unified store. Unifying automations within a restricted‐run framework enabled Company F to apply privileges to their scripts. Because they chose a best‐in‐class IT job scheduling solution, they were able to not only apply privileges on the scripts themselves but also the jobs, plans, and even variables associated with those scripts. Having correct permissions in place reduces the risk that a Sara will inappropriately execute a script. But, more importantly, it also reduces the risk that a Saratype worker will inappropriately attach the wrong variables to the right script, or the right variables to the wrong script.
Figure 2.12: Applying security at various levels in script execution.
Consolidating IT automations into a job scheduling framework provides visualization of scripts within the enterprise. It adds security to what might otherwise be highly‐dangerous text files. It creates a location where their successful execution can be proven to administrators as well as auditors. And it creates an auditable environment of approved execution that protects the data center.
These seven stories are told to help you understand the value add to an IT job scheduling solution. In them, you've learned how IT job scheduling works for administration as well as complex tasks that might otherwise be relegated to low‐level developers. You've discovered how triggers and file manipulations are as important as database tasks and middleware actions. You've also learned how job scheduling creates that framework for approved execution that your auditors—and, indeed, your entire business—truly appreciate.
Yet as I mentioned as this chapter began, I still haven't dug deeply into the inner workings of jobs themselves. Now that you've gained an appreciation for where job scheduling benefits the data center, let's spend some time discussing a technical deconstruction of an IT workflow. At the conclusion of the next chapter, you'll gain an even greater appreciation for how these job objects and their plans fit perfectly into the needs of your business services.
Computers are useful because they'll perform an activity over and over without fail. The art is in telling them exactly what to do.
I hope that reading Chapter 2 was as enjoyable as writing it. Although I'll admit I took a little literary license in telling its stories, I did so to highlight the use cases where IT job scheduling makes perfect sense. Coordinating administrator activities, consolidating tasks, generalizing workflows, gathering data, orchestrating its transfer, triggering, and security are all important facets of regular data center administration. Yet too often these facets are administered using approaches that don't scale, introduce the potential for error, or can't be linked with other activities. The ultimate desire in each of Chapter 2's stories was the creation of workflow. That workflow absolutely involved each story's actors; but, more importantly, it involved the appropriate handling of those actors' data.
In many ways, workflows, jobs, and plans represent different facets of the same desire: Telling a computer what to do. You can consider them the logical representations of the "little automation packages" I referenced in the first two chapters. Although I spent much of those chapters explaining why they're good for your data center and how they'll benefit your distributed applications, I haven't yet shown you what they might look like.
That's what you'll see in this chapter. In it, you'll get an understanding of how a workflow quantifies an IT activity. You'll also walk through a set of mockups from a model IT job scheduling solution. Those mockups and the story that goes with them is intended to solidify your understanding of how an IT job scheduling solution might look once deployed.
But for now, let's stay at a high level for just a bit more. In doing so, I want to explain how workflows bring quantification to IT activities.
A workflow has been described as a sequence of connected steps. More importantly, a workflow represents an abstraction of real work. It is a model that defines how data gets processed, where it goes, and what actions need to be accomplished with that data throughout its life cycle.
You can find workflows everywhere in business, and not all are technical in nature. Think about the last time you took a day off from work. You know that taking that day off requires first submitting a request. That request requires approval. Once approved, you notify teammates and coworkers of your impending unavailability. In the world of paid time off, you can't just miss a day without following that process.
And yet sometimes people do just miss days. Perhaps they were very sick, or got stuck on the side of the road far from cell phone service. In any of these cases, the workflow breaks down because the process isn't followed. What results is confusion about the person's whereabouts, and extra effort in figuring out what they were responsible for accomplishing during their absence.
You can compare this "people" workflow to the "data" workflows in an IT system. Data in an IT system needs to be handled appropriately. Actions on it must be scheduled with precision. Data must be transferred between systems in a timely manner. Failure states in processing need to be understood and handled. The result in any situation is a system where data and actions can be planned on.
To that end, let's explore further the IT plan first introduced back in Chapter 1. Figure 3.1 gives you a reproduction of the graphic you saw back in that chapter. There, you can see how three jobs have been gathered together to create Plan 7 – Send Data Somewhere.
Figure 3.1: An example IT plan.
I won't explain again what this plan intends to do; the activities should be self‐explanatory. More important is the recognition that this example shows how an IT workflow quantifies an activity along a set of axes: capturability, monitorability and measurability, repeatability and reusability, and finally security. Let's explore each.
I find myself often repeating the statement, "Always remember, computers are deterministic!" Given the same input and processing instructions, they will always produce the same result. Yet even with this assertion, why do they sometimes not produce the result we're looking for?
That problem often centers on how well the established workflow captures the environment's potential states. A well‐designed workflow (and the solution used to create it) must have the ability to capture a system's states and subsequently do something based on what it sees.
I recently heard a story that perfectly highlights this need for capturability. In that story, a company ran numerous mission‐critical databases across more than one database platform. Most of these databases were part of homegrown applications that the company had created over time.
Backing up these databases was a regular chore for the IT department. Although the company's backup solution could indeed complete backups with little administrator input, the configuration of many databases required manual steps for backups to complete correctly. Due to simple human error, those manual steps sometimes weren't completed correctly. With more than 25 databases to manipulate, that human error became the biggest risk in the system. Fixing the problem was accomplished by implementing a solution that could capture the manual portions of the activity into an IT job.
Such capture is only possible when an IT job scheduling solution is richly instrumented. That solution must include the necessary vision into backup solutions, database solutions, and even custom codebases. Vision into every system component means knowing when the task needs accomplishing.
You can't capture something unless you can monitor and measure it. Just as important as visibility into a system is visibility into the workflow surrounding that system. An effective IT job scheduling solution must be able to instrument its own activities so that the job itself can recover from any failure states.
This is of particular importance because most IT jobs don't operate interactively. Once created, tested, and set into production, a typical IT job is expected to accomplish its tasks without further assistance. This autonomy means that well‐designed jobs must include monitoring and measurement components to know when data or actions are different from expected values.
It's easiest to understand this requirement by looking at the simple IT plan in Figure 3.1. Such a workflow is only useful when its activities are measurable. More important, measurement of a plan's logic must occur at multiple points throughout the plan's execution. Figure 3.2 shows how this built‐in validation can be tagged to each phase of the plan's execution. In it, you see how the hand‐off between Job 27 and Job 19 requires measuring the success of the first job. If Job 27 cannot successfully connect to the database, then continuing the plan will be unsuccessful at best and damaging at worst. You don't want bad data being eventually sent via FTP to a remote location.
Figure 3.2: Validation logic ensures measurability.
Similar measurements must occur in the hand‐off between Job 19 and Job 42 and again at plan completion. A successful IT job scheduling solution will create the workbench where validation logic like that shown in Figure 3.2 can be tagged throughout an IT plan. This logic should not impact the execution of individual jobs, nor is it necessarily part of whatever code runs beneath the job object. Effective solutions implement validation logic in such a way to be transparent to the execution of the job itself.
Transparency of measurement along with parameterization of job objects combine to create a repeatable and reusable solution. You can imagine that creating a wellinstrumented IT plan like Figure 3.2 is going to take some effort. Once expended, that effort gains extra value when it can be reused elsewhere.
Reusability comes into play not only within each IT job but also within each plan. Recall my assertion back in Chapter 1 that an IT job is "an action to be executed." This definition means that the boundary of an IT job must remain with the execution of an action. Figure 3.3 shows a graphical representation of an Integrated Jobs Library. In that library, is a collection of previously‐created jobs: Job 17 updates a database row, Job 27 opens a connection to a database, and so on.
Figure 3.3: Reusing IT jobs in a plan; reusing IT plans in a workflow.
Each of those discrete jobs can be assigned to a workflow for the purposes of accomplishing some task. They can also be strung together in infinite combinations to create a more‐powerful IT plan. You can see an example of this in Figure 3.3. Notice how Job 27 represents the beginning step of Plan 15; it also represents a middle step for Plan 22.
Once created, both jobs and plans reside in an IT job scheduling solution's Integrated Jobs
Library. From there, created jobs can be reused repeatedly as similar tasks are required. In Figure 3.3, two new databases require synchronization. Since a plan has already been created to accomplish this task, reusing that plan elsewhere can be as simple as a drag‐anddrop. After dragging to create a new instance of the plan, the only remaining activities involve populating that plan with new server characteristics.
Chapter 2 introduced the notion of job security. In the seventh story, you read how individual jobs and entire plans can be assigned security controls to prevent misuse. That level of security is indeed an important part of any IT system; however, Chapter 2 only began the conversation.
Consider the situation where an IT plan updates data in a database. Correctly constructing this IT plan requires parameterizing the plan to eliminate specific row values or items of data to update. However, parameterizing the plan in this way introduces the possibility that someone could accidentally (or maliciously) reassign the plan and update the wrong data.
This risk highlights why deep‐level security is fundamentally important to an IT job scheduling solution. You want controls in place to protect someone from invoking a plan inappropriately. But you also want controls in place to protect certain instances of or triggers for that plan to be executed. Each platform and application tied into your IT job scheduling solution has its own security model, as does the job scheduling solution itself. Mapping these two layers together is what enables a job scheduling solution to, for example, apply Active Directory (AD) security principles to some application with a nonWindows' security model. Doing so enables you to lean on your existing AD infrastructure for the purpose of assigning rights and privileges in other platforms and applications. Figure 3.4 shows how such an extended access control list (ACL) might look, with triggers, trigger characteristics, and even instances of such a plan being individually securable.
Figure 3.4: Applying deep security to a job or plan.
At its core, an IT workflow is still a piece of code. Some kinds of code a solution's vendor will create and include within a job scheduling solution. These represent the built‐in job objects in your solution's Integrated Jobs Library. Other code must be custom‐created by the administrators who use that solution. No vendor can create objects for every situation, so sometimes you'll be authoring your own. Notwithstanding who creates the code, at the end of the day, it is that code that needs to be scheduled for execution.
With this in mind, let's walk through an extended example of constructing a workflow out of individual parts. You can assume in this example that an IT job scheduling solution has been implemented and will be used to author the workflow.
A diagram of that workflow is shown in Figure 3.5. In it, each block represents an activity to be scheduled. Its story goes like this: Data in a system needs to be monitored for changes. As changes occur, an IT plan must be invoked to gather the changes, run scripts against the data, and move it around through file copy and FTP transfers. While all these processes occur, individual jobs within the workflow must trigger each other for execution as well as monitor for service availability.
Figure 3.5: An example workflow.
You should immediately notice that scheduling is an important component of this workflow. That scheduling isn't accomplished through some clock‐on‐the‐wall approach. It is instead based on monitoring the states present within the system (presence of files, WMI queries, log file changes, and so on), and firing subsequent actions based on changes in those states. This intraworkflow triggering is the foundation of IT job scheduling. Without it, scheduling jobs is little more than a function of time and date. A workflow like this requires a much faster response, one that moves from step to step based on the results of the justcompleted step. You only get that through triggering.
Explaining Figure 3.5's workflow begins at its second step with the creation of an Oracle PL/SQL job object. This job object is necessary to run a query against the workflow's Oracle database. This object and its underlying query string should already be a component of your IT job scheduling solution. As a result, creating that job probably starts by clicking and dragging a representative SQL block (an example of which is shown in Figure 3.6) from a palette of options into the plan designer's workspace.
Figure 3.6: Oracle PL/SQL object.
Once added to the workspace, specifics about this job object's use will then be added into the SQL block's properties screen. In Figure 3.6, you see how a SELECT statement is created to connect to an Oracle database and gather data. You should also notice how a variable— ($DATA_SOURCE)—is used in this case to maintain the reusability of the job object.
Constructing that Oracle object is only the first step. By definition, there is no logic in it to define when it should be invoked. Accomplishing this requires creating one or more conditional statements. In this case, the workflow desires to query a Web service to see when data has changed. When it has, the Oracle SELECT statement is invoked. Figure 3.7 shows an example screen where such a Web services binding might be created. This binding identifies the methods that the Web service exposes, and is the first step in creating the necessary conditional logic.
Figure 3.7: Web services connection.
Our example now includes conditional logic for monitoring the Web service for data changes. It also includes connection logic for gathering data from the Oracle database. The next step in the workflow requires processing that data through the use of a script block. Such a script block might be entered into an IT job scheduling solution using a wizard similar to Figure 3.8.
Figure 3.8: A scripting job.
In this mockup, a job object is created to bound a script. Scripting jobs are exceptionally malleable in that they can contain any code that is understood by the IT job scheduling solution and target application. In the case of Figure 3.9, the code is VBScript, although any supported code could be used.
The script's code is entered into the script block, along with other parameters like those seen in Figure 3.9: Those parameters are associated with the code itself, completion status, script extensions, and so on. Once created, the script becomes a job object just like the others in this workflow.
Note: As you can imagine, using custom code introduces the possibility for error into any IT plan. Your IT job scheduling solution will include scripting guidelines, but it should also include instrumentation to validate script variables and handle and alert on errors as they occur.
Figure 3.9: File copy job.
The next step in constructing the workflow is twofold. Figure 3.5's branching pattern illustrates the need to transfer the script's results to two locations using two different mechanisms. The first, seen in Figure 3.9, might be through a file copy job object.
Such an object is likely to be a built‐in object within an IT job scheduling solution's Integrated Jobs Library. Thus, adding that job object to the plan may require little more than dragging it into the workspace just like with the SQL object. Once added, parameters associated with the file transfer are then added along with actions should a failure occur. Note again here how a variable is used in the file copy object's parameters to maintain reusability.
File copy jobs typically perform file transfers between similar operating systems (OSs), such as Microsoft Windows. But getting data off a Windows system and onto a Linux or UNIX system requires bridging protocols. That's why FTP jobs exist. Figure 3.10 shows how an FTP job object might look being dragged into the workspace. In Figure 3.10, an FTP (technically, an SFTP) job has been created. Added as parameters to that job are the FTP commands required to transfer the data as well as server names and credentials.
Figure 3.10: FTP job.
Note: Securing these credentials is also important to security. No regulated business or its auditors will look kindly on storing authentication credentials within an FTP command string. Thus, an effective IT job scheduling solution should provide a secured credentials store for such jobs. That store maintains credential security while allowing their reuse across multiple FTP jobs.
I mentioned earlier that monitoring and measurement were key components of good IT plan creation. If you're not monitoring your environment, you won't be prepared for unexpected states. One way to do that monitoring can be through a trigger. I show a portion of such a trigger in Figure 3.11.
Figure 3.11: WMIbased trigger.
This trigger is used to facilitate the Monitor Service element in the workflow. For it, a Microsoft WMI query verifies the state of a service (in this case the TlntSrv or Telnet service). Not shown in the figure, but an important part of the job creation, is the action the trigger will accomplish when it discovers a stopped service. Assuming this sample workflow requires use of the service being monitored, the action associated with Figure 3.11 will be to restart that service if it is down.
This example is important because it highlights the kinds of state‐correcting actions an IT job scheduling solution can automatically perform. If your workflow requires specific servers and their services (or daemons) to be operational, building those corrective measures directly into the workflow goes far into ensuring the continued operation of the distributed business system.
Our sample workflow needs to process two scripts to manipulate its data. The first you saw in Figure 3.8. I won't show you a similar view of the second script. Instead, I'll show you a constraint that might be applied (see Figure 3.12). Such a constraint can define when that script needs to be executed.
Figure 3.12: File constraint.
Recall that intra‐workflow scheduling needs to be more than just time‐based. Time‐based schedulers are by nature insufficient because they can only process data at prescribed times of the day. Doing so creates inappropriate delay for workflow processing. What you really want is steps in a workflow to fire once a successful result from previous steps is verified.
You could achieve this by running the workflow line by line. However, doing so doesn't necessarily base the execution of following steps off results from previous steps. That's why Figure 3.12's file constraint is useful. Constraining an IT job's execution to occur only when a file is present allows that job to kick off only at the most appropriate time.
Our example workflow needs to process its second script after a file is copied. One can assume then that the copied file will be present on the target system. Thus, adding a file constraint to a job object means running the job only when the file is present and the previous step is complete.
Although not necessarily related to this example, a pair of additional constraints is worth exploring. The first can be seen in Figure 3.13 where a job constraint has been placed on a job. For those plans where you simply want one job to follow another after its successful completion, job constraints can ensure that path is followed. Important to recognize here is that, as configured, whatever job follows the one in Figure 3.13 will only begin if the previous job is successful. Your IT job scheduling solution should include multiple options for defining when jobs in a plan are allowed to begin.
Figure 3.13: Job constraint.
The other half of this equation is in telling which job to trigger after a successful completion. You can see an example of this in Figure 3.14. Here, a job (not identified in the figure) can be instructed to trigger upon the success of the previous job. Using combinations of constraints and triggers ensures that following steps in the workflow only execute when the state of the system is appropriate.
Figure 3.14: Completion trigger.
Although time‐of‐day scheduling is of comparatively minor use, it is still useful from time to time. Figure 3.15 shows an example scheduler that can be used for identifying when jobs should initiate. A good scheduler will include not only date‐ and time‐based triggers but also scheduling support for complex scheduling needs.
Figure 3.15: Timebased schedule.
Whatever IT job scheduling solution you choose needs to arrive with a suite of potential triggers that define when jobs are fired. These triggers perform multiple functions. They enable actions to be fired based on known states rather than requiring periodic "wake the script up and verify" batch jobs. They provide a mechanism to simplify event handling on external systems, a process that can be very complex when handled within a job object itself. They also create the potential for new types of actions, enacting change based on states that would otherwise be difficult to monitor within a script.
Consider the following possible triggers as a starting point for defining when you might want actions fired in your data center. This list gets you going. I'll expand on it in the next chapter, where I deliver a shopping list of capabilities you should look for in a solution:
Last, although the core of any IT job scheduling solution is indeed the code that enacts changes on systems, the last thing you want to do is begin creating scripts if pre‐created objects are already available. This chapter has discussed how an Integrated Jobs Library creates a palette of potential actions that you can add to your workspace. Figure 3.16 shows a representative sample of what one might look like. Pay careful attention to the actions that are available right out of the box in your chosen solution. You may find that leaning on your vendor for creating, testing, and validating objects greatly reduces your effort and risk of failure.
Figure 3.16: Integrated Job Library.
Telling computers what to do is indeed an art, one that's bounded in the science of logic. Purchasing and implementing an IT job scheduling solution only nets you an empty palette within which you can create your own automations. That empty palette does, however, come with substantial capabilities for creating those instructions. This chapter has attempted to show you ways in which that might occur.
There's still one more story left to tell. That story deals with highlighting the capabilities that you might want in setting up that palette. That's the topic for the final chapter. In it, I'll share a shopping list of capabilities that you might look for in an IT job scheduling solution. Some of those features will probably make sense, while others might surprise you.
Purchasing and implementing an IT job scheduling solution nets you only an empty palette within which you can create your own automations. Filling that palette to meet the needs of your environment is the next step.
You might remember this idea as the closing thought of the previous chapter. It highlights an important realization to keep in mind as you're considering an IT job scheduling solution: Once you've selected, purchased, downloaded, and incorporated into your infrastructure an IT job scheduling solution, what do you have? With many solutions, not much. Once installed, some solutions expose what amounts to an empty framework inside which you'll add your own jobs, plans, and schedules.
An IT job scheduling solution is, at the end of the day, only what you make of it. Right out of the box, a freshly installed solution won't immediately begin automating your business systems. Creating all those "little automations" is a task that's left up to you and your imagination.
That's why finding the right IT job scheduling solution is so fundamentally critical to this process. The right solution will include the necessary integrations to plug into your data center infrastructure. The right solution comes equipped with a rich set of triggers that bring infinite flexibility in determining when jobs are initiated. And the right solution helps you accomplish those automations easily, carefully, and with all the necessary tools in place to orchestrate entire teams of individuals. Integrations, triggers, and administration—these should represent your three areas of focus in finding the solution that works for you.
Just three things, eh? That's easy to say when you're just the author of some book on solutions for automating IT job scheduling. The real world simply isn't as cut and dried. The reality is that businesses today require justification—and often formal justification—in order to convert a tool that's desired into a tool that actually gets purchased.
Oftentimes, IT professionals know via gut instinct that they need something to solve their current problem. They often even have a vague notion of what that something probably looks like. The difficult part for many is in translating their instinct into a set of requirements that lay out exactly what they need.
That's why I've dedicated this final chapter to assisting you the IT professional in creating a formal requirements specification. I'll outline a set of requirements that are remarkably similar to those used to find the solution for my project, The Project That Would Change Everything.
You remember that project, first introduced in Chapter 1? Its architecture is reprinted as Figure 4.1. The Project That Would Change Everything, as you can see, incorporated a range of technologies along with associated triggers for moving data around while processing it at the speed its business required. Finding a single‐source solution to accomplish all of this wasn't an easy task. Thus, locating the solution that worked for us needed a set of formal requirements.
Figure 4.1: The Project That Would Change Everything.
In the next sections, I'll lay out the most important of those in formal requirements language. For each requirement, I'll add a bit of extra commentary to its story and, where possible, show you a mock‐up of what a potential fulfilling solution might look like. You're welcome and encouraged to reuse these requirements along with their justifications in your own specification for finding the product you ultimately need.
Oh, and you're welcome. Consider these your requirements for finding an IT job scheduling solution that'll work for your needs.
Not to belabor the point, but any IT job scheduling solution you select must work with every technology if it's to be useful. That means support for your databases, along with their query and management languages. It means integrating with your applications either directly or through exposed Web services. It requires direct and indirect integration with all forms of file transfer because data that's processed almost always needs to be moved somewhere else at some point. Finally, it must be able to handle data transformation, converting data between formats as it is processed or relocated.
With a sketch of the integration points that comprise your business systems, compare its list of products and technologies with those supported by the IT job scheduling systems you're considering. Those that don't support every technology should be immediately removed from your candidate list.
Platforms, applications, and technologies are only the first level of integration an IT job scheduling solution requires. In addition to general support for an application, such a solution must be able to dig into that application's activities and behaviors if it is to process and move around data.
Equally as important as the support of those properties and methods is their exposure within the IT job scheduling solution. Not every business system is well documented, and not every property, method, or action has a well‐known reason for being. Thus, a solution that can interrogate its integrations for what's available becomes critically important. Figure 4.2 shows how this might look for a Web Service, where a mock‐up IT job scheduling solution exposes a list of potential actions and data (in this case, properties and methods) with a single click.
Figure 4.2: Exposing the properties and methods of a Web services object.
Orchestrating activities across platforms, applications, and technologies is only useful when the IT job scheduling solution can do so across the entire field of scripting languages. Script language independence refers to the requirement that any appropriate scripting language can be used within any job object and against any applicable platform, application, or technology.
Figure 4.3 shows how this might be implemented in a sample solution. Here, the job object itself does not place constraints on the type of script launched within the properties of the job. In this figure, any script can be inserted into the Job Properties location. That same script, irrespective of its language, can be further constrained via parameters, completion status, and other factors including pre‐ and post‐execution steps. This flexibility is necessary because you'll be connecting your scheduling solution to many types of technologies, any of which may require a specific language for interaction.
Figure 4.3: Scripts of any language are components of each job or plan.
Queues in an IT job scheduling solution represent a mechanism to manage and prioritize job and plan activities. A fully‐functioning IT job scheduling solution will leverage multiple queues of differing priorities in order to preserve performance across both the scheduling system and those it connects with. Working with a series of prioritized queues also enables a kind of failover when the resources needed by job objects are for some reason not available. In this scenario, job objects in one queue can be failed over to subsequent queues for processing. The result is a better assurance that jobs will succeed when the system experiences resource outages or other transient problems.
You can see a mock‐up of how this might look in Figure 4.4. Here, a job is configured to execute within a specific submission queue. That queue is given a priority along with other parameters that control its performance. Running jobs in this manner ensures that they execute based on priorities that are driven by business rules.
Figure 4.4: Individual jobs or plans are assigned to queues.
There's an idea in the sixth story of Chapter 2 that warrants revisiting: "Triggers are the real juice in an IT job scheduling solution. The kinds and capabilities of triggers a job scheduling solution supports makes the determination between one that's enterprise ready and one that's not much more than the Windows Task Scheduler."
It is indeed the flexibility of those triggers (along with their associated constraints) that separates the best‐in‐class IT job scheduling solutions from those you won't want. Requirements 5, 6, 7, and 8 all deal with the need for different types of triggers that fire based on state changes or other behaviors on target systems.
A file‐based trigger initiates job execution based on the presence or characteristics of a file on a system. These triggers are particularly useful for identifying when a file is created, then firing the job's next step based on that file creation. They can do the same when files are modified, deleted, or any other action associated with that piece of data. File triggers become important for eliminating lag in distributed systems because they initiate processing steps immediately as data experiences a change.
Messaging systems such as CORBA, Java Messaging System, and Microsoft Messaging Queue among others are a low‐level solution for orchestrating activities across applications and platforms. Their centralized approach to signaling across system components creates an easy framework for developers. They can be similarly easy for IT job scheduling solutions to work with.
Interrogation and integration enables message‐based triggers to coordinate the activities between low‐level systems and their accompanying shrink‐wrapped solutions. Messagebased triggers are similar to file‐based triggers in that they improve job execution performance by executing actions at exactly the moment they're needed. Your chosen solution should tie into the messaging systems that are used by your business systems, enabling you to extend the reach of their signaling in and among distributed systems.
Like messaging systems, events are a rich source of information about on‐system behaviors. With virtually every application reporting its state through OS and other onboard event systems, an IT job scheduling solution with event‐based triggers gains the ability to orchestrate application events with other activities.
Most important here is the ability to customize and tailor events inside the business system. Events can be fired based on the activities within a system, so monitoring for their creation allows an IT job scheduling solution to immediately invoke resulting actions elsewhere.
Although most of this book has been dedicated to highlighting why time‐based triggers aren't good enough for most business systems, there still comes the time when a job must be fired based on wall clock time. Most important to recognize here is that date‐ and timebased scheduling can be done well (when not done well, it can be a significant limiter). An IT job scheduling solution that does not include support for multiple schedules, irregular schedules, and highly‐custom schedules won't be enough for your needs, particularly in today's global workplace where jobs that span time zones may be common. Seek out those that provide high levels of customization for date‐ and time‐based schedules.
Chapter 1 introduced the notion of parameterization when it comes to IT jobs and plans. This activity essentially abstracts every piece of data into variables that can be used anywhere. Variables and other types of dynamic data are critical to reusability in an IT job scheduling solution. Your chosen solution must include those that support variables both within jobs and plans as well as across them.
Oftentimes reusability of variables and other dynamic data across job objects is referred to as creating "profiles" of data. Those profiles provide an easy way to reference data no matter where it becomes needed. Figure 4.5 shows how such variables can be instantiated within a job object. There, $ID, @ExecutionUser, and @ExecutionMachine variables have been created for later use.
Figure 4.5: Variables are created for specific use or across all jobs and plans.
Once installed, you'll be quickly creating lots of individual jobs and plans for automating your environment. As you learned in Chapter 3, those jobs and plans are the discrete actions that ultimately connect to create a workflow. An effective IT job scheduling solution will enable the reuse of variables both within and across workflows so that very large automations can be much more easily laid into place.
You've seen in the previous chapter an example of how communication within and across workflows is useful. Its notion of exchanging data is important to simplify workflows and achieve parallelism of job processing for improved performance.
You might think that a solution whose primary goal is job execution performance wouldn't need to consider the business calendar. On the contrary, it is important to recognize the impact of jobs on actual system performance. You don't want to run particularly resourceintensive jobs against production systems during periods of heavy use. Just the act of running those jobs can have a negative impact on the system as a whole.
Determining that exact "period of low use" isn't often an easy task. Business systems, particularly those that service users across multiple time zones, may experience unexpected hours of high and low utilization. The complexities of global utilization drive the need for scheduling based on a business calendar. Figure 4.6 shows one representation of how that business calendar implementation might look. Using such a calendar, the execution of entire series' of jobs can be visually identified and scheduled to prevent resource overutilization.
Figure 4.6: Scheduling of jobs via a business calendar.
Business calendars aren't only for resource preservation. They can also be used to schedule job activities based on when data gets created, mirroring the job execution to the business rules that drive its data. For example, if you know that a set of end‐of‐day data will be available at the close of each day, using a business calendar can orchestrate the collection of that data across time zones and in accordance with other business rules. By following a business calendar, it becomes possible to align the technical activities on the system with the personnel activities in the real world.
Workflows that are comprised of numerous job objects will grow unwieldy over time. This happens as individual items get ever‐more interwoven throughout the greater system. Reuse of job objects at the same time creates a web of interdependencies between those very objects, which then requires management. Lacking visualization on dependencies, administrators can too easily manipulate a job object or item without realizing its downstream effects.
Figure 4.7 shows a sample report where one job object's dependencies are listed, along with their label, name, and path. This simple report is a powerful tool in a complex system where job objects find themselves reused across systems.
Figure 4.7: A report on the dependent objects of an object.
Your IT teams will find themselves growing quickly reliant on your IT job scheduling solution for more of their daily activities. That's because any action or behavior that can be characterized into a script or other job can be trivially implemented into your IT job scheduling solution.
For these and other reasons, that same solution must support multiple interfaces for management and administration. Obviously, a client‐based solution will provide the richest interface for manipulating jobs and their characteristics; however, not every administrator is always in a location where that client GUI is accessible. Web browser or mobile device interfaces become valuable tools when you're in the data center and nowhere near a rich client. They become even more valuable (particularly in the case of mobile device support) when critical jobs might alert in the middle of the night. Choose a solution that includes numerous interface options, and you'll thank yourself down the line.
A large portion of Chapter 3 was dedicated to deconstructing the elements in a typical IT workflow. Each of those individual items can be encapsulated into a job or plan. Each performs some action, and interconnecting them in complex ways—such as nesting and chaining—is what makes job scheduling so extensible across the range of business services.
Figure 4.8 shows what a view might look like in an IT job scheduling solution. There JobA and JobB are graphically connected to show how results from JobA are used in the execution of JobB. Although simple in this example, the chaining of input and output represents one of the core value propositions of an IT job scheduling solution. Its orchestration of activities across all jobs and all system components is what enables this chaining to occur.
Figure 4.8: Two jobs, the execution of which is chained together.
The same holds true with job nesting, where the execution of one job occurs within another. Job nesting furthers the reusability of jobs by enabling an individual job to perform an action within the confines of another. Input and resulting output can be used between jobs.
There's a third activity that's an important part of this requirement. Consolidating chaining and nesting into an overarching system highlights the power gained through job load balancing. Already discussed to some extent as a function of Requirement #11's business calendar, job load balancing enables an administrator to enact change across dozens or hundreds of system components at once. An effective solution will enable that action to occur without the fear that running a massively parallel job will impact the platform or application, or the greater system as a whole.
One can really only get to this nirvana of complete automation if it can be properly visualized. In fact, one of the greatest limitations of many applications and platforms lies in their lack of visualization tools. You simply can't see their activities as they fire. The IT job scheduling solution you want will expose a workspace into which job objects can be laid out, interconnected, and watched as they execute.
This visual approach to job creation becomes particularly important as the scope and complexity of plans increases. As you can imagine, it's not that difficult to connect two jobs together like what you saw back in Figure 4.8. Yet the situation changes dramatically when greater numbers of tasks require orchestration, all with their own execution triggers and constrains.
An IT job scheduling solution's workspace designer functionality grows more important as complexity increases. Figure 4.9 shows what is still a relatively simple plan; this time comprised of five separate jobs. Connecting these jobs are the triggers (marked as CT) and constraints (marked as JC) that combine to determine when the next set of actions is to be executed. In this example, three jobs must coordinate their activities prior to the fourth one executing. Only after that fourth job executes can the fifth and final job complete.
Figure 4.9: A collection of jobs in a plan with associated triggers and constraints.
Look for an easy‐to‐use workspace designer tool in your chosen solution. Lacking one that presents visualizations in an easy‐to‐understand format makes your work more difficult and introduces the chance of failure or error in plan creation.
Scripts are obviously the backbone of any IT job scheduling solution, but many common actions in an IT environment are repetitive and/or easily captured into a reusable object. This chapter in fact began with the assertion that a freshly‐installed IT job scheduling solution creates an empty framework that you're responsible for filling with automation. The reality is, depending on the solution chosen, that framework may be automatically populated with common actions that can be immediately usable.
As you can imagine, having a collection of jobs readily at hand can significantly speed the creation of new automations. Need to email a document? Just drag the "Email" job step into your workspace designer. Figure 4.10 shows a mock‐up of how such a job steps editor might look. There, you can see a range of common activities that span the breadth of data center platforms and applications.
Figure 4.10: An editor that includes commonlyused jobs.
Although these job steps alone won't be specific to your needs, an effective solution will include ways of incorporating variables and other dynamic data to customize the job steps for the needs of the automation under construction. More importantly, these job steps are pre‐generated and pre‐tested from the vendor, which reduces the risk of scripting error and the level of effort in testing.
Easily one of the most difficult activities in creating automations is in recognizing their output, whether that be the data you're looking for or an error message. Most automations are not run in interactive mode. Instead, they're run as background processes that work with platforms and applications without exposing their activities to the logged in user. Thus, the resulting data and error messages from these scripts aren't easily captured using simple native tools.
An effective IT job scheduling solution will often execute its scripts within its own runtime environment, or within one where output and error messages can be captured. Executing scripts and other objects in this way enables the IT job scheduling solution to return this information to an administrator's console for review. Knowing output messages from executed scripts assists greatly in the generation of those scripts, easing their development process and reducing the risk of error. Look for an IT job scheduling solution that supports script execution reporting that includes output data as well as runtime error messages.
Chapter 2's final story relayed the painful situation where a script gets misused. Script misuse, accidental use, or malicious use are all common risks in any data center environment where multiple individuals work together. That's why an effective IT job scheduling solution will include a permissions structure that can lock down jobs, plans, and even variables to specific users and/or uses.
Having a centralized security model significantly reduces the risk that a script with significant impact cannot be accidentally or maliciously run against data center equipment. It also provides a point of control for change management administrators and auditors to monitor. Data centers that operate under heavy regulation or security controls will greatly benefit from centralizing the permissions structure for script execution into a single solution.
Security isn't the only mission‐critical requirement in a solution that could potentially make massive changes across hundreds of systems at once. No less important are the needs for change control and revision history of any automations that have been introduced into the system.
You've surely experienced the situation where "something got changed." Whether that change is to a setting on a server or a line in a script, figuring out exactly what got changed—and who changed it—in this scenario is a challenging task that isn't often successful. When changes are made that inappropriately alter data, finding the exact line or character at fault adds even more difficulty.
That's why an IT job scheduling solution that you'll want to use will store revisions of scripts and other automations for review. An excellent solution will provide a mechanism for you to analyze the individual changes between revisions, as well as note which user made the change. Figure 4.11 shows an example screen where 10 revisions of a script have been logged. There, each version can be viewed to identify "what got changed."
Figure 4.11: Revision history.
The final requirement here ties each of the last few into a centralized database for auditing, monitoring, and alerting purposes. It has been said repeatedly in this chapter that (in addition to enhancing job scheduling itself) a primary reason for implementing an IT job scheduling solution is for centralization of job execution. By default, this centralization automatically creates a single location where all actions to your business systems can be logged and monitored.
Administrator and even user alerting represent useful additions to the feature set of such a solution. Remember that any IT job scheduling solution sits in the center of your business service, orchestrating the communication and processing of data between disparate components. From this location, it is uniquely positioned to watch for and alert on behaviors in data. Those behaviors can be things of interest to administrators; or, more often, they are of interest to the users themselves. Creating alerts across all the usual alerting approaches such as email, messaging, instant messaging, and even more‐modern techniques such as social media outlets provide a way to notify users when conditions of interest occur. Figure 4.12 shows a simple example of an email alert that can be initiated based on either a trigger or other preconfigured condition.
Figure 4.12: An example alert.
So, there you have it—20 high‐level requirements for quantifying the types of capabilities you need out of an IT job scheduling solution. These 20 requirements highlight the mostcritical pieces that any distributed business system and its administrators will need to improve job execution performance while maintaining consistency of workflows.
And, at the same time, that's my story. In the end The Project That Would Change Everything was eventually implemented successfully. It took time to create the necessary automations, I'll admit. But the workflow assistance gained through the use of a centralized system ensured that all our changes were logged, monitored, and carefully categorized. In the end, given the same project and scope of work, I'd do it again in the very same way.