Microsoft System Center 2012 - Operations Manager

The Planning and Design Series Approach

This guide is one in a series of planning and design guides that clarify and streamline the planning and design process for Microsoft® infrastructure technologies.

Each guide in the series addresses a unique infrastructure technology or scenario. These guides include the following topics:

  • Defining the technical decision flow (flow chart) through the planning process.
  • Describing the decisions to be made and the commonly available options to consider in making the decisions.
  • Relating the decisions and options to the business in terms of cost, complexity, and other characteristics.
  • Framing the decision in terms of additional questions to the business to ensure a comprehensive understanding of the appropriate business landscape.

The guides in this series are intended to complement and augment the product documentation.

Benefits of Using This Guide

Using this guide will help an organization to plan the best architecture for the business and to deliver the most cost-effective Microsoft System Center 2012 - Operations Manager technology.

Benefits for Business Stakeholders/Decision Makers:

  • Most cost-effective design solution for an implementation. Infrastructure Planning and Design (IPD) eliminates over-architecting and overspending by precisely matching the technology solution to the business needs.
  • Alignment between the business and IT from the beginning of the design process to the end.

Benefits for Infrastructure Stakeholders/Decision Makers:

  • Authoritative guidance. Microsoft is the best source for guidance about the design of Microsoft products.
  • Business validation questions to ensure the solution meets the requirements of both business and infrastructure stakeholders.
  • High-integrity design criteria that includes product limitations.
  • Fault-tolerant infrastructure, where necessary.
  • Proportionate system and network availability to meet business requirements.
  • Infrastructure that is sized appropriately to meet business requirements.

Benefits for Consultants or Partners:

  • Rapid readiness for consulting engagements.
  • Planning and design template to standardize design and peer reviews.
  • A "leave-behind" for pre- and post-sales visits to customer sites.
  • General classroom instruction/preparation.

Benefits for the Entire Organization:

Using this guide should result in a design that will be sized, configured, and appropriately placed to deliver a solution for achieving stated business requirements, while considering the performance, capacity, manageability, and fault tolerance of the system.

Introduction to the Microsoft System Center 2012 - Operations Manager Guide

Operations Manager is a component of Microsoft System Center 2012 that helps the organization monitor services, devices, and operations for multiple computers from a single console. This guide leads the reader through the process of planning the Operations Manager infrastructure by addressing the following fundamental decisions and tasks:

  • Identifying which services, applications, and infrastructure need to be monitored.
  • Determining the resources needed to employ Operations Manager to monitor the selected resources.
  • Designing the components, layout, security, and connectivity of the Operations Manager infrastructure.

Figure 1. System Center 2012 capabilities and components

Business objectives should be prioritized at the start of the project so that they are clearly understood and agreed on by IT and business managers. Certain features require additional licensing or infrastructure costs; before adding those features, planners should inform the business of the extra costs involved.

What's New in System Center 2012 - Operations Manager

This guide has been adapted from the Infrastructure Planning and Design Guide for Microsoft System Center Operations Manager 2007. For a listing of what is new in System Center 2012 - Operations Manager, see http://technet.microsoft.com/en-us/library/hh551139.aspx. Another useful resource is the System Center: Operations Manager Engineering Blog article, "Topology changes in System Center 2012 Operations Manager (Overview)" at http://blogs.technet.com/b/momteam/archive/2011/08/22/topology-changes-in-system-center-2012-operations-manager-overview.aspx.

The most relevant infrastructure changes include:

  • RMS removal and the new RMS emulator. The single largest change impacting design and planning is the removal of the root management server (RMS). All management servers are peers now that there is no RMS. Therefore, the RMS is no longer a single point of failure because all management servers host the services previously hosted only by the RMS. Roles are distributed to all the management servers. If one management server becomes unavailable, its responsibilities are automatically redistributed. An RMS emulator role provides for backward compatibility for management packs targeting the RMS. If the organization does not have any management packs that previously targeted the RMS, the RMS emulator will not be required.
  • Data warehouse. The data warehouse is now required.
  • Resource pools. A resource pool is a collection of management servers, or gateway servers, used to distribute work among themselves and take over work from a failed member. All management servers are members of the All Management Servers resource pool, which balances the monitoring load of the management group as new management servers are added, and provides automatic failover for management server workloads such as dependency monitoring, network monitoring, and cross-platform monitoring.

The most current information on Windows Server® 2012 support with System Center 2012 - Operations Manager at the time of this guide's publishing is available at http://blogs.technet.com/b/momteam/archive/2012/09/05/windows-server-2012-system-center-operations-manager-support.aspx.

Assumptions

To limit the scope of material in this guide, the following assumptions have been made:

  • This design is for use in a production environment. It is expected that a test environment will also be created to mirror the configuration of the production environment.
  • The reader is familiar with Microsoft infrastructure solutions. This guide does not attempt to educate the reader on the features and capabilities of Microsoft products. The product documentation covers that information.
  • The supporting infrastructure requirements for Active Directory® Domain Services (AD DS), Domain Name System (DNS), and domain space naming described in "Supporting Infrastructure" at http://technet.microsoft.com/en-us/library/hh487285.aspx have been reviewed and met.

Operations Manager Design Process

The steps that follow represent the most critical design elements in a well-planned Operations Manager design:

  • Step 1: Define the Project Scope and Requirements
  • Step 2: Determine the Number of Management Groups
  • Step 3: Design the Operations Manager Management Server Infrastructure
  • Step 4: Design the Operational Database
  • Step 5: Design the Data Warehouse and Reporting Server
  • Step 6: Design the ACS Database Server
  • Step 7: Design the Notification System
  • Step 8: Design the Network Connections

Some of these items represent decisions that must be made. Where this is the case, a corresponding list of common response options is presented.

Other items in this list represent tasks that must be carried out. These types of items are addressed because their presence is significant to complete the infrastructure design.

Figure 2 provides a graphic overview of the steps involved in designing an Operations Manager infrastructure.

Figure 2. The Operations Manager infrastructure decision flow

Installing Operations Manager creates a management group. The management group is the basic functional unit of an Operations Manager infrastructure that can perform monitoring. Each management group contains a Microsoft SQL Server® database server to host the operational database, one or more management servers, one or more consoles, and the agents and other resources that are managed. It can also contain additional management servers and gateway servers, as well as Audit Collection Services (ACS) components. At a minimum, a management group consists of a management server, the operational database, and the data warehouse database.

  • The management server is the focal point for administering the management group and communicating with the database. When the Operations console is opened and connected to a management group, this connects to a management server for that management group. Depending on the size of the computing environment, a management group can contain a single management server or multiple management servers.
  • The operational database is a SQL Server database that contains all configuration data for the management group and stores all monitoring data collected and processed for the management group. The operational database retains short-term data, by default seven days.
  • The data warehouse database is a SQL Server database that stores monitoring and alerting data for historical purposes. Data written to the Operations Manager database is also written to the data warehouse database, so reports always contain current data. The data warehouse database retains long-term data.

Two examples of architecture designs are shown in Figure 3.

Figure 3. Example Operations Manager architecture

When Operations Manager Reporting functionality is installed, the management group also contains a reporting server that builds and presents reports from data in the data warehouse database.

The role of the management server is to administer the management group configuration, administer and communicate with agents, and communicate with the databases in the management group. The management group can contain multiple management servers to provide additional capacity and continuous availability. When two or more management servers are added to a management group, the management servers become part of a resource pool and work is spread across the members of the pool.

The workflows that the System Center Management service runs are defined by management packs. Management packs define the information that the agent collects and returns to the management server for a specific application or technology.

A specialized type of management server is the gateway server. A gateway server enables the monitoring of computers in untrusted domains.

An Operations Manager agent is a service installed on a computer. The agent collects data, compares sampled data to predefined values, creates alerts, and runs responses. A management server receives and distributes configurations to agents on monitored computers. Every agent reports to a management server in the management group. This management server is referred to as the agent's primary management server.

The Operations Manager agent sends alert and discovery data to the primary management server, which writes the data to the operational database. The agent also sends events, performance, and state data to the primary management server for that agent, which writes the data to the operational and data warehouse databases simultaneously.

Figure 4 provides a more complex architecture diagram.

Figure 4. Example Operations Manager architecture with fault tolerance

Applicable Scenarios

This guide addresses the planning and design decisions involved in creating a successful Operations Manager infrastructure. It has been written to address the needs of the following groups:

  • Organizations with no monitoring solution that are planning to monitor services, applications, and infrastructure with Operations Manager
  • Organizations now using another monitoring solution that are planning to move to Operations Manager
  • Organizations consolidating multiple monitoring solutions to Operations Manager
  • Organizations with multiforest environments where Operations Manager will be used to monitor and manage resources that span Active Directory Domain Services forest boundaries
  • Organizations that have distributed environments with systems separated by wide area network (WAN) links
  • Organizations with services in perimeter networks separated by firewalls
  • Organizations interested in implementing centralized security event log collection and reporting to meet internal audit or regulatory compliance requirements
  • Organizations upgrading from Microsoft System Center Operations Manager 2007 to System Center 2012 - Operations Manager
  • Organizations requiring coexistence with existing management systems

Customers with complex scenarios should consider having their architecture reviewed by Microsoft Services prior to implementation because that organization is best able to comment on the supportability of a particular design.

Out of Scope

This guide does not address the following:

  • Multi-tenancy. Service provider scenarios incorporating Operations Manager functionality.
  • System Center Essentials. This is a separate product designed for midmarket businesses.
  • OEM management packs. Original equipment manufacturer (OEM) management packs have varying resource and security requirements. Necessary resources should be obtained from the OEM vendor offering the management pack.
  • Management pack development. Creation of custom management packs.

Step 1: Define the Project Scope and Requirements

Before designing an Operations Manager infrastructure, the organization needs to determine the objectives for the project and which parts of its environment to include in the design. In this step, the business requirements will be identified. A component map will also be created for services that will need monitoring, the resources needed, any monitoring or administrative processes already in place, plus any reporting that may be needed.

The outputs of this step will be a list of resources in scope for monitoring, coexistence requirements with any existing management systems, and any IT support requirements affecting the Operations Manager infrastructure design. The information collected will be used to identify required management packs, to determine how certain components or business services will be monitored, and to design the Operations Manager infrastructure. The requirements gathered in this step are the main ones that influence the infrastructure design. However, the Operations Manager engineering team has published a more comprehensive inventory that may be useful for the administration and operations. See "Inventory of Operations Manager Infrastructure" at http://technet.microsoft.com/en-us/library/hh431838.aspx.

Task 1: Determine the Business Requirements

In this task, the functional requirements for business stakeholders are documented. When the requirements and budget are known, accurate technical decisions can be made on how to best meet solution requirements.

Below is a list of key data-collection tasks and descriptions of how the information will be used in later steps. Document all of the following information in the order listed:

  1. Which business services are in scope for monitoring? Is Operations Manager being implemented to monitor a specific service such as Microsoft Exchange, or will the organization use it as the primary enterprise-monitoring platform?

Using Table A-1 in Appendix A, "Job Aids," document the following:

  • The business services in scope. Record the description of the service and the service owner. Is the service dependent on any applications, subservices, servers, and devices, and if so, what are they?
  • The description of a sample transaction. Record where the service is located, where the clients that use the service are located, and the type of transaction. For instance, if it's an Internet Information Services (IIS)-hosted .NET application, then application performance monitoring (APM) will be needed to monitor from both server- and client-side perspectives to get details about application performance and reliability, and this will impact the scale and load of the operational database.
  • Assess the impact of the failure on the business. Do this for each service in scope. A suggested categorization is given in Table 1.

Table 1. Impact of a Failure on the Business

Impact of component failure on business service

Component priority

Service outage

Critical

Service operates, but at substantially reduced functionality or capacity

High

Service operates with full functionality, but at reduced capacity

Medium

Service operates with full functionality and capacity

Low

This information makes it possible to identify the dependent applications, servers, and devices, as well as the underlying technologies on which they depend (for example, SQL Server and AD DS) that must be considered for monitoring.

  1. Is long-term data collection required for capacity planning, performance tracking, or trend analysis? The Operations console can provide a limited view of data, but the database should be limited in size to maintain performance. Long-term retention should occur in a data warehouse. Once a data warehouse is required, determine the length of the retention period. This will be used to determine the size of the data warehouse.
  2. What are the availability requirements for the monitoring infrastructure? Give careful consideration to the availability requirements for each functional area. The organization should understand and rate the significance of the risk of possible data loss or interruption of business, and then the architect will use this information to select an appropriate fault-tolerance approach for the systems involved.
  3. Are there regulatory compliance or internal audit requirements? Obtain answers to the following questions before proceeding:
  • Does the organization have any external or internal requirements for security auditing?
  • If so, has the organization implemented a security auditing solution that satisfies these requirements?

External regulations, such as the Sarbanes Oxley (SOX) Act or the Health Insurance Portability and Accountability Act (HIPAA) might require implementation of ACS if a security-auditing solution is not currently in place. Likewise, internal security policies mandating a security audit can also create a need for ACS. If security logs must be recorded and stored centrally, record in Table A-1 in Appendix A whether security logs should be retained and how long they must be stored.

Task 2: Determine the Technical Requirements

In this task, the technical requirements are documented.

Before proceeding to the next step, document all of the following information in the order listed in Table A-2 in Appendix A:

  1. Will Microsoft System Center Virtual Machine Manager be used in this environment, with reporting enabled? Reporting data from Operations Manager can be integrated into the Virtual Machine Manager (VMM) console. If this is done, the reporting data from the Operations Manager data warehouse appears under a tab in the VMM console. If this integration will be required in the VMM console, the Operations Manager data warehouse will need to be implemented as well as the Operations Manager virtualization management pack.

If the answer was yes to any of the questions, an infrastructure design for Operations Manager Reporting is required. If the answer was no to all questions, go to the next step.

  1. What management packs and integration packs will be required by the infrastructure? To determine the appropriate packs needed for the design, consider the following options:
  • Native management packs are released by Microsoft. Compare the services in scope for monitoring to the list of available management packs in the System Center management pack catalog page at http://pinpoint.microsoft.com/en-US/systemcenter/managementpackcatalog.
  • Third-party management packs. Many devices and applications from Microsoft partners have management packs that can be used with Operations Manager.
  • Custom management packs. For situations where existing management packs aren't available, determine if there are in-house IT resources with the skills to develop a custom management pack, or if the development of a custom management pack could be outsourced to a specialist vendor.
  • Orchestrator runbooks. Microsoft System Center 2012 - Orchestrator provides the ability to create and run automated workflows, called runbooks, made of multiple activities that each perform a distinct function. The ability to synchronize alerts with remote systems is achieved by creating runbooks, using activities that interact with one or more other products. Because runbooks can include sophisticated logic and activities from any number of integration packs, scenarios can be implemented that are difficult to achieve with connectors. Integration packs will be delivered for each System Center component and provide additional activities specific to a particular component.

Capacity Requirements

To effectively implement Operations Manager in an enterprise, it is critical to understand the key performance characteristics to which the service will be subjected. Refer to the information that the organization provided in Task 1 of this step for the administrative boundaries that will be included, and then ask the following questions of the technical personnel in the organization. Record the answers to these questions in Table A-3 in Appendix A.

What is the approximate number of each of the following computers, and where are they located?

  • Agent-monitored computers. An agent is the feature installed on a Windows-based computer that performs management, collects data, compares sampled data to predefined values, creates alerts, and runs responses.
  • Agentless Exception Monitoring (AEM) computers. AEM redirects hardware, operating system, and application crash information to Operations Manager, which can aggregate, view, and report on error reports sent by the Windows Error Reporting service.
  • Agentless-managed computers. An agentless-managed computer is a Windows-based computer discovered by using the Operations console. It is assigned a management server or agent-managed computer to provide remote (proxy) agent functionality for the computers. Agentless-managed computers are managed as if there is an agent installed on them. Not all management packs work in agentless mode.

A computer can be monitored without an agent by using either agentless monitoring, AEM, or both. Use agentless monitoring of computers when it is not possible or desirable to install an agent on a computer.

  • Computers running UNIX or Linux. Operations Manager provides monitoring of UNIX- and Linux-based computers similar to monitoring of Windows-based computers. For supported operating systems, see http://technet.microsoft.com/en-us/library/hh205990.aspx#BKMK_RBF_UnixAgent.
  • Network devices. Operations Manager provides the ability to discover and monitor network routers and switches, including the network interfaces and ports on those devices and the virtual LAN (VLAN) that they participate in. Because network monitoring workflows run on management servers (on the SNMP module), and not on agents, a heavy load is placed on the management servers.
  • .NET applications monitored via APM. Because application performance monitoring in Operations Manager is Health service-based, scale and load put on the Operations Manager database is an important factor to consider. When monitoring many applications that generate several events (state changes) each per second, it is easy to see how the Operations Manager Health service can be overloaded if not considered in the design phase. For example, the average application generates 0.3 events per second. With IIS supporting hundreds of applications per host, it can result in 30 or more events per second being raised through the Operations Manager Health service.

Step Summary

Aligning technical decisions with business requirements is a critical component of a successful project. Failure to clearly identify the functional requirements of the business and the budgetary resources available to meet them can result in a design that fails to meet requirements within the resource constraints defined for the project.

After completing this step, a detailed list of the resources in scope for monitoring, coexistence requirements with any existing monitoring systems, and any IT support requirements affecting the Operations Manager infrastructure design was made and recorded in Tables A-1, A-2, and A-3 in Appendix A:

  • Business requirements for the Operations Manager solution
  • Business services in scope for monitoring with Operations Manager
  • Individual components containing the resources that must be monitored
  • Coexistence and interoperability requirements for Operations Manager with existing management systems (monitoring or ticketing systems) if other solutions exist and will continue to be present
  • Need for reporting
  • Required native management packs
  • Third-party management packs for monitoring third-party line-of-business (LOB) applications
  • Orchestrator integration packs required to integrate Operations Manager with existing management systems

This information will be used to identify required management packs, determine how some components or business services will be monitored, and design the Operations Manager infrastructure.

Additional Reading

  • "Connecting to External Systems by Using Operations Manager Connectors": http://msdn2.microsoft.com/enus/library/bb437511.aspx
  • System Center management pack catalog page: http://pinpoint.microsoft.com/en-US/systemcenter/managementpackcatalog
  • "Operations Manager Management Pack Development Kit": http://msdn.microsoft.com/en-us/library/hh770065.aspx
  • "Inventory of Operations Manager Infrastructure": http://technet.microsoft.com/en-us/library/hh431838.aspx

Step 2: Determine Number of Management Groups

In the previous step, the business and technical requirements of the organization were established. The goal of Step 2 is to determine the smallest number of management groups necessary to meet the organization's monitoring objectives. Infrastructure and environment data from Step 1 will be compared to the criteria for multiple management groups to determine the need for additional management groups.

The primary considerations in determining the number of management groups required are:

  • The number of resources in scope for monitoring.
  • The location of these resources.
  • The IT groups responsible for monitoring the resources.

This will be used to help determine the number and sizing of Operations Manager server roles and the distribution of these roles across servers, as well as how many iterations are necessary to complete all management group infrastructure planning activities.

Note If the organization is running both Service Manager and Operations Manager, the management group names for the Service Manager management group, the Service Manager data warehouse management group, and the Operations Manager management group must be unique.

Task 1: Determine the Number of Management Groups

The output of this task is the number of management groups required and the justification and function of each.

Begin this task with a design that includes one production management group. The need for additional groups can be determined via the following factors:

  • Scaling. Recommended scale limits for Operations Manager are published under the "Monitored Item Capacity" sections of "System Requirements for System Center 2012 - Operations Manager" at http://technet.microsoft.com/en-us/library/hh205990.aspx. Operations Manager supports the following number of monitored items per management group:
    • 100,000 AEM computers
    • 60 agentless-managed computers
    • 6,000 agent-managed and UNIX- or Linux-based computers (with 50 consoles open), or 15,000 agent-managed and UNIX- or Linux-based computers (with 25 consoles open)
    • 12,000 monitored URLs per dedicated management group
    • 700 agents performing APM
    • 400 applications being monitored using APM

Refer to the expected capacity requirements gathered in Step 1. If they exceed the limits, additional management groups will be needed.

  • Agents separated from their management server by WAN-speed network links. Slow network links may require separate management groups. Use the data in the "Minimum Network Connectivity Speeds" sections of "System Requirements for System Center 2012 - Operations Manager" at http://technet.microsoft.com/en-us/library/hh205990.aspx to determine whether bandwidth constraints indicate a need for a separate management group at the location.
  • Political, administrative or security requirements within the organization requiring separate management groups. The Operations Manager administrator role maintains full control of all resources in the management group and cannot be limited in the resources it controls. If the organization has multiple autonomous IT support units unwilling or unable to share administrative control of the Operations Manager infrastructure, then additional management groups are required.
  • A view of AD DS topology required across multiple forests. The AD DS management pack does not provide out-of-forest topology monitoring, so unless a cross-forest trust is in place, a separate management group must be designed for each forest in which AD DS topology function is needed.
  • A dedicated management group required for auditing purposes. If regulatory compliance or internal security policies require that administration of security event log data must be separated from management of operational events and alerts, add an ACS management group for each management group identified in this task.
  • Disaster recovery functionality required. If the organization's service level requirements for the Operations Manager infrastructure include disaster recovery functionality, a replica copy of the operational database can be created using SQL Server log shipping to send SQL Server logs to another SQL Server-based computer, and the logs can be applied to a copy of the operational database in the failover location.
  • Consolidated views of connected management groups required in Operations Manager. An additional local management group can be used to provide a centralized management view and to centrally connect to other event and alert management systems.

A centralized management model with large remote locations works best with a management group in each region and a local management group (which provides a consolidated view of alerts and status) in the parent location. In this case, the centralized management group connects through the software development kit (SDK) and functions as an additional console on each of the connected management groups. Note that performance data cannot be viewed from the local management group.

There is no official sizing guidance on how many connected management groups to which a local management group may connect.

  • Operations Manager integration with the VMM console. VMM can connect to a single Operations Manager management group, so the VMM console can display reporting and health data within the scope of that management group. This means that if two different VMM hosts are in different Operations Manager management groups, two VMM instances will be required to provide reporting integration in the VMM console. Each VMM instance will then only provide reports for the management group to which it is connected.

This may prompt the planner to design fewer Operations Manager management groups. Alternatively, side-by-side use of the VMM and Operations consoles may be sufficient without their integration.

Record any additional management groups required in Table A-4 in Appendix A.

Step Summary

After completing this step, the following strategies were designed and recorded in Table A-4 in Appendix A:

  • A list of required management groups
  • The justification and function of each

This information will be used as an aid in decision making about server size, count, and Operations Manager server role distribution. This information also determines the number of iterations needed to complete all infrastructure planning activities required for each management group.

Additional Reading

  • "Deploying System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh278852.aspx
  • "System Requirements for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh205990.aspx
  • "System Center 2012 - Operations Manager SDK": http://msdn.microsoft.com/en-us/library/hh329086.aspx

Step 3: Design the Operations Manager Management Server Infrastructure

Step 2 established the number of management groups needed. The goal of Step 3 is to determine the infrastructure for the Operations Manager management server. This step must be repeated for each management group.

Determining appropriate size and distribution of server roles is an important element in delivering the required monitoring functionality at a level of performance and fault tolerance expected by the organization. This decision depends on both the number of monitored objects and the fault-tolerance requirements of the organization served by that management group. This step will establish:

  • The appropriate number and distribution of management servers and gateway servers to support agent load.
  • The network topology to determine whether additional gateway servers are needed to optimize bandwidth utilization where poor connectivity exists.
  • Server configurations to meet the fault-tolerance requirements of the organization.
  • Hardware resources for each server to be implemented.

The information gathered in this step will be used to determine the size and placement of the operational database, ACS database, and AEM file share. The design for Operations Manager server roles will also be considered during design of the network connections.

Task 1: Determine the Number of Management Servers Required for Scaling

In this task, the expected agent load and console connections in each management group are used as criteria in determining how the many management servers are needed per management group.

A management group can contain multiple management servers to provide additional capacity. When two or more management servers are added to a management group, the management servers become part of a resource pool and some of the workloads are spread across the members of the pool.

To determine whether to add additional management servers, consider the following:

  • Scale limits. Add additional management servers if the management group is expected to exceed these limits:
    • 3,000 agent-monitored Windows-based computers
    • 25,000 AEM computers
    • 2,500 collective client monitored computers
    • 10 agentless-managed computers
    • 500 monitored UNIX- or Linux-based computers
    • 3,000 monitored URLs

For example, use the following formula for calculating the number of agents supported in the management server pool:

MaxAgents/Pool = (NumServers in Pool – Max Number of Failed Servers) * 500

Or for URL monitoring, use the following formula for calculating the number of URLs supported in the management server pool:

MaxURLs/Pool = (NumServers in Pool – Max Number of Failed Servers) * 3,000

  • Agentless Exception Monitoring. When an application error occurs in a Windows® operating system, Windows Error Reporting can capture the details so that they can be used to diagnose the cause of the problem. When AEM is enabled, those details can be forwarded to a management server and aggregated. They can then be used in centralized error analysis and diagnosis.

Decide whether AEM will be used and, if so, whether it will be implemented on dedicated server hardware.

  • Audit Collection Services. ACS collects security event data from domain controllers, member servers, and client computers. Its use enables central security monitoring and reporting.

If ACS will be used in the management group, decide whether the anticipated load is enough to warrant additional servers, and then add those servers as new rows in Table A-5 in Appendix A. This data will be used to help determine fault-tolerance configurations.

  • Network monitoring. Operations Manager provides the ability to discover and monitor network routers and switches, including the network interfaces and ports on those devices and the VLAN that they participate in. Discovered network devices can be deleted and prevented from being rediscovered the next time discovery runs.

Because network monitoring workflows run on management servers (on the SNMP module), and not on agents, a heavy load is placed on the management servers. Therefore, for better performance it is recommend that dedicated management servers be used in dedicated resource pools for network monitoring. Create a resource pool dedicated to network management, add dedicated management servers into the newly created resource pool, and then remove the dedicated management server from any other resource pool.

Note Windows agents do not report to a resource pool.

Decide the number of management servers required, and then for each server, document the applicable information in Table A-5 in Appendix A.

The server roles and configuration information recommended here will be used to decide whether to implement additional Operations Manager management servers to optimize performance and adjust for load and fault tolerance.

Task 2: Determine Placement of Web Console Role

Operations Manager provides a web console that enables the environment to be administered from a web browser. The console is delivered by an IIS server, which connects to a management server or resource pool.

The Operations Manager engineering team recommends that the Operations Manager web console be on a dedicated computer. If it is decided that the web console will be co-located with a management server, the web console must be installed and configured at the same time the management server is installed.

The Operations Manager engineering team does not have specific guidelines on when to scale out to additional web servers. If performance is inadequate, because the console server is running on IIS, scale-out and fault tolerance can be achieved using Network Load Balancing or hardware load balancers.

For further design details, see the Infrastructure Planning and Design Guide for Internet Information Services 7.0 and Internet Information Services 7.5 at http://go.microsoft.com/fwlink/?LinkId=157703.

Decide whether the web console feature will be used in the management group. If so, decide whether it will be implemented on one or more dedicated servers or on an existing management server. Update Table A-6 in Appendix A to reflect these decisions. This data will be used to help determine fault-tolerance configurations.

Task 3: Determine the Need for Gateway Servers

The focus of this step is to determine whether gateway servers are required, and if so, how many. This involves examining agent location and network connectivity to determine where the core server infrastructure of each management group should be augmented.

A gateway server is a specialized type of management server. By default, it is not added to the All Management Servers Resource Pool because it exists only to forward data from its connected agents to an upstream management server, which then inserts the data into the operational database. The benefits of a gateway must be balanced with the cost of administrative overhead, bandwidth utilization, hardware, and software.

Gateway servers can be implemented to:

  • Reduce administrative overhead. Agents across trust boundaries require certificate authentication. Determine if there are any agent-managed computers located in a separate workgroup or AD DS forest. If so, a gateway server can be within the trust boundary of the agents and can participate in the mandatory mutual authentication. Because they lie within the same trust boundary as the agents, the Kerberos version 5 protocol for Active Directory is used between the agents and the gateway server. Each agent then communicates only with the gateway servers that it is aware of. The gateway servers communicate with the management servers.
  • Minimize security concerns. Agents behind a firewall require multiple "allow" rules to permit agent traffic to pass through the firewall, raising potential security concerns. Gateway servers can act as a point of consolidation for agents to minimize the number of points of outbound traffic for environments separated by a firewall.
  • Reduce network bandwidth utilization. Agents located across WAN links consume network bandwidth, potentially affecting service delivery to and from the remote location. A gateway server can consolidate the traffic.

If gateway servers will be required for certain locations, then multiple gateway servers may be required when the following scaling limits are exceeded per dedicated gateway server:

  • 2,000 agent-monitored Windows-based computers
  • 100 monitored UNIX- or Linux-based computers

Use the following formula for calculating the number of agents supported in the gateway server pool:

MaxAgents/Pool = (NumServers in Pool – Max Number of Failed Servers) * 100

Add additional gateway servers to support more than these limits. It is also possible to chain multiple gateway servers to monitor across multiple untrusted boundaries. The requirements for creating a chained gateway are the same as the initial gateway server. See "Using Multiple Gateway Servers" at http://technet.microsoft.com/en-us/library/hh212790.aspx for more information.

Determine whether agent location or network connectivity indicates gateway servers are required, and if so, which management servers they will connect to, and record the answer in Table A-7 in Appendix A.

Task 4: Determine Resource Requirements for the AEM File Share

Agents configured to participate in Agentless Exception Monitoring upload exception data in .cab file format to a management server configured to host the AEM file share. The goal of this task is to determine the storage requirements of management servers hosting the AEM file share.

If it was determined that AEM will be implemented, determine the storage requirements for AEM and document findings in Table A-8 in Appendix A.

Task 5: Apply the Fault-Tolerance Requirements

If fault tolerance was named as a requirement during Step 1, determine the need for fault tolerance in each server role based on the criticality of the resources being monitored. If fault-tolerance options will be deployed, the hardware should be configured so that it will not be overloaded while operating in a fault-tolerant state.

Note Clustering of management servers is not supported in System Center 2012 - Operations Manager.

For regular management servers, add an additional server to the All Management Servers Resource Pool, which balances the monitoring load of the management group as new management servers are added, and provides automatic failover for management server workloads such as dependency monitoring, network monitoring, and cross-platform monitoring. The impact of failure of a management server in a distributed environment is minimized, but it increases the workload on additional management servers in the management group until the failed management server is restored. For other resource pools that may have been identified for network monitoring or other specific purposes, add additional servers to those pools to achieve redundancy.

Note Windows agents do not report to a resource pool. A Windows agent is configured to report to a single management server, and is automatically configured to fail over to any other management server. The failover server can also be specified via Windows PowerShell script.

For the specialized gateway servers, add a second server and configure the agents to use the second as a failover gateway. Then, configure the gateway servers to communicate with a management server resource pool rather than a specific management server. See "How to Configure a Gateway Server to Failover Between Multiple Management Servers" at http://technet.microsoft.com/en-us/library/hh212904.aspx and "How to Configure Agent Failover to Multiple Gateway Servers" at http://technet.microsoft.com/en-us/library/hh212733.aspx for more information.

If the web console server role needs to be fault tolerant, the applicable fault-tolerance method is Network Load Balancing or hardware load balancers.

The AEM file share can be made fault tolerant by storing the file on a file server in a failover cluster. For more information, see the Infrastructure Planning and Design Guide for Windows Server 2008 and Windows Server 2008 R2 File Services at http://go.microsoft.com/fwlink/?LinkId=160976.

For more information on failover behavior and limitations, see "How does the failover process work in OpsMgr 2012? (#SCOM, #SYSCTR)" at http://blogs.catapultsystems.com/cfuller/archive/2012/06/05/how-does-the-failover-process-work-in-opsmgr-2012-scom-sysctr.aspx.

Document any additional servers required for fault tolerance in Table A-9 in Appendix A. The output of this task is a list of the infrastructure necessary to meet the fault-tolerance requirements of the organization.

Task 6: Determine the Hardware Configuration

In this task, the hardware configurations for the Operations Manager components are determined. Operations Manager supports a variety of deployment topologies. Each of the main Operations Manager components can be installed either separately or in some combination, as described below.

The Operations Manager server roles can be run in a virtualized environment or in a physical server environment.

  • If a virtual machine will be used, ensure that it has access to CPU and memory resources equivalent to those specified for a physical machine.
  • If it was determined that a server will host multiple Operations Manager components, it should be sized to support the sum of the peak workloads.

For more information, see the "Operations Manager Virtualization Support" sections in "System Requirements for System Center 2012 - Operations Manager" at http://technet.microsoft.com/en-us/library/hh205990.aspx#BKMK_Virtualization.

The minimum configuration in the System Center 2012 - Operations Manager Sizing Helper tool has two management servers managing up to 500 agents, and a second server for fault tolerance, with 5 console connections, and this hardware configuration:

  • 4 disks as RAID 10
  • 8 gigabytes (GB) of RAM
  • 4 processor cores

Use the Operations Manager Sizing Helper tool, available for download at http://go.microsoft.com/fwlink/?LinkId=231853, to determine hardware requirements for each Operations Manager server feature. If multiple features will be installed on the same computer, use the higher of the recommended hardware requirements for any of the combined features.

Capacity shortage in the AEM file share will result in the inability of AEM to collect events. Ensure that adequate disk space is available for the AEM file share.

Record the answers in Table A-10 in Appendix A.

Step Summary

After completing this step, the following information was recorded in Tables A-5 through A-10 in Appendix A:

  • Any necessity (via agent location and/or network connectivity) for gateway servers
  • The implementation and placement of the web console feature
  • The number of management servers needed, plus their resource requirements, fault tolerances, and hardware requirements

Repeat this step for each management group needed.

This information will be used during operational and ACS database sizing and placement. The design for Operations Manager server roles will also be considered during the design of network connections.

Additional Reading

  • "Deploying System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh278852.aspx
  • "System Requirements for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh205990.aspx
  • System Center 2012 - Operations Manager Sizing Helper Tool v1: www.microsoft.com/download/en/details.aspx?displaylang=en&id=29270
  • "System Center 2012 - Operations Manager SDK": http://msdn.microsoft.com/en-us/library/hh329086.aspx
  • Infrastructure Planning and Design Guide for Windows Server 2008 and Windows Server 2008 R2 Active Directory Domain Services: http://go.microsoft.com/fwlink/?LinkId=157704
  • "Security Considerations": http://technet.microsoft.com/en-us/library/hh487288.aspx
  • "System Center: Operations Manager Engineering Blog - Event, Alerts, Perf Data Flow in OpsMgr 2007": http://blogs.technet.com/b/momteam/archive/2007/10/30/event-alerts-perf-data-flow-in-opsmgr-2007.aspx
  • "How and when is data written (or synchronized) to the Data Warehouse?": http://blogs.technet.com/b/jonathanalmquist/archive/2010/01/24/how-and-when-is-data-written-or-synchronized-to-the-data-warehouse.aspx
  • Infrastructure Planning and Design Guide for Internet Information Services 7.0 and Internet Information Services 7.5: http://go.microsoft.com/fwlink/?LinkId=157703
  • Infrastructure Planning and Design Guide for Windows Server 2008 and Windows Server 2008 R2 File Services: http://go.microsoft.com/fwlink/?LinkId=160976
  • "Using Multiple Gateway Servers": http://technet.microsoft.com/en-us/library/hh212790.aspx
  • "How to Configure a Gateway Server to Failover Between Multiple Management Servers": http://technet.microsoft.com/en-us/library/hh212904.aspx
  • "How to Configure Agent Failover to Multiple Gateway Servers": http://technet.microsoft.com/en-us/library/hh212733.aspx
  • "How does the failover process work in OpsMgr 2012? (#SCOM, #SYSCTR)": http://blogs.catapultsystems.com/cfuller/archive/2012/06/05/how-does-the-failover-process-work-in-opsmgr-2012-scom-sysctr.aspx

Step 4: Design the Operational Database

Step 3 saw the design of the Operations Manager management server infrastructure. The goal of Step 4 is to create infrastructure designs for the operational database. The outputs of this step are infrastructure design hardware specifications and fault-tolerance configurations for the operational database. This information will be used to design the network connections for the Operations Manager infrastructure.

This step is important because operational database design has a direct bearing on console performance, and an inadequately sized ACS database infrastructure will result in queuing at the ACS collector, causing delays in insertion of Security Event Log events and denial of connections from ACS forwarders. This step should be repeated for each management group identified earlier.

Note SQL Server 2008 is available in both Standard and Enterprise editions. Operations Manager will function with both editions.

Note Operations Manager does not support hosting its databases or SQL Server Reporting Services on a 32-bit edition of SQL Server.

Note Using a different version of SQL Server for different Operations Manager features is not supported. The same version should be used for all features.

Task 1: Determine Resource Requirements for Operational Database Server

The operational database contains the configuration for the management group as well as all the recent operational data (event, alert, performance, and state data) collected from agent computers. Performance of the operational database role is one of the primary determinants in the performance of the Operations console. The Operations Manager engineering team recommends periodic grooming of the database to maintain acceptable performance levels. For more information, see "How to Configure Grooming Settings for the Operations Manager Database" at http://technet.microsoft.com/en-us/library/hh230753.aspx.

The operational database size and load are based on two primary factors:

  • The rate of data collection, which varies by the number of monitored devices and the management packs deployed.
  • The rate of instance space change, which is the rate of change for the data that Operations Manager maintains to describe all the monitored computers, services, and applications in the management group. Updates to this data are expensive (in terms of performance) compared to writing new operational data.

The System Center 2012 - Operations Manager Sizing Helper tool, available for download at www.microsoft.com/download/en/details.aspx?displaylang=en&id=29270, can estimate the size of the database based on these inputs:

  • Number of days for data retention
  • Number of server computers
  • Number of network devices
  • Number of APM-enabled computers

Additionally, the tool provides estimates of the IOPS (input/output per second) for the operational database.

Table 2. DB Estimated Random IO per Second for Maximum Load Configuration (Assuming 80% Write, 20% Read)

Number of agents

Estimated IOPS

1-500

250

501-1,000

500

1,001-3,000

750

3,001-6,000

1,125

6,001-10,000

1,250

10,001-15,000

1,500

Because application performance monitoring in Operations Manager is Health service-based, scale and load put on the Operations Manager database is an important factor to consider.

Document the resource requirements for the operational database server in Table A-11 in Appendix A.

Task 2: Apply the Fault-Tolerance Requirements

The operational database contains the configuration of the management group and all operational data used to populate the Operations console. Because there is only one operational database in a management group, it must be available for the management group to function. Fault-tolerant configurations can be used to provide service data redundancy to eliminate the operational database as a single point of failure.

Fault-tolerance options for this database are:

  • Clustering. The cluster can sustain failure of one server and failover to the remaining server without user intervention, resulting in only a brief interruption. Only active-passive cluster configurations are supported. A server cluster does not provide data redundancy, because only one instance of the Operations Manager database is present on the server.
  • SQL Server log shipping. Log shipping provides data redundancy. It is the process of automating the backup of database and transaction log files on a production SQL Server computer, and then restoring them onto a standby server.

While the Operations Manager database can be recovered, the process is not completely automatic. Management server settings must be updated and services restarted on each management server to redirect them to the standby copy of the Operations Manager database after it is online. This could be scripted, but would involve some effort.

Record the information in Table A-11 of Appendix A.

Task 3: Determine the Hardware Configuration

The goal of this task is to determine the most appropriate type of hardware on which to deploy the operational database servers.

The bare minimum configuration in the Sizing Helper tool has the operational database, data warehouse, web console server, and SQL Server Reporting Services server co-located, on this configuration:

  • 8 disks as RAID 10 (Data) (300 GB)
  • 2 disks as RAID 1 (Log)
  • 16 GB of RAM
  • 4 processor cores

Microsoft supports running all Operations Manager server features in any physical or virtual environment that meets the minimum requirements.

Use the Operations Manager Sizing Helper tool at http://go.microsoft.com/fwlink/?LinkId=231853 to determine hardware requirements for each Operations Manager server feature. If multiple features will be installed on the same computer, use the higher of the recommended hardware requirements for any of the combined features.

Document the selected hardware configuration for servers in Table A-11 in Appendix A.

Step Summary

After completing this step, the following design information was recorded in Table A-11 in Appendix A:

  • An infrastructure design, fault-tolerance requirements, and hardware configuration for the operational database

This information will be used during design of the network connections.

Additional Reading

  • "Deploying System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh278852.aspx
  • "System Requirements for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh205990.aspx
  • "Clarification on SQL Server Collation Requirements for System Center 2012": http://blogs.technet.com/b/servicemanager/archive/2012/05/24/clarification-on-sql-server-collation-requirements-for-system-center-2012.aspx
  • "How to Configure Grooming Settings for the Operations Manager Database": http://technet.microsoft.com/en-us/library/hh230753.aspx

Step 5: Design the Data Warehouse and Reporting Server

Step 4 established the design of the operational database. In Step 5, data collected in Step 1 is used to design the data warehouse and the Operations Manager reporting server.

Operations Manager Reporting uses the following databases:

  • Operations Manager data warehouse (data warehouse database)
  • SQL Server Reporting Services databases (ReportServer and ReportServerTempDB)

Reporting stores monitoring and alerting data, aggregating the performance data on an hourly and daily basis. It enables reporting of long-term trends that the operational database cannot deliver because the operational database quickly fills with records from individual events, and so must be regularly groomed. The data warehouse used for reporting can also receive data from multiple management groups, which enables an aggregated view across resources in different management groups.

Failure to plan for reporting can result in a significant failure on the part of IT to meet the needs of the organization, as well as its own needs in troubleshooting and forecasting activities.

This step will determine:

  • Whether the data warehouse will be used to consolidate reporting data across management groups.
  • Projected database size based on choices for the period.
  • Redundancy requirements for Operations Manager Reporting.
  • Server size and role distribution based on database size and the architectural guidance.

The output of this step is the detailed infrastructure design for Operations Manager Reporting, including server hardware specifications, role distribution, and any fault-tolerance configurations. This data will be used in the implementation phase to build the Operations Manager Reporting infrastructure.

Note SQL Server 2008 is available in both Standard and Enterprise editions. Operations Manager will function with both editions.

Note Operations Manager does not support hosting its databases or SQL Server Reporting Services on a 32-bit edition of SQL Server.

Note Using a different version of SQL Server for different Operations Manager features is not supported. The same version should be used for all features.

Note Side-by-side installation of System Center Operations Manager 2007 R2 Reporting and System Center 2012 - Operations Manager Reporting on the same server is not supported.

Task 1: Determine the Data Consolidation Strategy Across Management Groups

The data warehouse can be used to store data from different management groups; it can then provide reports on resources across those management groups.

Refer to the reporting requirements established in Step 1 and to the management group design generated in Step 2 to determine whether reporting is required across management groups and, if so, across which groups. Use this information to create a row for each data warehouse that will be required in Table A-12 in Appendix A.

Proceed through the remaining tasks for each data warehouse instance.

Task 2: Determine Data Retention Requirements

The goal of this task is to figure out how long reporting data needs to be kept.

To identify the appropriate retention period, determine how far into the past (weeks or months) data is of interest to business units, as well as to IT. There may be regulatory requirements that dictate how long data must be stored; these must be determined by consulting with departments responsible for regulatory compliance.

The output of this task is the required retention period for data housed in the reporting data warehouse. This will be used as input to design the data warehouse in the next task.

Record the information in Table A-12 in Appendix A.

Task 3: Determine Resource Requirements

It is possible to estimate the size of the data warehouse based on the data retention requirements documented in Task 2 and the number of devices in scope for monitoring. The System Center 2012 - Operations Manager Sizing Helper tool at www.microsoft.com/download/en/details.aspx?displaylang=en&id=29270 can estimate the size of the database based on these inputs:

  • Number of days for data retention
  • Number of server computers
  • Number of network devices
  • Number of APM-enabled computers

Additionally, the tool provides estimates of the IOPS for the data warehouse.

Table 3. Data Warehouse Estimated Random IO per Second for Maximum Load Configuration (Assuming 80% Write, 20% Read)

Number of agents

Estimated IOPS

1-500

500

501-1,000

875

1,001-3,000

1,000

3,001-6,000

1,500

6,001-10,000

2,000

10,001-15,000

2,500

The Operations Manager engineering team recommends periodic grooming of the database. For more information, see "How to Configure Grooming Settings for the Reporting Data Warehouse Database" at http://technet.microsoft.com/en-us/library/hh212806.aspx.

The output of this tool is the estimated amount of storage required to contain the data in the data warehouse. Record the information in Table A-12 in Appendix A. This will be used to design the hardware required.

Task 4: Apply the Fault-Tolerance Requirements

Based on data collected in Step 1, determine the fault-tolerance requirements for reporting. To improve the fault tolerances for Operations Manager Reporting, specify the following:

  • The data warehouse database deployed on a single active-passive cluster, with no other Operations Manager features permitted to be installed on the cluster or nodes of the cluster, is the supported and recommended configuration. Co-locating the data warehouse with other databases is supported but not recommended because of potential performance issues with SQL Server. For more information, see "Supported Cluster Configurations" at http://technet.microsoft.com/en-us/library/jj656649.aspx#BKMK_ClusterConfig.

The output of this task is the fault-tolerance strategy for Operations Manager Reporting. Record the decisions in Table A-12 in Appendix A.

Task 5: Determine Hardware Configuration

The goal of this task is to determine the most appropriate type of hardware on which to deploy the following servers:

  • The SQL Server-based server used to host the data warehouse
  • The SQL Server Reporting Services-based server used to create and deliver reports from the data warehouse

The Sizing Helper tool referenced previously can give an estimate of the RAID disks needed to provide the appropriate capacity and performance for the data warehouse database based on the estimated IOPS and estimated disk space from Task 3. However, this is only a recommendation.

Use the requirements determined in Step 1 to decide how many reporting users will be on the system concurrently and whether reports will be run on demand during peak hours or automatically published during off-peak hours. This will provide insight into whether the read load of reporting will run at the same time as the maximum write load into the data warehouse.

Refer to the Infrastructure Planning and Design Guide for Microsoft SQL Server 2008 and SQL Server 2008 R2 at http://go.microsoft.com/fwlink/?LinkId=163302 and the Operations Manager Sizing Helper at http://go.microsoft.com/fwlink/?LinkId=231853. Then select a hardware configuration for the servers running SQL Server for the data warehouse and the SQL Server Reporting Services.

Microsoft supports running all Operations Manager server features in any physical or virtual environment that meets the minimum requirements. For more information, see the "Operations Manager Virtualization Support" sections in "System Requirements for System Center 2012 - Operations Manager" at http://technet.microsoft.com/en-us/library/hh205990.aspx#BKMK_Virtualization.

Document the selected hardware configuration in Table A-12 in Appendix A, and then proceed to the next step.

Step Summary

In this step, data collected in Step 1 was used to estimate growth and size of the reporting data warehouse.

After completing this step, the following design decisions regarding detailed infrastructure design for Operations Manager reporting, including server hardware specifications, role distribution, and any fault-tolerance configurations such as clustered SQL Server-based computers, were recorded in Table A-12 in Appendix A:

  • Determination of projected database size based on database growth estimates and free space requirements for a given retention period
  • Determination of redundancy requirements for Operations Manager reporting

This data will be used in the implementation phase to build the Operations Manager reporting infrastructure.

Additional Reading

  • "Configuring Reporting Services for Scale-Out Deployment": http://msdn.microsoft.com/en-us/library/ms156453.aspx
  • "Planning the System Center 2012 - Operations Manager Deployment": http://technet.microsoft.com/en-us/library/hh473583.aspx
  • "How to Configure Grooming Settings for the Reporting Data Warehouse Database": http://technet.microsoft.com/en-us/library/hh212806.aspx
  • "Supported Cluster Configurations": http://technet.microsoft.com/en-us/library/jj656649.aspx#BKMK_ClusterConfig
  • Infrastructure Planning and Design Guide for Microsoft SQL Server 2008 and SQL Server 2008 R2: http://go.microsoft.com/fwlink/?LinkId=163302
  • System Requirements for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh205990.aspx#BKMK_Virtualization

Step 6: Design the ACS Database Server

If it was decided in Step 2 that a separate management group should be implemented for ACS monitoring, then ACS will require dedicated management servers. Step 3 designed the management servers for either non-dedicated or dedicated management groups. Step 6 should be repeated for each ACS database required.

Using ACS, organizations can consolidate individual security logs into a centrally managed database and can filter and analyze events using the data analysis and reporting tools provided by Microsoft SQL Server.

ACS requires the following components:

  • ACS forwarders. The service that runs on ACS forwarders is included in the Operations Manager agent. By default, this service is installed but not enabled when the Operations Manager agent is installed. After enabling this service, all security events will be sent to the ACS collector in addition to the local security log. All computers from which security events should be captured must be ACS forwarders.

The number of ACS forwarders that can be supported by a single ACS collector and ACS database can vary because it depends on the number of events generated by the audit policy, the role of the computers that the ACS forwarders monitor (such as domain controller versus member server), the level of activities on the computer, and the hardware on which the ACS collector and ACS database run.

  • ACS collector. The ACS collector receives and processes events from ACS forwarders and then sends this data to the ACS database. This processing includes disassembling the data so that it can be spread across several tables within the ACS database, minimizing data redundancy, and applying filters so that unnecessary events are not added to the ACS database. The Operations Manager ACS collector service is installed on an Operations Manager management server. Refer to the design decisions made in Steps 2 and 3 as to whether ACS will have dedicated management servers or be co-located with the operational aspects.

Each ACS collector must have its own ACS database.

  • ACS database. The ACS database is the central repository for events generated by an audit policy within an ACS deployment. The ACS database can be located on the same computer as the ACS collector, but for best performance, install the database on a dedicated server.

When an organization has either large numbers of computers with audit requirements, aggressive security audit policies, or both, this database role will be very busy and the database can grow quickly.

Task 1: Determine Scaling

No specific scaling limits are available for when to add more ACS collectors. The number of ACS forwarders that can be supported by a single ACS collector and ACS database can vary, depending on the following factors:

  • The number of events that the audit policy generates
  • The role of the computers that the ACS forwarders monitor (such as domain controller versus member server)
  • The level of activities on the computer
  • The hardware on which the ACS collector and ACS database run

If the environment contains too many ACS forwarders for a single ACS collector, install additional ACS collectors as needed. Each ACS collector must have its own ACS database.

For more information on the number of ACS forwarders that an ACS collector can support, see "Collecting Security Events Using Audit Collection Services in Operations Manager" at http://technet.microsoft.com/en-us/library/hh212908.aspx.

Record the scaling information in Table A-13 in Appendix A.

Task 2: Determine Resource Requirements for ACS Database

The goal of this task is to determine the resource requirements for the ACS database role. Because the ratio of ACS collector servers to ACS database servers is 1:1, the number of ACS databases required is understood based on the decisions reached in Step 6.

A process to estimate the requirements for the disk subsystem in terms of storage space, disk performance, and physical disks needed to meet the expected load has been discussed on the Operations Manager Engineering Blog at http://blogs.technet.com/momteam/archive/2008/07/02/audit-collection-acs-database-and-disk-sizing-calculator-for-opsmgr-2007.aspx. Although this was written for an older version of Operations Manager, the principles apply to the 2012 version of Operations Manager.

This is based on the number of events per second generated on the computers on which ACS is enabled, along with the number of days data will be retained. The number of events per second can be estimated using the process in the "Designing Audit Collection Services" section of "Mapping Requirements to a Design for Operations Manager 2007" at http://technet.microsoft.com/en-us/library/bb735402.aspx. The information in this article is still applicable to the 2012 version of Operations Manager.

Additional information regarding audit policy configuration and ACS configuration can be found in "Managing Audit Collection Services in Operations Manager 2007" at http://technet.microsoft.com/en-us/library/cc974475.aspx. Although written for an earlier version of Operations Manager, it is relevant for the 2012 version of Operations Manager as well.

The total size of the database can be calculated by the formula:

[Events per second all computers] * [0.4 KB, which is the size of event] * 60 sec * 60 min * 24 hr /1,024 MB /1,024 GB /1,024 TB * [retention period in days]

To reduce the number of events written to the ACS database, change the audit policy to reduce the number of generated events or use filters, applied at the ACS collector, to discard unnecessary events and keep them out of the ACS database. Alternatively, reduce the number of ACS forwarders that send events to the ACS database by deploying an additional ACS collector and database so that fewer ACS forwarders are serviced by each ACS collector.

Record the resource requirements in Table A-13 in Appendix A.

Task 3: Apply the Fault-Tolerance Requirements for ACS Database

The ACS database stores collected audit information and thus may be critical to organizations that need to maintain complete audit data. Fault-tolerant configurations can ensure that recording and access to audit data continues uninterrupted.

Clustering and log shipping are the supported SQL Server fault-tolerance options for ACS. See the Infrastructure Planning and Design Guide for Microsoft SQL Server 2008 and SQL Server 2008 R2 at http://go.microsoft.com/fwlink/?LinkId=163302 for more information on SQL Server fault-tolerance options.

Determine which one of the SQL Server fault-tolerance options will be implemented, and record the answer in Table A-13 in Appendix A.

Task 4: Determine the Hardware Configuration for the ACS Database Servers

The goal of this task is to determine the most appropriate type of hardware on which to deploy the ACS database servers.

Microsoft supports running all System Center 2012 - Operations Manager server features in any physical or virtual environment that meets the minimum requirements.

The number of disks required to handle I/O for the transaction log can be calculated by the formula:

[1.384, which is the average number of disk I/O per event for transaction log] * [Events per second for all computers] / [disk RPM] * 60 sec/minute = [number of required drives] * [2 for RAID 1]

Note Average number of logical disk I/O per event (for the transaction log file) has been calculated by the Operations Manager engineering team to be 1.384.

The number of disks required to handle I/O for the database can be calculated by the formula:

[0.138, which is the average number of disk I/O per event for database file] * [Events per second for all computers] / [drive RPM] * 60 sec/minute = [number of required drives] * [2 for RAID 1]

Note The average number of logical disk I/O per event (for the database file) has been calculated by the Operations Manager engineering team to be 0.138.

The requirements for an ACS collector are as follows:

Table 4. ACS Collector Requirements

Requirements

Minimum

Recommended

RAM

1 GB

2 GB

Processor

1.8 gigahertz (GHz)

2.8 GHz

Hard disk space

10 GB available

50 GB

The requirements for an ACS database are as follows:

Table 5. ACS Database Requirements

Requirements

Minimum

Recommended

RAM

1 GB

2 GB

Processor

1.8 GHz

2.8 GHz

Hard disk space

20 GB available

100 GB

Document the selected hardware configuration for servers in Table A-13 in Appendix A.

Task 5: Determine SSRS Location

ACS reporting can be installed in two configurations:

  • A Microsoft SQL Server Reporting Services (SSRS) instance with Operations Manager reporting already installed. A benefit of this is the ability to view ACS reports in the Operations console
  • An SSRS instance without Operations Manager reporting installed

The installation procedures for ACS reporting do not differ, but the application of access control is different. By deploying ACS reporting on the same SSRS instance as the Operations Manager reporting, the same role-based security applies to all reports. This means that ACS reporting users need to be assigned to the Operations Manager Report Operator Role to access the ACS reports.

In addition to membership in the Operations Manager Reporting Role, ACS report users must also be assigned a db_datareader role on the ACS database to run ACS reports. This requirement is independent of the presence of Operations Manager Reporting.

If ACS reporting is installed independently of Operations Manager Reporting, SSRS security can also be used to secure the reports. For more information, see "Setting Permissions in Reporting Services" in the SQL Server 2008 R2 Books Online tutorial at http://msdn.microsoft.com/en-us/library/aa337491(v=sql.105).

Determine whether to install ACS reporting with the Operations Manager Reporting or in a separate SSRS instance without Operations Manager Reporting installed, and record the answer in Table A-13 in Appendix A.

Step Summary

After completing this step, the following design information was recorded in Table A-13 in Appendix A:

  • An infrastructure design, fault-tolerance requirements, and hardware configuration for the ACS database
  • Whether to install SSRS with or without Operations Manager already installed

This information will be used during design of the network connections.

Additional Reading

  • "Deploying System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh278852.aspx
  • "System Requirements for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh205990.aspx
  • "Audit Collection Services Capacity Planning": http://technet.microsoft.com/en-us/library/hh212872.aspx
  • "Collecting Security Events Using Audit Collection Services in Operations Manager": http://technet.microsoft.com/en-us/library/hh212908.aspx
  • Operations Manager Engineering Blog: http://blogs.technet.com/momteam/archive/2008/07/02/audit-collection-acs-database-and-disk-sizing-calculator-for-opsmgr-2007.aspx
  • "Mapping Requirements to a Design for Operations Manager 2007": http://technet.microsoft.com/en-us/library/bb735402.aspx
  • "Managing Audit Collection Services in Operations Manager 2007": http://technet.microsoft.com/en-us/library/cc974475.aspx
  • Infrastructure Planning and Design Guide for Microsoft SQL Server 2008 and SQL Server 2008 R2: http://go.microsoft.com/fwlink/?LinkId=163302
  • "Setting Permissions in Reporting Services": at http://msdn.microsoft.com/en-us/library/aa337491(v=sql.105)

Step 7: Design the Notification System

In Step 6, the ACS database was designed. The goal of Step 7 is to design the infrastructure to provide timely notification of alerts that require attention by operations staff members, even when they are not logged on to the Operations console. This step is crucial to ensuring that IT support staff members are notified, even if parts of the infrastructure become unavailable. Work through this step for each management group.

In this step, data collected on resources in scope for monitoring is used to identify which channels are necessary to ensure notification in a variety of circumstances. Additionally, the requirements from Step 1 are used to assess the need for redundancy in the notification infrastructure.

The output of this step is a design for the notification interface to the management servers. This will be used to determine necessary infrastructure additions to the Operations Manager environment to support the organization's alert notification requirements.

Task 1: Determine the Required Notification Channels

This task involves deciding which notification channels to use to meet the organization's needs. Effective planning will ensure that alert notifications are delivered in a timely manner and in an easily consumable format.

To identify the infrastructure necessary for notification delivery, use the requirements from Step 1 to select which notification channels are appropriate for the organization. The following notification channels are available in Operations Manager:

  • Email. Uses any Simple Mail Transfer Protocol (SMTP) server, such as Microsoft Exchange Server, Windows Server, or a third-party server, to deliver alert notifications by email.
  • Instant message. Uses a Session Initiation Protocol (SIP) server, such as Microsoft Lync® Server, to deliver alert notifications by instant message.
  • Short Message Service. Uses a global system for mobile (GSM) communications modem to deliver alert notifications. (When determining whether to use a GSM modem in a data center environment, be sure to validate that an adequate signal is available before determining the best solution for the organization.)
  • Command. Executes response through a Windows Command Prompt window. (This can be used to execute any number of command-line utilities, including programs that can run scripts capable of performing complex notifications that cannot be performed by any other method.)

Record the selected notification channels in Table A-14 in Appendix A. After these have been selected, the appropriate fault-tolerance strategy can be designed.

Task 2: Determine the Fault-Tolerance Strategy in Notifications

The goal of this task is to establish the fault-tolerance strategy for the infrastructure used in delivering Operations Manager alert notifications. Notifications are generated by the management servers and then flow through the notification channel (email, instant message, Short Message Service). Any one of these can be a single point of failure.

Fault tolerance in notification can be achieved using the following techniques in combination with each other:

  • Provide redundancy in the link from the management servers to the notification channel. The network link to the notification channel can be set to fail over to an alternate link in the event of a problem.
  • Provide redundancy within the notification channel. The email and command channels are the only notification channels that allow redundant configuration. For email notification, this requires purchasing additional hardware if existing servers cannot be configured to function as SMTP servers.
  • Use multiple notification channels. By using multiple channels (such as email and Short Message Service), notifications will still be received in the event one channel is unavailable, such as in case of an Exchange Server outage.

Note   Configuring multiple SMTP servers does not guarantee timely notification in the event of a Microsoft Exchange Server messaging issue. For example, if Internet connectivity fails, notifications will be queued on the Exchange Server-based computer. The best way to guarantee notification when email is down is through use of multiple notification channels.

Record the fault-tolerance strategy in Table A-14 in Appendix A. This data will be used in the implementation phase to make hardware purchases, if necessary, as well as in configuring the notification channels during installation.

Step Summary

After completing this step, the following design information was recorded in Table A-14 in Appendix A:

  • Which channels will be used for notification delivery
  • What fault tolerance in the infrastructure will be used to deliver the notifications

This information will be used as a design for the notification infrastructure to be used by Operations Manager.

Additional Reading

"Subscribing to Alert Notifications": http://technet.microsoft.com/en-us/library/hh212725.aspx

Step 8: Design the Network Connections

Step 7 saw the design of the notification system. The goals of this final step are to ensure that network connectivity between server roles and between server roles and agents is sufficient in terms of network bandwidth and that the required firewall rules are in place to allow traffic to flow as necessary.

In this step, the following data will be used to determine network bandwidth and network port requirements:

  • Data collected in Step 1 on network topology and inventory of resources in scope for monitoring
  • The management packs that will be deployed, as determined in Step 1
  • Server role distribution, determined in Step 3
  • Database role distribution, decided in Step 4
  • Reporting infrastructure requirements from Step 7

This information is used in the implementation phase of the project to determine necessary changes in network firewalls as well as any network links requiring additional network bandwidth to support the minimum requirements of Operations Manager.

Task 1: Determine Where Additional Bandwidth Is Required

The goal of this task is to identify and record the network bandwidth required, as well as the bandwidth available, between each of the Operations Manager components.

To make these determinations, perform the following steps and record the information in Table A-15 in Appendix A:

  1. For each of the Operations Manager components established earlier, map the connections between the roles and the bandwidth requirements of each, using the information in the "Minimum Network Connectivity Speeds" sections in "System Requirements for System Center 2012 - Operations Manager" at http://technet.microsoft.com/en-us/library/hh205990.aspx#BKMK_NetworkConnectivity.
  2. Measure the available bandwidth on these connections by requesting the average available bandwidth during peak usage periods.
  3. Compare the required bandwidth against the available bandwidth to determine whether additional bandwidth will be required.

Task 2: Determine Network Port Requirements

The goal of this task is to map the supported firewall scenarios to the locations of the roles to identify the network ports that must be opened. The network ports required for communication depend on the placement of server roles throughout the network.

To determine firewall port requirements, review server placement decisions and compare them with the port requirements in the "Operations Manager Feature Firewall Exceptions" sections in "System Requirements for System Center 2012 - Operations Manager" at http://technet.microsoft.com/en-us/library/hh205990.aspx#BKMK_FeatureFirewallException.

Record the network port requirements in Table A-15 in Appendix A.

Step Summary

After completing this step, tables containing the bandwidth requirements between physical network sites and the network ports that must be opened through network firewalls were recorded in Table A-15 in Appendix A.

This information will be used in the implementation phase of the project to identify necessary changes in network firewalls as well as any network links requiring additional network bandwidth to support the minimum requirements of Operations Manager.

Additional Reading

  • "Index to Security-Related Information for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh771596.aspx
  • "System Requirements for System Center 2012 - Operations Manager": http://technet.microsoft.com/en-us/library/hh205990.aspx.
  • System Center Central home page: www.systemcentercentral.com
  • "Agent and Agentless Monitoring": http://technet.microsoft.com/en-us/library/hh487284.aspx

Conclusion

This guide has summarized the critical design decisions, activities, and tasks required to enable a successful design of System Center 2012 - Operations Manager. It focused on decisions involving:

  • The resources to be monitored by Operations Manager
  • The number of management groups
  • The infrastructure necessary to employ Operations Manager to monitor the selected resources
  • The server roles, role placement, databases, and connectivity of the Operations Manager infrastructure
  • The design and resources of the data warehouse, reporting server, and notification system
  • The placement and port requirements for the network connections

This was done by leading the reader through the eight steps in the decision flow to arrive at a successful design. Where appropriate, the decisions and tasks have been illustrated with typical usage scenarios.

The guide has discussed the technical aspects, service characteristics, and business requirements needed to complete a comprehensive review of the decision-making process.

As stated in the introduction, it is very important at the start of an Operations Manager project to have a full understanding of the business objectives for the project:

  • What benefits does the business expect to achieve through the use of resource monitoring?
  • What is the value of those benefits, and therefore the cost case for using Operations Manager to deliver those benefits?

The business objectives should be prioritized at the start of the project so that they are clearly understood and agreed upon between IT and the business.

When an architecture has been drafted, limited "pilot" tests should be conducted before a major rollout begins so that lessons can be learned and incorporated back into the design.

This guide, when used in conjunction with the Operations Manager product documentation, allows organizations to confidently plan the implementation of Operations Manager.

Additional Reading

  • "Recommendations for Daily, Weekly, and Monthly Operations Manager Tasks": http://technet.microsoft.com/en-us/library/hh212937.aspx

Appendix A: Job Aids

Step 1. Use Table A-1 to record the answers asked of the business to determine the features and scope of the project.

Table A-1. Business Requirements

Question

Answer

Which business services are in scope?

 

Provide a description of a sample transaction.

 

Assess the impact of the failure on the business.

 

Is long-term data collection required for capacity planning, performance tracking, or trend analysis?

 

What are the availability requirements for the monitoring infrastructure?

 

Does the organization have any external or internal requirements for security auditing?

 

If so, has the organization implemented a security auditing solution that satisfies these requirements?

 

Step 1. Use Table A-2 to record the answers asked of the technical decision makers to determine the scope and requirements of the project.

Table A-2. Technical Requirements

Question

Answer

Will Microsoft System Center Virtual Machine Manager be used, with reporting enabled?

 

What native management packs (if any) will be used?

 

What third-party management packs (if any) will be used?

 

What custom management packs (if any) will be used? How will they be procured?

 

What Orchestrator integration packs (if any) will be used?

 

Step 1. Use Table A-3 to record the answers asked of the business to determine the capacity requirements of the project.

Table A-3. Capacity Requirements

Type

Number

Location

Agent-monitored computers

   

Agentless Exception Monitoring (AEM) computers

   

Agentless-managed computers

   

UNIX- or Linux-based computers

   

Network devices

   

.NET applications monitored via Application Performance Monitoring (APM)

   

Step 2. Use Table A-4 to record one management group. Add others only if needed for specific scenarios.

Table A-4. Determining the Need for Additional Management Groups

Scenario

Yes/no

Number

Scaling

   

Agents separated from their management server by WAN-speed network links

   

Political, administrative or security requirements within the organization requiring separate management groups

   

A view of AD DS topology required across multiple forests

   

A dedicated management group required for auditing purposes

   

Disaster recovery functionality required

   

Consolidated views of connected management groups required in Operations Manager

   

Operations Manager integration with the VMM console

   

Step 3. Decide the number of management servers required, and then for each server, document the applicable information in Table A-5.

Table A-5. Management Servers

Factor

Yes/no

Number

Are scale limits exceeded?

   

Will Agentless Exception Monitoring be used?

   

If so, will it be on a dedicated server?

   

Will Audit Collection Services be used in the management group?

   

If so, will they warrant additional servers?

   

Will network monitoring be allotted its own management servers?

   

Step 3. Decide whether the web console feature will be used in the management group. If so, decide whether it will be implemented on one or more dedicated servers or on an existing management server. Record the answers in Table A-6.

Table A-6. Web Consoles

Question

Answer

Will the web console feature be used in the management group?

 

If so, will it be implemented on one or more dedicated servers, or be implemented on an existing management server?

 

If implemented on one or more dedicated servers, how many will it be implemented on?

 

Step 3. Determine whether agent location or network connectivity indicates gateway servers are required, and if so, which management servers they will connect to. Record the answers in Table A-7.

Table A-7. Gateway Servers

Factor

Yes/no

Number

Does agent location warrant gateway servers?

   

Does network connectivity warrant gateway servers?

   

Is scaling a consideration?

   

Which management groups will they connect to?

Step 3. If it was determined that AEM will be implemented, determine the storage requirements for AEM and document the findings in Table A-8.

Table A-8. Resource Requirements for AEM

Question

Answer

What are the storage requirements for AEM?

 

Step 3. Document fault tolerance in Table A-9.

Table A-9. Fault Tolerance for Management Servers

Question

Answer

What fault-tolerance measures have been determined for regular management servers?

 

What fault-tolerance measures have been determined for specialized gateway servers?

 

What fault-tolerance measures have been determined for web console servers?

 

What fault-tolerance measures have been determined for AEM file share?

 

Step 3. Use Table A-10 to record the Operations Manager management server hardware configuration.

Table A-10. Operations Manager Management Server Hardware Configuration

Server component

Hardware configuration

Regular management server

 

Gateway server

 

Web console server

 

AEM file share

 

Step 4. Use Table A-11 to record the resource requirements, fault tolerance, and hardware configuration for the operational database server.

Table A-11. Operational Database Server Requirements

Requirements

Operational database server

Resource requirements

 

Fault tolerance

 

Hardware configuration

 

Step 5. Use Table A-12 to record the data consolidation strategy, data retention requirements, resource requirements, fault tolerance, and hardware configuration for the data warehouse and reporting server.

Table A-12. Data Warehouse and Reporting Server Requirements

Requirements

Data warehouse and reporting server

Data consolidation strategy

 

Data retention

 

Resources

 

Fault tolerance

 

Hardware configuration

 

Step 6. Use Table A-13 to record the scaling, resource requirements, fault tolerance, hardware configuration, and SSRS location for the ACS database server.

Table A-13. ACS Database Server Requirements

Requirements

ACS database server

Scaling

 

Resources

 

Fault tolerance

 

Hardware configuration

 

SSRS location

 

Step 7. Use Table A-14 to record the selected notification channel and fault tolerance strategy for the notification system.

Table A-14. Notification System Requirements

Notification type

Notification system requirements

Notification channel

 

Fault-tolerance strategy

 

Step 8. Use Table A-15 to record where additional bandwidth will be required and the port requirements for the network connections.

Table A-15. Network Connections Requirements

Bandwidth and port

Network connections requirements

Where will additional bandwidth be required?

 

Port requirements

 

Appendix B: IPD in Microsoft Operations Framework 4.0

Microsoft Operations Framework (MOF) 4.0 offers integrated best practices, principles, and activities to assist an organization in achieving reliable solutions and services. MOF provides guidance to help individuals and organizations create, operate, and support technology services, while helping to ensure the investment in technology delivers expected business value at an acceptable level of risk. MOF's question-based guidance helps to determine what is needed for an organization now, as well as providing activities that will keep the organization running efficiently and effectively in the future.

Use MOF with IPD guides to ensure that people and process considerations are addressed when changes to an organization's technology services are being planned.

  • Use the Plan Phase to maintain focus on meeting business needs, consider business requirements and constraints, and align business strategy with the technology strategy. IPD helps to define an architecture that delivers the right solution as determined in the Plan Phase.
  • Use the Deliver Phase to build solutions and deploy updated technology. In this phase, IPD helps IT pros design their technology infrastructures.
  • Use the Operate Phase to plan for operations, service monitoring and control, as well as troubleshooting. The appropriate infrastructure, built with the help of IPD guides, can increase the efficiency and effectiveness of operating activities.
  • Use the Manage Layer to work effectively and efficiently to make decisions that are in compliance with management objectives. The full value of sound architectural practices embodied in IPD will help deliver value to the top levels of a business.

Figure B-1. The architecture of Microsoft Operations Framework (MOF) 4.0

Appendix C: System Center 2012 - Operations Manager in Microsoft Infrastructure Optimization

The Infrastructure Optimization (IO) Model at Microsoft groups IT processes and technologies across a continuum of organizational maturity. (For more information, see http://go.microsoft.com/fwlink/?LinkId=229236.) The model was developed by industry analysts, the Massachusetts Institute of Technology (MIT) Center for Information Systems Research (CISR), and Microsoft's own experiences with its enterprise customers. A key goal for Microsoft in creating the Infrastructure Optimization Model was to develop a simple way to use a maturity framework that is flexible and can easily be applied as the benchmark for technical capability and business value.

IO is structured around three information technology models: Core Infrastructure Optimization, Application Platform Optimization, and Business Productivity Infrastructure Optimization. According to the Core IO Model, Operations Manager 2012 can be used to move an organization from a Standardized to a Dynamic level of maturity. At a Standardized level, monitoring may occur for 80 percent or more of critical servers. At the Rationalized level, service level agreement (SLA) monitoring of mission-critical servers and IT service level reporting would occur. A Dynamic level of maturity requires service level monitoring of desktops, servers, and applications.

Figure C-1. Mapping of System Center 2012 - Operations Manager technology into the Core Infrastructure Optimization Model