The Essential DBA Playbook for Optimized SQL Server Management

Whether you've just started as a DBA at a new company or you've been with your current company for years, our challenge is the same: not having the optimal hardware, software, workflow or culture you need to run a smooth environment. In fact, it's often quite the opposite: the department may have poor security practices, nonexistent disaster recovery plans, poor documentation and substandard hardware or network configuration. Then there are the cultural or organizational challenges: poor management, under-resourced departments or poorly thought-out processes.

While it would be impractical to formulate a strategy for every possible situation, you can put in place a handful of strategies to overcome panic and methodically focus on issues within your control.

What if you could learn tips to get the most value from your SQL Server deployments?

What if you could increase the efficiency of your database servers to free up system resources for other tasks such as business reports or ad hoc requests?

This e-book offers ways you can implement the right approach to tuning and effective backup and recovery procedures to ensure your SQL application workload running on your database servers is executing as quickly and efficiently as possible.

Read on to see how to thrive in the new database world.

Assess Your SQL Server Environment

Today's DBAs are stuck in firefighting mode, trying to keep up with the continued growth of SQL Server in their IT environment.

Without a triage framework for prioritizing database issues, you'll be stuck in this state of perpetual near-panic — and you'll be more likely to compound or misevaluate problems.

To avoid this problem, whenever you take on any new database management assignment, begin by assessing the overall condition of the SQL Server environment. Start organizing your efforts as follows:

  1. Take inventory.
  2. Prioritize issues by severity.
  3. Identify required resources.
  4. Assess security measures for protecting information and the environment.

Take inventory

How many instances and databases in the enterprise are you managing?

An up-to-date inventory of the hardware and systems in your purview is critical. This helps identify the areas you need to address.

Things you need to know:

  • The total inventory of the databases on your network and the amount of data they contain
  • Database history, including installation and creation dates
  • Which SQL Server versions and service packs are deployed
  • Server, instance and operating system names
  • Who your end users are
  • Which databases are being backed up and how frequently — and which databases aren't being backed up at all
  • The retention periods for backups

This information becomes the framework that locates and evaluates trouble spots in your infrastructure. Equipped with a wider view of the trouble spots, you can start to identify and prioritize problem areas.

Equipped with a wider view of the trouble spots, you can start to identify and prioritize problem areas.

Prioritize issues by severity

With your framework in hand, you can determine the severity of issues and assign priorities. Analyze your environment with the big three resources in mind: disk I/O, memory and CPU. The capacity and performance of these resources will delineate what you can and cannot do in terms of optimizing SQL Server performance.

Identify and prioritize the most critical handful of issues in terms of environment performance, security or (in extreme cases) viability. It will help you if you think of database performance through the lens of the business rather than as a purely technological issue.

Once your systems are ranked in order of importance, rank them again in order of the severity of issues. For help, use SQL Server's error management system, which assigns severity levels from zero to 24, from mere informational messages to existential threats.

Identify required resources

Once you've captured tactical data and prioritized which issues represent the biggest and most immediate threats to business operations, you need to identify which resources are required to stabilize the SQL Server environment in the short term. Possible resources include technical expertise, software tools, server capacity and backup storage. Ask questions like the following:

  • If backups or database performance are an issue, what steps can be taken and roughly when can a long-term fix be identified?
  • Is the environment sufficiently secure to protect information?
  • If additional resources are not available, what can you do with existing resources?
  • Does the organization understand the risk?

The easiest way to proceed is to identify issues you can fix with the hardware you have on hand, and then categorize the resources you need to add, such as a solid-state drive, another disk cabinet for a SAN or extra CPUs.

Once you've identified resource demands, communicate those needs to your superior and appropriate colleagues. We discuss communication strategy in more detail in chapter 3, but at a minimum, it's important to convey a few key points:

  • Issues that you have identified, prioritized and justified with actionable information — your interpretation of relevant data, tailored to your internal audience
  • The impact to the business if these issues are not addressed
  • Which issues you can address with resources at hand, and which will require additional resources

Your triage report should show that you have methodically assessed the situation and its importance. Most SQL Server databases have problems, but not all DBAs have a plan of action. Address issues and take a proactive approach to long-term improvement by alerting managers to both the problems and potential solutions at once, constructively.

Assess security measures

While performance and availability will naturally be front-of-mind during the first stages of triage, you'll also need to assess security liabilities, even though some may represent middle- or long-term threats. Be sure to keep the following best practices in mind:

Review user access, permissions and passwords

When creating database objects, you must grant permissions to make those objects accessible to users. Every securable object has permissions that can be granted to, or revoked from, a principal using permission statements. Granting permissions to roles, rather than to users, simplifies security administration. The permission set designated by a given role is inherited by all members of that role.

It is far easier to add or remove users from a role than it is to create separate permission sets for each individual user. Roles can be nested; however, too many levels of nesting can degrade performance.

Find and document compliance liabilities

Log management for security compliance in SQL Server can be complicated. Suggestions come from all directions, including vendors, auditors and lawyers. As a DBA, it's your job to determine what's reasonable based on risk. Depending on your organization, industry and country, your company may have stringent compliance requirements. For example, for the Health Insurance Portability and Accountability Act (HIPAA) for patient health data, or the Payment Card Industry Data Security Standard (PCI DSS) for credit card and transactional data. You can use SQL Server's auditing functionality to find and document areas of concern, thereby lessening your company's potential regulatory exposure. Regardless of the governing body, a big part of compliance initiatives is to monitor activity related to sensitive information and to keep good records.

Institute a code freeze during triage steps

Until you've had time to find and remediate major performance and security liabilities, introducing new code into an environment is a no-go.

Conclusion

What if you could spend less time fighting fires, so you can focus on more proactive database performance optimization? It's possible with the right preparation. As the first chapter described, your first step in optimizing database management is to carefully assess your SQL Server environment. By taking inventory, prioritizing issues, identifying resources and assessing security, you'll have the information you need to move to the next step which is to develop best practices for effective backup and recovery.

Establish Effective Backup and Recovery

In chapter 1, you learned tips to effectively assess your SQL Server environment. With your infrastructure continually growing in volume and complexity, you find yourself in a constant balancing act. Disaster recovery remains at the forefront of your responsibilities. With maintenance windows shrinking and data volumes exploding, the need to follow strict recovery time objectives (RTOs) and recovery point objectives (RPOs) is more important than ever.

Many DBAs make the mistake of thinking that because they have high availability (HA), a backup and recovery strategy is unnecessary. However, the possibility of data loss is a risk even in HA scenarios. Moreover, end-user downtime will surely attract unwanted attention. Therefore, it's critical to have a disaster recovery (DR) strategy even if you have high availability.

This chapter explains four best practices for effective backup and recovery:

  • Understand the differences between high availability and disaster recovery.
  • Understand your options for high availability in SQL Server.
  • Establish both a backup strategy and a recovery strategy.
  • Build a disaster recovery strategy for the long term.

Understand the differences between high availability and disaster recovery.

As noted earlier, high availability and disaster recovery are not synonymous:

  • High availability is the measurement of a system's ability to remain accessible in the event of a system component failure. Generally, HA is implemented by building in multiple levels of fault tolerance into a system.
  • Disaster recovery is the process by which a system is restored to a previous acceptable state, after a natural or man-made disaster. With DR, there can be a significant loss of service while the DR plan is executed and the environment is restored.

In short, HA is about maintaining service, and DR is about retaining data. It is imperative that both HA and DR strategies be driven by business requirements. Therefore, they should address non-functional requirements such as:

  • Performance
  • System availability
  • Fault tolerance
  • Data retention
  • Business continuity
  • User experience

For HA, determine any service-level agreements (SLAs) expected of your system. For DR, use measurable characteristics, such as an RTO and an RPO.

Understand your options for high availability in SQL Server.

High availability tools and functionality built into SQL Server are designed to ensure uptime. Here are a few options to consider:

  • AlwaysOn Failover Cluster Instances — AlwaysOn Failover Cluster Instances provides server-instance level redundancy. It uses Windows Server Failover Clustering (WSFC) to provide local high availability through redundancy. A failover cluster instance (FCI) is a single instance of SQL Server that is installed across WSFC nodes and, possibly, across multiple subnets. On the network, an FCI appears as an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.
  • AlwaysOn Availability Groups — This enterprise-level high-availability solution was introduced in SQL Server 2012 to enable you to maximize availability for one or more user databases. It requires that the SQL Server instances reside on WSFC nodes without all the extra hardware typically associated with clustering, such as shared storage.
  • Database mirroring — Database mirroring maintains a single standby database, or mirror database, of the production (principal) database. This approach increases the availability of a SQL Server database by supporting almost instantaneous failover, and also improves data protection.
  • Log shipping — Like AlwaysOn Availability Groups and database mirroring, log shipping functions at the database level. It is an automated backup and restore process that allows you to create an additional copy of your database for failover. You can use log shipping to maintain one or more warm standby databases (secondary databases) for a single production database (the primary database).

Establish both a backup strategy and a recovery strategy.

Not surprisingly, backup and recovery go hand and hand. There is no way to simply implement a recovery strategy without first assessing backup processes. Backups are instrumental to protecting critical data and giving you the ability to execute point-in-time recovery for the data in your environment. If data was maliciously or inadvertently deleted from a table, no amount of failovers on a cluster is going to bring that data back once the change was committed. As the old IT adage states: "A DBA is only as good as their last backup."

In chapter 1, we described how to inventory your environment, including which databases are being backed up, how frequently those backups are performed and the retention periods for backups — as well as which SQL Servers aren't being backed up at all.

If you do identify servers that are not currently backed up, you may need some backup of your own:

  • Engage your managers and explicitly declare a backup gap as a major problem.
  • Investigate whether there may be a valid reason for databases not being backed up and suggest a minimum level of protection (according to your tolerance).
  • Follow up with the owners of the databases to determine their RTOs and RPOs.

With this information and support, implement the agreed-upon SLAs. Regularly test a range of scenarios to ensure that your data can be restored within the expected time periods and SLAs.

Build a disaster recovery strategy for the long term

Use transaction log backups

Without transaction log backups, it is impossible to restore your data to a specific point in time. While managing transaction logs is costly — both in terms of the operational costs and the capital expenditure on additional on-site and off-site storage — the cost of data loss due to not having these is even higher.

As a guardian of the organization's data, you work to specify RPOs for your business. Many organizations schedule transaction log backups every 15–30 minutes; this baseline is a good starting point for your own conversations, during which you should stress the RPO is akin to data loss.

Create a reliable restore process

Restore processes need to be seamless and automated. The best way to ensure effective restoration is to test and document processes continually until everybody in the department who may be called upon to do so can perform the required tasks. Great care should be taken to update the restore process on a frequent basis so that your RTOs can either be adhered to or altered if required.

In many cases, RTOs can be significantly reduced by employing third-party solutions that reduce backup and restore times using advanced compression algorithms. Some tools also offer object-level restores, which let you avoid restoring your entire database if a single table was destroyed.

Conclusion

Establishing effective backup and recovery processes is critical to maintaining database uptime. What if you could reduce database backup and restore times, while also reducing the recovery times to near zero?

With the assessment and improvement steps discussed in chapter 1 and implementation of fast, reliable, backup and recovery workflows, you can rest assured that your SQL Server environment will have high availability, be more reliable and easier to manage. Read on to learn ways to continue to optimize SQL Server management through effective communication, a stable environment and automated processes.

Continue Optimizing SQL Server Management

Once you have assessed your SQL Server environment and established effective backup and recovery, you need to consider how to maintain SQL Server management optimization going forward. After reading this chapter, you'll have new insight into several best practices that will help you transition to rock-star DBA status. You'll learn how to:

  • Communicate proactively across the organization.
  • Shift from reactive mode into proactive mode.
  • Take advantage of automation.

Communicate proactively across your organization.

Proactive communication is a critical skill for any successful DBA. As you work to establish and maintain an optimal SQL Server environment, you'll need to have discussions with appropriate parties within your department, including both your manager and peers, as well as with key extra-departmental players whose needs have a large bearing on your work. Best practices for proactive communication include:

  • Establish good relationships with developers and management — Capture and discuss key database performance issues and events for your team.
  • Gain credibility — Demonstrate awareness of problems and share what you intend to do about them.
  • Establish transparency and accountability — Become a conduit of relevant information for management and colleagues, keeping them abreast of the health, status and ongoing priorities in managing the environment.
  • Report on both problems and solutions — Keep a standardized report in which you update stakeholders on the status and next steps of key issues to show that you're bringing constructive solutions along with the bad news. This is a career-enhancing habit that you need to make a part of your daily routine. In addition to enhancing transparency and accountability, these reports also build in some protection for you in case a major potential problem that you flagged and which required attention or resources from others went unheeded.

Shift from reactive mode into proactive mode.

In chapters 1 and 2, you ensured that your environment's most immediate problems have been addressed and that your data is recoverable within acceptable parameters. It's now time to look ahead and introduce stability and visibility into your environment. This is your opportunity to switch from reactive to proactive mode by tuning your environment to prevent problems in advance.

Figure 1. The basic steps in a baselining and benchmarking methodology

Introduce baselining and benchmarking methodology

A baseline and benchmark methodology helps you spot problems and engage in a process of continuous improvement. Figure 1 shows the basic steps.

Read our white paper, "Ten Tips for Optimizing SQL Server Performance," to learn about a methodical approach for:

  • Determining goals
  • Tracking and analyzing data
  • Creating a cycle of improvement that keeps you ahead of major performance problems

The paper also includes ten helpful tips for tuning and optimizing your environment so you can help ensure that your initial triage efforts have set the stage for longer-term performance and stability.

Follow best practices for deploying code

You can further protect your gains by instituting best practices that introduce new code into the environment. Many DBAs face constant requests to push out code from various points in the organization, which results in adhoc deployments that not only cause a great deal of stress, but also introduce security and performance risks.

Part of the solution lies in developing a standard process for reviewing and scheduling code deployment requests. Working with developers and other colleagues to craft this process might take some doing — be sure to take advantage of the best practices for proactive communication outlined earlier.

Every organization has its unique demands and working culture, but any good code deployment protocol will include the following components:

  • Formal review
  • A defined testing step, in a testing environment
  • A step for discussing and tweaking problematic code
  • Guidelines about when a rollback will be instituted and how it will be performed

Take advantage of automation.

Stabilizing your environment also means stabilizing your routine; as you implement proactive strategies, automation becomes a necessity. It also gives you the time you need to engage with your organization, tackle problems like ensuring your system can scale with the business and become a rock-star DBA.

Any repeatable task is a candidate for automation. As you identify ways to move from being a firefighter to the guardian of an environment that breezes along, look for tasks that can be run automatically and for tools that can help deliver that automation.

Automating your monitoring with a third-party tool is the best way to gain a consistent and guaranteed level of analysis across your entire SQL Server estate. No two SQL Server professionals would write the same code or look in the same place to solve a given problem, but consistency and reliability are the cornerstones for every DBA. You need a tool that you can rely on to perform the same every time, 24x7x365.

Conclusion

Solving the challenges of your evolving database environment can be stressful, but also intensely satisfying. We know it's not easy with growing data volumes, increasingly complex database environments and diverse customer demands. But, by following the best practices in this e-book, you can proactively manage your SQL Server infrastructure, protect your data and continue to thrive in the new database world. might even be able to enjoy more evenings and weekends doing the things you love - besides database administration