Windows Server 2008's Server Core installation is a great option for domain controllers: The operating system (OS) has a smaller footprint and has so far required significantly fewer patches than the full Windows installation, making it possible to have less downtime and maintenance for your critical domain controllers. In this tip, we'll install a Server Core domain controller from scratch.
The installation begins, ironically, with the lightweight GUI installer that's familiar to all editions of Win2008 and to Windows Vista. Select one of the Server Core options.
Note that this is a one-time decision: You can't later "upgrade" to the full Windows installation nor can you "downgrade" a full install to Server Core.
That's about the only decision you have during installation. When it's finished, you'll be looking at a logon screen and might be wondering what to do. Select the "Other User," and log in as Administrator. Use a blank password; you'll be immediately prompted to create a new password.
After changing the password, you'll be logged in and staring at your new, trimmed-down desktop. That's right—not much to see! This is Server Core, and it has only a few graphical elements available to it. To get it up and running, you'll need to run a few commands. Many of these will be commands you're familiar with already; others are new and are unique to Server Core.
Since we're building a domain controller, you'll probably want to start by assigning a static IP address. Do so using the Netsh command, as shown, to get a list of network interfaces. Use the number in the "Idx" column to refer to the interface in later commands.
Netsh interface ipv4 show address
With your network adapter identified, assign a static IP address, subnet mask, and default gateway using the Netsh command. The Name= parameter is where your chosen adapter's ID number goes.
Netsh interface ipv4 set address name = 2 source=static address=10.0.1.57 mask=255.255.255.0 gateway=10.0.1.1
Use the same technique to assign a DNS server. To assign more than one, increment the index= parameter—you can see here that I've attempted to add index=1 twice, and received an error message. Ipconfig /all will confirm that you've added the correct server address.
Netsh interface ipv4 add dnsserver name=2 address=10.0.1.1 index=1
Server Core still requires activation, which is a two-step process that uses the Slmgr command. First, install a product key. Then activate Windows. Note that Server Core is compatible with enterprise key servers if your organization uses one of those. Run Slmgr without any parameters to get a pop-up dialog box of other things it can do; note that the dialog often appears behind the command-line window and there's no Task Bar to clue you in. If the command's output doesn't show up quickly, try moving the Cmd.exe window out of the way. Don't close it—if you do, press Ctrl+Alt+Delete to get to Task manager, and use the New Task menu option to run a new instance of Cmd.exe.
Slmgr –ipk your-product-key-here
After installing the key, activate it. This can take some time—wait for the dialog box indicating success or failure, and don't forget that it might appear behind the Cmd.exe window.
You'll probably want to customize the computer name at this point. Use the hostname command to find the current computer name, and then the Netdom command to change it to a new one.
Netdom renamecomputer old-name /newname:new-name
A reboot will be required afterwards, so use the Shutdown /r command to reboot.
I generally like to install the DNS Server role myself so that I can customize it. After installing, you'll need to use the DNS administration console on another computer (such as your workstation) to connect to the Server Core computer and configure DNS. Server Core doesn't run any graphical admin tools. You could also use the Dnscmd command to configure DNS, if you're comfortable with it. To install the role, use the Ocsetup command; I prefer to get this going by using the Start /w command, which suspends the command prompt until Ocsetup finishes. If you don't do so, the command prompt immediately returns while the installation completes in the background, and you won't know when it's done.
Start /w ocsetup DNS-Server-Core-Role
Next, you'll need to create an unattended installation file for Dcpromo because its graphical wizard isn't available in Server Core. http://www.petri.co.il/creating-unattendinstallation-file-dcpromo-windows-server-2008.htm is an excellent reference for Win2008 unattended Dcpromo files—note that the Win2008 syntax is a bit different and newer from the Win2003 one. Server Core does have Notepad, so you can use it to create your unattended file if needed. Server Core's Notepad uses an older set of file dialog boxes; pay close attention to these Win95-vintage dialog boxes because they work differently from the newer ones you're used to.
The unattend file tells Dcpromo if you're creating a new domain, a new domain controller in an existing domain, a whole new forest, or whatever. Read through the options carefully! You can also use Dcpromo on an existing full Windows installation (although not on an existing domain controller) to create an unattend file; just run through the Dcpromo wizard and, before you commit to installing AD, save your configuration in a file. That file can then be carried to Server Core (on a USB key, for example) and used with Dcpromo there.
With your unattended file ready, run Dcpromo /unattend:filename to start the AD installation process. You'll see plenty of output telling you what's happening.
Of course, a reboot is in order afterwards, and Dcpromo will handle that automatically. Once the server restarts, you can use Active Directory Users & Computers—again, from another computer—to begin managing your domain.
Read-Only Domain Controllers (RODCs) are a new feature in Windows Server 2008 designed specifically for branch offices where the domain controller might not be as physically secure as you would like. A risk with less-secure computers is that the computer or its system hard drive might be stolen, giving an attacker the opportunity to break the encryption on the Active Directory database and then run a dictionary attack against stored passwords, potentially compromising every password in your domain. This isn't farfetched; while breaking the database encryption would be time-consuming, a dictionary attack that used a pre-generated "rainbow table" (which are readily available) can begin cracking passwords in just minutes. The idea with an RODC is that it doesn't store any passwords, so stealing it (or the hard drive) really limits the amount of useful information an attacker can get hold of.
A downside to an RODC is that they don't store passwords—meaning the primary function of a domain controller, authentication, can't be performed. Actually, RODCs can perform authentication. What they do is contact a writable domain controller, which has passwords stored, to handle the authentication; the RODC can then cache the password information locally. This allows authentication to occur when a writable domain controller isn't available—provided the user that the password information was retrieved for was cached in the advance. If the RODC is stolen, any cached passwords represent potential security vulnerabilities, but only those passwords need to be changed, not the entire domain. Simply force a password change on everyone in that office, and you're fine. You can specify, in advance, which accounts an RODC will cache. Any other accounts will only authenticate if a writable domain controller is available at the time.
You can pre-populate the password cache: When adding cache-allowed accounts to the RODC's Password Replication Policy, click Prepopulate Passwords to make this happen. This ensures that all cacheable passwords are cached immediately, without waiting for each of those users to log on.
The presence of an RODC doesn't negate the need for a writable domain controller. Any changes made to the domain, including user password changes, need to contact a writable domain controller; Windows clients handle this automatically, but you do need to ensure that branch office connectivity is sufficient to handle these contacts. A branch office that happens to have an active domain administrator might not offer acceptable performance because the administrator would essentially be working over the WAN to administer the domain. Joining a computer to the domain also requires contacting a writable domain controller, and Group Policy administration requires a writable domain controller.
One concern with RODCs is that certain information, in addition to passwords, is stored locally, including account lockout status. When an RODC locks an account, that lockout is forwarded to a writable domain controller but not "replicated" in the AD sense of the term. If the lockout occurs while the WAN link is down, however, no writable domain controller will receive the lockout notice. The AD management tools will not show the lockout, but the account will be locked on out the RODC—although even the RODC's management tools will not show the lockout because it isn't officially in the domain database, yet. ADSIEdit does show the lockout on the RODC, in the lockoutTime attribute (which isn't the attribute the AD management tools look at to see whether an account is locked). Normal accountunlocking methods won't work because they rely on a writable domain controller and the RODC isn't one. The main way to unlock the account is to restore WAN connectivity, allowing the user to authenticate normally. Unfortunately, restoring the WAN link will also immediately unlock the account because the writable domain controllers in your domain will overwrite the RODC's lockout status almost immediately. Thus, if the account was locked for a good reason—such as an attempted attack—the account will now be free for another try, and you might not even know that it had been locked on the RODC at all, if no user complained about it.
Some third-party applications that store data in AD may store sensitive information that you don't want replicated to RODCs. In these cases, you can configure a set of attributes in the schema that will not replicate to an RODC—this is called the RODC filtered attribute set.
Even if an attacker modifies an RODC and attempts to request replication of these attributes, the domain will deny the request. However, be aware that domain controllers running older versions of Windows will honor a request for these attributes because those older domain controllers don't recognize the filtered attribute set. The filtered attribute set is configured on the domain's Schema Master, which must be running Windows Server 2008 in order for the attribute set to be properly stored.
RODCs can also host the Windows DNS Server service, and the RODC will be able to replicate all application directory partitions that DNS uses. Clients can query the DNS server as they would any other for name resolution. However, the DNS service will be readonly and will be unable to accept updates of any kind.
Typically, clients use the DNS server in their site as their "preferred" DNS server, and send updates—including updates for A, AAAA, SRV, and other record types. An RODC has no means of accepting these updates, however, and when queried for an SOA record, the RODC will return the name of a writable domain controller running the DNS service rather than that of the RODC. This is how a secondary DNS server handles updates for zones that are not AD-integrated zones, and it's a well-established DNS standard operation.
The RODC does have a bit of smarts: When it refers a client to a writable DNS server, it waits for a bit and then tries to query any records related to that client from the DNS server. That gives the client a chance to contact a writable DNS server, submit updates, and lets the RODC quickly pull those updates down so that its local, read-only DNS database is up to date. This works only if at least one of your DNS servers is on a Windows Server 2008 computer, and if that computer has registered an NS record for itself in the DNS database.
RODCs allow you to delegate local administrative authority—such as the ability to run backup and restore operations—without delegating any domain authority. This allows branch office personnel to perform basic administrative tasks on the RODC computer without having any broader permission within AD itself.
Generally speaking, RODCs are compatible with any AD-enabled application. However, write-intensive applications don't do well when they're co-located with only RODCs because write requests have to be referred to a writable domain controller, which might under some circumstances (such as interrupted WAN connectivity) be unavailable. The write referral is potentially the most difficult operation; while applications that use standard directory programming interfaces should have no problem, not every application is built using these standard interfaces. Only testing will determine whether all your applications will be RODC-compatible, and if they're not, the developer will need to make corrections. Applications built using Microsoft's Active Directory Services Interface (ADSI) will automatically handle write referrals; developers often prefer the higher-performance LDAP, however, which carries referrals but does not automatically "chase" them as ADSI does.
Most Microsoft applications work fine against an RODC, although the following ones require special steps if actually installed on an RODC (see http://technet.microsoft.com/en-us/library/cc732790.aspx for details):
Probably the big challenge is Exchange Server, which does not use RODCs. Outlook clients, however, can use an RODC for read-only Global Catalog address book lookups.
Generally, "special steps" means creating appropriate service accounts on a writable domain controller and then ensuring they replicate to the RODC before beginning the software installation.
The best security is achieved when RODCs are combined with two other Windows Server 2008 features: BitLocker and a hardware Trusted Platform Module (TPM). The latter technologies provide volume-wide encryption for the system drive, providing yet another layer an attacker must work through in order to access data. The TPM helps by checking the hardware configuration against what's stored in its secure memory to ensure that nothing has been tampered with before allowing the host to boot—helping to prevent unauthorized hardware modifications that might be used to subvert or compromise the OS. Combined, these three features don't make it impossible to hack a domain controller, but they make it pretty impractical and ultimately unrewarding.
Aside from security and logon performance at branch offices, RODCs offer benefits in a couple of odd scenarios. One is a line-of-business application, which will only work if physically installed on a domain controller—a poor practice, to be sure, but one which some administrators face. An RODC will work with many of these applications (subject to the caveats mentioned earlier), providing a sort of special-purpose domain controller just for that application. RODCs also provide better security in some extranet scenarios, where you need to expose authentication capabilities but don't necessarily want passwords to be compromised.
In the past, a corrupted file or segment of disk storage could typically only be repaired by taking the entire server offline and running on offline CHKDSK. No more: Under Win2008, a new service detects corrupted files automatically and spawns a thread that attempts to fix them. The affected files remain offline, meaning applications—including the Server service that provides file sharing—can't access the file but everything else on disk remains accessible and the server itself remains online. Access to the file is restored automatically if Windows is able to repair the corruption; if not, that area of disk is marked off-limits so that no other processes try to write files there.
You don't even need to do anything to take advantage of this feature, but you do need to be aware that it's happening. Client applications may display misleading "access denied" messages, for example, when a file is under repair. It's not a permissions issue but rather the fact that Windows has taken the file "out of service" while attempting to fix it. Your first troubleshooting step, therefore, should be to see whether you can access the file as a fullprivilege administrator to eliminate permissions as a possible cause of the error (keeping in mind that with User Account Control enabled on your own workstation, you won't appear to be a real administrator unless you explicitly launch Explorer or another application "as Administrator.")
IIS 7 is pretty much a total re-write of IIS. It's such a drastic change, in fact, that Win2008 continues to ship with the old IIS6 management tools so that you can manage existing IIS6 installations! Many of the common IIS management tasks have changed completely, all the way down to how you install and set up FTP services.
As before, IIS maintains a top-level, server-wide set of configuration options, and Web sites can inherit these. You can also configure per-site settings on each individual Web site. What's new is how you do so: The IIS Management console has been vastly extended, so making everything accessible from a single Properties dialog box was no longer practical. Instead, the server and each site present a page of configuration icons, and double-clicking one opens a page for that specific item.
Figure 1: IIS 7 Manager.
In most cases, the layout of these item-specific pages is new, too, because most of them are also extensible. Authentication, for example, is no longer a set of four radio buttons but rather a list of all installed authentication choices, and the ability to enable or disable each.
Figure 2: Authentication configuration.
In some cases, it can be a bit tricky to find the setting you're after: Editing site bindings, for example (which determines the host names, IP addresses, and port numbers a site will respond to), is accessed from the right-hand sidebar, as are functions for stopping and restarting sites.
IIS continues to host sites within Application Pools, which are used to configure the number of threads servicing one or more sites, the user identity the sites operate under, and so forth. Unlike IIS 6, though, IIS 7 will—by default—create a new App Pool for each new Web site you create. It's an easy-to-change setting when you create a new site, but it's also easy to miss, and there are disadvantages to having one Application Pool per site.
Figure 3: Configuring Application Pools.
Each Application Pool consists of at least one thread of execution. Infrequently-used sites can easily share a single thread, while busier sites may benefit from multiple threads for parallel servicing of multiple incoming requests. Each thread, however, brings a small amount of overhead, so having one thread apiece for several less-busy sites may actually hamper server performance. The moral? Don't accept the defaults until you've' decided whether that's suitable for your specific situation.
IIS 7 is probably the most extensible version of IIS ever, and Microsoft—as well as third parties—is making numerous extensions available. To make installing all of these easier, Microsoft has created the Web Platform Installer, which is available for free at www.iis.net. This installer queries available extensions and offers to install them for you—up to and including non-Microsoft platforms such as PHP, which enjoys better support than ever under IIS 7.
Figure 4: Web Platform Installer.
Once set up, the Installer is available from the management page of any Web site. It'll remind you a bit of the easy-to-use Web-based management consoles that many hosting companies provide: You can even use it to install selected pre-packaged Web applications such as DasBlog, Drupal, Subtext, WordPress, and more.
Figure 5: Installing Web applications.
The Web Platform Installer is probably the easiest way to extend IIS we've ever had.
Although Win2008 includes the old FTP Publishing Service, you don't want it. In fact, if it's already installed, un-install it using Server Manager (go to the Web Server role, and click "Remove Role Services"), and use the new FTP service available through the Web Platform Installer.
Figure 6: The new FTP service.
This new service, which cannot be installed if the old IIS6-compatible FTP Publishing Service is installed, offers secure FTP, FTP firewall support, better FTP logging, and much more. It's a more scalable and more efficient FTP service that can be managed from within the IIS7 Manager console (the old service requires the use of the old IIS6 console).
One of the most annoying aspects of using IIS, as opposed to something like Apache, is the availability of URL rewriting. Numerous popular Web applications make use of this feature to provide search engine-friendly URLs as well as other capabilities. Apache makes it easy by using an industry-standard rewriting syntax in a simple text file, named .htaccess. Dropping an .htaccess file into a Web site's root folder, or any subfolder, enables rewriting for that site or folder. Under IIS, third-party commercial tools were required to provide this capability—until IIS7. The Web Platform Installer can be used to get a free URL rewriting module, which appears as a configuration option in IIS Manager.
Figure 7: Editing a URL rewrite rule.
Although IIS still (somewhat irritatingly) doesn't use simply .htaccess files, it can import those files into its own URL rewriting module. You can create custom rules, and a wizard provides shortcuts for creating common types of rules. For example, one rule (see Figure 8) can be used to remove the "www" from incoming requests, forcing users to
"realtimepublishers.com" rather than "www.realtimepublishers.com." This is a common trick for helping search engines see only one version of the site and avoiding the "duplicate content penalty" many engines impose when they think they're seeing the same content on two different Web sites (one starting with www, and the other without).
Figure 8: The "No-WWW" rule.
To create this rule, create a new, blank URL rewrite rule. Set it to match the pattern:
Which is a regular expression (regex) for any URL coming into the site (the site's bindings will ensure that only requests intended for that site make it this far). Under the rule's conditions, specify a single condition:
And set the action to redirect to:
Select the "Append query string" check box and make the redirect a Permanent (301) redirect. This will grab whatever URL the user was trying to reach, if it starts with "www," and redirect to the non-www version of the URL. You can also use this to capture old domain names and permanently redirect them to a new one.
Windows Server Backup has been entirely rewritten for Win2008, and it's finally—after more than a decade of Windows' existence as a server operating system (OS)—a viable choice for many real-world backup and recovery tasks, especially in smaller environments. However, it's not a do-it-all solution; you should be prepared for significant disadvantages and weaknesses.
Like nearly every component of Win2008, Windows Server Backup (WSBackup) isn't installed by default. You'll need to open Server Manager, go to Features, and as shown in Figure 9, manually add the Windows Server Backup feature. It's a good idea to add the Command-line Tools sub-feature because you'll gain the ability to add backups to other automated processes in Windows PowerShell commands and scripts.
Figure 9: Adding Windows Server Backup.
The need to add this feature can actually be a little confusing because Windows installs a shortcut on the Start menu for Windows Server Backup even if the feature itself isn't installed. Clicking the shortcut opens a console that tells you that you need to install the feature.
Let's be perfectly clear in that WSBackup is intended to back up data and applications on the local computer; Microsoft doesn't position this feature as anything more than a very basic, local, bare-bones utility. Operations are primarily wizard-driven, such as the Backup Schedule Wizard that Figure 10 shows. With this wizard, you can select what you want to back up, when you want to back it up, where the backup will be stored (disk only—no tape support), and so on.
Figure 10: Configuring a backup.
You can restore a backup that was made from the local computer or from another computer (if you're trying to recover an entire system, for example, or need to grab a few files from a backup that was made of another computer). In addition, you can restore individual items from a backup as well as the entire thing.
As Figure 11 shows, you can configure backup performance by simply selecting the type of backup that will be made: a full backup (doesn't hit hard the server itself in terms of performance and cleans up Volume Shadow Copy files), or an incremental backup (leaves behind Windows' Volume Shadow Copy files and may diminish server performance somewhat). You can also make this decision on a per-volume basis.
Figure 11: Configuring backup performance.
Volume Shadow Copy (VSC) is designed to keep old versions of files handy in a disk-based store for easier recovery; users can use Windows' Previous Versions tab on a file's Properties dialog box to access VSC versions. Upon making a full backup, VSC files are normally cleared because the files protected by VSC are now safely in a backup.
Although WSBackup itself is designed to back up the local computer only, you can use the management console to connect to WSBackup running on other computers, allowing you to manage their local backup operations without having to physically log onto their consoles.
Figure 12 shows this task in action.
Figure 12: Connecting the WSBackup Console to another computer.
Most experienced administrators pretty much ignore Windows' built-in backup, and WSBackup isn't going to change their minds. For a very small environment dealing primarily with file and print servers, WSBackup is a reasonably effective, if bare-bones, means of making the backups you need to be safe. You'll need to move the backups offserver, of course, or they're at risk of a complete disk or system failure, and WSBackup doesn't make it easy to move those files around (it expects them to pretty much remain local). You can't save backups to any disk volume that contains Windows itself or application data, which means you'll need to install a dedicated volume—often not an option on a server that's already had all its disk space allocated.
Although Windows PowerShell isn't specifically new in Win2008 (it was previously made available for Windows XP, Windows Server 2003, and Windows Vista), Win2008 is the first version of Windows that includes Windows PowerShell. A complete discussion on
PowerShell is a book unto itself, but there are a few things you should be aware of and plan to take advantage of right away.
Everyone who has heard of PowerShell has an idea of what it is: a command-line tool, a scripting language, or something. It's almost easier to explain what PowerShell isn't:
PowerShell is a standardized means for Microsoft to package administrative functionality. They're not quite command-line tools, although we humans can access them through a command-line interface. The big part there is standardized because PowerShell is the first time that Microsoft has created a clear, documented standard for exposing administrative functionality. PowerShell can be accessed from a command-line window, true, but it can also be hosted by graphical applications that run commands in the background. In some cases, you might be using a GUI console and not realize that PowerShell is actually doing all the work behind the scenes.
It's safe, in casual conversation, to refer to PowerShell as a command-line interface because that's how most of us will experience it directly.
Although PowerShell is included with Win2008, it isn't installed by default: As Figure 13 shows, you have to enable its feature in order to start using it. Doing so will also enable the .NET Framework v3.0, which is the version that ships with PowerShell. PowerShell actually requires v2.0, which is a subset of v3.0.
Figure 13: Windows PowerShell is an optional feature.
The "R2" release of Win2008 actually does install Windows PowerShell v2 by default, which means the latest version of the Framework (3.5) is also installed by default. Because of PowerShell v2's new features, Microsoft feels—and I agree—that everyone will want and need PowerShell on every computer.
PowerShell has the ability to execute "script files," which are essentially a batch of commands executed in sequence, so Microsoft has obvious concerns about PowerShell and security. The last scripting language Microsoft pushed out, VBScript, was a dismal failure in terms of security, enabling mass virus attacks such as "I Love You," "Melissa," and other famous malware; the company certainly didn't want PowerShell landing in the same boat.
Understand that the potential danger in PowerShell does not come from running commands interactively. Typing a command, getting the syntax right, and doing anything requires a certain amount of expertise and isn't something you can typically trick someone into doing. In any event, no command will work unless the user has the necessary underlying permissions in the first place—PowerShell isn't a way to bypass Windows security. No, the real danger in PowerShell comes from scripts. That's because a script is something you can trick someone into running, and a script may contain entire sequences of commands that the tricked person might normally know not to run. Tricking an admin is especially deadly, because the admin will usually have permission to do all kinds of dangerous things.
So PowerShell's security focuses on script execution, primarily through a mechanism called the execution policy. By default, this policy is set to "Restricted," which prevents scripts from running entirely. Problem solved.
Changing the policy—by using the Set-ExecutionPolicy command within PowerShell itself—requires local Administrator privileges, as the setting is stored in the HKEY_LOCAL_MACHINE portion of the Windows registry. You can also control this setting centrally using a Group Policy administrative template that's available from http://download.microsoft.com (just punch in "PowerShell adm" in the search box to find the download). A Group Policy-applied setting overrides anything else.
So what might you change the policy to? "Unrestricted," the loosest setting, is stupid; you're putting PowerShell right back into the VBScript days, allowing any script to execute at any time. The next-higher setting, "RemoteSigned," might sound promising. It allows local scripts to execute without restriction but requires remote scripts—those downloaded from the Internet or accessed via UNC—to contain a digital signature. This setting isn't any safer than "Unrestricted," no matter what anyone tells you. I'll explain.
When a script is digitally signed (something you can accomplish using the Set-
AuthenticodeSignature command), an encrypted copy of the script is added to the end of the script file in a special block of comments. When running the script, PowerShell decrypts this signature and compares it with the clear-text copy of the script. If the two match, the signature is "intact" and the script executes. If the signature doesn't match the script, the signature is "broken" and the script won't execute. This in and of itself doesn't prevent maliciousness, but here's what does: Obtaining the necessary digital certificate—a Class III Authenticode Code-Signing Certificate, to be specific—typically requires you to prove your identity, in some fashion, to the certificate issuer. Your identity becomes a part of the certificate and of any signatures you create using that certificate. Thus, if you create a malicious script, and sign it, PowerShell will run it—and anyone affected by it will be able to divine your identity and hunt you down. So, in very general terms, signed script = safe script.
This "safe" depends entirely on the certificate issuer doing a good job of actually checking your identity when issuing a certificate. You can configure Windows to trust certificate issuers who you believe do a good job; you can configure it to not trust issuers who you don't believe do a good job. If PowerShell encounters a signature that came from an untrusted issuer, the signature and the script are also considered untrusted and the script won't run.
So, imagine a scenario: Your computer gets infected with a piece of malware. Only, rather than trying to do anything nasty, it just modifies an innocent little text file on your computer. One with, say, a ".ps1" filename extension—a PowerShell script, in other words, that you've already written. The next time you go to run this local script, the commands added by the malware also execute—and chaos ensues.
Or, even worse, the malware creates a simple text file with the name profile.ps1, inside a folder named "WindowsPowerShell," right in your Documents folder. No big deal, right? Wrong: This is a PowerShell profile script, and it is going to execute automatically the next time you open the shell! Worse, this file doesn't exist by default, so it's easy for a piece of malware to create it without you knowing. User Account Control (UAC) won't save you here
because it's just a simple text file in your Documents folder—nothing you need Administrator privileges to access.
The solution? PowerShell's third execution policy, "AllSigned." This setting requires all scripts to carry a signature, created by using a certificate that came from a trusted issuer. Create your own profile script (a blank one is fine) and sign it to prevent a piece of malware from plopping down a profile script, and you're protected. Sure, you have to sign your scripts before they'll run—no big deal. The better commercial script editors (PrimalScript and PowerShell Plus Professional Edition come to mind) will do that automatically for you, if you want them to. Don't want to buy a certificate? Run help about_signing in PowerShell and read how to use the MakeCert utility to create a free, local-use-only certificate for your own scripts.
As a command-line shell, PowerShell works a lot like the Cmd.exe shell you're probably familiar with: type a command, add on any necessary parameters, and you're ready to hit Enter. Need to try again? Hit the up arrow, modify the command, hit Enter, and you're done.
So how do you get around your system in PowerShell? If you've ever navigated a disk drive in Cmd.exe, then you know how to do it in PowerShell.
Type Dir to get a listing of files and folders—or type Ls, if you prefer that. Cd will change folders. Del will delete files, so will Rm. Type will display the contents of a text file, as will Cat. A backslash is a path separator, as is a forward slash. So whether you're comfortable with UNIX- or DOS-style syntax, you're good to go.
There are some caveats. "cd.." won't go up one folder level; you need to use "cd .." with a space. That's because PowerShell assigns a special meaning to the space character: it's a separator between a command and its arguments. That's why Cd c:\program files doesn't work—the space between "program" and "files" confuses it. Add quotes—either single or double—and run Cd "c:\program files" instead. That's pretty much what you would do in Cmd.exe or even most UNIX shells, by the way.
So there you have it: A completely arbitrary (like "Ls" is intuitive?) set of commands that you've probably already memorized and can use to navigate through a hierarchical database. Yes, a database—that's what the file system really is, after all. It's not relational like an Access or SQL Server database, but it's hierarchical, not unlike an Exchange Server mail store, or the Windows registry, or even Active Directory (AD). Speaking of which, would you like to learn a whole new set of commands that let you navigate the registry or even AD?
I hope you said "no" because who wants to learn a whole new set of commands when YOU
ALREADY KNOW A SET that should do the job? In other words, why can't we just run "Cd HKCU:" to change into the HKEY_CURRENT_USER registry hive? Why can't we run "Ls" to get a list of registry keys? Run "Cd Software" to change into that key, and "Del *" to delete everything—whoops.
Well, it turns out you can in PowerShell. Try it. That's because PowerShell has little adapters called PSDrives that allow PowerShell to see different forms of storage as if they were disk drives. The Certificate Store, environment variables, registry, and more are just the beginning. More PSDrive adapters can be added in, and products like SQL Server 2008, AD (in Win2008R2), and others do just that. Run "Get-PSDrive" to see a list of all the drives currently available, and use "New-PSDrive" to create new drive mappings (remember, these are PS Drives, so they only live in PowerShell—you won't see them in Windows Explorer).
PSDrives illustrate a key part of PowerShell's design philosophy: Take ONE set of skills— preferably a skill that administrators already have—and leverage it as widely as possible. That means less learning for you while expanding the number of things you can do. It's like the moving walkway at the airport: slow people are supposed to keep right so that faster people can pass on the left. It's the same skill that we're supposed to use on the highway, leveraged in a new location. Sadly, most people seem to lack the skill in either scenario, but you get the idea. And even that makes a good point: PowerShell is leveraging skills that administrators SHOULD already have; if you've stayed away from any kind of commandline administration, you have done yourself a disservice because PowerShell assumes you've worked at least a little from the command line. If you haven't, PowerShell won't be impossible to use, but it will be a bit more of a learning curve because you lack some of the background experience that PowerShell is trying to leverage to make things easier on you.
Let THAT be a lesson for you. A big reason to learn PowerShell NOW is because there will be future versions that add MORE functionality. By learning PowerShell NOW, you can start gaining the background experience that will make future versions more incremental and easier to learn; the longer you wait, the harder it will be to learn each successive version.
So all of these things we ran—CD, DIR, LS, and whatnot—are all "commands." Technically, because they're within PowerShell, they're called "cmdlets" (pronounced "command-lets," not "cee-em-dee-lets"). Actually, technically, what we've been using so far are aliases.
Let me back up a bit.
PowerShell's functionality comes primarily from these cmdlets, all of which are written by developers working in a .NET language such as C# or Visual Basic. Cmdlets come packaged in a snap-in, which is basically a DLL file. You can think of them as similar to the snap-ins used by the Microsoft Management Console (MMC), in that they add product-specific functionality to an otherwise empty shell or console.
Cmdlets use a consistent naming scheme devised by Microsoft. Cmdlet names consist of a verb, such as Get, a dash, and then a singular noun, such as Service (for example, GetService). The list of verbs is actually fairly short and is intended to be used consistently. Changing something uses the Set verb, so you have cmdlets such as Set-Service and SetExecutionPolicy—never Change-Policy or Configure-Service. Using consistent verbs helps folks like us guess the right command name without having to pore through manuals. For example, based solely on what I've written here, can you guess the Exchange Server command that would retrieve user mailboxes? Get-Mailbox.
The downside of these command names is that they can be long. Not that long is inherently bad—long also means clearer and easier to remember. But long does mean harder to type, and nobody wants that. So PowerShell has a system for aliases, which are simply nicknames for a command. Dir is an alias for Get-ChildItem, Type is an alias for GetContent, Ps is an alias for Get-Process, and so forth. The alias is simply a way of shortening the command name or making the cmdlet name look like a familiar command (such as Dir or Del). The alias doesn't change anything about the way the cmdlet works. Run Dir /s and you'll see. That generates an error because the Get-ChildItem cmdlet, which is what's really being run when you type Dir, doesn't support a /s parameter.
Which brings us to parameters, I suppose. In sticking with the "consistency" theme, PowerShell finally brings us a consistent command-line syntax for parameters. Parameters always begin with a dash—not a slash—and the parameter names are really clear: computerName, -path, -filter, -exclude, -credential, and so forth. The parameter name is followed by a space and then whatever value goes with the parameter, if appropriate. A parameter such as –append wouldn't usually take a value; it's just a switch, telling the cmdlet to append content to existing content. A parameter such as –computerName obviously does need a value—the computer name you want to pass along. So that's why Dir /s doesn't work: the Get-ChildItem command doesn't recognize /s as a parameter. Actually, it'll think it's supposed to be a path because PowerShell uses both / and \ as path separators. However, the command does have a –recurse parameter that'll do what you want.
There's no way to create an alias so that "Dir /s" behaves as "Get-ChildItem –recurse"— aliases are nicknames only for command names, not for anything else, and not for any parameters. Using an alias doesn't change the command syntax in any way; you're simply substituting a shorter name for the command, nothing more.
That said, you don't have to type the whole parameter name—honestly, typing – computerName all the time would be a hassle. You only have to type as much of the parameter name as needed to distinguish it from other parameters. So, for Get-ChildItem, instead of typing –recurse, you could type –r because there are no other parameters of that command that begin with "r." The "r" alone is enough to let the shell figure out which command you meant. In other cases, a few more letters may be needed: I usually type "comp" for –computerName, for example. It's probably more letters than I technically have to type in most cases, but it's enough to help me visually determine what parameter I meant.
And there's always Help: PowerShell's built-in help system even accepts wildcards, so running Help *Service* will help you find all the commands related to services, while running something like Help Get-WmiObject will offer complete help for that entire command and all its parameters. In PowerShell v2 (with Win2008 R2), the Help command picks up an –online parameter, which pops up the latest and most accurate help in a Web browser, straight from Microsoft's Web site.
Hyper-V is an exciting new feature of Windows Server 2008. Although much has been, and will be for some time to come, written on Hyper-V and its major competitors—VMware vSphere (ESX Server), and Citrix Xen Server—it's important to understand what Hyper-V is and isn't because it comes with Win2008.
Hyper-V is Microsoft's brand name for their Windows-based hypervisor. A hypervisor is a special type of software that's specifically designed to enable virtualization: the ability for one computer to effectively mimic the operation of many "virtual" computers at the same time. The hypervisor installs on a host computer and has direct (more or less) access to its hardware; it then enables one or more virtual machines to execute in memory. Each virtual machine, or guest, can run its own operating system (OS)—which need not be Windows— and each guest OS thinks it's running on its own dedicated hardware.
Hyper-V is technically a type 1 hypervisor, meaning the hypervisor itself runs on "bare metal," or directly on the server's hardware. Win2008 automatically creates a special virtual machine where the rest of Win2008 is installed. So, when you're using a Win2008 machine that has Hyper-V installed, you're always running at least one virtual machine— the one that Win2008 itself is running on. That "primary" virtual machine is the one that gets to tell the hypervisor what to do. It's not quite a guest virtual machine because it does have a special management relationship to the underlying hypervisor.
You need to own a Win2008 license to run Hyper-V. Beyond that, you'll also need licenses for whatever guest OSs you plan to run inside your virtual machines. The free, downloadable "Windows Hyper-V Server" product doesn't include licenses for anything but Hyper-V itself; any guest OSs will need a license.
When you buy a copy of Win2008, however, it comes with a certain number of licenses for guest virtual machines running copies of Win2008. The Datacenter edition of Win2008, for example, lets you run an unlimited number of virtual machines that run any other editions of Win2008; Win2008's Enterprise edition includes guest licenses for up to four Win2008 guests.
Yes. Lots of people like to argue this because when you install Hyper-V, you appear to be using a full copy of Windows. So, they argue, if Hyper-V requires Windows, it's technically a type 2 hypervisor, meaning the hypervisor doesn't talk directly to the hardware. This was the case with the predecessor, Microsoft Virtual Server. Its architecture looked a bit like what's shown in Figure 14, with the hypervisor clearly running atop Windows and depending on Windows to provide access to the hardware. Here, the hypervisor runs as an application, at the same level as something like Exchange Server.
Figure 14: A type 2 hypervisor.
Hyper-V's architecture is shown in Figure 15. What fools folks about Hyper-V is that it always installs a virtual machine—technically, a partition, to use Microsoft's terminology— containing a full Win2008 install. So you always see Windows, even though Hyper-V itself isn't talking through Windows to get to the hardware.
Figure 15: Hyper-V architecture.
Also shown are some unique features of Hyper-V, such as the ability of OSs that know about Hyper-V to realize that they're running in a guest virtual machine. This lets them feed specific types of information (such as performance) to the host for better manageability, and lets Hyper-V communicate with the guest OS to perform key tasks, such as better managing shutdowns. Non-aware guest OSs can also run but get fewer manageability improvements.
In fact, there is a way to run Hyper-V without running the full copy of Windows: Windows Server Core. The free "Windows Hyper-V Server" downloadable product uses this, and you can set up one yourself. It simply installs Server Core into the "root" partition so that you get a smaller Windows footprint in the root and more resources freed up for running your other partition.
Server Manager has proven to be a great way of administering Win2008's complex set of server roles and features. It offers a central means of adding, configuring, and removing roles and features, and provides central access to a number of security- and configurationrelated features that would otherwise be scattered across the operating system (OS) and require a lot of digging. If Server Manager had one significant failing, though, it was its inability to work with remote computers. If you wanted to use Server Manager, you were stuck logging onto the server console directly—which is a real limitation and really breaks the "single-seat administration" model Microsoft has been slowly trying to implement.
In Windows Server 2008 R2 ("R2" for short), though, Server Manager has been improved to support remote management. As Figure 16 shows, this change is subtle and one that's easy to miss: You simply pick up a "Connect to Computer" menu option.
Figure 16: Connecting to a remote computer.
This feature means you can now use a local copy of Server Manager to manage features and roles on all your R2 servers—except those running Server Core; unfortunately, the Server Manager console can't install roles on the stripped-down Server Core version of the OS. Hopefully that capability will come in time, as it would go a long way toward making Server Core more approachable for a wider range of administrators.
R2 offers an improved version of Server Core that makes up for a lot of the shortcomings of previous versions, albeit at a potentially higher level of maintenance overhead. One of the most important new features is the SConfig.exe utility (see Figure 17). This utility offers a text-based menu that helps administrators configure the core operating system (OS) settings such as domain membership, computer name, Windows Update, network settings, and so forth. This is a welcome improvement, as many of these tasks in the past required complex, fairly arcane command-line tools. Those same tools are still in use; they're just called in the background by SConfig. Think of SConfig as a sort of lightweight "Server Manager" specifically for Server Core.
Figure 17: Using SConfig in Server Core R2.
Server Core also offers a subset of the .NET Framework. This subset includes portions of v2.0 and v3.0; it specifically excludes the Windows Forms classes and Windows Presentation Framework, which require graphical user interface (GUI) elements not present in Server Core. The inclusion of this Framework subset has a couple of really important, far-reaching consequences. One of those is the potential for additional patches, as the Framework is an additional set of "moving parts" that do come with their own potential problems and the resulting hotfixes and service packs. A major benefit of Server Core has always been that it requires fewer patches—historically, about a third of what the full Windows OS requires. The Framework isn't historically a heavily-patched set of code but it does get patched.
The tradeoff, however, is significant: Server Core R2 now supports ASP.NET Web applications under IIS 7.5, which is a major improvement over the original Server Core release—which didn't have any Framework and didn't support ASP.NET at all. The inclusion of the Framework in Server Core R2 also permits remote management of IIS through the standard IIS management console—another major benefit for administrators (you have to enable the remote management service to make this happen).
Perhaps the biggest improvement offered by the Framework subset, however, is the inclusion of Windows PowerShell v2 as a pre-installed component of Server Core R2. This addition brings significant new administrative capability to Server Core, including the ability to remotely connect to Server Core's PowerShell instances from remote machines, enabling remote command-line management of single and multiple servers.
Active Directory Certificate Services (ADCS, formerly just Certificate Services) is also supported as a server role on Server Core R2. This means that yet another key infrastructure component—Public Key Infrastructure (PKI)—can now be migrated to this lower-maintenance, smaller-footprint OS.
Keeping in mind that R2 is only being made available in a 64-bit edition, Server Core R2 optionally supports a WoW64 layer that makes it possible to run 32-bit applications. I primarily see this as being used to support older management agents or anti-malware applications, although every effort should be made to acquire native 64-bit versions of these items as quickly as possible.
Finally, Server Core R2 also supports File Server Resource Manager (FSRM), which finally enables advanced file quotas and other FSRM-related functionality in Server Core.
Much has been made about the "Active Directory Recycle Bin" in Windows Server 2008 R2, but the reality falls somewhat short of the hype. Although this feature provides great capabilities, it also has some limitations that aren't immediately obvious—and the term "Recycle Bin" actually implies a level of functionality and ease of access that simply isn't present. But first, some background.
As you may know, deleted objects in Active Directory (AD) aren't deleted immediately. Instead, they're marked with a "tombstone" flag, which is replicated to all domain controllers in the domain. Tombstoned objects, as they're called, continue to hang around in the directory for some time—180 days in the most recent versions of AD. Although they can't be used to log on or for any other purposes, keeping the objects around in this tombstoned condition helps ensure that every domain controller knows about the deletion.
Some third-party Recycle Bin-like tools of the past simply take advantage of the situation, giving you a graphical user interface (GUI) for seeing tombstoned objects, and enabling you to remove the tombstone flag (and replicate that change), bringing the object back to life— reanimating it, to stick with the graveyard terminology. Some third-party recovery tools provide no other functionality, in fact, especially those of the shareware variety, and you don't even need a tool if you're comfortable using ADSIEdit or other free, low-level tools that enable you to change the tombstone attribute yourself.
There's a downside, though: When an object is deleted, AD removes most of its attributes at the same time it applies the tombstone flag. That means many of the object's attributes are no longer available, so the object isn't "complete." This is especially frustrating with user objects, as we tend to populate many of the users' attributes. So simply reanimating an object often isn't that "simple" at all because you may also need to re-populate the majority of its attributes to make it fully functional again.
Windows Server 2008 R2 makes one important change to the deleting process: It places deleted objects into a "recycled" state where their attributes are left intact. Thus, reanimating them, by flipping the tombstone flag, is easier, because the object is preserved in its original form.
Unfortunately, Windows Server 2008 R2 will not provide an actual Recycle Bin in the form of an icon or container that you can use to easily access deleted objects. Deleted objects will still be essentially inaccessible from most native AD management tools, and you'll need to use low-level directory editors, scripting, or other—frankly complex—means to reanimate objects from their "recycled" state. The term "Recycle Bin" is kind of misleading, because although the feature does provide a sort of "undo" capability, it doesn't do so in the same easy-to-access way that the Windows Explorer Recycle Bin does.
Also, this new "recycled" state depends on changes made to AD in Windows Server 2008 R2—meaning you can't leverage this new feature until every domain controller has been upgraded to this new version of Windows. You also have to upgrade every domain in your environment to the Windows Server 2008 R2 functional level, and upgrade your forest to the Windows Server 2008 R2 functional level. That's a serious commitment for most organizations, requiring planning, new software licenses, and a significant amount of effort in order to reduce the risk of outages in a production environment. Figure 18 shows how to make the upgrade using the new Windows PowerShell AD cmdlets included in R2.
Figure 18: Upgrading the forest functional level.
But wait, there's more to do: Once your domain controllers, domains, and forests are upgraded, you have to manually enable the "Recycle Bin" functionality in AD. Figure 19 shows this being done from Windows PowerShell.
Figure 19: Enabling the Recycle Bin.
Once you've done that, you can start writing scripts that actually let you recover deleted objects with their attributes intact. Oh, and once the "Recycle Bin" functionality is turned on, you can't turn it off. So before enabling it, make absolutely certain that this new feature won't be in violation of any internal security rules, legislative security requirements, or industry security requirements. For example, in many European countries, it's illegal to retain personally-identifiable information (PII) in certain circumstances; enabling the "Recycle Bin" may unacceptably retain PII without you realizing it, as object attributes aren't deleted.
Accessing deleted objects isn't as simple as opening a "Recycle Bin" icon in the AD management console; far from it. You'll need a lower-level tool, like Ldp.exe, to access the newly-created Deleted Objects container, as shown in Figure 20.
Figure 20: Accessing deleted objects in Ldp.exe.
The Recycle Bin is also only useful for deleted objects: Changes to objects aren't captured and preserved. Restoring multiple objects, especially those in a deep hierarchy, is still complicated. Non-directory objects, including Group Policy Objects (GPOs, which live on the file system, not in the directory) aren't protected by the Recycle Bin. The Recycle Bin also relies on AD itself being functional; if something goes wrong at the domain or forest level, you'll still need to have a backup made by other means.
So the new Recycle Bin feature can certainly be useful—but you need to understand its limitations before you rely on it, and you may still want to have third-party recovery tools in place for other scenarios and for ease of use. You'll certainly still want regular domain controller backups.
An entirely-new feature in Windows Server 2008 R2 is the Windows File Classification Infrastructure (FCI). This feature is designed to help administrators better manage file storage resources, enforce company policies regarding stored data, and so on. FCI is essentially designed to help classify the data on your file servers and to automate otherwise-manual processes using predefined policies that are based on the business value of your data. FCI is an infrastructure feature, meaning it provides a lot of ways for thirdparty vendors to "hook in" and provide features above and beyond what Windows includes natively.
Here's the basic problem FCI seeks to solve: Organizations would love to be able to clean up their file servers. But some data needs to be preserved for long periods of time, and today it's very difficult and time consuming to sort the "keeper" data from the "don't need it" data. FCI is designed to support predefined rules that help Windows automatically classify data, and then allow management processes—such as file cleanup and archiving, or security audits—to operate from the classifications.
Natively, R2's FCI helps classify files based on content and location. One classified, sensitive data might be moved or secured differently, backup solutions might prioritize highlyvaluable files over less-valuable ones within a backup window, or stale data might be automatically archived or deleted.
The native FCI capabilities are accessed through the File Server Resource Manager (FSRM) console, shown in Figure 21.
Figure 21: Accessing FCI through FSRM.
As you can see, classification starts with a list of classification properties. In this example, files can be classified as having personally-identifiable information (PII) or not, and can have a "secrecy" level applied. These properties essentially define the key aspects of information that might drive a business to make different decisions about the file: Files containing PII might be secured differently, or files with a high "secrecy" level might be backed up more frequently.
Next, rules are created to help automatically populate these properties for each file. Figure 22 shows the creation of a rule, where files in a particular location have a specific secrecy level applied automatically.
Figure 22: Automatic classification rules.
The content of files, rather than just their location, can also drive the classification. Figure 23 shows the Content Classifier being used to set the "PII" classification property.
Figure 23: Defining a Content Classifier rule.
Figure 24 shows the content that's being searched for—in this example, a regular expression that matches on US Social Security Number patterns.
Figure 24: Defining the content to search for.
Third parties can provide additional classifiers, and third parties can also use the FCI application programming interface (API) to apply classification properties or to read those properties—for example, an auditing solution might use these properties to prioritize the files that are included in a security audit.
Windows PowerShell v2 introduces a new form of remote management based upon the industry-standard Web Services for Management (WSMAN) and Microsoft's Windows implementation, Windows Remote Management (WinRM).
WinRM is a Web Services-based protocol, meaning it operates over HTTP. By default, this means it uses ports 80 and 443, although those port numbers are configurable. The WinRM service listens for incoming requests, then passes those requests to registered applications—including PowerShell. For security purposes, administrators can govern the applications that are allowed to register with WinRM. Essentially, WinRM replaces the older and more cumbersome Remote Procedure Call (RPC) protocol; WinRM offers easier compatibility with firewalls.
PowerShell v2 includes a set of cmdlets designed to configure and enable remoting through WinRM, and a set of cmdlets designed to establish sessions with remote computers. Once you have created an authenticated session from your local PowerShell instance to a remote instance, you can engage in two distinct management scenarios: 1:1 and 1:n.
A 1:1 scenario basically provides you with a remote interactive command-line window, not at all unlike SSH found on most Unix/Linux operating systems (OSs). A 1:n scenario allows you to invoke PowerShell commands and have them run on multiple remote computers in parallel, with the results being brought back to your computer. This makes multiplecomputer management virtually the same as single-computer management and makes it easier to manage even a highly-distributed IT infrastructure.
As mentioned in a previous tip, Windows Server 2008 R2's Server Core installation mode offers a new, easier way to perform the initial server configuration: the Sconfig utility. In some ways, Sconfig is kind of like a text-based, mini Server Manager—and it can be used to enable even greater management flexibility.
As Figure 25 shows, Sconfig can be run immediately after the Server Core installation completes and you log onto the console for the first time.
Figure 25: Running Sconfig.
The utility makes it easier to perform all but one of the major initial configuration tasks you need to do on any new server (more on that missing one item in a moment). You can join a domain, set the computer name (although computer name should really be the first item, not the second, since changing the name should occur before joining a domain). You can configure Windows Update, run a Windows Update check, and configure a variety of remote management options, which you should definitely do. In Figure 26, you'll see that I'm enabling MMC remote management, a task that also enables the necessary firewall exceptions on the server.
Figure 26: Enabling MMC Remote Management.
I also recommend allowing Server Manager Remote Management. A new feature in Windows Server 2008 R2's Server Manager console, Remote Management will enable a remote instance of Server Manager to connect to, and manage, your Server Core instance— making it vastly easier to examine roles and features installed on Server Core, for example. You can also enable Remote Desktop, as Figure 27 shows. Keep in mind that on Server Core, Remote Desktop only buys you a remote command-line window; it doesn't magically give you a full GUI to work with remotely. In fact, although I always enable Remote Desktop, I mainly use it for emergencies—I prefer to use remote GUI-based tools to connect to, and manage, Server Core installations.
Figure 27: Enabling Remote Desktop.
Finally, as Figure 28 shows, Sconfig even allows you to configure network settings for each installed network adapter. Configure a static IP or any other settings. (Although I frankly prefer to leave Server Core using DHCP and to instead configure a DHCP reservation in my DHCP server. That way if I ever re-install Server Core for some reason, I don't have to reconfigure the static IP—it'll just pick up the desired IP from DHCP).
Figure 28: Configuring network adapter settings.
It seems as if Sconfig will do everything you need, but you won't find an option on its menu for activating Windows, which seems like a pretty serious oversight. Instead, you'll still need to manually install your product key using Slmgr, as Figure 29 shows.
Figure 29: Installing a product key in Server Core.
After installing the product key, you'll have to activate Windows. If you're using a normal retail key, just run Slmgr –ato to initiate activation.
Sconfig is a big help, although it would be nice if it also handled the product activation.
If you thought Windows' Hyper-V was Microsoft's only foray into virtualization, you're in for a bit of a surprise. Microsoft is slapping the "v-word" on many different products and technologies—some of which have been around for years without anyone apparently realizing they were virtualization!
The "real" virtualization in Windows, Hyper-V is a type-1 hypervisor that's designed to emulate PC hardware for the purpose of running "guest" operating systems (OSs). Deriving from Microsoft Virtual Server—but in fact built in an entirely different way—Hyper-V is the basis for Microsoft's enterprise virtualization efforts. It competes with VMware's vSphere / ESX products and Citrix' Xen family.
App-V is designed to run on Windows client computers or on Terminal Services servers. It essentially allows you to create images of completely installed applications, then deploy those images rather than actually installing the application on each of your client computers. App-V creates a sort of sandbox or bubble around the application, preventing it from having a permanent impact on the client's file system, registry, and other resources, and protecting applications from conflicting with one another. Central management tools provide deployment, management, de-provisioning, and other functionality. App-V is available as part of the Microsoft Desktop Optimization Pack (MDOP), which is only offered to Microsoft customers who have purchased Software Assurance for their enterprise OSs.
Sort of a stripped-down Virtual Server, Virtual PC is Microsoft's workstation-grade virtualization software. Conceptually, it does the same thing as Hyper-V: Running guest OSs on Windows (or Macs). Under the hood, it's a very different type of hypervisor with lesser performance. It's useful for software testers and other employees who need to run an alternate OS on their client computer; Windows 7's "Windows XP Mode" is essentially a built-in Virtual PC running a preconfigured Windows XP guest OS. Virtual PC competes with VMware Workstation and similar products from Parallels.
Microsoft Enterprise Desktop Virtualization (MED-V) is also a part of the MDOP. It's designed to provide central management and control of Virtual PC images, enabling you to deploy, manage, and control these images. For example, subcontractors working in your environment might be given a corporate-standard Virtual PC image, which allows them to access corporate resources without joining their laptop or desktop computer to your domain. You can then control the security and use of that Virtual PC image.
Called presentation virtualization, Remote Desktop Services (RDS) used to be known as
"Terminal Services." It got a name change in Windows Server 2008, and is officially part of Microsoft's virtualization efforts now. Technically, it has always offered "virtual desktops," although it's "virtualization" in a very different way than, say, Hyper-V. RDS competes with Citrix in a way, and in a way is complemented by certain Citrix products.
"Virtualization" has taken on so many meanings—due in part to the word's popularity and marketing clout—that it has become an almost meaningless term, like "ActiveX" and ".NET" were back in their days. Suffice to say that Microsoft has a number of creative and useful products and technologies that "virtualize" something in some way; focus on individual solutions more so than the "v word."
For more than a decade, Windows administrators have suffered with the native Windows event logs. We've struggled to find relevant events to help us audit and troubleshoot our systems, we've hunted for the meaning behind obscure messages and event ID numbers, and we have tried to make a science out of a pretty raw and low-level store of information. Worse, the things aren't centralized, meaning you wind up hunting across multiple servers to find the information you need.
In Windows Server 2008, things are a little better. Sure, you get a fancy new user interface (UI) embedded within Server Manager (shown in Figure 30), but you also get some important new features.
Figure 30: New event log viewing in Server Manager.
Now you can create custom views, which contain filter and sort criteria that make it easier for you to repeatedly come back and find specific events. You might set up a view for events related to a specific application, for example. Figure 31 shows an example, using the builtin "Administrative Events" custom view.
Figure 31: Viewing events through a custom view.
Log data has been segregated out into more logs, helping break down information logically by product or technology. As Figure 32 shows, a fairly bare-bones Windows Server 2008 installation has dozens of individual logs; fortunately, those custom views can aggregate events from multiple logs, giving you a consolidated view if you so desire.
Figure 32: Multiple logs help categorize information better.
Event forwarding and subscriptions provide a syslog-like capability to forward selected events to a central Windows server for consolidation. It took more than a decade to get this feature in Windows, but you should be glad you have it! You can set it up through event subscriptions, allowing you to set up a central "log server" that aggregates all your logs. Figure 33 shows the configuration for a subscription, and you can see that you can even select specific events to be collected.
Figure 33: Setting up event subscriptions.
Not everything is perfect, though. In some cases, the old event IDs you're used to seeing are gone, replaced by event IDs that have had 4,096 added to their numeric ID. This was done to help make room for new event IDs, and can be frustrating until you realize what's happened.
Multi-site clustering is the official name of the improvement to Windows' built-in clustering capabilities that allows cluster nodes to exist in different geographic locations. The idea is that a dispersed cluster can survive even major facility disasters, like fire or flood, because the individual nodes are widely separated. Like other types of Windows Server clusters, multi-site clusters offer automatic failover when a node fails. Each site has at least one storage array, and nodes are connected to storage in such a way that each node can access the storage at their own site in the event of a communications failure.
Thus, the cluster nodes in Denver can access storage in Denver, and cluster nodes in Las Vegas can access storage in Las Vegas. The Denver nodes can function without talking to the Las Vegas storage—as would be the case in the event of a disaster there. The storage fabric must provide a way to mirror or replicate data between sites so that data in Denver is being replicated to Las Vegas and vice versa. This is the tricky bit, and you'll need thirdparty help to solve it, because Windows doesn't provide that replication functionality itself. However, some Microsoft applications do provide the needed capability—such as Exchange Server 2007's Continuous Cluster Replication feature.
Windows clusters have in the past relied primarily upon a shared storage resource that is always accessible to every node, so some changes needed to be made. Quorums, for example, have moved to the concept of votes. Each node gets a vote to decide which node gets to control which resources; a witness—a Windows server that isn't in the cluster— serves as an independent tiebreaker.
Think of a cluster that has Node 1 and Node 2 and an independent witness. Node 1 votes for itself, Node 2 votes for itself, and the witness votes for one or the other, breaking the tie. That vote determines which cluster node is "active" and which is on "standby" for particular resources. Ideally, the witness lives at a different site from the two nodes; whichever node site the witness can "see" must be alive and online, so it makes sense for that cluster node to be active within the cluster.
Multi-site clusters are a great new feature, especially for larger organizations with distributed data centers.
No matter how robust, reliable, and secure Windows Server 2008 is, we face risks of data loss. Systems administrators know all too well how often someone accidently deletes a file and then somehow manages to accidently clear the recycling bin as well. Data loss like that falls at the easy end of the spectrum of problems we face with protecting data on Windows Server 2008 platforms. At the other end of the spectrum, we have a big challenge when it comes to data loss: disaster recovery. How will the organization continue to function if critical applications are down because servers were destroyed in a fire, flood, hurricane, or some other natural disaster that might be a familiar threat in your area. Between the simple and the complex, we have a whole range of data loss risks:
We can quickly see from this list, the threats underlying the risk of data loss can be roughly grouped along two dimensions: intentional versus accidental and programmatic versus human action. Natural disasters are something of a special case, and we will discuss that in different terms. One final note about the grouping, the dimensions are not mutually exclusive. We could have a situation where a problem in an application, say a bug in a patch, combines with mistakes applying the patch by the administrator to create a compounded threat. Clearly, there is no shortage of ways in which our data can be lost.
The terms risk and threat are sometimes used in ways that obscure their precise definitions. A risk is a hazard or potential loss, such as the risk of losing data, having data tampered with, or having login information stolen.
A threat is a means for realizing a risk. A single risk can have many threats that can bring about the unwanted outcome. For example, someone could steal your online banking credentials by overlooking your shoulder and watching you type or by installing a key logger that captures keystrokes as you type.
Figure 34: The risk of data loss is due to multiple threats.
The first step in understanding the risk of data loss is to understand how these different threats operate to undermine the integrity and availability of our data.
End user errors lead to relatively isolated data loss: deleted files, corrupted records in a database, and overwritten files in a shared directory. These types of errors can be mitigated with access controls that limit delete and write privileges to only necessary users. In the case of application‐related errors, improving usability and prompting for confirmation for destructive changes can help reduce the risk of data loss.
Administrator errors are more difficult to prevent. If you've been in systems administration long enough, you probably have tales of mistakes that still make you cringe. One way to reduce the risk of data loss from administrator errors is to document procedures and use checklists to ensure the procedures are followed. By its nature, systems administration often requires us to perform unique tasks, such as applying a particular service pack; however, after you have performed this task a few times, you can develop a pattern that can be generalized enough to create a checklist of essential steps (not the least of which is creating a backup before you start).
The "blue screen of death" has been a well‐known phrase since the days of Windows NT. If an application made an error and did not properly trap for it and the operating system (OS) did not properly isolate the error, then it was time to abandon all hope and reboot. Today's application code and OSs like Windows Server 2008 are more resilient than their 1990s counterparts, but accidental data loss due to application error is still a problem. Not surprisingly, it is the complexity and interoperability of applications that create significant threats of data loss today. Consider some of the ways a Web application with a rich Internet application (RIA) interface may lose data:
Figure 35: Application stacks are becoming more complex; even browsers are collections of addons each of which can harbor application vulnerabilities that can lead to data loss.
Although there have been advances in some areas of application development, especially in the area of OS robustness, the additional complexity in today's applications harbor the potential for data loss.
Malicious software, commonly known as malware, comes in a number of forms, all of which can either directly or indirectly result in data loss if a server or client device becomes infected. Some typical types of malware include:
Malware can cause data loss either because the malware developers designed their code to destroy data or because the malware interferes with other operations resulting in data loss. Although data loss is a problem, a bigger concern with malware is loss of confidentiality with malicious code that steals files or logs keystrokes.
Disgruntled employees are nothing new. Some may not like their jobs, some may have psychological issues, and some might be looking for payback after a layoff or perceived unjust sanction. Concerns about insider abuse are probably more at the forefront of our minds during challenging economic times that inevitably lead to layoffs. The ways a disgruntled employee can cause data loss are limited only by their imagination.
One particularly difficult form is the logic bomb. This should be included with the list of malware, but we address it here because it is malicious code introduced by an insider. A logic bombs is code that is set to execute at some time after the code is introduced and will destroy, corrupt, or otherwise tamper with data or applications. The damages from a logic bomb can extend beyond the business or organization that is the initial victim. In 2008, a former systems administrator at a health services company was convicted of creating a logic bomb that would have destroyed virtually all information on the company servers, according to one report, including healthcare information (Source: Sharon Gaudin, "Medco Sys Admin Gets 30 Months for Planting Logic Bomb," Computerworld, January 8, 2008. http://www.computerworld.com/s/article/9056284/Medco_sys_admin_gets_30_months_f or_planting_logic_bomb).
At some point, the level of data loss caused by intentional human action crosses the boundary into a more disaster‐like situation. For example, if an arsonist succeeds in seriously damaging a data center, the level of data loss would approach that of a loss due to a natural disaster.
There are two aspects of data loss due to natural disaster that distinguish it from other threats of data loss: the scale of data lost and the accompanying loss of infrastructure. Natural disasters do not selectively target data, the way malware might, and it is not limited to a single application or database, the way an application error might be; natural disaster can wipe it all out. When considering how to mitigate the threat of data loss due to natural disaster, we must consider how we will provide temporary servers and other infrastructure to run critical applications.
Figure 36: Failover systems replicate data from a primary to a secondary system so that the latter can take over in the event of a failure in the primary server.
We also need to consider how long IT services can be down before there is significant adverse effect on the business or organization. If rapid recovery is essential, then we need to consider high‐availability solutions. With these systems, data is replicated from primary servers to standby servers. Primary servers may be monitored and if they fail, applications will failover to the standby server; in other cases, manual intervention is required to switch to the standby server. (We will have much more detail on high‐availability and data replication services in a future volume).
Once the threats that can lead to data loss are understood, we can devise a plan to mitigate the risk. Clearly, backups will play a role in data loss protection, but as we will see, there is much more to reducing the risk of data loss than simply making backups.
A recovery management strategy is a plan for reducing the chance of data loss due to any of the threats described in Tip, Trick, and Technique 17. With an overview understanding of the threats, how do we go about protecting our Windows Server 2008 servers and other infrastructure? It starts with a four step process:
At the end of the process, we have described the level of protection required to mitigate the risk of data loss balanced against the requirements and resources of the organization.
Think about all the different types of data in a typical midsize business. (The principles we develop apply equally well to non‐business organizations, but for simplicity, we'll use a business example here). There is transaction data about sales, customer details and account summaries, HR data about employees, data warehouses and executive reporting data, emails, documents, and other unstructured data. Now we need to ask, Is all this data equally valuable? Another way to think about it is, How would the business be affected if the data were lost?
From these examples, we can see three categories for data classification: critical, important, and optional. Critical data deserves the greatest level of protection (we'll define what that means in operational terms shortly), important data should be protected but not at the expense of critical data, and finally, optional data should be protected if possible but is of lower priority than the other types of data.
A key benefit of having a data classification scheme is that is allows us to prioritize how we commit resources to protecting data, and that priority is based on business, not technical, requirements. For example, a business may have two Windows Server 2008 systems running SQL Server; one is hosting an orders database and the other is used for a data warehouse. It is the type of data in the database, not the fact that the server is used for SQL Server, that determines its data protection priorities.
Data Classification Category
Data that is essential to the continued operation of the organization. If the data were lost, it would severely and adversely impact the organization.
Financials, Customer database
Data that is needed for normal operations. If the data were lost, it could be recreated with some effort. Its loss would not have an immediate adverse impact although long‐term loss would.
Data warehouse, Marketing data
Low value data that would not adversely impact the organization if it were lost for an extended period. Is easily recreated at low cost.
Copies of publically available data (for example, census data used in marketing)
Table 1: Data classification schemes partition data by value and impact on business operations.
Just as some data is more important than other data, some servers are more important to a business or organization. To identify which servers are most important, we need to understand what functions the server carries out, in terms of business processes, and what data the server stores. Often, but not always, there will be overlap; critical business data resides on critical servers. This is not always the case. For example, a development server is critical to a software development group but it does not (or at least should not) have any critical organizational data stored for anything but development purposes.
The first step in identifying critical servers is to create a high‐level map of where different types of data reside. For example, servers could be labeled as storing critical, important, or optional data as well as a combination of multiple types. In the case of multiple types, the server should be considered as having the higher priority category of data.
Figure 37: Servers are considered critical if their core function is a critical business function or if another critical server is dependent on them.
Consider a simple scenario of a small business or a department within a larger organization. There are several applications running on an application server and all of those systems are considered critical. That adds the application server to the critical list. But we can't stop there. The applications running on that server depend on a SQL Server database that is hosted on another server. Authentication to the applications depend on an Active Directory (AD) server on yet another server. Finally, the applications are of little use without the Web interface that allows users to interact with the application. What started as one critical server quickly became four because of application dependencies.
When creating these prioritized lists of servers, it helps to have someone with knowledge of the application architecture to help run down all the dependencies. The last thing any of us want is to get into a disaster recovery situation only to learn we missed a critical dependency.
Of course, when we speak of data on the server, we really mean data that is logically managed by that server. The data may actually reside on a storage array that is shared by multiple servers. From the perspective of protecting against data loss, that does not matter. If the server is down, the data it manages is not readily accessible even if the storage array is functioning.
If we conducted a survey and asked IT and business professionals to define the critical applications used in their business, we would probably get many answers about sales order systems, customer relationship management systems, financials, and other back‐office applications. These certainly fit into the critical category, but they do not cover the full spectrum of essential systems. Take messaging, for example. Many of the applications listed make minimal use of email services yet email systems are essential in many organizations. We just have to ask how long we could continue to operate without a functioning email system. Chances are it would be longer than if our online sales system was down, but we would not want to go long without email. This demonstrates the point that critical systems come in many forms.
Depending on your organization and its dependence on email, an Exchange Server might be considered critical or important. Important servers, like the category of data classification, indicate a lower priority than critical servers. Some examples of important servers include:
At the end of this exercise, we have a breakdown of the types of data and servers by criticality. This allows us to organize the servers based on their importance to essential operations. There is just one more step before we can create a summarized, consolidated report of our recovery management needs that will allow us to define an informed set of backup and disaster recovery procedures.
RPO and RTO are a couple of terms that are frequently used when discussing recovery management, backup, and recovery. Let's start with a couple of definitions. RPO is the maximum amount of data that can be lost expressed in the time from a data loss event to the time of the last backup. For example, an RPO of one day means we can accept the loss of one day's worth of data. RTO is the maximum amount of time that data or systems may be unavailable while restoration or disaster recovery procedures occur. If we must have a database backed up and fully restored within one hour of a failure, then we have a one hour RTO.
The last step in putting together the pieces required to formulate a recovery management strategy is to define the RPOs and RTOs for each server or application. Table 2 shows an example of a summarized report with RTOs and RPOs assigned.
Human Resource Mgmt
Human Resource Mgmt
File Server 1
File Server 2
Table 2: Example summary assessment of data and server classifications and associated RPOs and RTOs.
In this example scenario, we have limited ourselves to two levels of RPOs and RTOs. In practice, we should craft these objectives as the business requirements demand, but we need to balance those with management considerations. The more variation we have, the more backup policies we will need to define and manage. Remember, complexity is often the enemy of reliability. Keeping backup schedules as simple as possible, but not simpler, can help reduce management overhead and the potential for errors.
Let's use the information we've compiled here to help formulate a disaster recovery policy.
RPOs and RTOs define how long we can be without critical and important systems and their data. These objectives guide our decision making when it comes to deciding a range of issues regarding data loss prevention:
This information also guides us in disaster recovery, especially with regard to how long we have to restore services. That, in turn, influences our choice of architectures for implementing disaster recovery. Key questions that arise in disaster recovery include:
Depending on the answers to these questions, we can formulate the disaster recovery plan. At the highest levels, a disaster recovery plan will document the following:
Figure 38: In disaster recovery mode, multiple virtual servers can be hosted on a single physical host, reducing the cost of maintaining disaster recovery infrastructure.
This tip, trick, and technique has outlined the basic building blocks of a recovery management strategy; unfortunately, analyzing the parts does not always give a comprehensive picture of the whole. One aspect of recovery management that was not addressed here is security, so we will turn to that next.
We expend a lot of effort to keep our data secure. We set up access controls, implement authentication mechanisms, and limit privileges to reduce the risk of someone tampering with data or accessing data they should not see. Many of the mechanisms we use are not able to protect data once it moves from the servers that normally house it to backup media. For example, file‐based operating system (OS) access controls do not limit access to files in a backup set. The need to protect against data loss has implications that conflict with our need to protect the confidentiality of data.
There are a few key drivers for the need for confidentiality of data. Depending on the type of business or organization, there may be regulations that proscribe levels of privacy protection that should be ensured for personal information. Healthcare and financial services industries are obvious examples where such is the case. Even in industries without well‐defined regulations, there are still incentives to preserve the privacy of customer or client data. A well‐publicized data breach can damage a business' image, lead to the loss of customers, and eventually impact the bottom line. Heartland Payment Systems and the TJX Companies, Inc. received quite a bit of press about their record‐breaking data breaches in 2009 and 2007, respectively. Intellectual property is a particularly important target in some industries in which high research and development costs provide incentive to steal rather than develop intellectual property.
The Open Security Foundation maintains the Data Loss Database at http://datalossdb.org/. The site has a wealth of information about data loss incidents that may be useful if you need statistics to justify a business case for the need for information security. For threats to intellectual property, see Kim Zetter's "Report Details Hacks Targeting Google, Others," Wired, February 3, 2010 (http://www.wired.com/threatlevel/tag/apt/), and the Christian Science Monitor's "US Oil Industry Hit by Cyberattacks: Was China Involved," January 25, 2010 (http://www.csmonitor.com/USA/2010/0125/US‐oil‐industry‐hit‐bycyberattacks‐Was‐China‐involved).
Security breaches come in many forms, including the loss or theft of backup media. Confidential and private data should be encrypted when it is backed up and the backup media leaves the control of the organization. With encryption, even if the media is lost or stolen, there is little chance the data will be compromised.
One other point to keep in mind about encryption is that the definition of strong encryption changes over time. Use strong encryption algorithms and long encryption keys to maximize the protection provided by encryption.
Systems administrators are the last people who need to be told the volumes of data are growing at staggering rates and will probably continue on the same trajectory. As they are the ones responsible for keeping up with this growth, it's worth taking a look at where all this data is coming from. After all, computers are nothing new; they have been running business applications since the 1950s. What is it about today's use of information technology that is generating such high growth rates? The answer is that there is no single culprit; rather a confluence of technical and organizational issues drives this growth. Some of the most important drivers contributing to this phenomenon are:
Systems administrators will have influence on some of these drivers, such as new applications, while other areas, such as compliance, have more rigid requirements that may not leave much room for optimization.
The days of business running a limited number of back office applications are over. Of course, pretty much any business will be running financial packages that track revenues and expenditures along with whatever form of sales they may have—that is, products or services. Any but the smallest will likely have some kind of customer relationship management, human resources, and inventory management package as well. These will often generate a fairly constant rate of data or grow in proportion to the business activity. These kinds of applications do not generate significant growth in data—that comes from other data‐intensive applications.
Data‐intensive applications come in many forms, including those that capture detailed interactions with customers, instrumentation, data analysis applications, and content management systems. We will consider each of these in turn.
More and more interactions with customers are being tracked. In the past, we could track customer interactions at a point of sale. For example, when we shop at a national retailer, the business captures their first pieces of data about us at the point of sale system. At that point, we are done shopping, we have a full cart, and we are ready to pay. The retailer can capture information about:
That is a relatively small amount of data compared with what could be gathered through an online catalog. If we were to shop at the same retailer's Web site, the list of data elements that could be tracked would grow to include:
If we were to multiply the size of the new data by the additional number of customers that come to Web sites over retail stores, we would start to get a sense of how much additional data can be generated.
Tracking customer interaction in detail is only useful if we do something with that data, and that is where business intelligence and analytics come in.
Business intelligence and analytics are applications designed for internal use. Customers reviewing product offerings, checking the status of orders, or making purchases work with online transaction processing systems. These are optimized for rapid response to high volumes of concurrent users. Business intelligence systems are a different breed of application.
Figure 39: Business intelligence environments duplicate data found in transaction processing systems.
Business intelligence systems are designed for managers, analysts, and others who need to delve into data and make comparisons across time, products lines, sales regions, and so on. For example, if you want to know how sales in the Southeast sales region are doing this quarter compared with the same time last quarter, you would use a business intelligence system. Similarly, if you wanted to find branch offices with the poorest revenue to expense ratio, you would use a business intelligence system. The problem from a data storage perspective is that the business intelligence systems duplicate the data found in transaction processing systems using operations known as extraction, transformation, and load (ETL) processes.
Why duplicate data? After all, if is already in the transaction processing system, why not use that? A complete answer is beyond the scope of this topic, but the quick answer to that question is:
In addition to traditional business intelligence reporting, there is a growing use of statistical analysis and data mining techniques known collectively as business analytics. These applications are consumers of data—and the more data, the better in some cases. Like data warehousing, they require data in a particular format that does not usually correspond to the way online transaction processing systems structure data. As a result, data is copied from source systems and reformatted into a format more amenable to analysis.
Business intelligence and analytics are formalized processes that duplicate data; information practices contribute to data duplication as well. Microsoft Excel and other spreadsheets are something of a double edge sword for data warehouse designers. On the one hand, it is convenient to have the option of exporting data from the data warehouse into a spreadsheet so that users can take advantage of the additional features of the application. On the other hand, users take advantage of this. Data that was originally in an online transaction processing system is now in a data warehouse and some unknown number of spreadsheets in user directories. Of course, some of these will be emailed to multiple recipients who may potentially save their own version.
Figure 40: Useful data warehousing features, such as the ability to export to spreadsheets, can quickly become a means to duplicate data many times over.
Data‐intensive applications are significant drivers behind the growth in data volumes due to both improved methods for collecting data and the need to duplicate data to meet multiple needs. These examples focus on what is typically known as structured data. They tell only part of the story.
Unstructured data is data that does not fit into well‐defined data structures, such as database tables or spreadsheets. Free‐form text, audio, and video are all examples of unstructured data. Unstructured data is ubiquitous in today's organizations with sources including:
In addition to the fact that many of us generate unstructured data on a daily basis, we are constantly duplicating it. When we reply to an email message and embedded the original text in our response, we create more unstructured data. When we save attachments as personal copies, we add to the growing volume of unstructured data. The Web makes it easy to bring in additional data from outside the organization as well. Find an especially useful article? You might save a local copy so that you do not have to search again or risk having the site remove the content. The rate at which we create and duplicate unstructured data is yet another driver behind the growth in data volumes.
Calling text unstructured is something of a misnomer. Linguists study the structure of natural languages and can describe their complex structures in detail. If anything, natural language is highly structured. For most IT needs, though, we can safely ignore the structure of natural language. Instead, we treat the entire text as a single object and do not delve into the structure within.
To appreciate the importance of unstructured data, we only have to consider how our organizations would function without email, shared directories, or SharePoint servers. Applications such as these can be just as business critical as application servers and databases. Both structured and unstructured data can be subject to yet another factor in data growth: compliance.
Regulatory compliance and other legal drivers, such as e‐discovery, are shaping the way we generate, store, and archive data. Regulations such as the Sarbanes‐Oxley Act (SOX), the Health Insurance Portability and Accountability Act (HIPAA), and others define certain requirements with regard to how businesses report on their financial status and protect customer privacy. A common aspect of many regulations requires businesses to not only comply with the regulation but also be able to prove that they are in compliance. To do so, they must document, in detail, with policies, procedures, documentation and audit trails.
At first glimpse, this documentation requirement may sound simple and not terribly data intensive; however, in many cases, the level of detail required can result in significant data generation. For example, consider the type of events that may have to be logged:
A related driver is known as e‐discovery. During legal proceedings, a company may be required to produce electronic documents, such as emails and word processing documents, relevant to the case. In the past, companies that have been unable to produce those documents have been subject to severe fines. In 2008, Qualcomm was fined $8.5 million dollars for e‐discovery violations (Source: Kristine L. Roberts, "Qualcomm Fined for 'Monumental' E‐Discovery Violations—Possible Sanctions Against Counsel Remain Pending" at http://www.abanet.org/litigation/litigationnews/2008/may/0508_article_qualcomm.html). It does not take many such examples to motivate businesses to retain and catalog electronic communications.
Data‐intensive applications, both those that are designed to capture and generate data as well as those designed to analyze it, are significant contributors to data volume growth. Unstructured data is easily created and duplicated, further contributing to that growth. If these factors were not enough, compliance and e‐discovery concerns are prompting businesses to preserve data and to maintain it longer than they might otherwise.
Who is responsible for managing the growing volumes of data? It is a shared responsibility of the business owners, who are responsible for setting policies and procedures governing the generation, use, and destruction of data; application managers, who are responsible for maintaining their applications and ensuring they function as required; and systems administrators. In many ways, it is the systems administrator who is on the front line of managing data in an organization.
Some of the key responsibilities of systems administrators, with respect to growing volumes of data, is keeping up with
These responsibilities range from the mundane but essential, such as setting and verifying access controls on files, to making recommendations on the use of emerging technologies, such as cloud computing, to accommodate even more data.
Backup and recovery procedures are standard operating tasks for systems administrators, but these tasks become more difficult with growing volumes of data. In particular, systems administrators have to grapple with:
Part of the solution is to understand what has to be backed up and how frequently; a related part is to understand how long different types of data have to be kept. Information life cycle management practices can help here; they are discussed in more detail in Tip, Trick, Technique 22.
Most of the difficulties previously listed can be at least mitigated with deduplication technologies. Backup vendors are incorporating deduplication technologies in their software packages to combat the problem of growing data volumes. The basic idea behind deduplication is that data is often duplicated and rather than storing multiple copies of identical data blocks, a backup can be constructed using a single copy of such data blocks and references or pointers back to that copy.
Security concerns can be distilled to three words: confidentiality, integrity and availability. In the case of confidentiality, security comes down to the question: How do we ensure that private and sensitive data is accessed only by those with legitimate reasons to have it? To maintain the integrity of data, we have to ensure that only processes that follow established protocols can change data. This translates into a question of How do we ensure that no one tampers with data, for example, making an unauthorized change to a revenue statement or delete entries from a system log file? Availability is a bit different from the other two security fundamentals. In one sense, systems administration is all about ensuring availability. In a security context, though, it has more to do with preventing adverse events from disrupting services (think Denial of Service—DoS—attacks) but also includes recovering from adverse events (recovering from backups made prior to a malware infection).
The challenge of growing volumes of data has not introduced security responsibilities; it has only made them more difficult. Let's just consider some of the ways that increasing volumes of data can tax policies and procedures.
One of the ways we protect confidentiality and integrity of data is with the use of access controls. These consist of three parts:
Figure 41 shows the basic security dialog on the Windows file system.
Figure 41: Windows supports several types of privileges on files, which are used to preserve confidentiality and integrity of file data.
With growing volumes of data, these simple building blocks for protecting confidentiality and integrity become more difficult to apply, track, and monitor. Some of the reasons for this include:
Ironically, with growing volumes of data come growing challenges to protecting the data in the first place.
With more data comes more servers, more storage, and potentially more applications and this ultimately leads to more potential points of failure. Recovery management practices, such as backups and disaster recovery planning, can mitigate the risk of losing data to a hardware failure, human error, natural disaster, and in some cases to malicious attack or other security breach. Another area that should be considered is application vulnerabilities (this applies to protecting confidentiality and integrity as well).
As noted earlier, one of the reasons for growing data volumes is new applications, both customer‐facing applications and internally‐oriented systems such as decision support applications. Each of these new applications increases the options available to malicious attackers looking to either steal private and confidential data or disrupt services. This is called "increasing the attack surface" in security parlance and it essentially means the more applications and the more complexity, the more opportunity for vulnerabilities.
One thing we should understand is that attackers do not need detailed knowledge of our applications (although that helps). Automated vulnerability scanning tools can be used to detect vulnerabilities to well‐established attack methods such as cross‐site scripting attacks, which exploit weaknesses in Web applications to compromise them. Another class of tools, known as fuzzers, probe application programming interfaces looking for exploitable errors. Fuzzers, for example, can generate random input of varying sizes to detect unhandled errors in applications accepting user input.
It's real. The days of cyber‐vandalism look benign in retrospect. Identity theft and credit card fraud are real threats, but a larger, more costly threat is to businesses, government agencies, and other organizations with valuable sensitive information. Recent Congressional hearings on threats to cyber security summarized the situation as "computer‐based network attacks are slowly bleeding US businesses of revenue and market advantage" (Source: Elinor Mills, "Experts Warn of Catastrophic Cyberattack" at http://news.cnet.com/8301‐27080_3‐10458759‐245.html). Businesses are now facing the kind of sophisticated, long‐term attacks once limited to governments; see "Report Details Hacks Targeting Google and Others" at http://www.wired.com/threatlevel/2010/02/apt‐hacks/ for a glimpse into the world of advanced persistent threats.
The growing volumes of data increase the amount of data to be protected while likely be accompanied by an increasing number of applications for manipulating that data. For systems administrators, this means two things:
Systems administrators not only have to protect the growing volumes of data but also help control that growth by utilizing information life cycle management practices.
Figure 42: Increasing amounts of data combined with increasing numbers of applications expand the opportunities for exploiting existing threats to compromise confidentiality, integrity, and availability of data.
With the growing volumes of data naturally comes a response from those who are tasked with managing it. The term "information life cycle management" is used to describe an array of management practices designed to rationalize the process of creating, collecting, storing, and in some cases destroying data. Information life cycle management makes use of tools and technologies, such as backup and recovery software and information classification systems, but it is primarily a business practice.
In its most basic form, information life cycle management answers key questions about data within an organization:
To answer these questions, we have to look at the business case for keeping data. There are several important drivers:
These drivers show the range of reasons we generate, store, and maintain data. Now let's turn our attention to implementing an information life cycle management practice.
The first step in information life cycle management is differentiating data so that we can treat data according to its value to the organization. For example, confidential engineering diagrams may need to be encrypted when they are backed up; whereas, the contents of the public Web site do not. Classifying data should include determining
Classifying data would be a tedious and costly operation without automation. Fortunately, with Windows Server 2008 R2, additional capabilities in File Sever Resource Manager provide the tools we need to classify data efficiently. Windows Server 2008 R2 includes the File Classification Infrastructure (FCI), which can be used to classify data file attributes such as:
These properties can be used to execute particular commands based on the classification. For example, if a file is on a high‐performance disk array but has not been accessed for more than 2 years, it may be moved to a slower, less expensive disk archive. In addition, the FCI provides reports along with the ability to apply policies according to classifications.
This step is also part of a security management process. The goal here is to ensure that only users with legitimate business need for data have access to that data. This information can be used to
The next step is defining recovery requirements for data. This includes defining Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs). The RPO specifies the point in time in which data should be recoverable, such as any day in the past week or any week in the past month. RTOs define how long the recovery operation should take. Critical systems may have short recovery windows, such as a few minutes, while other relatively stale systems may not be needed for days.
With so much emphasis on protecting data from tampering and keeping multiple backup copies so that we can restore, it is easy to forget about destroying data. Let's face it, not every email we write, spreadsheet we put together, or database we compile is worthy of study by some future archeologist. It often is not worth keeping after a few years. Some examples of data that can be purged include:
The first four steps address the organizational factors of information life cycle management. The fifth step focuses on implementing those policies.
The FCI mentioned in Step 1 for classifying data can also be used to enforce policies. (Policies are essentially rules that fit into a pattern of "IF a certain set of conditions are met, then execute this script.") Policies themselves have to be managed, so once they are defined in the FCI, be sure to:
When dealing with unstructured data, policies may be applied in unintentional ways. For example, a policy may specify that any document with word "confidential" in the title be categorized as proprietary information. This rule would apply to a document entitled "Introduction to Data Classification Policies for Documents Ranging from Public to Confidential."
Information life cycle management is no panacea for the challenges of data growth. At best, these practices will help:
A couple of problems will continue to challenge information life cycle management. First, automated classification techniques are not foolproof and their results should be reviewed. Second, data is easily copied both informally to employees' directories and workstations and formally to backups and failover servers. Destroying old data may never be 100% successful; copies may linger for years in unexpected places.