Backing Up the Exchange Server 2010 Environment : Understanding the Importance of Backups & Establishing Service Level Agreements

2/14/2011 8:55:00 AM

Understanding the Importance of Backups

Through various improvements and changes in the JET database engine and storage, Microsoft Exchange Server 2010 offers the most stable and resilient database of any Exchange Server implementation to date. The database can recover from dirty shutdowns, hardware failures, and power outages. The database enables both users and administrators to recover recently deleted items. Exchange Server 2010 even introduces new replication options that result in up to 16 independent copies of mailbox data spread across the world. However, even with all this functionality, it is still necessary to perform backups of the Exchange server to address long-term archival, legal discovery, and protection against malicious attacks on the Exchange Server environment.

Traditionally, backups are performed and maintained for three primary purposes:

  • Recovering deleted items past the retention period

  • Offline extraction of messages

  • Disaster recovery

In modern environments, a fourth common purpose can be added with the purpose of electronic discovery. Legal departments regularly need to access historic email for groups of users to utilize in legal proceedings. Because maintaining a deleted item retention of multiple years isn’t realistic, it falls to the traditional backup to provide this data over the years. To support these functions, it is critical to not only perform the regular backups, but to also understand what it is you are backing up, how often you are backing it up, and exactly what recovery scenarios you can support. It is equally critical to ensure that a retention policy is clearly defined and that it is supported by both information technology and the legal departments within a company. The job of IT is to use technology to support and enforce the retention policies set forth by legal. This means not only ensuring that enough data is available, it is also making sure that no data beyond that which is allowed is still resident in the environment.

The goal of this chapter is to show an administrator how to do the following:

  • Evaluate their needs for backup

  • Capture all the necessary information for disaster recovery

  • Properly document their environment

  • Determine a reasonable service level agreement (SLA)

  • Design their backup strategy to support that SLA

  • Build policies and procedures around backup processes

  • Determine what data to back up

  • How to take advantage of new backup technologies available in Exchange Server 2010


Exchange Server 2010 running on Windows Server 2008 does not provide any native backup functionality. Exchange Server 2010 supports only Volume Shadowcopy Services (VSS)-based backup technologies. This means that Exchange Server 2010 requires a separate product to perform backups. This can be at the storage level, such as NetApp’s SnapManager for Exchange Server, or it can be at the application level, such as Microsoft’s SCDPM 2010 or Symantec’s NetBackup.

Establishing Service Level Agreements

The most common question from Exchange Server administrators is “How should I be doing my backups?” The answer to this question is quite simple. You should be doing them so that they support your service level agreements around recoverability and retention for Exchange Server services.

Based on this concept, it quickly becomes apparent that the first step in planning out your backups is to determine exactly what you’ve committed yourself to. This is commonly referred to as a service level agreement or simply an SLA.

Establishing a Service Level Agreement for Each Critical Service

Exchange Server 2010 is often deployed so that roles are distributed across multiple servers. This distribution of roles might vary from site to site. However, the SLAs will likely remain constant across the enterprise as the goal is actually to keep messaging alive and available to the end users.

It is important to understand the implication of SLAs for each aspect of Exchange Server 2010 because the SLA drives your design and must be considered upfront and not as an afterthought to a deployed Exchange Server 2010 environment. Too often, IT groups implement Exchange Server and later go back to determine how quickly they can restore services or rebuild a failed system. The correct methodology is to determine recovery time objectives and uptime goals and then design the architecture to enable those goals.

Determining SLAs for Mailbox Servers

One of the most important aspects of Exchange Server 2010 is the mailbox server. If the mailbox server isn’t up, users can’t access their mail. This is usually the first thing that triggers the help desk phone to ring. Most companies start their SLAs around the mailbox servers. In most environments, a two-hour recovery for a mailbox database is acceptable. This means that if your database fails, you need to recover that data within two hours. If you know that your system is capable of restoring 100GB of data per hour, you know that, based on your backup process, you can support only 200GB per database.

If your SLA for an entire mailbox server recovery is four hours and you know that it takes two hours to rebuild a new server with Exchange Server 2010, you have only two hours to restore data; based on the preceding example, this means you can have only 200GB of data on the server. If you planned to allow users 2000MB of storage each, this limits the server to 100 users. If you want to support more users per server, you either need to alter the SLA or you need to change your backup strategy to allow you to restore more data in the same period of time. This is what enables you to safely support large numbers of users with good SLAs. This is where you have to balance the costs of the backup/restore system with the cost of adding additional servers.

Luckily Exchange Server 2010 offers technologies that enable you to run a significantly tighter SLA. For example, Database Availability Groups enable a replica server to take over for a failed server within a matter of minutes. This would be a prime technology to implement if you have an SLA that allows mailbox access to be down for only a number of minutes.

Determining SLAs for Client Access Servers

Another major component of Exchange Server 2010 is the Client Access server (CAS). These are the systems that enable mobile devices and web browsers to access users’ email. In Exchange Server 2010, the functions of the Client Access servers are greatly extended. Exchange Server 2010 utilizes MAPI on the middle tier that puts a much greater importance on CAS roles always being available. When determining SLAs for this function, it is helpful to view the service and the servers as two entities. Although you likely want high availability on the service, you can likely worry less about the servers individually if they are designed with redundancy in mind. So, if you have at least one more Client Access server than you need for performance purposes, you have plenty of time to rebuild one server if it fails because there is already another that is taking up the load. Keep this in mind when designing your Exchange Server environment. Also keep in mind that the data on a CAS is mostly static. Building a new CAS is often faster than restoring an existing one.

Determining SLAs for Edge Transport Servers

For systems such as the Edge Transport servers in Exchange Server 2010, it is more useful to view the SLA for this role as being for the service as opposed to the servers themselves. In the case of Edge Transport servers, the service they provide is to send and receive external email to and from the Internet. In this sense, most companies try to enforce a fairly aggressive SLA on the service itself. For example, if Internet mail connectivity were to fail, they’d want the service restored within one to two hours. In most environments, this is fairly easy to accomplish because there is typically two or more Edge Transport servers to provide redundancy and minimize wide area network (WAN) traffic. In the case of the SLAs on the servers themselves, typically a one-day recovery is acceptable. Because the Edge Transport servers don’t hold any unique data, they can easily be replaced a failure occurs.

Remember that the Edge Transport service is dependent on the network itself. If the Edge Transport servers are running but the Internet connection is down, they can’t do their job. One easy way to improve availability and thus support a tight SLA for Edge Transport is to have multiple entry points from the Internet. This can protect against Internet or Internet service provider issues by enabling Internet mail to enter from another location and simply ride the corporate WAN to reach the appropriate Exchange Server 2010 server. The simplest way to do this is to advertise multiple Mail Exchanger records (MX) in Domain Name Services (DNS) on the Internet.

Determining SLAs for Hub Transport Servers

The role of the Hub Transport server is to transfer mail from one site to another connected site. As such, when a Hub Transport server fails, the site it served is effectively cut off from other sites. Moreover, because the architecture of Exchange Server 2010 requires that all messages first pass through a Hub Transport, if this role were unavailable in a site, users cannot send to each other even though they are hosted on the same mailbox server. As such, a company would most likely want a fairly aggressive SLA on the Hub Transport servers. In most environments, the Hub Transport server role is combined with other roles because, in most cases, it won’t justify being on an isolated server. As such, the SLA for recovery is often overwritten by the SLA for another role that it supports. As such, it is recommended that, when possible, two or more systems per site should host the Hub Transport server role.

  •  Making the Best Use of SAN/NAS Disks with Exchange Server 2010
  •  Optimizing an Exchange Server 2010 Environment - Properly Sizing Exchange Server 2010
  •  Optimizing an Exchange Server 2010 Environment - Analyzing and Monitoring Core Elements
  •  SharePoint 2010 : Beyond Built-In SharePoint PowerShell Cmdlets
  •  SharePoint 2010 : Understanding Advanced PowerShell Topics
  •  Optimizing an Exchange Server 2010 Environment : Monitoring Exchange Server 2010
  •  Optimizing Exchange Server 2010 Servers
  •  Business Intelligence in SharePoint 2010 with Business Connectivity Services : Consuming External Content Types (part 3) - Business Connectivity Services Web Parts
  •  Business Intelligence in SharePoint 2010 with Business Connectivity Services : Consuming External Content Types (part 2) - Writing to External Content Types
  •  Business Intelligence in SharePoint 2010 with Business Connectivity Services : Consuming External Content Types (part 1) - External Lists & External Data
  •  Optimizing an Exchange Server 2010 Environment : Analyzing Capacity and Performance
  •  Examining Exchange Server 2010 Performance Improvements
  •  Recovering from a Disaster in an Exchange Server 2010 Environment : Recovering Active Directory
  •  Business Intelligence in SharePoint 2010 with Business Connectivity Services : External Content Types (part 3) - Creating an External Content Type for a Related Item
  •  Business Intelligence in SharePoint 2010 with Business Connectivity Services : External Content Types (part 2) - Defining the External Content Type
  •  Business Intelligence in SharePoint 2010 with Business Connectivity Services : External Content Types (part 1)
  •  Recovering from a Disaster in an Exchange Server 2010 Environment : Recovering from Database Corruption
  •  Recovering from a Disaster in an Exchange Server 2010 Environment : Recovering Exchange Server Application and Exchange Server Data
  •  Recovering from a Disaster in an Exchange Server 2010 Environment : Recovering from a Complete Server Failure
  •  Sharepoint 2007: Add a Column to a List or Document Library
    Top 10
    Primer – Choosing And Using Peripheral Buses (Part 2)
    Primer – Choosing And Using Peripheral Buses (Part 1)
    SanDisk ReadyCache 32GB - Caching Solution SSD
    Windows 8 Tips And Tricks – Jan 2013
    Lenovo IdeaPad S400 - Stylish And Affordable Laptop
    Nokia Lumia 920 - Super Smart, Super-Size Handset
    Optimus L9 - The Nicest Phone In LG's 'L' Line
    Bits Of Bytes
    Happy iMas (Part 2)
    Happy iMas (Part 1)
    Most View
    Designing and Configuring Unified Messaging in Exchange Server 2010 : Unified Messaging Architecture (part 3)
    Use a Stopwatch to Profile Your Code
    Synchronizing Mobile Data - Using Merge Replication (part 2) - Programming for Merge Replication
    # BlackBerry Java Application Development : Networking - HTTP Basics
    iPhone 3D Programming : Vector Beautification with C++
    Capacity Efficiency - Create Sustainable Storage and Mitigate Rising Costs
    Memory Management : Force a Garbage Collection, Create a Cache That Still Allows Garbage Collection
    Enhance Images with iPhoto on iPad
    Windows 7 : Zero Touch Installations - Understanding Configuration Manager
    Windows Phone 7 Development : Debugging Application Exceptions (part 2) - Debugging a Web Service Exception
    Richard Cobbett: Publish and be damned
    The SQL Programming Language : Creating Tables and Entering Data
    System Builder - The Future Of USB
    Low- Pass Filter Removal (Part 1)
    Silverlight Recipes : Save a File Anywhere on the User's System
    Aquacomputer Airplex XT 240
    Windows Server 2008 and Windows Vista : Working with GPOs - Group Policy Results
    Exchange Server 2007: Administrate Transport Settings - Work with Accepted Domains
    CMS Revolution (Part 1)
    Windows Server 2003 : Active Directory Troubleshooting and Maintenance