Exchange Server 2010 : Operating Without Traditional Point-in-Time Backups

10/10/2010 10:06:18 AM
An Exchange infrastructure that uses Exchange Native Data Protection is an infrastructure in which you don't need to perform traditional point-in-time backups.

The intention to move toward an infrastructure without traditional point-in-time backups is mainly cost-driven—you can save the money for expensive backup solutions and storage facilities for tapes or disks. To deploy an infrastructure without traditional point-in-time backups, you need to consider the following:

  • Create multiple database copies using DAGs to protect your database by having it copied to different Exchange servers—maybe even to different datacenters—to reduce the risk of losing a database because of a malfunction such as a disk crash.

    Note: As a best practice, you should create at least three database copies if you store these copies on just a bunch of disks (JBOD); at least two database copies are required if you are using additional data protection on your disks such as RAID. Lagged database copies should be considered additionally.
  • Provide deleted item recovery by implementing Single Item Recovery and hold policy so that you can recover changed or deleted items on user request. Traditionally, recovering items required a brick-level backup or a full database to be restored that you don't need anymore.

  • To protect your databases from logical corruption, implement a lagged database that replays log files after a delay of up to 14 days.

If you consider implementing all these areas, you're on the best way to implementing a backup-less infrastructure. However, you should not forget about Public Folders, which are not covered in this concept. Public Folder databases can be replicated to multiple servers; thus you maintain multiple copies of them in your environment. But what happens if somebody deletes a Public Folder item or folder by mistake? As mentioned in the previous section, you can use deleted item retention to recover deleted items or folders. The big question here is whether that solution is sufficient for your organization. If your answer to that question is no, you might still need a third-party backup solution for your public folders.

Notes From The Field: An Exchange 2010 Implementation Without Traditional Point-in-Time Backups

Sascha Schmatz

Global Service Manager Messaging, Quimonda AG, Germany

We implemented a fully backup-less solution with Exchange 2010. My company has 12,000 employees spread over 12 locations (North America, Europe, and Asia). All Exchange 2010 servers are centralized at a single site located in Germany. To prevent problems from going backup-less, we defined the following corporate policies for the messaging service:

  • A user can recover deleted messages for 14 days; older deleted messages are no longer available. VIPs get 30 days, but this is only available for fewer than 100 users.

  • Public Folders are only used for Free/Busy Information and nothing else.

To realize a backup-less messaging system, we implemented DAGs with two database copies and one lagged database copy (lag time: 14 days) on a RAID. The database copies are stored on different storage systems in the same location—thus we make sure that a disk failure does not affect all copies at the same time. Because we do not run any backup, we enabled circular logging for all databases. We're also using a lagged database to protect us from logical database corruption. For that reason we can very easily give up doing backups.

For the single item restore, we configured Single Item Recovery to 14 days and enabled it for every mailbox. This increased user satisfaction; users previously had only four days of deleted items to recover. Using DAGs and Single Item Recovery enabled us to move away from backup to disk solution/snapshots that we had to maintain for Exchange 2007. As you can guess, going backup-less saved us a lot of money without interfering with our data-protection policies.

1. Using Lagged Database Copies

A lagged database copy is a database that uses a delayed replay lag time to commit the log files to the database. This allows you to go back to a point in time (maximum 14 days). Because 14 days is the fixed upper window for a lagged copy, this might not be the right solution for you to fit all scenarios, especially those scenarios where you need to restore items older than 14 days. By delaying the replay of logs in to a database, you have the capability to recover it to a point in the past.

Lagged database copies can protect you from the extremely rare logical corruption type cases as described in the following scenarios:

  • Database Logical Corruption This is the case when the database pages checksum matches, but the data on the pages is logically wrong. It can occur when ESE attempts to write a database page and the operating system storage stack returns success even though the data either never makes it to disk or gets written to the wrong place. This behavior is called lost flush. To prevent lost flushes, ESE includes a lost flush detection mechanism in the database with a single database page restore feature.

  • Store Logical Corruption This means data is added, deleted, or modified in a way that is not accepted by the user, so the user views it as a corruption. Typically this is caused by third-party application that issues a series of valid MAPI operations against the store. An example is a corrupt archiving solution that changes all message items of the users. Single Item Recovery or retention hold provides some protection against this case because all changed items are kept and thus can be restored. However, especially when large amounts of data are changed, it might be easier to recover the database to a point back in time before the corruption occurred.

  • Rogue Admin Protection This is the case where the organization seeks protection against malicious or rogue administrators, particularly against administrators that by intention add, change, or remove data from the system in a way that is seen as undesirable by the users. To protect against this, the lag database copies can be placed on a server that is under separate administrative control.

If you use multiple database copies and Single Item Recovery, only the extremely rare catastrophic store logical corruption case is interesting. Depending upon which third-party applications you use and your history with store logical corruption, lagged database copies may or may not be that interesting for you.

In the following scenarios lagged database copies can be used to recover data:

  • Recovering a log file that was deleted on the source

  • Rolling back to a point in time because of a virus outbreak

  • Recovering a deleted item that is outside the retention time

1.1. Planning Lagged Database Copies

When planning for lagged database copies, you should carefully consider the implications this brings to your storage planning. Every lagged database needs sufficient disk space for holding the database as well as the log files for the configured time.

For example, at Microsoft, 14 days of logs for one database result in about 60,000 log files or 60 GB of data. The log storage design for the lagged database copy needs to accommodate this. In addition to the space requirements, consider the following criteria when deciding the replay lag time:

  • How long does it take you to identify a logical database corruption? This should include non-working days such as weekends. So if you configure a replay lag time of two days, you might not be able to identify the problem when it happens on a weekend and you're back on Monday.

  • Consider the maximum time where a replay lag time makes sense. Fourteen days is the maximum time possible, but do you really need the full 14 days? In most cases, 7 days should be sufficient to identify a corruption and be able to recover using the lagged database copy.

  • Don't underestimate the space requirements needed the longer the replay lag time is defined. In the previous Microsoft example you needed to reserve 60 GB for 14 days; thus 7 days would save you 30 GB per database of storage that you need to have available.

  • The duration of replaying the log files is also worth considering. You should plan a test to replay all log files; this might take a considerable amount of time. Replaying 14 days of logs might require several hours before the database is up to date.

Besides the replay lag time considerations and the storage design, you should plan the following considerations carefully:

  • How many lagged database copies do you need? Normally one lagged copy should be sufficient, but maybe you want more copies because of your disaster-recovery requirements. If lagged database copies are a critical piece of your disaster-recovery strategy, you will probably want to put them on a RAID system or have multiple copies of them.

  • Where should you store the lagged database copies—at a server at the same site or offsite? This decision has a direct impact on the time you need to recover the lagged database copy because you need to consider available bandwidth when storing them offsite.

  • On what Exchange server should you place the lagged database copies? You have the option to place them on the same server where your active database copies are stored, or you can use a single server just for all lagged database copies, such as a dedicated public folder server.

  • Lagged database copies always should be activation-disabled and have the highest activation preference number available. This is required to prevent automatic activation by mistake or resulting from a system failure.

You should make the best decision for your own situation. Don't start with the maximum of 14 days for replay lag time, but make a decision that suits your needs considering both disaster recovery and budgetary (or storage design) aspects.

Note: Lagged database copies are not updateable with the single page restore feature. If a lagged database copy hits a page corruption, you will have to reseed to repair it (and subsequently lose the lagged aspect to the copy). It is therefore best practice to either deploy the lagged database copy on RAID or create multiple lagged database copies when using JBOD.
1.2. Deploying Lagged Database Copies

You configure a lagged database copy using the EMS by following these steps:

  1. Create a database copy to the target server where you want to store the lagged database copy.

  2. Configure the ReplayLagTime of the database. The following cmdlet configures a lag time of 7 days to the database DAG01-BERLIN-01 located on Berlin-MB01: Set-MailboxDatabaseCopy –id DAG01-BERLIN-01\Berlin-MB01 –ReplayLagTime 7.0:0:0.

  3. Block auto activation of this database to make sure it is not activated by mistake. You use the following cmdlet to perform this task: Suspend-MailboxDatabaseCopy <database\server> -ActivationOnly -Confirm:$false.

  4. If you use a dedicated Exchange server that hosts all lagged database copies, you can block automatic activation of databases also on the server level by using the following cmdlet: Set-MailboxServer <mailbox server> –DatabaseCopyAutoActivationPolicy Blocked.

When the lagged database copy is configured, you will see that the replay queue length of the lagged database will increase, as shown in Figure 1.

Figure 1. Viewing a lagged database copy in EMC

Note: To verify that all logged database copies are not automatically activated, use the Get-MailboxDatabaseCopyStatus –Server <name> | ft Name, Act* cmdlet and make sure that the ActivationSuspended property is set to true.
1.3. Using a Lagged Database Copy to Recover Data

Using a lagged database copy to get to a specific point in time is rather difficult because you have to know the exact time frame in which something occurred. In addition, no tools are available to tell you which log file contains exactly what database change. Thus you have to estimate which log files need to be replayed so that you get the database to the point in time that you require. You must simply guess when you grab the database and logs files and then replay the logs manually before you can recover data from a recovery database.

Recovering a lagged database to a specific point in time is a manual process, so follow these steps to receive the data you're looking for:

  1. Suspend replication to the lagged database copy by using the Suspend-MailboxDatabaseCopy <database>\<server> cmdlet.

    Note: You should now decide whether you want to back up or copy the database and log files to a different location so that you have them available if you don't get to the right point in time. You alternatively can create a VSS snapshot using the VSSAdmin CREATE SHADOW /For=<Volume that includes database> command.
  2. Use Explorer to delete or move all log files that are newer from the log file's time stamp than the time you decided to go back. For example, if you have 14 days of log files available, and you want to replay the log files to get back 10 days, you only need to commit those log files to the 14-days-old database, that are 10 days and older. In order to achieve this, you need to delete or move all log files that have a time stamp newer than 10 days, like day 9 or newer.

  3. Delete the .chk file for the database and note its filename. It should normally be something like E00.chk.

  4. Run the Eseutil.exe /r E00 /a command but replace E00 with the filename of the .chk file. Depending on the number of log files that need to replayed, this might take several hours. A rule of thumb is that on normal 7.2K JBOD 3.5-inch disks, you can assume that you'll replay approximately 7.2 GB of transactional log files per hour. The exact value, of course, depends on your local factors such as storage performance or CPU.

    Note: If you want to measure how long replaying the log files to the database takes, you can use the tool JetStress 2010, which includes a Recovery Performance measure option for this exact situation. You can download Microsoft Exchange Server Jetstress 2010 (64 bit) at
  5. When Eseutil is finished, the database is in clean shutdown state. You can now decide how to continue:

    1. You can create a recovery database using this database, mount it, and recover the data .

    2. You can replace the corrupt database files with the lagged database files and mount the database.

As you can see, several steps are involved here and the process is time-consuming because of the large number of logs that must be replayed. The process is not difficult, but is not something you want to be doing on a daily/weekly basis because of the operational time required. Lagged copies were not designed for the deleted item recovery case—they were designed for the once-in-a-great-while scenario where multiple database copies within a DAG combined with retention hold is not enough protection in a backup-less environment.

Note: As already mentioned, no tools are currently available from Microsoft that allow you to automate the process of recovering a lagged database copy to a specific point in time. However, third-party vendors may soon provide solutions for this situation. Check the Internet regularly for updates.

2. Backups and Log File Truncation

Log file truncation, or deleting the transactional log files that are no longer required for a successful database restore, takes place once you do a successful backup. But if you do not perform a backup in situations where you decided to no longer use traditional point-in-time backups, how will you make sure the log files are removed so they don't pile up? Simple: they are never removed.

For this reason you need to configure log file truncation by enabling circular logging. You can enable circular logging on a database either in EMC or in EMS using the Set-MailboxDatabase –Identity <DatabaseName> -CircularLoggingEnabled $True cmdlet.

Once you enable circular logging when multiple database copies are in place, you get a new type of circular logging called continuous replication circular logging (CRCL) which behaves differently from traditional circular logging known from Exchange 2007 and before.

CRCL is performed by the Microsoft Exchange Replication Service, not the Microsoft Exchange Information Store service. Also, CRCL requires considering log files that are required for log shipping and replay before removing them. This situation needs special logic to ensure that all database copies process the log file before it is removed, which differs from the traditional circular logging logic where the log file was deleted when it was committed to the database.

When CRCL is enabled, log file truncation for database copies that are not lagged occurs in the following way:

  • The log file is checked to determine whether it is below the checkpoint.

  • The log file is inspected that all other non-lagged database copies replayed the log file into their database.

  • The log file has been inspected by all database copies (including any lagged database copies).

Log file truncation happens for lagged database copies in the following way:

  • The log file is checked to determine whether it is below the checkpoint.

  • The log file is older than ReplayLagTime and TruncationLagTime.

  • The log file is already deleted on an active database copy and all copies agree on the deletion.

3. Reasons for Traditional Point-in-Time Backups

Even though Exchange Server 2010 supports backup-less scenarios, in some cases your organization may want to maintain its traditional backup methods. Keep in mind the following argumentations when discussing the pros and cons of a backup-less infrastructure:

  • No Available DAGs Organizations that do not use DAGs need to consider traditional ways to back up their databases. A reason for not implementing DAGs is often that they are too expensive to deploy—DAGs require a Windows Server Enterprise Edition license.

  • Single Exchange Server Implementation Single Exchange Server implementations are not conducive to DAG usage because they require adding more server hardware. Traditional backups to disks or tapes are the option to follow here.

  • Utilizing an Existing Backup Environment Your company's backup strategy might force you to follow other applications if you have an existing backup environment in which all other applications will back up their data, so that even when you maintain multiple copies of your database, you are required to have a copy of it in your backup environment.

  • Compliance Requirements You typically use tape backups if you have an archival reason to preserve data for an extended time, as governed by compliance requirements. You also need to ensure that you can access the data in the future, especially if the storage is long term—sometimes up to 10 years.

  •  Exchange Server 2010 : Performing Backup and Recovery for Mailbox Server Roles
  •  Exchange Server 2010 : Performing Backup and Recovery for Non-Mailbox Server Roles
  •  Exchange Server 2010 : Backup and Disaster Recovery Planning
  •  Changes to Backup and Restore in Exchange Server 2010
  •  Programming Windows Azure : Using the SDK and Development Storage
  •  Programming Windows Azure : Building a Storage Client
  •  Working with the REST API
  •  Excel Programmer : Fix Misteakes
  •  Excel Programmer : Change Recorded Code
  •  Excel Programmer : Record and Read Code
  •  Configuring Server Roles in Windows 2008 : New Roles in 2008
  •  Windows Server 2003 : Creating and Configuring Application Directory Partitions
  •  Windows Server 2003 : Configuring Forest and Domain Functional Levels
  •  Windows Server 2003 : Installing and Configuring Domain Controllers
  •  Manage Server Core
  •  Configure Server Core Postinstallation
  •  Install Server Core
  •  Determine Your Need for Server Core
  •  Install Windows Server 2008
  •  Windows Server 2008 : Configure NAP
    Top 10
    A Look At Truecrypt The Open Source Security Tool
    Price Of Piracy
    Acer Aspire 5600U 23" Touchscreen All-in-One PC
    Zalman FX100-Cube Fanless Cooler
    Devolo dLAN LiveCam Starter Kit
    Has Apple Lost It? (Part 2)
    Has Apple Lost It? (Part 1)
    Sony Computer Entertainment (Part 3)
    Sony Computer Entertainment (Part 2)
    Sony Computer Entertainment (Part 1)
    Most View
    Microsoft Surface
    iPhone Application Development : Making Multivalue Choices with Pickers - Using Date Pickers (part 2) - Adding a Date Picker
    SQL Server 2005 : Dynamic T-SQL - Supporting Optional Parameters (part 3) - SQL Injection
    Dropbox For Teams: Suitable For Business?
    Migrating to Active Directory in Windows Server 2003 (part 2) - Moving from Windows 2000 Server
    Upgrading to Windows Server 2003 : Switching Forest and Domain Functional Levels
    Change The View : Cut out subjects in Photoshop using the Colour Channels in four simple steps
    SQL Server 2008 : Multiple Sources with SQL Server Integration Services
    Windows 7 : Advanced Networking Concepts
    Intel Core i7-3770K - From Sand To Ivy
    Building Android Apps : Controlling the Phone with JavaScript (part 1) - Beep, Vibrate, and Alert
    Grouptest Headphones: $150-$210 - Phone Home (Part 5) - Klipsch image one
    How You Can Get Fit Using Your Phone
    Buying Guide: High-end CPUs (Part 3) - Intel Core I7-3770K, Intel Core i7-3930K, Intel Core i7-3970X
    Creative Sound Blaster Tactic3D Omega Earphones Reviews (Part 2)
    Ultrabook Supertest (Part 6) - HP Envy 6
    SEH Computerteknik myUTN-150 - USB Deviceserver
    Microsoft predicts the future
    Scythe Grand Kama Cross Rev.B Microprocessor Cooler Review (Part 2)
    Surviving Changes to the Definition of a Primary or Unique Key