Roles and Responsibilities
With any process that is
likely to include more than one person, it is useful to clearly define
the roles and responsibilities of those people. This ensures that the
people involved know what is expected of them and they know who to go to
in various situations.
Separation of Duties
A
typical Exchange Server environment involves members from potentially
many groups. For example, one group might be responsible for Exchange
Server services and configuration, whereas another group might be tasked
with management of Windows and security patches. Often, yet another
group is responsible for performing backups of the systems. Each of
these groups must be aware of what other groups are doing. For example,
if the Windows group needs to install Windows patches on the Exchange
servers, the backup group also needs to be aware of this because they
might need to change the scheduling of the backup job. This type of
interdependency must be taken into account when configuring the backup
schedule.
Escalation and Notification
If a backup job fails,
it is critical for the support staff to know what they are supposed to
do and who they should contact. It is recommended to build a matrix of
common issues and create an escalation path for various events. It is
also quite useful to have those events automatically notify the
responsible party. For example, the server monitoring group might be
told that in the event of a backup failure, they should do the
following:
Contact the backup group to alert them of the failed job.
Contact the Exchange Server group to alert them of the failed job.
If neither group contacts you within 30 minutes, contact the IT manager.
If the IT manager doesn’t contact you within 60 minutes, contact the IT director.
By knowing who to call, it
is easier to get a qualified party to look at the issue and potentially
fix the issue in time to allow another backup job to be attempted
before the backup window is expired.
Developing a Backup Strategy
Developing an
effective backup strategy involves detailed planning around the
logistics of backing up the necessary information or data via backup
software, media type, and accurate documentation. To truly be effective,
organizations should not limit a backup strategy by not considering the
use of all available resources for recovery.
Along with
planning and documentation, other aspects of a backup strategy include
assigning specific tasks and responsibilities to individual IT staff
members, considering the best person to be responsible for backing up a
particular service or server and ensuring that documentation is accurate
and current depending on their strengths and area of expertise.
What Is Important to Exchange Server Backups?
In general, the critical
thing to capture in an Exchange Server backup is any unique data whose
loss would impact users. This typically means that you need to back up
the mailbox databases, public folder databases, and the log files that
go with them. Files such as the operating system or the System State
data are less important. This information can be easily recovered because it is stored in the
Active Directory (AD). In the case of Database Availability Groups, the
backup of log files is less important because multiple copies of the
databases and logs are in other locations that can take over if the
primary replica fails. In these configurations, the primary purposes for
the backups is to enable for long-term storage of data to protect
against deletion and to truncate log files so that servers don’t run out
of space and shut down Exchange Server.
Creating Standard Backup Procedures
Creating a regular
backup procedure helps ensure that the entire enterprise is backed up
consistently and properly on a regular basis. When a regular procedure
is created, the assigned staff members soon become accustomed to the
procedure because they are given a guide that walks through each
required step. If there is no documented procedure, certain items might
be overlooked and not be backed up, which can be a major problem if a
failure occurs. For example, a regular backup procedure for an Exchange
Server 2010 server might back up the Exchange Server databases on the
local drives every night, and perform a System State backup with Windows
Server Backup Features once a month and whenever a hardware change is
made to a server. These differences might be overlooked if no one is
following regular change control and documented procedures.
Tip
It is a best
practice to add documentation updates into standard server change
control processes. This ensures that any modifications to server
configurations also get added into server build documents.
Protecting Data if a System Failure Occurs
Server
failures are the primary concern most organizations plan for, because a
complete system failure creates the most impact and, ultimately, a
scenario in which data needs to be restored from backup tape. Server
hardware failures include failed motherboards, processors, memory,
network interface cards, disk controllers, power supplies, and, of
course, hard disks. Each of these failures can be minimized through the
implementation of RAID-configured hard disk drives, error-correcting
memory, redundant power supplies, or redundant controller adapters. In a
catastrophic system failure, however, it is likely that the entire data
backup would need to be restored to a new system or repaired server.
Because data is read and
written to hard drives on a constant basis, hard drives are frequently
singled out as the most possible cause of a server hardware failure. To
address this, Windows Server 2008 supports hot-swappable hard drives and
RAID storage systems, enabling for the replacement of the drive without
server downtime. However, this is only if the server chassis and disk
controllers support such a change. Windows Server 2008 supports two
types of disks: Basic disks, which provide backward compatibility, and
Dynamic disks, which enable software-level disk arrays to be configured
without a separate disk controller. Both Basic and Dynamic disks, when
used as data disks, can be moved to other servers easily. This provides
data or disk capacity elsewhere if a system hardware failure occurs and
the data on these disks needs to be made available as soon as possible.
Note
If
hardware-level RAID is configured, the controller card configuration
should be backed up using a utility available through the vendor.
With most array
controllers today, dynamic reading of the disk configuration can be done
if the disks are placed into a new system using the same disk order. If
this is not supported, the controller can be moved to the new systems,
or the configuration might need to be re-created from scratch to
complete a successful disk move to a new machine.
This process should
always be tested, verified, and documented in a lab environment before
being considered as a valid recovery option.
To protect against a
system failure, organizations need to have a full image backup that can
then be restored in its entirety to a new or repaired server system.
This also requires completing and documenting these steps in advance to
ensure that it can be completed and administrators understand the steps
involved.
Protecting Data if a Database Corruption Occurs
Data recovery also is
needed if a database corruption occurs in Exchange Server. Unlike a
catastrophic system failure, which can be restored from the last tape
backup, data corruption creates a more challenging situation for
information recovery. If data is corrupt on the server system, a restore
from the last backup might also contain corrupt information in its
database, so a data restore needs to predate the point of corruption.
This typically requires the capability to restore the database from an
older full backup tape and then recover incremental data since the clean
database restoral.
Providing the Ability to Restore a Message, Folder, or Mailbox
In other situations,
an organization might need to recover a single message, folder, or
mailbox rather than a full database. With most full backups of an
Exchange server, the restore process requires a full restore of all
messages, folders, and mailboxes. If an administrator needs to work with
only a full image backup, typically a full restore must be performed on
a spare server and information extracted from the full restore as
necessary.
If message, folder,
or mailbox recovery is required on a regular basis, the organization
might elect to back up information in a format or process that provides
an easier method of information recovery. This might involve the
purchase and use of a third-party tape backup system, or a combination
of various utilities available in Exchange Server 2010 to restore
individual sets of information.
Assigning Tasks and Designating Team Members
Each particular
server or network device in the enterprise has specific requirements for
backing up and creating documentation around hardware and the service
it provides. To make sure that a critical system is backed up properly,
IT staff should designate a single individual to monitor that device and
ensure the backup is completed and documentation is accurate and
current at all times. Assigning a secondary staff member who has the
same set
of skills to act as a backup if the primary staff member is unavailable
is a wise decision, to ensure that there is no point of failure among
IT staff performing these tasks.
Assigning only
primary and secondary resources to specific devices or services helps
improve the overall security and reliability of the device and services
provided to network users. By limiting who can back up and restore
data—and even who can manage servers and devices—to just the primary and
secondary qualified staff members, the organization can be assured that
only competent, trained individuals work on systems they are assigned
to manage. Even though the backup and restore responsibilities lie with
the primary and secondary resources, the backup and recovery plans
should still be documented and available to the remaining IT staff for
additional training and a final means of support if needed.
Selecting the Best Devices for Your Backup
Each device used on
any network could have specific backup requirements. As mentioned
earlier, each assigned IT staff member should also be responsible for
researching and learning the backup and recovery requirements of each
device to ensure that all backups have everything that is necessary to
also recover from a device failure.
As a rule of thumb for
network devices, the device configuration should be backed up whenever
possible—using the device manufacturer’s configuration software whenever
possible or just by documenting the configuration for use as a
reference should a device require reconfiguration.
Tip
It is also a best practice
to evaluate the hardware used in your environment to determine which
areas might be the most likely points of failure. Having spare devices
can reduce the overall downtime in case of a failure. When dealing with
Exchange Server 2010 considerations, these spare hardware devices can be
pieces such as hard drives to support a failed drive in a RAID
configuration.
Understanding How Devices Affect Backups
Depending on how a given
environment is architected, there might be several different options on
how it will be backed up. Administrators lucky enough to have network
attached storage (NAS) or storage area networks (SANs) for their
Exchange Server 2010 servers might have significantly faster options for
performing backups than administrators who use direct attached storage
(DAS). Many times, the NAS or SAN devices can perform local snapshots,
or the SAN might be backed up by a tape device that is plugged directly
into the Fibre Channel fabric. This has great advantages when compared
to backing up an Exchange Server 2010 server over the network. For
example, Gigabit Ethernet enables for 1Gb/sec of throughput. Fibre
Channel not only offers speeds of 4Gb/sec to 8Gb/sec but is also a more
efficient protocol.
One way to
drastically speed up backups performed is to use a faster media for the
final destination. Although current AIT and LTO tape technologies are
very fast, they still can’t compare to an array of hard drives for the
destination. Technologies such as System Center
Data Protection Manager can take regular snapshots of Exchange Server
2010 and store them on disks. Longer-term backups are made from the disk
images to tape. Because this transfer from disk to tape happens on the
backup server, it can be done during the day without impacting end users
or interfering with the regular backups. Technologies exist for most
backup software in the form of virtual tape libraries that are actually
files within a set of disks that can enable you to retain the normal
methodologies of traditional tape backup while taking advantage of the
speed and size of modern hard drives to drastically shrink the backup
window on network attached backups.
Determining Backup Speeds and Times
The time needed to
perform a backup of Exchange Server 2007 is influenced mostly by the
speed of the backup device. Although vendors quote values for MB per
minute that their device can back up, this isn’t always an accurate
value when backing up an Exchange Server 2010 server. It is always
recommended to perform test backups of Exchange servers to determine the
speed at which they can be backed up. By knowing how long jobs take, an
administrator can better select the backup window in which the backups
occur. As Exchange servers grow in terms of the storage used by mail
data, the backups take longer to occur. Pay careful attention to the
network utilization and to the backup device utilization so that you can
watch for bottlenecks that cause backup jobs to take too long.
Tip
Consider backing up
Exchange Server 2010 to a backup server that uses disks as the media for
the backup. This is typically the fastest media that you can utilize
for “over the network” backups. Then take the locally stored backup and
back that up to tape. Because you are backing up “cold” data, there is
no concern about performing the backup during the day. This allows you
to keep your backup window relatively short. The side benefit is that if
you ever experience a failure that requires you to restore from the
backups, you’ll be doing a disk-to-disk restore, which is much faster
than a tape-to-disk restore.
Validating the Backup Strategy in a Test Lab
Regardless of what
methodology you choose for backups of your Exchange Server 2010
environment, it is critical to test the processes in a lab environment.
The goal of this validation is not only to prove that data can be backed
up and restored, but also to refine and document the exact steps used.
It is much easier to figure out how to perform a restore in the lab than
it is in production when hundreds or thousands of mailbox users are
down. The goal of a production restore is to follow accurate, validated
instructions and not have to figure out what you need to do on-the-fly.