Note
Exchange Server 2010
running on Windows Server 2008 does not provide any native backup
functionality. Exchange Server 2010 supports only Volume Shadowcopy
Services (VSS)-based backup technologies. This means that Exchange
Server 2010 requires a separate product to perform backups. This can be
at the storage level, such as NetApp’s SnapManager for Exchange Server,
or it can be at the application level, such as Microsoft’s SCDPM 2010 or
Symantec’s NetBackup.
Establishing Service Level Agreements
The
most common question from Exchange Server administrators is “How should
I be doing my backups?” The answer to this question is quite simple.
You should be doing them so that they support your service level
agreements around recoverability and retention for Exchange Server
services.
Based on this concept,
it quickly becomes apparent that the first step in planning out your
backups is to determine exactly what you’ve committed yourself to. This
is commonly referred to as a service level agreement or simply an SLA.
Establishing a Service Level Agreement for Each Critical Service
Exchange Server
2010 is often deployed so that roles are distributed across multiple
servers. This distribution of roles might vary from site to site.
However, the SLAs will likely remain constant across the enterprise as
the goal is actually to keep messaging alive and available to the end
users.
It is important to
understand the implication of SLAs for each aspect of Exchange Server
2010 because the SLA drives your design and must be considered upfront
and not as an afterthought to a deployed Exchange Server 2010
environment. Too often, IT groups implement Exchange Server and later go
back to determine how quickly they can restore services or rebuild a
failed system. The correct methodology is to determine recovery time
objectives and uptime goals and then design the architecture to enable
those goals.
Determining SLAs for Mailbox Servers
One of the most
important aspects of Exchange Server 2010 is the mailbox server. If the
mailbox server isn’t up, users can’t access their mail. This is usually
the first thing that triggers the help desk phone to ring. Most
companies start their SLAs around the mailbox servers. In most
environments, a two-hour recovery for a mailbox database is acceptable.
This means that if your database fails, you need to recover that data
within two hours. If you know that your system is capable of restoring
100GB of data per hour, you know that, based on your backup process, you
can support only 200GB per database.
If your SLA for an
entire mailbox server recovery is four hours and you know that it takes
two hours to rebuild a new server with Exchange Server 2010, you have
only two hours to restore data; based on the preceding example, this
means you can have only 200GB of data on the server. If you planned to
allow users 2000MB of storage each, this limits the server to 100 users.
If you want to support more users per server, you either need to alter
the SLA or you need to change your backup strategy to allow you to
restore more data in the same period of time. This is what enables you
to safely support large numbers of users with good SLAs. This is where
you have to balance the costs of the backup/restore system with the cost
of adding additional servers.
Luckily Exchange
Server 2010 offers technologies that enable you to run a significantly
tighter SLA. For example, Database Availability Groups enable a replica
server to take over for a failed server within a matter of minutes. This
would be a prime technology to implement if you have an SLA that allows mailbox access to be down for only a number of minutes.
Determining SLAs for Client Access Servers
Another major
component of Exchange Server 2010 is the Client Access server (CAS).
These are the systems that enable mobile devices and web browsers to
access users’ email. In Exchange Server 2010, the functions of the
Client Access servers are greatly extended. Exchange Server 2010
utilizes MAPI on the middle tier that puts a much greater importance on
CAS roles always being available. When determining SLAs for this
function, it is helpful to view the service and the servers as two
entities. Although you likely want high availability on the service, you
can likely worry less about the servers individually if they are
designed with redundancy in mind. So, if you have at least one more
Client Access server than you need for performance purposes, you have
plenty of time to rebuild one server if it fails because there is
already another that is taking up the load. Keep this in mind when
designing your Exchange Server environment. Also keep in mind that the
data on a CAS is mostly static. Building a new CAS is often faster than
restoring an existing one.
Determining SLAs for Edge Transport Servers
For systems such as
the Edge Transport servers in Exchange Server 2010, it is more useful to
view the SLA for this role as being for the service as opposed to the
servers themselves. In the case of Edge Transport servers, the service
they provide is to send and receive external email to and from the
Internet. In this sense, most companies try to enforce a fairly
aggressive SLA on the service itself. For example, if Internet mail
connectivity were to fail, they’d want the service restored within one
to two hours. In most environments, this is fairly easy to accomplish
because there is typically two or more Edge Transport servers to provide
redundancy and minimize wide area network (WAN) traffic. In the case of
the SLAs on the servers themselves, typically a one-day recovery is
acceptable. Because the Edge Transport servers don’t hold any unique
data, they can easily be replaced a failure occurs.
Remember that the
Edge Transport service is dependent on the network itself. If the Edge
Transport servers are running but the Internet connection is down, they
can’t do their job. One easy way to improve availability and thus
support a tight SLA for Edge Transport is to have multiple entry points
from the Internet. This can protect against Internet or Internet service
provider issues by enabling Internet mail to enter from another
location and simply ride the corporate WAN to reach the appropriate
Exchange Server 2010 server. The simplest way to do this is to advertise
multiple Mail Exchanger records (MX) in Domain Name Services (DNS) on
the Internet.
Determining SLAs for Hub Transport Servers
The role of the Hub
Transport server is to transfer mail from one site to another connected
site. As such, when a Hub Transport server fails, the site it served is
effectively cut off from other sites. Moreover, because the architecture
of Exchange Server 2010 requires that all messages first pass through a
Hub Transport, if this role were unavailable in a site, users cannot
send to each other even though they are hosted on the same mailbox
server. As such, a company would most likely want a fairly aggressive
SLA on the Hub Transport servers. In most environments, the Hub
Transport server role is combined with other roles because, in most
cases, it won’t justify being on an isolated server. As such, the SLA
for recovery
is often overwritten by the SLA for another role that it supports. As
such, it is recommended that, when possible, two or more systems per
site should host the Hub Transport server role.