A Database Availability Group is a group of up to 16
Exchange Server 2010 mailbox role servers that replicate mailbox data to
each other and that can perform automated recovery at the mailbox
database level in the event of a hardware, storage, or network failure.
They utilize a subset of Windows 2008 failover clustering in order to
monitor each other’s health. This allows them to determine which node
should be primary for a given database.
To understand
Database Availability Groups and how they work, it’s important to
understand the various technologies that are involved with making this
work. DAGs take advantage of
While Exchange Server
2010 DAGs aren’t built on a traditional Windows cluster, DAGs do take
advantage of Windows 2008 failover clustering in order to establish a
heartbeat amongst them and to monitor the availability of each other.
Unlike earlier versions of Exchange Server, Exchange Server 2010 does
not require the administrators to manage resources
at the cluster level. The installation of failover clustering features
and the management of failover clustering is handled entirely “under the
hood” by Exchange Server 2010.
Database Portability
is a concept that was introduced in Exchange Server 2007. In short, it
effectively uncoupled the Exchange server’s identity from the security
settings on the mailbox database. This allowed an Exchange Server 2007
server to host a mailbox database that was originally owned by a
different server. In versions of Exchange Server prior to 2007, this
concept didn’t exist and as a result, recovering mailbox databases on
new servers was a rather painful exercise. In Exchange Server 2010, this
concept is what allows multiple Exchange Server 2010 mailbox servers to
be effectively authoritative for the same mailbox database information.
Note
Because a single
mailbox database can be replicated across multiple servers, it is
required by Exchange Server 2010 that all mailbox databases be created
with unique names. In older versions of Exchange Server, it was
acceptable to reuse a database name because they were always referenced
by “Server name\Storage Group name\ Database name” which made them
unique within the Exchange Server organization. In Exchange Server 2010,
this is not the case as a replica could potentially have a name
conflict with a local database.
Log Shipping
Replication was introduced into Exchange Server 2007 with the creation
of Clustered Continuous Replication and was later reused in Standby
Continuous Replication. The same base technology is used to replicate
mailbox database transactions between members of a Database Availability
Group. This replication has been improved in terms of resiliency and
recovery through the introduction of Shadow Redundancy and Incremental
Reseeding.
Shadow Redundancy is the
name given to a new process in Exchange Server 2010. Similar to the
Transport Dumpster function in Exchange Server 2007, wherein a message
that was sent via a Hub Transport role was saved for a period of time in
case it needed to be resent after a CCR or SCR failover, Shadow
Redundancy ensures that a message is not deleted from a transport
database until after it has received a confirmation of receipt from the
next hop. If the next hop doesn’t confirm receipt, the message is
resubmitted for delivery. This is especially useful in high traffic
environments where large numbers of messages are potentially “in flight”
on a hub transport server. If that server was to fail and the messages
were not sent to their next hop, the mailbox server would detect this
and resubmit the unconfirmed messages to another available Hub Transport
server. Should those messages later become available through a “fix” of
the failed Hub Transport, the destination would recognize them as
duplicates and suppress them from the target mailbox.
Incremental Reseeding
is another new function of Exchange Server 2010 that reduces the impact
of replicating mailbox data to a database that was offline for a period
of time. In Exchange Server 2007, if a CCR or SCR replica was too far
out of sync, the only solution was
to delete the database and start replication from scratch. This
resulted in potentially hundreds of gigabytes of data being replicated
in order to get the database back to a point where it could accept and
process log files. During this time, the databases were no longer
redundant and the mailbox data was at risk. With Incremental Reseeding,
the out of sync database is compared to the source database and only the
necessary updates are sent to the database in order to bring it back to
a level where replication can resume normally. This greatly reduces the
time taken to reseed a database and thus reduces the windows of
exposure for the mailbox data.
The last piece of Database
Availability Groups was the removal of the concept of a Storage Group
from Exchange Server 2010. In Exchange Server 2007, any of the
replication technologies required that the system be configured with
only one database per storage group. Log files for all databases within a
storage group were grouped together and recovery of a database read the
log files from this storage group. In Exchange Server 2010 DAG, because
the databases are no longer associated with a specific server, the need
to manage by Storage Group was removed. Databases are now associated
with the DAG instead.
Note
Database replication within a
DAG is only supported between mailbox servers with less than 250ms of
round trip latency. As such, it’s important to be aware of the typical
latency between sites that might potentially house replicas of your
mailbox data. Although physics tells us that electricity can travel the
circumference of the earth in around 135ms, network induced latency as
well as indirect paths can make this number significantly higher.