Improving Availability
Exchange 2010 allows you to protect mailbox databases and the data they contain by configuring your mailbox databases for high availability
automatically when you use database availability groups. Database
availability groups allow you to group databases logically according to
the servers that host a set of databases. Each mailbox server can have
multiple databases, and each database can have as many as 16 copies. A
single database availability group can have up to 16 Mailbox servers
that host databases and provide automatic database-level recovery from
failures that affect individual databases. Any server in a database
availability group can host a copy of a mailbox database from any other
server in the database availability group.
Servers in a database availability group can host other Exchange
roles. Member servers must be in the same Active Directory domain.
Unlike Exchange 2007, where achieving a high level of uptime could
require a high level of administrator intervention, Exchange 2010
integrates high availability and messaging resilience into the core
architecture, providing a simple unified framework for both high
availability and disaster recovery. This new approach reduces the cost
and complexity of deploying a highly available solution. How does this
work? Exchange 2010 has enhanced continuous
replication and has replaced clustering features in Exchange 2007 with
a more robust solution that doesn't require expensive hardware and also
requires less maintenance.
In previous versions of Exchange, Exchange was a clustered application that used the cluster resource management model
for high availability. Exchange 2010 is not a clustered application and
does not use the cluster resource model for high availability. Instead,
Exchange 2010 uses its own internal high-availability model. Although
some components of Windows Failover Clustering are still used, these
components are now managed exclusively by Exchange 2010.
To support continuous replication, Exchange 2007 offered several approaches, including Local Continuous Replication (LCR), Cluster Continuous Replication (CCR), and Standby
Continuous Replication (SCR). LCR was a single-server solution for
asynchronous log shipping, replay, and recovery. CCR combined the
asynchronous log shipping, replay, and recovery features with the
failover and management features of the Cluster service, and it was
designed for configurations in which you had clustered mailbox servers
with dedicated active and passive nodes. SCR was an extension of LCR
and CCR that used the same log shipping, replay, and recovery features
of LCR and CCR but was designed for configurations in which you used or
enabled the use of standby recovery servers.
Exchange 2010 includes some aspects of the continuous replication
technology previously found in CCR and SCR, but the technology has
changed substantially. Because storage groups have been removed from
Exchange 2010, continuous replication operates at the database level.
Exchange 2010 still uses an Extensible
Storage Engine (ESE) database that produces transaction logs that are
replicated and replayed into copies of mailbox databases. Because each
mailbox database can have as many as 16 copies, you can have one or
more database copies on up to 16 different servers.
Instead of using Server
Message Block (SMB) for data transfer during log shipping and seeding,
continuous replication uses a single administrator-defined TCP port for
data transfer, and there are built-in options for network encryption
and compression for the data stream. In Exchange 2007, Microsoft
Exchange Replication service was responsible for replaying logs into
passive database copies. When the passive copy was activated, the
database cache was lost when the Microsoft Exchange Information Store
service mounted the database. Exchange 2010 does not use the Microsoft
Exchange Replication service for this purpose. Instead, the Exchange
Replication service periodically monitors the health of all mounted
databases and the ESE. If the service detects a failure, it notifies
Active Manager, and Active Manager then handles the failure.
Microsoft moved the passive
copy replay functionality into the Microsoft Exchange Information Store
service. Because active databases and passive database copies are all
managed by the same service, the database cache is available for use
after a failover or switchover
has occurred and no data loss occurs. Having one service manage both
active and passive databases has other benefits as well. For example,
while failover of a clustered mailbox server in a CCR environment for
Exchange 2007 took about 2 minutes to complete, failover of a mailbox
database for Exchange 2010 is completed in about 30 seconds (most of
the time).
Like SCR, the concepts of replay lag time and truncation lag time
apply to database copies. Database copies can be backed up using
Exchange-aware, Volume Shadow Copy Service (VSS)–based backup
applications.
In Exchange 2010, databases are defined at the organization level
rather than at the server level. When an administrator establishes a
database copy as the active mailbox database, this process is known as
a switchover. When a failure affecting a database occurs and a new
database becomes the active copy, this process is known as a failover.
Failover and switchover occur at the database level for individual
databases and at the server level for all active databases hosted by a
server. When either a switchover or failover occurs, other Exchange
2010 server roles become aware of the switchover almost immediately and
redirect client and messaging traffic automatically as appropriate.
Although you can perform most management tasks for availability
groups in the Exchange Management Console, you have additional options
when you work with the Exchange Management Shell. Table 2 provides an overview of commands you can use to manage availability groups and their various features.
Table 2. Cmdlets for Working with Database Availability Groups
MANAGEMENT AREA |
RELATED COMMANDS |
---|
Database availability group management |
Get-DatabaseAvailabilityGroup
New-DatabaseAvailabilityGroup
Remove-DatabaseAvailabilityGroup
Set-DatabaseAvailabilityGroup |
Database copy management |
Add-MailboxDatabaseCopy
Get-MailboxDatabaseCopyStatus
Remove-MailboxDatabaseCopy
Resume-MailboxDatabaseCopy
Set-MailboxDatabaseCopy
Suspend-MailboxDatabaseCopy
Update-MailboxDatabaseCopy |
Database management |
Clean-MailboxDatabase
Dismount-Database
Get-MailboxDatabase
Move-DatabasePath
New-MailboxDatabase
Remove-MailboxDatabase
Set-MailboxDatabase |
Network configuration |
Get-DatabaseAvailabilityGroupNetwork
New-DatabaseAvailabilityGroupNetwork
Remove-DatabaseAvailabilityGroupNetwork
Set-DatabaseAvailabilityGroupNetwork |
Switchover management |
Move-ActiveMailboxDatabase
Start-DatabaseAvailabilityGroup
Stop-DatabaseAvailabilityGroup
Restore-DatabaseAvailabilityGroup |
Server membership |
Add-DatabaseAvailabilityGroupServer
Remove-DatabaseAvailabilityGroupServer |
As part of database availability group planning, keep in mind that you can create database copies
only on Mailbox servers in the same database availability group that do
not host the active copy of a database. An active copy differs from a
passive copy in that it is in use and being accessed by users rather
than offline. You cannot create two copies of the same database on the
same server. Other things to keep in mind when working with database
copies include the following:
-
Exchange 2010 mailbox databases can be replicated only to other
Exchange 2010 Mailbox servers and the servers must be in the same
database availability group. You cannot replicate a database outside a
database availability group, nor can you replicate an Exchange 2010
mailbox database to a server running Exchange 2007.
-
All copies of a
database use the same path on each server containing a copy. The
database and log file paths for a database copy on each Mailbox server
must not conflict with any other database paths.
-
All Mailbox servers in a database availability
group must be in the same Active Directory domain. Database copies can
be created in the same or different Active Directory sites and on the
same or different network subnets. However, database copies are not
supported between Mailbox servers with roundtrip network latency
greater than 250 milliseconds (by default).
Note
Database copies are for mailbox databases only. For redundancy and high availability
of public folder databases, you should use public folder replication.
Unlike when you used CCR with Exchange 2007, you can use public folder
replication to replicate multiple public folder databases between
servers in a database availability group. Because database availability
groups can be stretched across sites, it is possible for a mailbox
database to be moved between sites.
Introducing Active Manager
In Exchange 2010, Active Manager provides the resource model and failover management features previously provided by the Cluster service.
When you create your first database availability group in an Exchange
organization, a Windows Failover Cluster is created by Exchange, but
there are no cluster groups for Exchange and no storage resources in
the cluster. Therefore, as shown in Figure 1 Failover
Cluster Manager shows only basic information about the cluster, which
includes the cluster name, the cluster networks, and the quorum
configuration. Cluster nodes and networks will also exist, and their
status can be checked in Failover Cluster Manager. However, all cluster
resources, including nodes and networks, are managed for you by
Exchange. Exchange makes use of the cluster's node and network
management functions, and you can check the node and network status in
Exchange Management Console.
Note
Failover Cluster Manager is the primary management tool for working with the Cluster service.
Although you need to use the Exchange Management tools to view and
manage database availability groups and related features, Failover
Cluster Manager does show the status of clustering.
-
By selecting the cluster name in the left pane, you get a quick
overview of the cluster configuration, including the current quorum
configuration, which can be either Node Majority or Node and File Share
Majority depending on the number of nodes in the database availability
group.
-
By selecting the Nodes entry in the left pane, you can quickly check
the status of all the nodes in the database availability group.
-
By expanding the Networks entry in the left pane and then selecting
available cluster networks, you can check the status of the network as
well as individual network connections.
-
By selecting the cluster name in the left pane and then clicking the
link for Recent Cluster Events, you can check the event logs on all
cluster nodes for errors and warnings.
Active Manager runs on all Mailbox servers that are members of a database availability group. Active
Manager operates as either the primary role holder or a standby
secondary role holder with respect to a particular database. The
primary role holder, referred to as the Primary
Active Manager, decides which database copies will be active and which
copies to activate. It also receives topology change notifications and
reacts to server failures. Only one copy of a database can be active at
any given time, and that copy can be mounted or dismounted.
The group member that holds the primary role is always the member
that currently owns the cluster quorum resource and the default cluster
group. If the server that owns the cluster quorum resource fails, the
primary role automatically moves to another server in the group and
that server takes ownership of the default cluster group. Before you
take the server that hosts the cluster quorum resource offline for
maintenance or an upgrade, you must first move the primary role to
another server in the group.
Secondary role holders, referred to as Standby
Active Managers, provide information about which server hosts the
active copy of a mailbox database to other Exchange components, such as
the RPC Client Access service or Hub Transport service. The secondary
role holder detects failures of replicated, local databases and the
local information store, and it issues failure notifications to the
primary role holder and asks the primary role holder to initiate a
failover. The secondary role holder does not determine which server
takes over, nor does it update the database location state with the
primary role holder. With respect to its local system, the primary role
holder also performs the functions of the secondary role by detecting
local database and local information store failures and issuing related
notifications.
Active Manager determines which database copy should be activated by
attempting to locate a mailbox database that has characteristics
similar to the following:
-
The database has a status of Healthy, DisconnectedAndHealthy, or DisconnectedAndResynchronizing.
-
The database has a content index with a status of Healthy.
-
The database has a copy queue length that is less than 10 log files.
-
The database has a replay queue length of less than 50 log files.
If no database copy meets all of these criteria, Active Manager continues looking for the best choice by lowering the selection requirements through successive iterations.