Exchange Server 2010 : Day-to-day DAG management and operations (part 2) - Building the DAG

10/3/2011 9:14:45 AM

1. Building the DAG

Immediately after creation, a new DAG is an empty container that is waiting to be filled with mailbox servers and their databases that bring functionality to the DAG. Being able to construct a DAG gradually over time is one of the advantages that the implementation brings to Exchange. The alternative, which we see in the implementation of clustered mailbox servers in Exchange 2007, is to install all of the servers in the cluster at one time. After the DAG is created and its properties verified, you can start to add mailbox servers to the DAG before creating new database copies that will be managed by the servers. This process doesn’t have to begin immediately and you can keep an empty DAG in the organization for as long as you need to before you begin to add servers.

Servers are added and removed from the DAG using the Manage Database Availability Group Wizard through EMC or by using the Add-DatabaseAvailabilityGroupServer and Remove-DatabaseAvailabilityGroupServer cmdlets. To begin to manage DAG membership with EMC, go to the Mailbox section of the Organizational Configuration node and select the DAG from the Database Availability Groups tab, then right-click to reveal the menu options for DAG management (Figure 6).

Figure 6. Option to manage DAG membership.

Warning:

Microsoft recommends that you don’t install Exchange on a domain controller. It therefore follows that they’re not particularly excited if you add a mailbox server that happens to run on a domain controller to a DAG. Exchange doesn’t block this action and you can go ahead and use a domain controller if you really must; hopefully, you will only do this in a test configuration and not in production use.

After a mailbox server is added to a DAG, you can manage its membership in the DAG from another Exchange server. Some administrators prefer to use Server Manager to install the Windows Failover Clustering feature (Figure 7 ) before they attempt to add a server to a DAG because they prefer to make sure that all of the prerequisites are in place before they proceed.

Figure 7. Installing Windows Failover Clustering.

Figure 8 illustrates the EMC Wizard adding a new server to a DAG.

Figure 8. Adding a new server to a DAG.

The following steps occur to instantiate a DAG with the addition of the first mailbox server. Exchange has to do quite a lot of work to bring a server into a DAG, especially if it has to install Windows Clustering, so this process does not happen quickly.

Exchange validates that the server has the mailbox server role installed and does not host the FSW resource for the DAG.
If not present, Exchange installs Windows Failover Clustering on the mailbox server.
A failover cluster is created using the name of the DAG.
A cluster network object (CNO) is created in the Computers organizational unit (OU) in Active Directory (Figure 9).
The name and IP address of the DAG is registered in DNS as a Host (A) record.
The mailbox server is linked to the DAG object by populating the MSExchMDBAvailabilityGroupLink property on the server object in Active Directory.
The cluster database is updated with information about the databases hosted by the newly added server. These databases remain as standalone active copies until you create additional passive copies through replication to other servers in the DAG.

Figure 9. Properties of the DAG object added to the Computers OU.

Although it is easiest to add servers using EMC, you can also do this through EMS. For example:

Add-DatabaseAvailabilityGroupServer -Identity 'DAG-Dublin' -MailboxServer 'ExServer2'

When you add additional mailbox servers to the DAG, Exchange does the following:

Validates that the server has the mailbox role installed.
Joins the server to the cluster.
Adjusts the quorum model. A node majority model is used for DAGs with an odd number of members, whereas a node and file share majority model is used for DAGs with an even number of members. The quorum model is automatically adjusted as servers join and leave the DAG, including when they are taken offline for maintenance or suffer a failure. The adjustment occurs in the background and does not require any administrator intervention.
Links the server to the DAG object in Active Directory.
Updates the cluster database with information about the databases hosted by the newly added server.

Figure 10 shows details of a DAG with two member servers as viewed through the Windows Failover Cluster Manager. The networks have been configured automatically using DHCP and the cluster is configured to use a node and file share majority quorum (because the cluster is formed by an even number of servers). In this case, the FSW is hosted on a server that doesn’t run Exchange, so we had to add the Exchange Trusted Subsystem to the local Administrators group on the server before using it to host the FSW. As you can see, there is no obvious indication that the FSW is on a non-Exchange server.

Figure 10. DAG details seen through Failover Cluster Manager and EMC.

INSIDE OUT: Do not manage DAG resources through Failover Cluster Manager

Although a DAG is visible to the Failover Cluster Manager, you should never attempt to manage any of the resources used by the DAG through this console. If you do, don’t expect much sympathy from Microsoft Support if one of your changes compromises the integrity of the DAG. Exchange stores many important properties for a DAG in Active Directory and the only way that you can manipulate DAG settings properly is through EMC or EMS. If necessary, the underlying code in the DAG management cmdlets will update the settings of the Windows cluster.

I’ve already stated that the more developed state of Windows Server 2008 R2 makes it my preferred platform for Exchange 2010. Development occurs over time in response to real-life operational experience and Windows clustering is no different. An example is the design change in Windows Server 2008 R2 to handle situations where clusters could enter a lost quorum state because the FSW resource was in a failed state even though the witness directory was available. Microsoft provides a retrofit update in KB978790 that you can apply to Windows 2008 servers. The change kicks in when the cluster determines that it is necessary to use the FSW to maintain quorum. If the FSW resource is failed, the cluster attempts to kickstart it back into action by bringing the resource online. If the server that hosts the FSW is available, it can respond and report that the FSW is available and accessible and the cluster can maintain quorum. However, if the FSW cannot be brought online, the cluster is in a lost quorum condition that has to be resolved by an administrator.

Before a server can join a DAG, it must be able to communicate with the cluster service running on every other member server that is currently in the DAG. In other words, you cannot expect to populate a DAG if some servers are offline or experiencing network problems. There are a number of reasons why communication might not be possible, including the following:

Servers are powered off or otherwise unavailable.
The cluster service is not running on a server.
Firewall rules on a server are blocking communication to the cluster service.
The DNS service is unavailable.
Authentication problems (Kerberos, Active Directory, or NTLM) are interfering with secure server-to-server communications.

Production servers should have two network interface cards (NICs) to allow them to isolate MAPI traffic (interaction with other servers including CAS and Active Directory) and replication traffic (log shipping and database seeding). This is not a hard technical requirement from an Exchange perspective because it is perfectly feasible to have all traffic routed across a single NIC, assuming that the NIC has sufficient capacity to handle the network traffic (Microsoft recommends Gigabit Ethernet for single NICs). However, Windows Failover Clustering also has a dependency on a solid network and because dependable replication is so important to the smooth operation of the DAG, it’s really best if you deploy servers equipped with dual NICs for DAGs. Additional replication networks can be added as required and you can take advantage of techniques such as NIC teaming to improve overall network resilience against failure. You can also consult TechNet to determine how to best to configure network settings for DAG operations.