SQL Server 2008 : High availability with database mirroring - Failover options

10/28/2013 7:44:11 PM

The three types of failover options are automatic, manual, and forced service. Table 1 summarizes the availability of these options based on the mirroring mode and presence of a witness instance.

Table 1. Supported failover modes by mirroring type
Mirroring mode	Witness	Supported failover modes
High performance	No	Forced service
High safety	No	Manual or forced service
High safety	Yes	Automatic, manual, or forced service

Let's begin this section with a look at automatic failover.

1. Automatic failover with SNAC

If the principal instance fails, automatic failover brings the mirror database online as the new principal database. For this to occur, all of the following conditions must be true:

Mirroring must be operating in the high-safety (synchronous) mode.
A witness instance must be present.
The mirror database must be synchronized—that is, all outstanding transactions from the principal should be written to the transaction log on the mirror database.
The principal instance must lose communication with the witness and mirror.
The witness and mirror instance must remain in contact.
The mirror instance must detect the loss of the principal instance.

The last bullet is a crucial item and requires more explanation. There are various types of failure conditions that may lead to the loss of the principal and are broadly categorized into hard and soft errors. An example of a hard error is a power failure; a TCP/IP timeout is regarded as a soft error.

Unlike soft errors, hard errors are typically reported immediately. In either case, database mirroring uses a timeout setting in combination with a heartbeat between the mirroring partners in order to detect failure. The default timeout value is 10 seconds. If a mirroring partner doesn't receive a response within that time, a failure is assumed. For synchronous mirroring, you can adjust this value, although you shouldn't use a value of less than 10 seconds to prevent unwanted failovers due to temporary network issues.

With these conditions in mind, let's walk through the process of an automatic failover, starting from the point of assumed failure:

The principal database, if still available, sets its status to disconnected, and drops any client connections.
The witness and mirror instances register the principal instance as unavailable.
The mirror database rolls forward any outstanding transactions in its redo queue.
The mirror database comes online as the new principal.
When the original principal database comes online, it's registered as the mirror, and synchronizes missing transactions.

Let's consider step 1 in more detail. Imagine a case where a private network connection to the mirror and witness instances breaks but the public network used for client connections is still available. In this case, the database is still up and running and available to clients, but a mirroring failure is assumed. By actively disconnecting the client connections, a situation is prevented whereby both databases may temporarily receive updates to the database, resulting in a loss of data.

Mirroring quorum

Enabling a witness role in a mirroring session introduces the concept of quorum. A mirroring session is said to have quorum when at least two of the three instances (principal, mirror, and witness) in the mirroring relationship are connected. For a mirroring database to be available, quorum must exist. When all three instances are connected, full quorum exists. If the principal instance fails, the mirror instance has quorum with the witness and coordinates with it to take on the role of principal. If the new principal then loses the connection to the witness, no quorum exists, and the database is taken offline.

To fully benefit from automatic failover, you should consider how clients can be automatically reconnected to the mirror database. One of the great things about database mirroring is that client connections using SQL Server Native Client (SNAC) can benefit from its understanding and awareness of database mirroring. A SNAC connection string includes the Failover Partner option, as shown in this example:

Data Source=SV1\Sales; Failover Partner=SV2\Sales; Initial Catalog=Sales;

Using the SNAC's failover partner option automates a lot of the hard work in application reconnection logic required for other high-availability options. However, even with database mirroring in place, client connections originating from non-SNAC sources won't be able to take advantage of automatic reconnection, unless the reconnection logic is coded into the application. This is an important high-availability consideration; while the database may fail over immediately and without any data loss, if the clients can't automatically reconnect to the mirror, it can hardly be considered a success, thus devaluing part of the appeal of automatic failover.

High-safety mirroring sessions can also use the manual failover method.

2. Manual failover

The manual failover method, available only in high-safety (synchronous) mode, is typically used when preparing for a planned outage such as a hardware or service pack upgrade.

The DBA enacts the manual failover, which swaps the mirroring roles between the principal and mirror.
Applications are reconnected to the new principal database once it completes the processing of its redo queue and comes online.
Mirroring is suspended and the mirror instance is taken offline for upgrade, during which time the principal database runs exposed—that is, failover of any type isn't possible, so if the new principal instance fails, an outage will result.
Once the upgrade is complete, the mirror database rejoins the mirroring session and synchronizes outstanding transactions (catches up).

At this point, the mirroring roles can be reversed to return the instances to their original state, although assuming both servers are configured with the same processing capacity and load, this step shouldn't be required, so the current roles could remain in place.

Of course, if there are multiple databases on the server being taken offline for a planned outage, and some of them aren't mirrored, then this approach obviously needs to be reconsidered.

The final failover mode is forced service.

3. Forced service

Typically used in disaster-recovery scenarios with high-performance (asynchronous) mirroring, this option brings the mirror database online and makes it available for client connections only if the link between the principal and mirror instances is lost.

The critical consideration before enacting this failover mode is the possibility of data loss. If a network connection drops and the principal database continues processing transactions before failover, these transactions won't be available for recovery on the mirror database. As such, forced service is typically only used when service must be resumed as soon as possible and the possibility of data loss is accepted.

In closing our section on failover modes, let's walk through a number of failover scenarios.

4. Failure scenarios

To understand how failure is handled based on the mirroring mode and which instance fails, consider table 2. Read the notes that follow the table in conjunction with each example.

Table 2. Failure scenarios by mirroring topology and transaction safety
Failure	Mirroring mode	Witness	Action
Principal	Synchronous	Yes	Automatic failover (see note 1)
Principal	Synchronous	No	Manual failover (see note 2)
Principal	Asynchronous	No	Force service to make mirror available (see note 3)
Mirror	Both	Both	Principal runs exposed and transactions retained (see note 4)
Witness	Synchronous	Yes	Automatic failover not possible

Action Notes:

Assuming the mirror and witness are in contact, automatic failover occurs after the nominated timeout.
Mirroring is stopped on the mirror database using ALTER DATABASE <dbname> SET PARTNER OFF. The mirroring database is then recovered using RESTORE DATABASE <dbname> WITH RECOVERY. When the principal server becomes available, mirroring would need to be reestablished, this time in the reverse direction.
Service is forced by running this command on the mirror database: ALTER DATABASE <dbname> SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS.
For as long as database mirroring exists, transactions generated on the principal must remain in the transaction log, so disk space usage would need to be closely monitored while the mirror instance is unavailable.

Armed with an overview of database mirroring principals and options, let's roll up our sleeves and get into the details of the implementation.