The three types of failover options are automatic, manual, and forced service. Table 1 summarizes the availability of these options based on the mirroring mode and presence of a witness instance.
Table 1. Supported failover modes by mirroring type
Mirroring mode | Witness | Supported failover modes |
---|
High performance | No | Forced service |
High safety | No | Manual or forced service |
High safety | Yes | Automatic, manual, or forced service |
Let's begin this section with a look at automatic failover.
1. Automatic failover with SNAC
If
the principal instance fails, automatic failover brings the mirror
database online as the new principal database. For this to occur, all
of the following conditions must be true:
Mirroring must be operating in the high-safety (synchronous) mode.
A witness instance must be present.
The mirror database must be synchronized—that is, all outstanding transactions from the principal should be written to the transaction log on the mirror database.
The principal instance must lose communication with the witness and mirror.
The witness and mirror instance must remain in contact.
The mirror instance must detect the loss of the principal instance.
The
last bullet is a crucial item and requires more explanation. There are
various types of failure conditions that may lead to the loss of the
principal and are broadly categorized into hard and soft errors. An example of a hard error is a power failure; a TCP/IP timeout is regarded as a soft error.
Unlike
soft errors, hard errors are typically reported immediately. In either
case, database mirroring uses a timeout setting in combination with a heartbeat
between the mirroring partners in order to detect failure. The default
timeout value is 10 seconds. If a mirroring partner doesn't receive a
response within that time, a failure is assumed. For synchronous
mirroring, you can adjust this value, although you shouldn't use a
value of less than 10 seconds to prevent unwanted failovers due to
temporary network issues.
With these
conditions in mind, let's walk through the process of an automatic
failover, starting from the point of assumed failure:
The principal database, if still available, sets its status to disconnected, and drops any client connections.
The witness and mirror instances register the principal instance as unavailable.
The mirror database rolls forward any outstanding transactions in its redo queue.
The mirror database comes online as the new principal.
When the original principal database comes online, it's registered as the mirror, and synchronizes missing transactions.
Let's
consider step 1 in more detail. Imagine a case where a private network
connection to the mirror and witness instances breaks but the public
network used for client connections is still available. In this case,
the database is still up and running and available to clients, but a
mirroring failure is assumed. By actively disconnecting the client
connections, a situation is prevented whereby both databases may
temporarily receive updates to the database, resulting in a loss of
data.
Enabling a witness role in a mirroring session introduces the concept of quorum. A mirroring session is said to have quorum
when at least two of the three instances (principal, mirror, and
witness) in the mirroring relationship are connected. For a mirroring
database to be available, quorum must exist. When all three instances
are connected, full quorum exists. If
the principal instance fails, the mirror instance has quorum with the
witness and coordinates with it to take on the role of principal. If
the new principal then loses the connection to the witness, no quorum
exists, and the database is taken offline.
|
To
fully benefit from automatic failover, you should consider how clients
can be automatically reconnected to the mirror database. One of the
great things about database mirroring is that client connections using SQL Server Native Client
(SNAC) can benefit from its understanding and awareness of database
mirroring. A SNAC connection string includes the Failover Partner
option, as shown in this example:
Data Source=SV1\Sales; Failover Partner=SV2\Sales; Initial Catalog=Sales;
Using
the SNAC's failover partner option automates a lot of the hard work in
application reconnection logic required for other high-availability
options. However, even with database mirroring in place, client
connections originating from non-SNAC sources won't be able to take
advantage of automatic reconnection, unless the reconnection logic is
coded into the application. This is an important high-availability
consideration; while the database may fail over immediately and without
any data loss, if the clients can't automatically reconnect to the
mirror, it can hardly be considered a success, thus devaluing part of
the appeal of automatic failover.
High-safety mirroring sessions can also use the manual failover method.
2. Manual failover
The
manual failover method, available only in high-safety (synchronous)
mode, is typically used when preparing for a planned outage such as a
hardware or service pack upgrade.
The DBA enacts the manual failover, which swaps the mirroring roles between the principal and mirror.
Applications are reconnected to the new principal database once it completes the processing of its redo queue and comes online.
Mirroring
is suspended and the mirror instance is taken offline for upgrade,
during which time the principal database runs exposed—that is, failover
of any type isn't possible, so if the new principal instance fails, an
outage will result.
Once
the upgrade is complete, the mirror database rejoins the mirroring
session and synchronizes outstanding transactions (catches up).
At
this point, the mirroring roles can be reversed to return the instances
to their original state, although assuming both servers are configured
with the same processing capacity and load, this step shouldn't be
required, so the current roles could remain in place.
Of
course, if there are multiple databases on the server being taken
offline for a planned outage, and some of them aren't mirrored, then
this approach obviously needs to be reconsidered.
The final failover mode is forced service.
3. Forced service
Typically
used in disaster-recovery scenarios with high-performance
(asynchronous) mirroring, this option brings the mirror database online
and makes it available for client connections only if the link between
the principal and mirror instances is lost.
The
critical consideration before enacting this failover mode is the
possibility of data loss. If a network connection drops and the
principal database continues processing transactions before failover,
these transactions won't be available for recovery on the mirror
database. As such, forced service is typically only used when service
must be resumed as soon as possible and the possibility of data loss is
accepted.
In closing our section on failover modes, let's walk through a number of failover scenarios.
4. Failure scenarios
To understand how failure is handled based on the mirroring mode and which instance fails, consider table 2. Read the notes that follow the table in conjunction with each example.
Table 2. Failure scenarios by mirroring topology and transaction safety
Failure | Mirroring mode | Witness | Action |
---|
Principal | Synchronous | Yes | Automatic failover (see note 1) |
Principal | Synchronous | No | Manual failover (see note 2) |
Principal | Asynchronous | No | Force service to make mirror available (see note 3) |
Mirror | Both | Both | Principal runs exposed and transactions retained (see note 4) |
Witness | Synchronous | Yes | Automatic failover not possible |
Action Notes:
Assuming the mirror and witness are in contact, automatic failover occurs after the nominated timeout.
Mirroring is stopped on the mirror database using ALTER DATABASE <dbname> SET PARTNER OFF. The mirroring database is then recovered using RESTORE DATABASE <dbname> WITH RECOVERY. When the principal server becomes available, mirroring would need to be reestablished, this time in the reverse direction.
Service is forced by running this command on the mirror database: ALTER DATABASE <dbname> SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS.
For
as long as database mirroring exists, transactions generated on the
principal must remain in the transaction log, so disk space usage would
need to be closely monitored while the mirror instance is unavailable.
Armed
with an overview of database mirroring principals and options, let's
roll up our sleeves and get into the details of the implementation.