There are many aspects to
consider when designing and planning high availability with SQL Server
clustering. Chief among them is the number of dedicated passive, or standby, servers to include in the cluster for failover purposes.
In simple clusters with a 1:1 working:standby
ratio, each working server is matched with a standby server that,
during normal operations, isn't used for any purpose other than to
simply wait for a failure (or planned failover). Alternatively, both (or
all) servers can be active, but this introduces resource complications
during failovers; for example, will a server be able to handle its own
load plus that of the failed server?
In previous versions of clustering, Active/Passive, or Active/Active terminology was used to define the usage of two-node server clusters. The current terms are single-instance or multi-instance
and are more appropriate terms considering the ability of today's
clusters to contain up to 16 servers and many more SQL instances, with
one or more standby servers available for failover from any number of
other active servers.
This section will focus on the various clustering topologies commonly used and the resource considerations for each.
1. Single-instance clusters
Under
normal operating conditions, one node is in the standby status, existing
purely to take on the load in the event of failure or a planned outage.
When such action occurs, the instance moves between one server and the
next, as shown in figure 1.
This clustering topology, known as a single-instance cluster,
is the simplest to understand and administer, and also provides the
highest availability while avoiding any performance impacts after
failover. As such, it's commonly used to protect mission-critical SQL
Server applications.
The major downside of this
topology is the cost. During normal operation, one server isn't used in
any capacity, and depending on the cost of the server, this can be an
expensive approach. In addressing this, the multi-instance topology is
frequently used.
2. Multi-instance clusters
As the name suggests, a multi-instance
cluster contains multiple SQL Server instances. In a typical two-node
multi-instance cluster, each cluster node runs one or more instances,
and a failover situation causes an instance to assume another node's
workload in addition to its own. In the example in figure 2, the Sales SQL instance resides on the Sales1 server, and the Marketing
SQL Instance resides on the Sales2 server. In the event of failure of
Sales1, Sales2 will run both the Sales and Marketing instances,
potentially reducing the performance of both instances.
Because of the
increase in server utilization, multi-instance clusters are typically
used in budget-constrained environments, or in those valuing high
availability much higher than reduced performance in a failed state.
The multi-instance example in figure 2
achieves higher resource utilization than the single-instance example
presented earlier; both servers are utilized in the normal working
status.
Multi-instance
clusters require careful consideration in terms of resource
configuration. In our example, if the Sales1 and Sales2 servers each
have 32GB of RAM, and both the Sales and Marketing SQL instances are
using 28GB of RAM, a failover situation would see one (or both)
instances with a potentially drastically reduced memory allocation after
failover, a scenario I refer to as a resource crunch.
To avoid this
issue, configure instances so that the sum total of maximum resource
usage across both (or all) instances doesn't exceed the total resources
of a single node that the instances could end up running on.
The two topologies we've
covered so far represent the opposite ends of the scale; single-instance
clusters with a 50 percent node utilization, and multi-instance
clusters with 100 percent resource utilization. In between these two
lies the N+1/M cluster.
3. N+1/M clusters
To avoid the cost of idle
servers and limit the effects of a failover-induced resource crunch, a
commonly used cluster configuration is an N+1/M
cluster, whereby one or more standby servers exist for more than one
working server. For example, in a three-node cluster, two nodes may be
active, with a third existing as a failover node for both active nodes.
Similarly, a five-node cluster with three active nodes and two failover
nodes is a common cluster configuration.
As the number of cluster
nodes and SQL Server instances increases, so too does the importance of
considering the failover rules governing which node(s) a SQL Server
instance can fail over to.
4. Failover rules
By regularly polling cluster nodes using a series of mechanisms called LooksAlive and IsAlive
checks, a Windows cluster may conclude that a cluster node has failed.
At that point the resources hosted by the node are failed over to
another cluster node.
In a simple two-node cluster, the failover process is straightforward, as demonstrated in figure 5.4.
But consider a five-node cluster containing two passive/standby nodes.
Which (if any) of the standby nodes will be used for failover purposes
when one of the active nodes fails? In large clusters containing many
SQL Server instances, this is a particularly important consideration in
maintaining a well-balanced cluster, ensuring that a single node doesn't
carry a disproportionate burden of load.
There are several common strategies and techniques used
in controlling failover, and central to them is the use of the Preferred Owners, Possible Owners, and Failback settings.
The properties of the SQL Server resource shows the Possible Owners
property. This property lets you specify which nodes the instance is
permitted to fail over to. In a similar manner, the Preferred Owner
setting (not shown) is used to set the preferred failover node (this may
not be chosen in some situations—for example, if the node is
unavailable).
Finally, the
Failback options (not shown) let you determine whether or not resources
fail back to the original node if the failed node comes back online.
With
this background in mind, let's continue by walking through the
installation of a clustered SQL Server instance into a two-node Windows
Server 2008 cluster.