DESKTOP

Windows Server 2003 : Server Clustering (part 1) - Cluster Terminology, Types of Resources, lanning a Cluster Setup

8/24/2012 9:04:23 PM
If an NLB cluster is too limited in functionality for you, you should investigate a true server cluster. In a true server cluster, a group of machines have a single identity and work in tandem to manage and, in the event of failure, migrate applications away from problematic nodes and onto functional nodes. The nodes of the cluster use a common, shared resource database and log storage facility provided by a physical storage device that is located on a hardware bus shared by all members of the cluster.

The shared data facility does not support IDE disks, software RAID (including Windows-based dynamic RAID), dynamic disks or volumes, the EFS, mounted volumes and reparse points, or remote storage devices such as tape backup drives.


Three types of clusters are supported by Windows Server 2003 in the Enterprise and Datacenter editions of the product: single node clusters, which are useful in test and laboratory environments to see if applications and resources function in the manner intended but do not have any sort of fault-tolerant functionality; single quorum device clusters, which are the most common and most functional type of cluster used in production because of their multiple nodes; and majority node set clusters, which function as a cluster but without a shared physical storage device, something required of the other two types. Majority node set clusters are useful if you do not have a SCSI-based SAN or if the members of a cluster are spread out over several different sites, making a shared storage bus unfeasible. The Enterprise Edition supports up to four cluster nodes; the Datacenter Edition supports up to eight.

Clusters manage failure using failover and failback policies (that is, unless you are using a single node cluster). Failover policies dictate the behavior of cluster resources when a failure occurs—which nodes the failed resources can migrate to, the timing of a failover after the failure, and other properties. A failback policy specifies what will happen when the failed node comes back online again. How quickly should the migrated resources and applications be returned to the original node? Should the migrated objects stay at their new home? Should the repaired node be ignored? You can specify all of this behavior through policies.

1. Cluster Terminology

A few specific terms have special meanings when used in the context of clustering. They include the following:


Networks

Networks, also called interconnects, are the ways in which clusters communicate with other members (nodes) of the cluster and the public network. The network is the most common point of failure in cluster nodes; always make network cards redundant in a true server cluster.


Nodes

Nodes are the actual members of the cluster. The clustering service supports only member nodes running Windows Server 2003 Enterprise Edition or Datacenter Edition. Other requirements include the TCP/IP protocol, connection to a shared storage device, and at least one interconnect to other nodes.


Resources

Resources are simply anything that can be managed by the cluster service and that the cluster can use to provide a service to clients. Resources can be logical or physical and can represent real devices, network services, or file system objects. A special type of physical disk resource called the quorum disk provides a place for the cluster service to store recovery logs and its own database. I'll provide a list of some resources in the next section.


Groups

Resources can be collected into resource groups , which are simply units by which failover and failback policy can be specified. A group's resources all fail over and fail back according to a policy applied to the group, and all the resources move to other nodes together upon a failure.


Quorum

A quorum is the shared storage facility that keeps the cluster resource database and logs. As noted earlier in this section, this needs to be a SCSI-based real drive with no special software features.

2. Types of Resources

A variety of resources are supported out of the box by the clustering service in Windows Server 2003. They include the following:


DHCP

This type of resource manages the DHCP service, which can be used in a cluster to assure availability to client computers. The DHCP database must reside on the shared cluster storage device, otherwise known as the quorum disk.


File Share

Shares on servers can be made redundant and fault-tolerant aside from using the Dfs service  by using the File Share resource inside a cluster. You can put shared files and folders into a cluster as a standard file share with only one level of folder visibility, as a shared subfolder system with the root folder and all immediate subfolders shared with distinct names, or as a standalone Dfs root.

Fault-tolerant Dfs roots cannot be placed within a cluster.



Generic Application

Applications that are not cluster-aware (meaning they don't have their own fault tolerance features that can hook into the cluster service) can be managed within a cluster using the Generic Application resource. Applications managed in this way must be able store any data they create in a custom location, use TCP/IP to connect clients, and be able to receive clients attempting to reconnect in the event of a failure. You can install a cluster-unaware application onto the shared cluster storage device; that way, you need to install the program only once and then the entire cluster can use it.


Generic Script

This resource type is used to manage operating system scripts. You can cluster login scripts and account provisioning scripts, for example, if you regularly use those functions and need their continued availability even in the event of a machine failure. Hotmail's account provisioning functions, for instance, are a good fit for this feature, so users can sign up for the service at all hours of the day.


Generic Service

You can manage Windows Server 2003 core services, if you require them to be highly available, using the Generic Service resource type. Only the bundled services are supported.


IP Address

The IP Address resource manages a static, dedicated IP address assigned to a cluster.


Local Quorum

This type of resource is used to represent the disk shared by the cluster for activity logs and the cluster resource database. Local quorums do not have failover capabilities.


Majority Node Set

The Majority Node Set resource represents cluster configurations that don't reside on a quorum disk. Because there is no quorum disk, particularly in instances where the nodes of a cluster are in separate, geographically distinct sites, there needs to be a mechanism by which the cluster nodes can stay updated on the cluster configuration and the logs each node creates. Only one Majority Node Set resource can be present within each cluster as a whole. With a majority node set, you need 1/2n + 1 functioning nodes for the cluster to be online, so if you have four members of the cluster, three must be functioning.


Network Name

The Network Name resource represents the shared DNS or NetBIOS name of the cluster, an application, or a virtual server contained within the cluster.


Physical Disk

Physical Disk resources manage storage devices that are shared to all cluster members. The drive letter assigned to the physical device is the same on all cluster nodes. The Physical Disk Resource is required by default for all cluster types except the Majority Node Set.


Print Spooler

Print services can be clustered using the Print Spooler resource. This represents printers attached directly to the network, not printers attached directly to a cluster node's ports. Printers that are clustered appear normally to clients, but in the event that one node fails, print jobs on that node will be moved to another, functional node and then restarted. Clients that are sending print jobs to the queue when a failure occurs will be notified of the failure and asked to resubmit their print jobs.


Volume Shadow Copy Service Task

This resource type is used to create shadow copy jobs in the Scheduled Task folder on the node that currently owns the specified resource group hosting that resource. You can use this resource only to provide fault tolerance for the shadow copy process.


WINS

The WINS resource type is associated with the Windows Internet Naming Service, which maps NetBIOS computer names to IP addresses. To use WINS and make it a clustered service, the WINS database needs to reside on the quorum disk.

3. Planning a Cluster Setup

Setting up a server cluster can be tricky, but you can take a lot of the guesswork out of the process by having a clear plan of exactly what goals you are attempting to accomplish by having a cluster. Are you interested in achieving fault tolerance and load balancing at the same time? Do you not care about balancing load but want your focus to be entirely on providing five-nines service? Or would you like to provide only critical fault tolerance and thereby reduce the expense involved in creating and deploying the cluster?

If you are interested in a balance between load balancing and high availability, you allow applications and resources in the cluster to "fail over," or migrate, to other nodes in the cluster in the event of a failure. The benefit is that they continue to operate and are accessible to clients, but they also increase the load among the remaining, functioning nodes of the cluster. This load can cause cascading failures—as nodes continually fail, the load on the remaining nodes increases to the point where their hardware or software is unable to handle the load, causing those nodes to fail, and the process continues until all nodes are dead—and that eventuality really makes your fault-tolerant cluster immaterial. The moral here is that you need to examine your application, and plan each node appropriately to handle an average load plus an "emergency reserve" that can handle increased loads in the event of failure. You also should have policies and procedures to manage loads quickly when nodes fail. This setup is shown in Figure 1.

Figure 1. A balance between load balancing and high availability

If your be-all and end-all goal is true high availability, consider running a cluster member as a hot spare, ready to take over operations if a node fails. In this case, you would specify that if you had n cluster nodes, the applications and resources in the cluster should run on n-1 nodes. Then, configure the one remaining node to be idle. In this fashion, when failures occur the applications will migrate to the idle node and continue functioning. A nice feature is that your hot spare node can change, meaning there's not necessarily a need to migrate failed-over processes to the previously failed node when it comes back up—it can remain idle as the new hot spare. This reduces your management responsibility a bit. This setup is shown in Figure 2.

Figure 2. A setup with only high availability in mind

Also consider a load-shedding setup. In load shedding, you specify a certain set of resources or applications as "critical" and those are automatically failed over when one of your cluster nodes breaks. However, you also specify another set of applications and resources as "non-critical." These do not fail over. This type of setup helps prevent cascading failures when load is migrated between cluster nodes because you shed some of the processing time requirements in allowing non-critical applications and resources to simply fail. Once repairs have been made to the nonfunctional cluster node, you can bring up the non-critical applications and resources and the situation will return to normal. This setup is shown in Figure 3.

Figure 3. A sample load-shedding setup
Other  
  •  Windows XP : Participating in Internet Newsgroups - Downloading Messages
  •  Windows XP : Participating in Internet Newsgroups - Working with Newsgroups in Outlook Express
  •  Analysis Ultrabooks
  •  Farewell To Pixels : Retina MacBook Pro brings the new age of dot-free displays to OS X
  •  Computing – OS
  •  Windows Server 2003 : Protecting Network Communications with Internet Protocol Security - IPSec Basics (part 2) - Differences Between AH and ESP, Process and Procedure
  •  Windows Server 2003 : Protecting Network Communications with Internet Protocol Security - IPSec Basics (part 1) - Security Advantages of IPSec
  •  Windows Vista : Communicating with Windows Mail - Handling Incoming Messages (part 2) - Customizing the Message Columns, Setting Read Options
  •  Windows Vista : Communicating with Windows Mail - Handling Incoming Messages (part 1) - Processing Messages
  •  Windows Vista : Communicating with Windows Mail - Setting Up Mail Accounts
  •  Ultra-X P.H.D PCI2 - Solve PC Problems Easily (Part 2)
  •  Ultra-X P.H.D PCI2 - Solve PC Problems Easily (Part 1)
  •  Confessions Of An Internet Troll (Part 2)
  •  Confessions Of An Internet Troll (Part 1)
  •  Windows Vista or Windows Server 2008 : Architecture of Group Policy - Domain Controller Selection During GPO Management
  •  Windows Vista or Windows Server 2008 : Architecture of Group Policy - Group Policy Dependencies
  •  Retina MacBook Pro
  •  Suitcase Fusion 4
  •  Canon PIXMA MX895
  •  Samsung Series 5 Ultra
  •  
    Top 10
    Canon PowerShot G15 12MP Digital Camera With 3-Inch LCD
    3D Printed Guns
    Dual-channel DDR3 RAM (Part 4)
    Dual-channel DDR3 RAM (Part 3)
    Dual-channel DDR3 RAM (Part 2)
    Dual-channel DDR3 RAM (Part 1)
    In-Win G7 Black Windowed Mid-Tower Case
    Starcraft II Gaming Mouse & Marauder Starcarft II Gaming Keyboard
    The Computers That Came In From The Cold (Part 2)
    Joystick Junkies - The Sim Hardware Roundup (Part 3) : Thrustmaster HOTAS Warthog, Thrustemaster TH8 RS Gear Shifter, ButtKicker Gamer 2
    Most View
    Microsoft Dynamics Sure Step 2010 : A repeatable process for the sales teams (part 2)
    Sony KDL-40HX853 - Superb DVD Upscaling
    Find Yourself With Geolocation Technology (Part 1)
    Philips Hue - Color Your World
    Ministry Of Sound - The Under-Appreciated World Of PC Gaming Audio (Part 1)
    The Ideal OS (Part 2)
    The Price Of Piracy (Part 1)
    BizTalk 2006 : Dealing with Extremely Large Messages (part 1) - Large Message Decoding Component
    Work Zone On Smartphone (Part 1)
    Compact Digital Cameras Under $300 (Part 2) - BenQ LM100