1. INTRODUCTION
Storage systems have been confounding
database administrators and designers since Microsoft first released
SQL Server. Today DBAs are not only required to design and maintain SQL
Server, but are also often pressed into service as storage
administrators. For DBAs working in the enterprise, communication with
the server, networking, and especially storage teams are always a
challenge.
By examining various storage hardware
components, you will learn how best to protect your data with RAID
technology. You will also see how storage area networks assist in data
protection. Finally, you will learn how to validate your functional
configuration and performance.
2. SQL SERVER I/O
Let’s begin by investigating how SQL
Server generates I/O. We are concerned with reading existing data and
writing new data. At its most basic SQL Server is made up of a few
files that reside within the server file system. As a rule, different
computer system components perform at different rates. It is always
faster to process items in the CPU than it is to serve requests from
processor cache.
SQL attempts to mitigate the relatively slow I/O
system by caching whatever it can in system memory. Newly received data
is first written to the SQL transaction log by SQL Server write-ahead
logging (WAL). The data is then written to
buffer pages hosted in server memory. This process ensures that the
database can be recovered in the event of failure.
Contrast these write operations with read
requests that are generated by SQL Server worker threads. The workers
initiate I/O read operations using the SQL Server asynchronous I/O
engine. By utilizing an asynchronous operation worker threads can
perform other tasks while the read request is completed. The
asynchronous I/O engine depends on Windows and the underlying storage
systems to successfully read and write data to permanent storage.
SQL Server takes advantage of the WriteFileGather
and ReadFileScatter Win32 APIs. WriteFileGather collects data from
discontinuous buffers and writes this data to disk. ReadFileScatter
reads data from a file and disperses data into multiple discontinuous
buffers. These scatter/gather APIs allow the bundling of potential I/O
operations thus reducing the actual number of physical read and write
operation.
3. STORAGE TECHNOLOGY
The Host Bus Adapter (HBA) handles
connections from the server to storage devices and can also perform
several other roles. While a basic HBA provides connectivity to
storage, more advanced HBAs have embedded Array controllers. When the
storage is located within or attached to the server, it is called
Direct Attached Storage (DAS). A storage device managed by a dedicated
external array controller is called Storage Area Network (SAN) attached
storage. Figure 1 shows the basic building blocks of a storage subsystem.
Storage devices connected to a storage network
that are not logically grouped are called, rather inelegantly, a JBOD,
for “just a bunch of disks (or drives).” Figure 2
shows an example of a JBOD. JBODs can be accessed directly by SQL
Server as individual physical disk drives. Just remember that JBODs do
not offer any protection against failure.
Storage array controllers group disks into
volumes called a redundant array of inexpensive disks (RAID).
RAID-constructed volumes offer capacity without failure protection. The
simplest type of unprotected RAID set is often called disk striping, or RAID 0.
To understand a RAID 0 set, imagine a series of
four disk drives lined up in a row. Data written to a stripe set will
fill the first drive with a small amount of data. Each subsequent drive
will then be filled with the same amount of data, at which point the
process is repeated starting with the first disk drive. Figure 3
shows how data looks after it has been written to a RAID 0 disk
subsystem. Each data stripe is made up of some uniform data size. Most
RAID systems allow the user to modify the size of the data stripe.
Concatenated disk arrays are similar to stripe
datasets, differing in the method used to load data. You can think of
concatenated datasets as a group of disk drives that are filled in
series. The first group is filled, then the second group, and so on.
Figure 4
shows the contrast between striped RAID, which is serpentine in its
layout, and the waterfall pattern of a concatenated disk array.
Concatenated systems don’t necessarily lack data protection. Many
storage arrays layer different types of RAID. One example is a system
that combines mirrored physical disks into a concatenated RAID set.
This combined system offers the benefits of protected data and the ease
of adding more capacity on demand since each new concatenated mirror
will be appended to the end of the overall RAID set.
RAID defines two ways to provide failure protection: disk mirroring and parity generation. RAID 1, often called disk mirroring,
places data in equal parts on separate physical disks. If one disk
fails, the array controller will mirror data from the remaining good
disk onto a new replacement disk. Figure 5 details the frequent combination of mirroring and striping. This system is often called RAID 1 + 0 or simply RAID 10.
The storage array uses an exclusive or (XOR)
mathematical calculation to generate parity data. Parity enables the
array to recreate missing data by combining parity information with the
data that is distributed across the remaining disks. This parity data
enables you to make efficient use of your capacity but at the cost of
performance, as the XOR calculation needed to generate the parity data
is resource intensive.
Many different parity RAID configurations have
been defined. The two most common types are disk striping with parity
(RAID 5) and disk striping with double parity (RAID 6). Examples of
both are shown in Figure 6 and Figure 7.
RAID 5 protects a system against a single disk drive failure. RAID 6
protects against a double disk failure. RAID 5 and 6 offer disk failure
protection while minimizing the amount of capacity dedicated to
protection. Contrast this with RAID 1, which consumes half of the
available storage in order to protect the data set.
To create the parity information the RAID engine
reads data from the data disks. This data is computed into parity by
the XOR calculation. The parity information is written to the next data
drive. The parity information is shifted to a different drive with each
subsequent stripe calculation thus ensuring no single drive failure
causes catastrophic data loss.
RAID 6 generates two parity chunks and
diversifies each across a different physical disk. This double parity
system protects against a double disk drive fault. As disk drives
become larger and larger, there is a significant chance that before the
failed data can be repaired a second failure will occur.
RAID 5 and RAID 6 become more space efficient on
larger sets of disk drives. A RAID 5 disk set using seven data drives
and one parity drive will consume less relative space than a disk set
using three data drives and one parity drive.
Each of these RAID sets represents a failure
domain. That is to say, failures within the domain affect the entire
dataset hosted by a given failure domain. Large failure domains can
also incur a performance penalty when calculating the parity bits. In a
four-disk RAID 5 set, only three data drives are accessed for parity
calculation. Given an eight-disk RAID set, seven drives are accessed.
You can combine RAID types into the same volume.
Striping or concatenating several RAID 5 disk sets enables the use of
smaller failure domains while increasing the potential size of a given
volume. A striped, mirrored volume is called RAID 1+0 (or simply RAID
10). This RAID construct can perform extremely well at the cost of
available capacity.
Many storage controllers monitor how
RAID sets are accessed. Using a RAID 10 dataset as an example, several
read requests sent to a given mirrored drive pair will be serviced by
the drive with the least pending work. This work-based access enables
RAID sets to perform reads more rapidly than writes.