In my experience, perhaps 85% of SAP performance problems are considered to be storage-related. This is not to say that 85% of all
SAP problems can be traced back to the storage system—but by the time I
am called in, the easy fixes are typically taken care of. So I rarely
run into simple profile parameter problems, or issues with other
hardware subsystems, or problems with the network, either. Instead,
after a quick review of the entire solution stack I usually find myself
drilling down into the following:
The model and features of the disk subsystem servicing the systems that exhibit the slow performance Details as to how the disk subsystem has been configured Database statistics, to quantify how well the database itself performs on the given disk subsystem platform Specific
transaction loads on the system (that is, batch processes or especially
heavy online user transactions), especially the top 40 or so
transactions identified by transaction ST03 as consuming the most
database request time Finally, the ABAP programs (or other code) that is actually being executed by these top 40 transactions
Although a variety of disk subsystems are
deployed today in support of SAP solutions, including direct-attached
SCSI, Channel-Channel Arbitrated Loop-based storage, and a few NAS
(Network Attached Storage) systems, the discussions that follow focus on
the predominant disk subsystems being deployed today—Storage Area
Networks, or SANs.
Special Considerations for Storage Area Networks
Deploying Storage Area Networks, or SANs, for
current and new SAP system landscapes has kept many SAP professionals
quite busy for the last two years. The latest iterations represent not
only some of the fastest disk subsystems ever manufactured, but also
represent everything that customers need in a highly available and
scalable disk solution. Not only does a SAN give us the ability to
expand quickly, but it also allows us to move data around with ease.
Snapshots, disk clones, data replication, and more satisfy
high-availability requirements, and help us to address disaster
recovery, too.
With these new capabilities comes new
complexity, of course, and certainly new paradigms. In some cases, we
also have a new set of issues that must be addressed when deploying the
latest in SAN technology—switched fabric designs, planning for different
data access models within the same SAN, providing connectivity to new shared
resources like tape drives and tape libraries, and all of the
complexity that comes with deploying new software solution sets that
interoperate seamlessly (or nearly so!) within the SAN environment.
Moving on, let’s sneak a quick look at some basic best practices for
implementing SAP in a SAN environment, with the understanding that the
newer “virtual” SANs will be covered in detail later.
Deploying SAP on a SAN—General Best Practices and Observations
To approach this in an organized manner, we will
start with some general SAN observations, and then cover the servers
that play a role in the SAN, move to the SAN infrastructure, and finally
wrap up with the disk subsystem itself. The following list has been
assembled as a result of more than a hundred SAP-on-SAN design or
deployment engagements:
As the SAN is a special network
encompassing all components from the HBA to the disk, it is incorrect to
call each cabinet of controllers and disks a SAN. Rather, the entire
connected solution is the SAN. If the development environment is off by
itself and is not connected in any way to production, for example, we
have in place a Development SAN and a Production SAN. After these SANs
are connected (that is, via a fibre cable connecting one SAN’s switch to
the other SAN’s switch), it then becomes a single SAN. Each
server needs at least a single Host Bus Adapter, or HBA. Because this
represents a glaring single point of failure, it is always recommended
to install and configure two where high availability is important. Both
HBAs are cabled to the same SAN, but to redundant switches (discussed
later). A server should not connect to
two different SANs—connecting to multiple SANs concurrently is not
supported. In other words, a single server with two HBAs should not be
connected to two different models or implementations of a SAN. If
a server contains two HBAs, it actually now has two paths in which to
access data on the SAN. To manage this condition, an OS or OS-supported
software utility is typically employed on each server connected to the
SAN. Hewlett-Packard’s line of StorageWorks SANs uses a utility called
SecurePath, for example. SecurePath require a license for each server,
but ships with all StorageWorks cluster kits—SecurePath can help to
ensure that the HBAs and data access paths are balanced, too, from a
performance perspective. Each HBA
requires what used to be called a GBIC, or gigabit interface connector,
as might each port in your fibre switches. Today, GBICs are more
commonly called transceivers. Fibre
cables then run from each GBIC in each HBA to a GBIC (or otherwise
pre-enabled port) in each redundant set of fibre switches. In
regards to fibre switches, different switch vendors may have different
rules regarding cascading their switches (to increase the port count
available to the SAN, for example). I highly recommend that the specific
vendor’s documentation is referenced for details, with the
understanding that many SAN fabrics in use today are limited to very few
levels of cascading, or hop counts. Switched
fabric switches should be configured with dual power supplies. Two
switches are required at minimum to address high availability. A fully
redundant SAN/server configuration consists of four fibre
connections—two to the server, and two to the disk subsystem. These
fibre connections should be carefully mapped to the switches so that
each controller is connected to two switches, not one. Either
something referred to as “zoning” or something referred to as
“selective storage presentation” is used to ensure that a specific
server can only access a specific set of disk drives or virtual drives.
The implementation of this varies with the vendor as well as the product
set. Regardless, though, be sure to implement one of these access
protocols. Failure to do so risks corrupting data. Now
that we have established connectivity from each server to the SAN, we
need to carve up disk drives on the SAN. Until November of 2001, some of
the highest-performing SANs on the market leveraged up to six separate
SCSI controllers across six different SCSI busses, or drive shelves.
Each drive shelf was capable of housing 14 drives, for 84 disk drives
total. It is best to view the disks vertically, in that we stripe data
“up and down” the disk subsystem. A maximum of six drives (when six
shelves are deployed) can be configured per logical drive in this
manner. By addressing our SAN disk configuration in this way, we
mitigate risk inherent in a bus or shelf or shelf-power failure. This in
turn prevents us from ever losing data in the event of one of these
failures. Whenever possible, I promote
the idea of using a graphical user interface to address day-to-day SAN
management, and a scripted command-line approach for actually
configuring a SAN (to allow for rapid cookie-cutter standard SAN
installations). For SAP we need several
sets of disks, depending on which database vendor has been selected. All
data and log files (including redo logs, archive logs, SQL transaction
logs, TempDB, and so on) must reside on the SAN. Further, as long as we
are not using a clustering technology
that requires local access to database executables, the Oracle or SQL
Server (or Informix, DB2, SAPDB, and so on) executables can be located
on the SAN as well. If
you are clustering, you need to add a Quorum drive or similar such
drive as well (actually, a mirrored disk pair distributed across two
different busses is recommended). The quorum must
be located on the SAN. Also, clustering SQL Server or Oracle means that
these database executables must be installed on each node in the
cluster, not out on the SAN. For the
production database, I nearly always recommend that the production
datafiles reside on LUNs (drive partitions) set up for hardware-based
mirroring and striping (RAID 1+0 or 0+1, depending on the storage
vendor’s implementation). For simplicity, I like to create LUNs where
the physical drives are vertically situated one on top of the other. SAP
BW represents a potential exception to this general configuration rule
of thumb for production systems. In the fast disk subsystems available
today, the trade-off between performance and disk space between the
different RAID levels is not as significant as it has been in the past. If
all of the preceding items are in place, you are well-prepared to
address database growth. As the database grows you can add another RAID
1+0 set, for example. The OS will immediately recognize this additional
disk space (if not, go to Disk Administrator and execute the option to
“scan for new devices”), and allow us to format this space and create a
new drive letter. Then, in order to increase your effective database
size, the database administrator simply needs to create more table
spaces or data files on the new disk partitions.
A good example of what a traditional SAN deployment might look like from a disk-layout perspective can be seen in Figure 1.
With your new understanding of how to take
advantage of the features and capabilities a classic SAN environment can
provide in regards to our SAP environment, let’s move ahead and discuss
the latest in SAN technology—the virtual SAN.
The Latest in SAN Technology—Leveraging Storage Virtualization
In review, Storage Virtualization,
or SV, is defined as “the transparent abstraction of storage at the
block level.” Regarding SV, the idea is to minimize the need for
parameters that allow fine-tuning, low-level optimization, and
“tweaking,” in favor of allowing the intelligent virtual disk subsystem
to take care of everything. It’s a paradigm shift in the biggest sense
of the word—hands-off!
The ultimate in high-performance SV equates to
striping and then mirroring all disk partitions across all 84–168 or
more disk drives in a virtual storage disk subsystem. The ultimate in
storage capacity equates to creating a RAID 5 partition across all of
these drives. In either case, multiple partitions, or drives, can be
created and presented to the operating system, or a single enormous
drive can be created and presented instead. And many storage
virtualization technologies automatically create optimally sized and
configured RAID drives based on the space available in conjunction with
historical data. It’s that simple. The various block size parameters,
read/write caching algorithms, read-ahead logic, and so on are all under
the covers, so to speak. The best SV disk subsystems allow the few
remaining configuration choices to be made via a browser-based graphical
user interface, too.
Virtual storage eliminates the physical
one-to-one relationship between servers and storage devices, or
databases and physical drives. Some important, albeit limited,
configuration options still exist, however, and it is these options that
concern us next.
Storage Virtualization Best Practices
When it comes to implementing and configuring a Virtual Array, you have control over the following options:
As with a traditional SAN, you can still
design and implement high-availability SAN switch architectures,
including the ability to create zones for effectively segregating
specific servers and their storage across multiple storage systems. You
can create one or more groups within a physical Virtual Array storage
system. Groups represent another way to segregate storage and servers,
though this capability is limited to disks that are physically attached
to the same disk controller subsystem (normally in one or more
“cabinets” or “disk enclosures”). Thus, in an 84-drive Virtual Array,
you might choose to create two groups of 42 physical drives each, three
groups of varying sizing, or just a single group containing all 84
drives. You can manually create LUNs, or
disk partitions, and select the group (and therefore disk drives) over
which this LUN will be created. You can create many LUNs, or one LUN,
whatever is optimal for the need at hand. For example, you might create
three 100GB LUNs in a single group consisting of 42 drives, and place
your SAPDATA files on these. Upon
creating the LUNs, you can typically specify what RAID level each LUN is
to use, and whether caching should be enabled or disabled (although in
some virtualization product sets, this is simply not possible). In this
way, if required of your solution, you can mix high-density RAID 5 LUNs
(for database disk dumps, for example) with high-performance RAID 1+0
LUNs (for database data files and logs).
Some of the preceding points defy the preaching
of many a database administrator or basis consultant. I can hear them
now. “What, one giant virtual disk chopped up into pieces? You can’t do
that, we’re supposed to keep our logs separate from our data! What
happened to best practices?” Let them know that best practices differ
now, based on the storage solution employed. And recommend that they
read the PDF files found on the Planning CD that relate to configuring
and optimizing, for example, HP’s Enterprise Virtual Array Storage
System.
Finally, let the facts speak for themselves, as presented in Figure 2—the Virtual Array easily handles a variety of workloads, while other traditional high-end storage systems struggle.
|