Physical requirements mean different things to
different people. In the context of building your data center, I could
go into minute details regarding actual construction of the facility,
for example. In such case, physical requirements would exist with regard
to the following:
Physical
construction materials and related factors (fire codes, weight-bearing
members for roof-mounted AC/Environmental Units, load-rating factors for
the raised floor construction, cable risers or trays, and so on).
Location
(avoid the first floor or top floor of buildings, and avoid flood
zones, areas where access may be limited in terms of single stairs,
elevators, roads, and so on).
Environmental
systems (cooling, heating, humidity levels), including access to
published thermal specifications for each component to be housed in the
data center.
Accessibility (loading dock/freight elevator access, as well as double-protected public access points).
Physical
security/access, including monitoring systems (card or other systems
for doors and window access, and attention to vertical security above
the dropped ceiling and below the raised floors).
Controlling systems (temperature/fire suppression, smoke, water sensors, and so on).
Lighting and plumbing, as required.
A central operations/monitoring station.
Access to high-bandwidth multi-path data communications circuits or network/Internet connections.
General
dual power infrastructure, in terms of 208 volts AC versus other
options, including the availability of generators, battery backup, and
so on. This can also include access to two discrete city or state power
grids, should high-availability or disaster recovery requirements
dictate such a robust power infrastructure.
These details are best
left up to the experts who design and build data center facilities.
Ensure that at a minimum the preceding points are addressed, however.
Power Requirements
Power problems can
plague an otherwise bullet-proof solution architecture and therefore
power needs to be planned for the long term, not merely for the demands
of Go-Live—your SAP environment will grow, grow, grow. When addressing
the power needs of the SAP data center, it is helpful to first analyze
each specific server, disk subsystem, network, or other hardware component requiring power, and then work “back” to the ultimate power source. For maximum availability, ensure the following:
Where the
highest levels of availability are necessary, each hardware component
must support redundant power supplies (otherwise, the remainder of this
list is not of any use). Preferably, these power supplies should also be
“hot swappable” or “hot pluggable.” In this way, in the event of a
power failure, not only would the server remain available and powered up
on its second power supply, but the failed power supply could be pulled
out and replaced without incurring downtime.
Each
power supply alone should be capable of keeping the hardware component
up and running. That is, if the average load being pulled from one of
the power supplies in a dual-power supply configuration exceeds 50% of
its rated capacity,
you actually don’t have protection from failure of the other power
supply! The alternatives are clear—lower the capacity requirements by
reducing the number of disk drives, CPUs, and so on, or increase the
number of power supplies to three or more, or in rare cases simply
replace the current power supplies with higher-rated alternatives.
Each
power supply must have its own unique power cable. This is a very
common oversight with some of the second-tier server and disk subsystem
vendors, where highly available systems might be touted, but reality
differs. These high-availability wannabes often provide only a single
power cable receptacle even in their “redundant” power supply
configurations. A single anything
represents a single point of failure, and should be therefore avoided.
Besides, I have actually seen a couple of power cables fail in the real
world. As silly or unlikely as that sounds, it happens. And besides,
even more likely is the potential to pull out a single power cable,
effectively bringing down the most available server or disk subsystem.
As
I indicated previously, color-coding or otherwise differentiating power
cables makes it very clear to everyone when things are cabled
correctly. The most common implementation of this involves a black cable
cabled to the primary power supply, and a gray or white cable cabled to
the redundant power supply.
Each
power cable must be routed to dedicated separate power distribution
units (PDUs, or power strips, and so on)—whatever is used by the company
to centralize many power feeds into fewer larger-capacity connections.
Each PDU needs to be analyzed to ensure that the load placed on this
single component, should the other PDU fail, can still be addressed by
the remaining PDU. And as I said earlier, the most common implementation
of this is black cables to one PDU, and gray or white cables to the
redundant PDU.
Each
PDU must in turn be cabled to redundant uninterruptible power supplies,
or UPSes. Like the PDUs, these need to be regularly tested and analyzed
to ensure that they are indeed “redundant.” Note that a UPS tends to
only be equipped to handle short-term power losses, thus necessitating
our next power component.
Primary
power for each redundant power supply should culminate in redundancy at
the breaker boxes as well. That is, each power supply should ultimately
receive its power from a dedicated breaker panel, like that illustrated
in Figure 1.
The
“back-up generator” is a necessity for mission-critical SAP shops.
Whereas the UPS provides short-term relief from blackouts and brownouts,
a generator can conceivably provide power for days, as long as fuel is
available. Select a generator that runs on whatever is most easily
accessible or readily available, including diesel fuel, propane, or
natural gas.
It
is of utmost importance that the generator and the UPS be properly
sized to handle the loads placed upon them. Generators must leverage an
Automatic Transfer Switch (ATS) to allow them to tie into both the power
company (the power utility, or “utility power”) and the SAP data
center, as you see in Figure 2.
Critical systems need to be identified and earmarked for generator
backup. These systems typically include emergency lighting, emergency
environmental and safety controls, the critical SAP data center gear,
and in some cases the contents of the entire data center.
Best practices also
dictate that critical computing systems be isolated from main
facility/operational power, but that each source of power “back up” the
other. In this way, operational failures do not impact the enterprise
system, and vice versa, and both are protected from failure by two power
sources.
UPSes are rated by KVA
or Kilo Volt-Amps. The formula to calculate KVA is amps × volts / 1000 =
KVA. So if your rack is capable of pulling 69.5 amps × 240 volts =
16680 / 1000 = 16.6 KVA. Never allow your UPS to run above 80% capacity.
16.8 KVA is 80% of 21 KVA. So at a minimum, the rack should have 21 KVA
worth of UPS.
|
Power Oversights in the Real World
One
of my favorite enterprise SAP servers, the HP ProLiant, serves as my
next example. A wealth of information is available on the ProLiant, in
the form of something HP calls “quickspecs.” These technical
specifications have been published and updated for years, and describe
in great detail much of the minutia of little interest to anyone but
hardcore techies once the SAP system landscape is in place. Prior to
that time, though, these quickspecs fulfill a number of critical roles.
First, quickly perusing this document reveals that the ProLiant DL760
8-CPU server will draw a maximum of 10 amps on a 208- to 240-volt line,
while producing a moderate 5309 BTUs every hour. These little tidbits of
information will help ensure that the data center facilities folks
understand how many and what type of power circuits to run. The BTUs, on
the other hand, should be fed into a simple model that will determine
the minimum rating of the air handlers (cooling/heating system).
Another bit of
information is provided in the quickspecs as well, the power plug
connector. As with most servers, the power connector from the server to
the PDU or UPS is ordered as an option with the PDU or UPS. The real
challenge then becomes matching the PDU’s power plug with the
appropriate power receptacle. Typically, either an L6-20P or L6-30P is
called for—in the case of certain PDUs deployed at one of my particular
customer’s new SAP data center sites, the L6-30P (a 30-amp circuit) was
specified by the quickspecs.
However, the
customer got ahead of themselves and in the interest of meeting
deadlines for power, laid enough L6-30P power cables for the new Storage
Area Network, too, which was to arrive shortly after the servers. When
the SAN cabinets showed up, though, they couldn’t plug them in. The
connectors were different—they required the L6-20P receptacles. To this
day my colleagues and I are still not quite sure how our motivated
customer managed this, but they actually forced their million-dollar SAN
power cables into the wrong receptacles, and ran the system this way
for perhaps a week before someone noticed that “something just didn’t
look quite right” under the subfloor. Understand that these two
connectors are quite different, and this little engineering feat will
begin to sink in—they somehow managed to squeeze and twist the male
plugs most of the way into the female receptacles. Not only did this
pose a potential safety hazard, but they also risked blowing up their
20-amp SAN gear with 30 amps of juice.
They got lucky.
The SAN had very few drives actually installed and spinning (and
therefore drawing power) that first week. However, this simple oversight
caused a one-week delay in the project plan, while the system was
effectively shut down awaiting the proper wiring. In the end, the lost
time was made up in the operating system and SAP basis installations,
and neither gear nor people were any the worse. But the moral of my
little story should be clear. Not only should the technical
specifications for each piece of equipment be checked and verified, but
we should never solve our power problems by brute force. Besides,
because all of this information is just a click or quick-spec away,
there’s really little excuse for misconfiguring or underallocating power
and cooling resources.
Another common
mistake illustrates how the redundancy of power-related components can
be rendered useless through lack of attention to cabling and overall
power architecture. My customer in this instance wanted to factor in
redundancy at the physical layer of their SAP deployment. Their high-end
servers, disk subsystem, and network equipment all supported redundant
power supplies, so they took advantage of them. Each power supply on the
back left
side of each server and disk subsystem drive shelf was the recipient of a
black power supply cable. This in turn was carefully routed along the
left side of the rack enclosures to a power distribution unit dedicated
for this purpose. Similarly, each power supply located on the back right
side was fitted with a gray power cable, and these gray cables were
also carefully routed to their own PDU. So far, so good—no single points
of failure existed, in that half of the power supplies, cables, and
PDUs could fail, and the system could still remain up and powered.
However, all of
the careful preparation and planning that went into this phase of the
project was tossed out the window after another few minutes of work. My
customer plugged both PDUs into the same UPS, which then was cabled to
two large redundant data center UPSes. Like the 1975 Chevrolet
Corvette’s exhaust system, a dual-power approach to high availability is
just “smoke and mirrors” after everything merges
into a common pipe. The Corvette never realized its peak power
potential that year, and my customer lost any hopes of achieving 100%
reliability, even though the solution “looked good” from many angles. By
plugging the PDUs into the same UPS, they defeated the purpose of
redundant power—if their single-point-of-failure UPS were to fail, the
entire system would grind to a halt.
Like power, the next layer in the solution stack also represents a basic necessity for supporting your SAP enterprise—cooling.
Cooling and Other Environmental Controls
One of the
biggest causes of hardware component failure is heat. Luckily, planning
for cooling requirements has become a lot easier with the popularity of
the World Wide Web. That is, nearly every hardware vendor out there
today publishes BTU/thermal specifications. Your job is then to simply
pull down and “add up” these technical specifications on every piece of
equipment you plan to deploy. Don’t forget to allow for future growth,
either—with server and disk form-factors shrinking every year, the heat
generated per cubic foot of data center space just continues to grow and
grow. To conservatively address the next three years in your data
center planning efforts, determine the average BTU output per cubic
foot, and then double that number and apply it to any remaining floor
space that could conceivably house incremental SAP gear. Don’t forget to
factor in the fact that air might not move uniformly through your data
center and any other cooling dynamics inherent to your facility. In
doing so, you will be eminently ready for the day the VP of Operations
says, “Hey! We’re gonna go ahead with that SAP PLM project, so make room
for 20 new servers and a couple of SAN cabinets in the next few weeks.”
As with cooling,
air handlers exist that allow for controlling and exhausting heat,
monitoring and fine-tuning humidity, and so on. Ensure that new hardware
additions to the SAP data center are plugged into the BTU model as soon
as possible, so that any new requirements for cooling will be given an
opportunity to be addressed. Air handlers and other large environmental
gear like this require significant lead times when new components or
upgrades/replacements loom in the future.
Don’t forget to load the
proper OS-drivers or applets that may be required by your hardware
system to shut itself down in the event of overheating or loss of
cooling. In the case of the ProLiant, the Compaq System Shutdown Service
shuts down the server when the heat exceeds a predefined threshold,
acting in response to commands from the integrated management features
inherent to the ProLiant server platform.
And consider some of the
newer trends in air handling and monitoring. For example, HP recently
developed a robot that literally rolls around your data center floor
looking for hot spots. Upon finding a hot spot, the robot analyzes the
conditions and may, for example, signal your cooling system to increase
airflow to the area.
Or it may instead communicate with your hardware systems to relocate
workloads from one system to another. Utilizing a combination of these
approaches, HP believes that it can reduce cooling costs for its
customers in the neighborhood of 25 percent.