Developing the Server Architecture
CPU, RAM, and disk I/O
are the three most important items when planning and configuring server
hardware. The size, or robustness, of the server provisioned for any
given role dictates how well it will handle the load. When discussing
expectations of the overall solution, some level of understanding needs
to be communicated and agreed on. ConfigMgr has many dependencies,
including business and user requirements in addition to the overall
infrastructure and network services requirements. This makes it
difficult to predict expectations for the overall solution. Because each
environment is different and has different requirements, there is no
“one size fits all” solution.
Database Servers
The site database
server is the most memory-intensive role in the ConfigMgr hierarchy. The
amount of memory used is configurable in SQL Server and limited to 3GB,
unless you are running SQL Server on an x64 platform and operating
system. If SQL Server will require more than 3GB, as in
instances when it is not dedicated to ConfigMgr, using a separate SQL
Server running on x64 becomes a compelling solution.
Several counters are listed in Table 4 that you will want to evaluate on your ConfigMgr database server.
Table 4. Site Database Server Counters to Be Monitored
Object | Instance | Comments |
---|
Physical Disk | Avg. Disk Queue Length: Volume | Select
one of these counters for each volume involved in Configuration Manager
processes. This includes the operating system installation volume,
ConfigMgr installation (inbox) volume, as well as the SQL Server tempdb,
site database, and log volumes. |
Physical Disk | % Disk Time | Select
one of these counters for each volume that is involved in Configuration
Manager data processing. These include the operating system
installation volume, ConfigMgr installation (inbox) volume, SQL Server
tempdb, site database, and log volumes. |
SqlServer:General Statistics | Temp Tables Creation Rate | General SQL Server statistics. |
SqlServer:General Statistics | Logouts/sec | General SQL Server statistics. |
SqlServer:General Statistics | Logins/sec | General SQL Server statistics. |
SqlServer:SQL Statistics | SQL Re-Compilations/sec | General SQL Server statistics. |
SqlServer:SQL Statistics | SQL Compilations/sec | General SQL Server statistics. |
SqlServer:SQL Statistics | Batch Requests/sec | General SQL Server statistics. |
SqlServer:Memory Manager | Lock Memory | General SQL Server statistics. |
SQLServer:Locks | Lock Requests/sec | General SQL Server statistics. |
SQLServer:Locks | Number of Deadlocks/sec | General SQL Server statistics. |
SqlServer:Databases | Transactions/sec: SCCM db | General SQL Server statistics. |
SqlServer:Databases | Transactions/sec: Wsus Db | General SQL Server statistics. |
You will also want to
understand some basic SQL Server best practices. Some of these options
will vary depending on your site size, hierarchy, which roles you are
using, and how you are using them.
Microsoft has also
produced a SQL Server 2005 Best Practice Analyzer (BPA) that gathers
data from SQL Server configuration settings. The SQL BPA produces a
report using predefined recommendations to determine if there are issues
with the SQL Server implementation.
General Performance
There
is no ideal performance or target goal for a given ConfigMgr solution.
Cost/benefit analysis should be performed to weigh the performance cost
versus the actual requirements.
Across any ConfigMgr
role, it is important to understand the overall load the role places on
a system, and how the system will handle that load. Table 5
illustrates a general array of performance counters system
administrators should be aware of and use to gauge the overall
performance, or health, of their systems. These counters are not
specific to servers or roles, and can be applied to any Microsoft
Windows operating system.
Table 5. General System Performance Counters
Object | Counter | Instance | Notes |
---|
System | % Total Processor Time | N/A | Less than 80% is acceptable. Consistently exceeding that level means more CPU is needed or the load needs to be reduced. |
System | Processor Queue Length | N/A | Two or fewer means the CPU utilization is acceptable. |
Thread | Context Switches/sec | _total | Lower is better. Measure the thread counter to enable the processor queue length counter. |
Physical Disk | %Disk Time | Each disk | Less than 80% is acceptable. |
Physical Disk | Current Disk Queue Length | Each disk | Tells
you how many I/O operations are waiting for the hard disk to become
available. Opinions vary widely on this one; the common rule is to
multiply the number of spindles in the array by two and make sure the
value stays below this. |
Memory | Committed Bytes | N/A | Should be less than the installed RAM. |
Memory | Page Reads/sec | N/A | If consistently exceeding 5, add RAM. |
SQL Server | Cache Hit Ratio | N/A | 98% or more is acceptable; lower means SQL is being delayed by paging. |
Disk Performance
Disks
today are the weakest point in a computer’s performance, and you will
want to give serious attention to designing the right disk subsystem for
the various ConfigMgr roles. Due to the increasing demand to lower
server prices, vendors now make server systems available using hardware
designed for the desktop-level system. This may lead to performance
bottlenecks and disk failures with ConfigMgr site systems. Although
performance using SCSI (Small Computer System Interface) devices may be
adequate for server specs, technologies such as SATA (Serial Advanced
Technology Attachment) have a much higher Mean Time Between Failure
(MTBF), which is calculated during Phase 2 of a hard drive’s life.
It is important to
understand the implications of drive failure in servers. Although a
drive may fail and the system may continue to run, if another drive
fails, the entire volume goes down, ultimately creating an outage. If
you are dealing with an enterprise environment, outages are never
welcome. Here are the three phases of a drive’s life:
Phase 1 of a drive’s life is the burn-in phase, and failure is very high.
In
phase 2, the drive is run for a length of time and the failure rate is
minimal. This equates to the normal operational lifetime of the drive
and is how the MTBF value is calculated.
Phase 3 is where failure rates increase and the drive is reaching the end of its life, or warranty (ironically).
Table 6 lists characteristics of several types of drives.
Table 6. Disk Drive Characteristics
Drive Type | Rotational Speed | Average Seek/Access Time | MTBF |
---|
EIDE | 5400–7500 rpm | Seek time: 8 to 10 ms | 300,000–500,000 hours at 20% duty cycle. |
SCSI | 7500–15000 rpm | Access time: 5 to 10 ms | 600,000–1,200,000 hours at 100% duty cycle. |
SATA | 5400–10000 rpm | Access time: 3 to 7 ms | Mostly less expensive drives than SCSI, and MTBF’s defined for less than 100% duty cycle. MTBF 500,000–1,500,000 hours. |
Disk I/O is the
biggest performance bottleneck on ConfigMgr implementations and can have
a large impact on overall site health. When a site cannot keep up with
client demands, a snowball effect occurs—unless the load decreases, the
server cannot catch up and performance continuously deteriorates.
Modern-day best practices for disk architecture include the following:
Use SCSI or SAS devices when possible. Use
hardware RAID (Redundant Array of Independent Disks) instead of
software RAID. Software RAID uses the CPU of the server, taking away
from its ability to process computations. Use
battery-backed cache controller cards. This allows the disks to run at a
higher performance level due to the lack of corruption from possible
power loss. More spindles with smaller size are better than fewer spindles of larger disks. Utilize eight or more drives in RAID 1+0 when serious I/O or performance concerns are present. Make
sure you have adequate network bandwidth to support data transfers. As
an example, it takes 2.5Gbps to equate to the transfer rate of a SCSI
Ultra 320 drive.
|
Smaller
sites may be able to run sufficiently on a small array, such as a RAID1
array, which uses two disks. However, larger implementations will
falter on such a small backend disk subsystem. As scale increases in the
enterprise or demand increases on the disk subsystem used by ConfigMgr,
a larger array becomes necessary to support the load. Unfortunately,
there is no formula where x number of ConfigMgr clients equals y
number of disks—there are just too many possible implementation paths
in ConfigMgr 2007 to allow a standard formula to dictate disk I/O load.
When dealing with
larger enterprises or more aggressive policy evaluation intervals, such
as daily or hourly inventories, know that adding spindles always
increases performance of the disk subsystem. Arrays composed of many
disks will yield exponentially better performance than arrays just
several disks smaller. An easy way to understand this is thinking of
each disk as a worker going to find information. When there are
additional workers, the information is returned quicker.
If ConfigMgr
console performance is important to your ConfigMgr administrators, you
will want to explore SANs (Storage Area Networks) and other storage
solutions for the ConfigMgr database and binaries. Although discussing
SANs, iSCSI (Internet SCSI), you will want to explore them in large-scale
enterprises with 20,000 or more clients reporting to a site server. This
does not imply that if there are fewer than 20,000 clients that you
should not look at using a SAN for your SQL Server databases or
distribution points. Storage solutions offer a variety of other
benefits, including disaster recovery, backup, and other options that
are frequently vendor specific.
Disk optimization steps include the following:
- The ConfigMgr SQL database should be on its own array.
- The ConfigMgr SQL transaction log should be on its own array.
- The Windows operating system should be on its own array.
- Any distribution point should be on its own array.
- Any software update point should be on its own array.
Operating
systems perform best when loaded on RAID 1 arrays. Consult with your
company’s standard on whether you use RAID1+0 or external storage
solutions such as SANs. The principle here is that two disks give good
performance, redundancy, and the lowest possible failure rate. (That is
correct. With only two disks in a RAID1 array, you are four times less
likely to have a failure than in a RAID5 array with eight disks!)
Databases
typically need to be placed on RAID5 or RAID10 arrays, due to the sheer
number of disks required to support the database size. Fortunately,
ConfigMgr has a relatively small database size, although its size is
dependent on a multitude of variables such as inventories, packages,
number of clients, features in use, and such. SMS 2003’s SQL sizing was
based on 50MB + (N × 250KB), where N is the number of clients. This means that if there were 5,000 clients, the formula would read as follows:
50MB + (5,000 × 250KB) = 1.27GB |
This sizing formula was
found to be unrealistically low, and most administrators doubled or
tripled the value. With ConfigMgr 2007, you can use the same rule of
thumb for database sizing, but should increase the 250KB multiplier to
support the new features, including patch management, configuration
management, and expanded inventory. Experience has shown that 2MB per
client is a more realistic value to use than 250KB as a starting point
for sizing the ConfigMgr database. This means you should use the
following formula to determine the required database size:
50MB + (N × 2,048KB), where N is the number of clients |
Using this new formula for the same size (5,000 clients) gives a considerably higher number:
50MB + (5,000 × 2,048KB) = 9.8GB |
SQL Server
transaction logs can usually be a RAID1 array because it is not common
for ConfigMgr requirements to do point-in-time restores. This means
selecting a simple database recovery model, so the transaction log will
not need to be extraordinarily large.
DPs and state
migration points are the most critical in terms of disk I/O. Memory and
CPU on these roles are a minor concern, and are not an issue as long as
there’s sufficient RAM on the system to prevent unnecessary swapping.
Distribution
points can have the most widely varying requirements depending on how
they are used. As an example, if a company performs routine software
patching and package pushes, the size of its distribution point may be
minimal, particularly if BITS is used extensively to download and
execute content. Anything from a single disk to a RAID1 array could be
effective in a branch DP or a conventional DP.
If you introduce the OSD
functionality into your ConfigMgr solution, the requirements jump
substantially. Conventional packages are relatively small, between 1MB
and 200MB, depending on the average package. Microsoft Office usually is
one of the largest at 1GB for the 2007 version. Operating system
images, regardless of the applications being in the images or called
from outside them, average around 1GB for Windows XP and 3GB to 4GB for
Vista images in the ImageX WIM (Windows Imaging) format. In addition,
because download-and-execute is not an option for operating system
deployments, you can have a DP with a very large data demand for many
machines in parallel. The best solution for this scenario is many disk
spindles. You should seriously consider SANs if your ConfigMgr
implementation requires supporting large operating system deployments.
State migration points may have similar disk I/O requirements. Disk I/O
for this role is difficult to calculate, with each user’s state volume
size being an unknown.
Tip: Calculating User State Volume
You can use
ConfigMgr to calculate user state volume size, thus helping to define
capacity requirements and expected timeframes for OS migrations. Simply
query the user’s dataset you desire to capture running a script deployed
as a package, and store the size in Windows Management Instrumentation
(WMI) on the client. The next inventory will upload data to the site
server, where it can be used to populate reports. Microsoft partners
such as SCCM Experts (formerly known as SMS Experts) specialize in
solutions such as this.
Monitoring Performance
If available,
utilize tools such as System Center Operations Manager (OpsMgr) 2007 to
baseline performance and monitor ConfigMgr site health. When external
monitoring solutions are not available, use a tool such as Performance
Monitor (Perfmon), which is built into each version of Windows. Perfmon
enables administrators to collect a myriad of performance data and log
it to a file for later analysis. Realize that this method of using
Performance Monitor can place a load on the system when the samples are
captured at an aggressive interval! Because you only need to look at
average performance over a broad period, sampling every 10 or 15 minutes
is acceptable and provides a multitude of useful data to analyze when
tuning the system.
Tip: Benchmarking
Consider
periodically collecting performance metrics from the site systems when
they are utilized during business hours. This data ultimately will
provide a baseline by which you can measure performance. This data is
useful for scaling out or up, depending on how the load increases on the
site systems. CPU, memory, disk, and network throughput are the four
areas to evaluate periodically.