Optimizing an Exchange Server 2010 Environment - Analyzing and Monitoring Core Elements

2/13/2011 11:40:15 AM

The capacity analysis and performance optimization process can be intimidating because there can be an enormous amount of data to work with. In fact, it can easily become unwieldy if not done properly. The process is not just about monitoring and reading counters; it is also an art.

As you monitor and catalog performance information, keep in mind that more information does not necessarily yield better optimization. Tailor the number and types of counters that are being monitored based on the server’s role and functionality within the network environment. It’s also important to monitor the four common contributors to bottlenecks: memory, processor, disk, and network subsystems. When monitoring Exchange Server 2010, it is equally important to understand the various Exchange roles to keep the number of counters being monitored to a minimum.

Memory Subsystem Optimizations

At the risk of sounding cliché, forget everything you knew about memory optimization in 32-bit Windows. Because Exchange Server 2010 is a 64-bit application, it requires a 64-bit operating system. 64-bit Windows 2008 deals with memory in an entirely different way than Windows 2003 32-bit did. The concepts of Physical Addressing Extensions (PAE) have gone away, as there are now enough bits to natively address memory, and the old tricks such as "/3GB" and "/USERVA=3030" in the boot.ini files have gone away. Table 1 summarizes some of the key improvements in memory management that will greatly enhance the performance of Exchange Server 2007.

Table 1. Key Improvements in Memory Management with 64-bit Windows
Architectural Component	64-bit Windows	32-bit Windows
Virtual memory	16TB	4GB
Paging file size	512TB	16TB
Hyperspace	8GB	4MB
Paged pool	128GB	470MB
Non-paged pool	128GB	256MB
System cache	1TB	1GB
System PTEs	128GB	660MB

Virtual memory refers to the memory space made from a combination of physical memory and swap file space. Each process in Windows is constrained by this virtual memory size. In 32-bit Windows, this meant that the store.exe, traditionally the largest consumer of memory in Exchange Server, was limited to 4GB of address space. In 64-bit Windows, store.exe can access 16TB of address space. This gives store.exe 4,096 times as much memory space as before. This means Exchange Server 2010 can utilize significantly more physical memory and use the page file, consisting of much slower disks, less often. By being able to cache more of the Exchange Server database in this larger memory space, the requirements for disk I/O are greatly reduced.

The page file refers to the disk space allocated for scratch space where the operating system will place “memory pages” when it no longer has room for them and they aren’t being actively used. This increased value allows for the support of the greater virtual memory size.

Hyperspace is the special region that is used to map the process working set list. It is also used to temporarily map other physical pages for such operations as zeroing a page on the free list, invalidating page table entries in other page tables, and for setting up the address space of a new process.

Paged pool is the region of virtual memory that can be paged in and out of the working set of the system process. It is used by Kernel mode components to allocate system memory.

Non-paged pool is the memory pool that consists of ranges of system virtual addresses. These virtual addresses are guaranteed to be resident in physical memory at all times. Thus, they can be accessed from any address space without incurring paging I/O to the disks. This pool is also used by Kernel mode components to allocate system memory.

System cache refers to the pages that are used to map open files in the system cache.

System PTEs are the Page Table Entries that are used to map system pages. 64-bit programs use a model of 8TB for User and 8TB for Kernel, whereas 32-bit programs use 2GB for User and 2GB for Kernel.

With the Performance Monitor console, a number of important memory-related counters can help in establishing an accurate representation of the system’s memory requirements. The primary memory counters that provide information about hard pages (pages that are causing the information to be swapped between the memory and the hard disk) are as follows:

Memory— Pages/sec—The values of this counter should range from 5 to 20. Values consistently higher than 10 are indicative of potential performance problems, whereas values consistently higher than 20 might cause noticeable and significant performance hits. The trend of these values is impacted by the amount of physical memory installed in the server.
Memory— Page Faults/sec—This counter, together with the Memory—Cache Faults/sec and Memory—Transition Faults/sec counters, can provide valuable information about page faults that are not committed to disk. They were not committed to disk because the memory manager allocated those pages to a standby list. Most systems today can handle a large number of page faults, but it is important to correlate these numbers with the Pages/sec counter as well to determine whether Exchange Server is configured with enough memory.

Improving Virtual Memory Usage

Calculating the correct amount of virtual memory is one of the more challenging parts of planning a server’s memory requirements. While trying to anticipate growing usage demands, it is critical that the server has an adequate amount of virtual memory for all applications and the operating system. This is no different for Exchange Server 2010.

Virtual memory refers to the amount of disk space that is used by Windows Server 2008 and applications as physical memory gets low or when applications need to swap data out of physical memory. Windows Server 2008 uses 1.5 times the amount of random access memory (RAM) as the minimum paging file size by default, which for many systems is adequate. However, it is important to monitor memory counters to determine whether this amount is truly sufficient for that particular server’s resource requirements. Another important consideration is the maximum size setting for the paging file. As a best practice, this setting should be at least 50% more than the minimum value to enable paging file growth, should the system require it. If the minimum and maximum settings are configured with the same value, there is a greater risk that the system could experience severe performance problems or even crash.

The most indicative sign of low virtual memory is the presence of 9582 warning events logged by the Microsoft Exchange Information Store service that can severely impact and degrade the Exchange server’s message-processing abilities. These warning events are indicative of virtual memory going below 32MB. If unnoticed or left unattended, these warning messages might cause services to stop or the entire system to crash.

Tip

Use the Performance snap-in to set an alert for Event ID 9582. This helps proactively address any virtual memory problems and possibly prevent unnecessary downtime.

To get an accurate portrayal of how Exchange Server 2010 is using virtual memory, monitor the following counters within the MSExchangeIS object:

VM Largest Block Size— This counter should consistently be above 32MB.
VM Total 16MB Free Blocks— This counter should remain over three 16MB blocks.
VM Total Free Blocks— This value is specific to your messaging environment.
VM Total Large Free Block Bytes— This counter should stay above 50MB.

Other important counters to watch closely are as follows:

Memory— Available Bytes—This counter can be used to establish whether the system has adequate amounts of RAM. The recommended absolute minimum value is 4MB.
Paging File— % Usage—% Usage is used to validate the amount of the paging file used in a predetermined interval. High usage values might be indicative of requiring more physical memory or needing a larger paging file.

Monitoring Processor Usage

Analyzing the processor usage can reveal valuable information about system performance and provide reliable results that can be used for baselining purposes. Two major Exchange-related processor counters are used for capacity analysis of an Exchange Server 2007:

% Privileged Time— This counter indicates the percentage of nonidle processor time spent in privileged mode. The recommended ideal for this value is under 55%.
% Processor Time— This counter specifies the processor use of each processor or the total processor use. If these values are consistently higher than 50%–60%, consider upgrade options or segmenting workloads.

Tracking these values long term, for trend analysis, makes it much easier to spot accountable anomalies, such as a processor time spike during the online defragmentation or interactions with other systems. Tracking a “weighted average” of these processor values allows you to predict the point in time at which a system needs to be upgraded or when an additional system needs to be deployed to share the load.

Monitoring the Disk Subsystem

Exchange Server 2010 relies heavily on the disk subsystem and it is, therefore, a critical component to properly design and monitor. Although the disk object monitoring counters are, by default, enabled in Windows Server 2008, it is recommended that these counters be disabled until such time that an administrator is ready to monitor them. The resource requirements can influence overall system performance. The syntax to disable and enable these counters is as follows:

diskperf –n (to disable)
diskperf –y [\\computer_Name] (to reenable)

Nevertheless, it is important to gather disk subsystem performance statistics over time.

The primary Exchange-related performance counters for the disk subsystem are located within the Physical and Logical Disk objects. Critical counters to monitor include, but are not limited to, the following:

Physical Disk— % Disk Time—This counter analyzes the percentage of elapsed time that the selected disk spends on servicing read or write requests. Ideally, this value should remain below 50%.
Logical Disk— % Disk Time—This counter displays the percentage of elapsed time that the selected disk spends fulfilling read or write requests. It is recommended that this value be 60%–70% or lower.
Current Disk Queue Length (Both Physical and Logical Disk Objects)— This counter has different performance indicators depending on the monitored disk drive (Database or Transaction Log volume). On disk drives storing the Exchange Server database, this value should be below the number of spindled drives divided by 2. On disk drives storing transaction log data, this value should be below 1.

If there appears to be an excessive load on the disks, consider adding more memory to the Exchange Server 2010 server. Improvements in cache in the Exchange Server database engine allow more information to be read and cached into memory. This decreases the workload on the disks and might alleviate the need to add more disks. For large Exchange Server 2010 servers, it is usually less expensive to add more memory than to add more disks to address this type of issue.

Monitoring the Network Subsystem

The network subsystem is one of the more challenging elements to monitor because a number of factors make up a network. In an Exchange Server messaging environment, site topologies, replication architecture, network topologies, synchronization methods, the number of systems, and more are among the many contributing factors.

To satisfactorily analyze the network, all facets must be considered. This most likely requires using third-party network monitoring tools in conjunction with built-in tools such as the Performance snap-in and Network Monitor. The current version of Network Monitor is 3.3 and can be downloaded from Microsoft at the following URL:

http://www.microsoft.com/downloads/details.aspx?FamilyID=983b941d-06cb-4658-b7f6-3088333d062f&displaylang=en

From a performance standpoint, consider implementing Gigabit Ethernet adapters in your Exchange Server 2010 servers. Given the amount of memory and disk likely to be in the server, it would easily saturate a 100-MB connection. If your server hardware offers it, consider using fault-tolerant configurations for your Ethernet connections that will not be participating in clusters or load-balance groups. Most of the fault-tolerant configurations on the market today separate out input and output to different interfaces, resulting in better overall throughput for the network interfaces.

If you are connecting your storage via iSCSI, strongly consider running dedicated Gigabit Ethernet interfaces for the connection to the iSCSI network with an appropriate Device Specific Module (DSM) to support MultiPath I/O (MPIO). This separates the load of the iSCSI from the load for the users and results in better overall performance for the users.