IIS 7.0 : Performance and Tuning - Configuring for Performance

3/6/2011 3:19:12 PM

Configuring your environment for performance requires that you truly understand your applications and how your users will interact with the applications. How many times have you heard administrators tell the tale that an application worked great in the lab, but when it was deployed to production, it crashed? Sound familiar? I hope not! Performing adequate testing beforehand should be a requirement before you purchase production hardware.

You should consider a few scenarios when configuring your servers. First, you need to know some basic information, such as the following:

How many users do you expect to use your application?
At peak usage, how many concurrent users do you expect?
Where are your users based—are they on well-connected networks such as a local area network, or are they connecting over the Internet?
How long will a typical session last?
Does your application require session state?
Does your application interact with a database?
How much RAM does a typical session take?
Will your application be disk-intensive?

These questions are important to understand when you are configuring your environment. Your answers can influence the decision to run everything on a stand-alone or on a set of servers. A set of servers could include a Web farm front end with a separate database server.

If your current application is in production, you can use the Reliability and Performance Monitor to capture baseline numbers that will help you load-test your environment. If your application is brand new, you’ll have to calculate usage numbers and estimate how much traffic you expect.

Server Level

You can do many things at the server level to configure your environment for performance. Before you do anything at server level, however, understanding the application you want to support will go a long way toward creating a smooth running environment. Here are a few things you can do to make sure your server will perform at acceptable levels:

If possible, run the 64-bit version of Windows Server 2008.
Configure your application pools to run 32-bit mode.
Make sure your server has enough RAM.
Make sure your server firmware and basic input/output system (BIOS) are up to date.
If running a Web farm, try to keep your servers configured identically.
Configure your system to use RAID.
Install only necessary software such as monitoring agents, antivirus software, and so on.

IIS

IIS has always been pretty much a self-tuning service. There have been a few registry tweaks here and a metabase adjustment there, but overall, IIS has a good reputation for self-tuning based on a wide variety of workloads. IIS 7.0 continues this behavior. Before we say that IIS 7.0 will scale on any type of hardware, though, you need to take into account the types of applications that you will be deploying. The application types that you choose can greatly change IIS performance, but they have little to do with IIS 7.0 itself. The hardware setup, amount of RAM, and speed of the disk all contribute to the overall Web server experience.

Optimizing for the Type of Load

How you optimize your environment really depends on your needs. For example, do you need to support a large concurrent connection base of users, or is it important to provide numbers of transactions? The following sections provide a couple examples illustrating what to take into account when optimizing your environment.

Load

Load is one of the first performance requirements that should be identified in the planning stages for your application. If your application will have spikes of load, you’ll need to account for that in your design. An application that has peaks and valleys can be harder to design for. Most likely, you’ll need to account for spikes no matter what the load is.

Consider this example: You are the director of IT for Contoso Ltd. You are aware that every year on December 31, the Contoso.com Web site experiences a thousand times more than normal traffic. The visitors are submitting expenses for the previous year, so that they can be reimbursed. Does that mean you scale your environment to meet the volume on December 31, and the rest of the year you have excess capacity? Probably not, but realistically, you need to take into account how to plan for the additional load created by so many visitors accessing the site at one time. You may want to consider using staggered schedules or some other option to prevent all the users from being on the system at once. This might seem like an unreal example, but it provides a picture of what you should define as load and what kinds of things you need to think about when testing. In this scenario, for example, when a user finishes submitting expenses and clicks Submit, that person is going to want the transaction to work so that they don’t have to enter the information again—and of course so they get paid back for their expenses sooner rather than later.

As another example, consider that you are in a situation in which you have to scale your application to meet a lot of concurrent connections. In such a case, you would want to look at having multiple servers in a Web farm to handle not only the load, but also the site availability.

Required Availability

You usually will work with your business partners to determine what an acceptable level of application availability is. As the Internet has matured, applications need to be available 24 hours a day, 7 days a week to maintain this type of availability. Performing routine maintenance or security patching can be tough. One recommendation is to schedule a standard maintenance window. This can be difficult, because Internet applications can be accessed from anywhere. The standard window helps provide some consistency when changes happen, however.

Performance Requirement

Having a defined SLA (service level agreement) can help you understand your performance requirements. Frequent and critical functions need to be identified and tested to ensure they perform properly. Frequently used transactions, intensive operations, and business-critical operations need to perform under load and must be available to users.

When testing your application, it is critical that you use data that is as close to real-world data and user patterns as possible to make sure your system performs optimally. Creating an application baseline in the real world is a key component to being successful.

Type of Content

The type of content in your application has a significant impact on performance and how you might design your Web server environment. You can start by looking at the amount of static content versus dynamic content used on your Web site. The server should be tuned in a way that accommodates the type of content that is more prevalent on the server. A server that contains mostly static content, for example, can take advantage of Kernel-mode caching and output caching.

IIS 7.0 output caching can handle content scenarios that are semi-dynamic, such as an ASP.NET application that pulls data from a database with content that remains relatively unchanged. One suggested technique when using output caching is that you can see a huge performance boost by caching your content for just a few seconds.

Server-Side Tools

IIS 7.0 tries to cache content in many places. Each feature covered in this section determines what type of content is cached along with how the content will be cached. Each has their own place but can help with overall server performance.

HTTP.sys Cache

Microsoft introduced Kernel-mode caching in IIS 6.0. This feature eliminates the need for accessing User-mode cache in many cases. The HTTP.sys cache helps increase server performance and reduces disk cost. The content has to be requested a few times before IIS 7.0 considers caching an URL. You can configure frequentHitTimePeriod and frequentHitThreshold thresholds in the serverRuntime applicationHost.config section. Here are the default serverRuntime settings located in the %windir%\system32\inetsrv\config\schema\IIS_schema.xml folder. These settings are inherited by default.

  <sectionSchema name="system.webServer/serverRuntime">
    <attribute name="enabled" type="bool" defaultValue="true" />
    <attribute name="appConcurrentRequestLimit" type="uint" defaultValue="5000" />
    <attribute name="maxRequestEntityAllowed" type="uint" defaultValue="4294967295" />
    <attribute name="uploadReadAheadSize" type="uint" defaultValue="49152" validationType="integerRange" validationParameter="0,2147483647" />
    <attribute name="alternateHostName" type="string" />
    <attribute name="enableNagling" type="bool" defaultValue="false" />
    <attribute name="frequentHitThreshold" type="uint" defaultValue="2" validationType="integerRange" validationParameter="1,2147483647" />
    <attribute name="frequentHitTimePeriod" type="timeSpan" defaultValue="00:00:10" />
  </sectionSchema>

A request is cached if more than the number of frequentHitThreshold requests for a cacheable URL arrive within the frequentHitTimePeriod setting. Following is an example of the benefits you can get by using Kernel-level cache rather than User-mode caching or no caching enabled.

The example shows how to setup Web Capacity Analysis Tool (WCAT) and run the WCAT controller, WCAT client, and setup Output Caching policies in IIS Manager. There are three tests using different Output Caching policies.

First, the following steps allow you to configure WCAT to use this example in your environment:

1.	Download the IIS6 Resource Kit Tools (go to http://www.iis.net/go/1352), do a custom install, and install only WCAT. You’ll use three files to help configure WCAT: A script file that tells WCAT which URLs to request. Each URL gets a unique ClassID. A distribution file that tells WCAT how the requests should be distributed across the URLs specified in the script file. A configuration file that configures the parameters of a particular performance run, for example, the duration of the tests, how many HTTP clients to simulate, and so on.
2.	Create a folder named C:\LoadTest to hold the configuration files.
3.	Create a file called Default.aspx in C:\LoadTest. Type <% = Datetime.Now() %> and save Default.aspx. This file will be used for load-testing.
4.	Create a new file called script.cfg in C:\LoadTest and type the following text: NEW TRANSACTION classId = 1 NEW REQUEST HTTP Verb = "GET" URL = "http://localhost/Default.aspx"
5.	Create a file called distribution.cfg and type the following text inside the file: 1 100
6.	Create a file called config.cfg and type the following text inside the file: Warmuptime 5s Duration 30s CooldownTime 5s NumClientMachines 1 NumClientThreads 20

Next, after you have configured the LoadTest folder and supporting WCAT files, you can enable the WCAT controller. This is required to perform the tests. Open a command prompt and type the following syntax:

Cd \LoadTest
								"%programfiles%\IIS Resources\WCAT Controller\wcctl"
								-c config.cfg -s script.cfg -d distribution.cfg -a localhost

After enabling the WCAT controller, you can run the WCAT client to perform your performance tests. You’ll need to open another command prompt window to start the WCAT client.

"%programfiles%\IIS Resources\WCAT Client\wcclient.exe" localhost

The first test has no Output Cache policy enabled. You can set the Output caching policy in IIS Manager. Figure 1 shows no Output Cache policy enabled.

Figure 1. No Output Cache policy enabled.

Per Table 1 , running a test has results of 575 requests per second, which is an acceptable number. Let’s see how caching can help improve performance.

Table 1. Output Caching Results
Requests per Second	Cache Level
575	No cache enabled
656	User-mode caching only
946	Kernel-mode caching only

The second test enables User-mode policy only. The test using User-mode policy assumes you are using the file notifications option displayed in Figure 2. Figure 2 also shows how to enable a User-mode cache policy.

Figure 2. User-mode cache policy.

As you can see in Table 1 , after running a test with User-mode caching enabled, the results are 656 requests per second, which is a 13 percent increase over no cache policy enabled.

For the third test, disable the User-mode caching policy and enable Kernel-mode caching. The Kernel-mode caching test assumes you are using the file notifications option displayed in Figure 3. Figure 3 shows how to configure a Kernel-mode caching policy.

Figure 3. Kernel-mode caching policy.

After running the test with Kernel-mode caching enabled, the results are 946 requests per second, which is 60 percent more than if you had had no cache policy. Table 17-10 shows results from the three performance tests. The results can vary depending on what type of hardware you are using.

You should be aware of limitations when you are using Kernel-mode caching. Kernel-mode caching does not support modules and features that run in user mode, for example, if your application uses basic authentication or Windows Authentication or authorization. The content will be served, but it won’t be cached. The Kernel-mode caching option supports the varyByHeaders attribute, but not varyByQuerystring. To see if a request is in the Kernel-mode cache, type netsh http show cachestate.

Note

For more information on Http.sys changes in Vista and Windows Server 2008, go to http://technet.microsoft.com/en-us/library/bb726965.aspx and search for HTTP.sys within the article.

User-mode Caching

One important change with User-mode caching is that any content type can be cached, not just Classic ASP or ASP.NET.

Direct from the Source: Native Output Cache Changes in IIS 7.0

Native output cache is the new user mode response cache added in IIS 7.0. This module provides functionality that is similar to that provided by the managed output cache module in ASP.NET. You can control this module’s functionality by editing the system.webServer/caching section, located in applicationHost.config, or by using IHttpCachePolicy intrinsic. The IHttpCachePolicy intrinsic is for getting/setting kernel-response cache or user-mode output cache policy from code. You can set the following properties in the system.webServer/caching section:

enabled This property tells if output caching is enabled or not for this URL. If disabled, output cache module won’t do anything in ResolveRequestCache and UpdateRequestCache stages.
Note that setting the enabled property to true doesn’t ensure response caching. Some modules must set User-cache policy.
enableKernelCache Controls if kernel caching is enabled for this URL. The output cache module calls IHttpResponse::DisableKernelCache if this property is set to false. The output cache module does kernel caching work in the SendResponse stage if no one called DisableKernelCache in the pipeline. Note that setting enableKernelCache to true doesn’t ensure kernel caching of the response. Some modules must set the kernel cache policy.
maxCacheSize This is the maximum size of the output cache in megabytes. A value of 0 means the maximum cache size is calculated automatically by IIS 7.0. IIS 7.0 uses half of the available physical memory or the available virtual memory—whichever is less.
maxResponseSize This is the maximum size of the response in bytes that can be stored in the output cache. A value of 0 means no limit.
Note that although you can set maxCacheSize and maxResponseSize for a URL, the output cache module uses values set at the root level only.

Per application pool properties in the future will be configurable for each application pool. If the output cache is enabled, you can control its behavior for different file types by adding profiles for different file extensions. These profiles make the output cache module populate IHttpCachePolicy intrinsic, which enables user/kernel caching of the response. Properties that you can set in a profile are similar to those available for system.web/caching/outputCacheSettings profiles. The following properties are allowed for system.webServer/caching profiles:

extension For example, .asp, .htm. Use * as a wildcard entry. If the profile for a particular extension is not found, the profile for extension * will be used if it is present.
policy Can be DontCache, CacheUntilChange, CacheForTimePeriod, or DisableCache (only in the server). Output cache module changes IHttpCachePolicy intrinsic, depending on the value of this property.
Note that DontCache means that intrinsic is not set, but that doesn’t prevent other modules from setting it and enabling caching. In the server, we have added the DisableCache option, which ensures that the response is not cached even if some other module sets the policy telling output cache module to cache the response.
kernelCachePolicy Can be DontCache, CacheUntilChange, CacheForTimePeriod, or DisableCache (only in the server). As previously mentioned, DontCache doesn’t prevent other modules from setting kernel cache policy. For static files, the static file handler sets kernel cache policy, which enables kernel caching of the response. In the server, the DisableCache option ensures that the response doesn’t get cached in kernel.
duration The duration property is used only when policy or kernelCachePolicy is set to CacheForTimePeriod.
location Sets cache-control response header for client caching. The cache-control response header is set depending on value of this property, as follows:
```
Any | Downstream: public
ServerAndClient | Client: private
None | Server: no-cache
```
varyByHeaders Comma-separated list of request headers. Multiple responses to requests having different values of these headers will be stored in the cache. You might be returning different responses based on Accept-Language or User-Agent or Accept-Encoding headers. All the responses will get cached in memory.
varyByQueryString Comma-separated query string variables. Multiple responses get cached if query string variable values are different in different requests. In the server, you can set varyByQueryString to *, which makes the output cache module cache a separate response if any of the query string variable values are different.

Only user mode cache uses location headers and varyBy. These properties have no effect on kernel caching. So if policy is set to DontCache, these properties are not used. To make output cache module cache multiple responses by an ASP page for 30 minutes, which returns different responses based on value of query string variable “action” and also based on request header “User-agent,” the caching section will look like the following, which is located in the applicationHost.config file:

<caching>
<profiles enabled="true">
<add extension=".asp" policy="CacheForTimePeriod" duration="00:30:00"
varyByQueryString="action" varyByHeaders="User-Agent"/>
</profiles>
</caching>

Output cache module populates the IHttpCachePolicy intrinsic in the BeginRequest stage if a matching profile is found. Other modules can still change cache policy for the current request, which might change User-mode or Kernel-mode caching behavior. The output cache caches 200 responses to GET requests only. If some module already flushed the response by the time the request reaches the UpdateRequestCache stage, or if headers are suppressed, the response is not cached in the output cache module.

The output cache module caches the response only if some other module hasn’t already cached it, as indicated by IHttpCachePolicy::SetIsCached. In addition, caching happens only for frequently hit content. The definition of frequently hit content is controlled by the frequentHitThreshold and frequentHitTimePeriod properties, which are defined in the system.webServer/serverRuntime section located in applicationHost.config. Default values define frequently hit content as content that is requested twice in any 10-second period.

Kanwaljeet Singla

IIS Team Microsoft

Compression

IIS 7.0 provides static and dynamic compression capabilities. Most of the properties are managed under system.webServer\httpCompression, which is located in applicationHost.config.

Direct from the Source: Changes Made to Compression Modules in IIS 7.0

Static compression is on by default in IIS 7.0. Dynamic compression is still off by default, and you can turn it on for all content by using the following syntax.

Appcmd set config -section:urlCompression /doDynamicCompression:true

In IIS 6.0, static compression happens on a separate thread. So, upon receiving a request, the first response is uncompressed, and IIS 6.0 starts a separate thread to compress the file and keep it in compressed files cache. Requests for compressed content reaching IIS 6.0 after the compression is complete receive a compressed response.

In IIS 7.0, compression happens on the main thread. But to avoid the cost of compression for all requests, compression happens only for frequently requested content. The definition of frequently requested content is controlled by the properties frequentHitThreshold and frequentHitTimePeriodsystem.webServer/serverRuntime. If IIS 7.0 receives more than the threshold number of requests in frequentlyHitTimePeriod for the same URL, IIS 7.0 will go ahead and compress the file to serve a compressed response for the same request that made IIS reach threshold. under the section

This compressed response is saved in the compressed files cache, as in IIS 6.0. If the compressed response was already present in compression cache, frequentHitThreshhold logic is not applied, because compressed content will be picked from cache and there will be no additional cost for compressing the content. Hit count is maintained per URL. So sending the first request with Accept-Encoding: gzip and the second with deflate will still qualify as frequently hit content, and IIS will go ahead and compress the response. This will require cachuri.dll to present in the globalModules section, because it is the module that keeps URL hit count.

The temporary compressed files folder has a nested directory structure in IIS 7.0, whereas it is flat in IIS 6.0. IIS 7.0 creates folders for each application pool in temporary compressed files and then creates separate folders for different schemes under each application pool. Under these scheme folders, IIS 7.0 creates a folder structure similar to the folder from which the content was picked. So, if iisstart.htm from D:\inetpub\wwwroot was compressed using gzip, a cache entry will be created in the D:\inetpub\temp\IIS Temporary Compressed Files\DefaultAppPool\$^_gzip_D^\INETPUB\WWWROOT folder.

IIS 7.0 will ACL (access control list) the application pool folder with worker process identity to protect the content from worker processes serving other application pools. You can still configure the directory from config, but the default is moved from %windir%\iis temporary compressed files to %SystemDrive%\inetpub\temp\iis temporary compressed files.

Also with this change, the maxDiskSpaceUsage limit is applied per application pool. So, if you have a value of 100 MB for HcMaxDiskSpaceUsage in IIS 6.0, then that limit is applied to all the compressed content in the compressed files cache. In IIS 7.0, this limit applies to compressed files per application pool. If you have 10 application pools and have maxDiskSpaceUsage set to 100 MB, total space allocated to compressed files cache is actually 1 GB.

Because static compression is enabled by default, and compression is happening on the main thread, on-the-fly compression shuts off or resumes, depending on CPU load. Four properties are added to the system.webServer/httpCompression section to control this behavior. These are as follow:

staticCompressionDisableCpuUsage Compression is disabled when average CPU usage over a specified period of time is above this number.
staticCompressionEnableCpuUsage Compression is enabled if average CPU usage over a specified period of time falls below this number.
dynamicCompressionDisableCpuUsage and dynamicCompressionEnableCpuUsage Enable or disable dynamic compression depending on the CPU load. IIS 7.0 will calculate average CPU utilization every 30 seconds.

In IIS 7.0, you can enable/disable compression depending on the content type of the response. In IIS 6.0, this is possible on an extension basis. In IIS 7.0, you can have just one entry in the configuration to enable static or dynamic compression for text/HTML responses. You no longer need to pick up all extensions that return text/HTML responses. When configuring these MIME types under the httpCompression section, you can use * as a wildcard. If the response type is text/HTML, look for an entry for text/HTML. If you find it, use the corresponding enabled value. If text/HTML is not found, look for text/* or */html. If both are present, pick the one that comes first and use that enabled property value. If you don’t find them, look for */* and use the corresponding enabled value. For enabling compression for all content types, add an entry under the httpCompression section in applicationHost.config as shown here.

   <staticTypes>
     <add mimeType="*/*" enabled="true" />
   </staticTypes>

The maxDiskSpaceUsage entry in IIS 7.0 is configured in megabytes rather than bytes. We realized that people don’t really want to configure the limit to the byte level, but the main reason we made this decision was because the limit is UINT, and we didn’t want users to set it to a value that cannot be stored in UINT. With large disks today, having a large value won’t be uncommon.

With static compression enabled by default, IIS 7.0 has only cache-compressed responses in the kernel (HTTP.sys). So if compression is enabled for a file, but the current request doesn’t contain an Accept-Encoding header (or compression didn’t happen because it was the first request), IIS 7.0 won’t tell the HTTP.sys to cache it. Only the compressed response is cached in the kernel for which compression is enabled. Dynamically compressed responses are not cached in any of the caches (even in compressed files, as happens in IIS 6.0).

Deflate is removed in the default configuration, but the functionality is still present in gzip.dll. To add the deflate scheme, add the following in the httpCompression section.

   <scheme name="deflate" dll="%Windir%\system32\inetsrv\gzip.dll" />

You can use following Appcmd command.

Appcmd set config
/section:httpCompression /+[name='deflate',dll='%Windir%\system32\inetsrv\gzip.dll']

Because static compression is enabled by default, the default value of staticCompressionLevel is changed from 10 in IIS 6.0 to 7 in IIS 7.0.

Kanwaljeet Singla

IIS Team Microsoft

Application Pools

IIS 7.0 offers a new behavior to allow you to create an application pool for each Web site. This is a significant change from IIS 6.0, which required you to have a pre-existing application pool. This new change can have a significant impact on, for example, Web hosters, because many of them configured sites in shared application pools.

Microsoft recommends you have each site isolated in their own application pool. If there is an issue, you can set the application pool to recycle based on a number of options.

NLB

Network Load Balancing (NLB) refers to having two or more servers handling your Web site. Spreading the load across two or more machines requires load balancing. You can use the built-in Network Load Balancing application provided by Windows Server 2008, or you can use a third-party hardware device. Using NLB can be a great way to help your application and Web site achieve good performance.

Application

Configuring your application for performance starts literally the moment you start discussing application needs. Certain architecture and design considerations have the most impact on how your application will perform in the long term. How will you design your database? How much data will you store? Where will you get your data from? Will the data have to be accessed in real time, or can it be cached? How much data will you cache? Will your application use session state, or will it be stateless? How will you authenticate your visitors? What kind of security model will your application use?

The type of data you have will have an impact on your application and is something else to consider when configuring your application. Things like controlling memory usage and how many database calls will be made have an impact on performance.

Identifying and Isolating Bottlenecks

IIS 7.0 has a great story when it comes to helping isolate and identify bottlenecks in your Web applications. IIS 7.0 offers instrumentation completely through the request pipeline. IIS Manager puts Web request data at administrators’ fingertips. This data was first exposed in Windows Server 2003 SP1, but it is much more user-friendly in IIS 7.0. Imagine, for example, that you are experiencing a problem with one of your application pools. With IIS 7.0, you can quickly identify which Web pages are executing at run time.

Failed Request Tracing (FRT) can be implemented on all content types, not just ASP or ASP.NET. You can use the data available to help make your application perform better. Following is one example: Your application makes an LDAP (Lightweight Directory Application Protocol) call to a directory service such as Microsoft Active Directory directory service to look up some group membership information. This process happens on the very first step when the application starts. This step is going to make or break the application, because it’s the first thing the user will see. Using the new tracing tools, IIS 7.0 enables you to trace completely through the entire request pipeline to identify how long a step like this could take. If it takes milliseconds, that is pretty good (when compared to it taking five to seven seconds).

Before IIS 7.0, developers and administrators would have to troubleshoot where the bottleneck was occurring. The developer would point to the administrator and say, “My application performs well after it gets the group information back.” The administrator would say the opposite: “My Active Directory is redundant, is running on fast servers, and has fast network connectivity. What are you trying to do inside your application?”

The administrator’s point is valid if the application is round-tripping back and forth a few times. If you have the user credentials ahead of time, make one call to the directory for your group information. Doing so can cut down on latency and improve the user experience. This is one example of working together with both administrators and developers to achieve good application performance.

Best Practices

One of the key concepts to keep in mind when tuning your applications is TTFB (time to first byte) and TTLB (time to last byte). How many seconds does it take to get TTFB and how many seconds are there between TTFB and TTLB? What is the latency? Is there a quick load of the page, but then the application has to retrieve a lot of data from Web services and other data sources?

Your hardware can make or break an application. However, how an application is architected, designed, and tested will go a long way toward determining if it is successful in production.