Configuring your environment for performance requires
that you truly understand your applications and how your users will
interact with the applications. How many times have you heard
administrators tell the tale that an application worked great in the
lab, but when it was deployed
to production, it crashed? Sound familiar? I hope not! Performing
adequate testing beforehand should be a requirement before you purchase
You should consider a few
scenarios when configuring your servers. First, you need to know some
basic information, such as the following:
How many users do you expect to use your application?
At peak usage, how many concurrent users do you expect?
are your users based—are they on well-connected networks such as a
local area network, or are they connecting over the Internet?
How long will a typical session last?
Does your application require session state?
Does your application interact with a database?
How much RAM does a typical session take?
Will your application be disk-intensive?
These questions are
important to understand when you are configuring your environment. Your
answers can influence the decision to run everything on a stand-alone or
on a set of servers. A set of servers could include a Web farm front
end with a separate database server.
If your current application
is in production, you can use the Reliability and Performance Monitor
to capture baseline numbers that will help you load-test your
environment. If your application is brand new, you’ll have to calculate
usage numbers and estimate how much traffic you expect.
You can do many things
at the server level to configure your environment for performance.
Before you do anything at server level, however, understanding the
application you want to support will go a long way toward creating a
smooth running environment. Here are a few things you can do to make
sure your server will perform at acceptable levels:
If possible, run the 64-bit version of Windows Server 2008.
Configure your application pools to run 32-bit mode.
Make sure your server has enough RAM.
Make sure your server firmware and basic input/output system (BIOS) are up to date.
If running a Web farm, try to keep your servers configured identically.
Configure your system to use RAID.
Install only necessary software such as monitoring agents, antivirus software, and so on.
IIS has always been pretty
much a self-tuning service. There have been a few registry tweaks here
and a metabase adjustment there, but overall, IIS has a good reputation
for self-tuning based on a wide variety of workloads. IIS 7.0 continues
this behavior. Before we say that IIS 7.0 will scale on any type of
hardware, though, you need to take into account the types of
applications that you will be deploying. The application types that you
choose can greatly change IIS performance, but they have little to do
with IIS 7.0 itself. The hardware setup, amount of RAM, and speed of the
disk all contribute to the overall Web server experience.
Optimizing for the Type of Load
How you optimize your
environment really depends on your needs. For example, do you need to
support a large concurrent connection base of users, or is it important
to provide numbers of transactions? The following sections provide a
couple examples illustrating what to take into account when optimizing
Load is one of the first
performance requirements that should be identified in the planning
stages for your application. If your application will have spikes of
load, you’ll need to account for that in your design. An application
that has peaks and valleys can be harder to design for. Most likely,
you’ll need to account for spikes no matter what the load is.
Consider this example:
You are the director of IT for Contoso Ltd. You are aware that every
year on December 31, the Contoso.com Web site experiences a thousand
times more than normal traffic. The visitors are submitting expenses for
the previous year, so that they can be reimbursed. Does that mean you
scale your environment to meet the volume on December 31, and the rest
of the year you have excess capacity? Probably not, but realistically,
you need to take into account how to plan for the additional load
created by so many visitors accessing the site at one time. You may want
to consider using staggered schedules or some other option to prevent
all the users from being on the system at once. This might seem like an
unreal example, but it provides a picture of what you should define as
load and what kinds of things you need to think about when testing. In
this scenario, for example, when a user finishes submitting expenses and
clicks Submit, that person is going to want the transaction to work so
that they don’t have to enter the information again—and of course so
they get paid back for their expenses sooner rather than later.
another example, consider that you are in a situation in which you have
to scale your application to meet a lot of concurrent connections. In
such a case, you would want to look at having multiple servers in a Web
farm to handle not only the load, but also the site availability.
You usually will work
with your business partners to determine what an acceptable level of
application availability is. As the Internet has matured, applications
need to be available 24 hours a day, 7 days a week to maintain this type
of availability. Performing routine maintenance or security patching
can be tough. One recommendation is to schedule a standard maintenance
window. This can be difficult, because Internet applications can be
accessed from anywhere. The standard window helps provide some
consistency when changes happen, however.
Having a defined
SLA (service level agreement) can help you understand your performance
requirements. Frequent and critical functions need to be identified and
tested to ensure they perform properly. Frequently used transactions,
intensive operations, and business-critical operations need to perform
under load and must be available to users.
When testing your
application, it is critical that you use data that is as close to
real-world data and user patterns as possible to make sure your system
performs optimally. Creating an application baseline in the real world
is a key component to being successful.
Type of Content
The type of content in
your application has a significant impact on performance and how you
might design your Web server environment. You can start by looking at
the amount of static content versus dynamic content used on your Web
site. The server should be tuned in a way that accommodates the type of
content that is more prevalent on the server. A server that contains
mostly static content, for example, can take advantage of Kernel-mode
caching and output caching.
IIS 7.0 output
caching can handle content scenarios that are semi-dynamic, such as an
ASP.NET application that pulls data from a database with content that
remains relatively unchanged. One suggested technique when using output
caching is that you can see a huge performance boost by caching your
content for just a few seconds.
IIS 7.0 tries to cache
content in many places. Each feature covered in this section determines
what type of content is cached along with how the content will be
cached. Each has their own place but can help with overall server
introduced Kernel-mode caching in IIS 6.0. This feature eliminates the
need for accessing User-mode cache in many cases. The HTTP.sys cache
helps increase server performance and reduces disk cost. The content has
to be requested a few times before IIS 7.0 considers caching an URL.
You can configure frequentHitTimePeriod and frequentHitThreshold thresholds in the serverRuntime applicationHost.config section. Here are the default serverRuntime settings located in the %windir%\system32\inetsrv\config\schema\IIS_schema.xml folder. These settings are inherited by default.
<attribute name="enabled" type="bool" defaultValue="true" />
<attribute name="appConcurrentRequestLimit" type="uint" defaultValue="5000" />
<attribute name="maxRequestEntityAllowed" type="uint" defaultValue="4294967295" />
<attribute name="uploadReadAheadSize" type="uint" defaultValue="49152" validationType="integerRange" validationParameter="0,2147483647" />
<attribute name="alternateHostName" type="string" />
<attribute name="enableNagling" type="bool" defaultValue="false" />
<attribute name="frequentHitThreshold" type="uint" defaultValue="2" validationType="integerRange" validationParameter="1,2147483647" />
<attribute name="frequentHitTimePeriod" type="timeSpan" defaultValue="00:00:10" />
A request is cached if more than the number of frequentHitThreshold requests for a cacheable URL arrive within the frequentHitTimePeriod
setting. Following is an example of the benefits you can get by using
Kernel-level cache rather than User-mode caching or no caching enabled.
The example shows
how to setup Web Capacity Analysis Tool (WCAT) and run the WCAT
controller, WCAT client, and setup Output Caching policies in IIS
Manager. There are three tests using different Output Caching policies.
First, the following steps allow you to configure WCAT to use this example in your environment:
Download the IIS6 Resource Kit Tools (go to http://www.iis.net/go/1352), do a custom install, and install only WCAT. You’ll use three files to help configure WCAT:
A script file that tells WCAT which URLs to request. Each URL gets a unique ClassID.
A distribution file that tells WCAT how the requests should be distributed across the URLs specified in the script file.
configuration file that configures the parameters of a particular
performance run, for example, the duration of the tests, how many HTTP
clients to simulate, and so on.
Create a folder named C:\LoadTest to hold the configuration files.
a file called Default.aspx in C:\LoadTest. Type <% = Datetime.Now()
%> and save Default.aspx. This file will be used for load-testing.
Create a new file called script.cfg in C:\LoadTest and type the following text:
classId = 1
NEW REQUEST HTTP
Verb = "GET"
URL = "http://localhost/Default.aspx"
Create a file called distribution.cfg and type the following text inside the file:
Create a file called config.cfg and type the following text inside the file:
Next, after you
have configured the LoadTest folder and supporting WCAT files, you can
enable the WCAT controller. This is required to perform the tests. Open a
command prompt and type the following syntax:
"%programfiles%\IIS Resources\WCAT Controller\wcctl"
-c config.cfg -s script.cfg -d distribution.cfg -a localhost
the WCAT controller, you can run the WCAT client to perform your
performance tests. You’ll need to open another command prompt window to
start the WCAT client.
"%programfiles%\IIS Resources\WCAT Client\wcclient.exe" localhost
The first test has no Output Cache policy enabled. You can set the Output caching policy in IIS Manager. Figure 1 shows no Output Cache policy enabled.
Figure 1. No Output Cache policy enabled.
Per Table 1,
running a test has results of 575 requests per second, which is an
acceptable number. Let’s see how caching can help improve performance.
Table 1. Output Caching Results
|Requests per Second||Cache Level|
|575||No cache enabled|
|656||User-mode caching only|
|946||Kernel-mode caching only|
The second test
enables User-mode policy only. The test using User-mode policy assumes
you are using the file notifications option displayed in Figure 2. Figure 2 also shows how to enable a User-mode cache policy.
Figure 2. User-mode cache policy.
As you can see in Table 1,
after running a test with User-mode caching enabled, the results are
656 requests per second, which is a 13 percent increase over no cache
For the third test,
disable the User-mode caching policy and enable Kernel-mode caching. The
Kernel-mode caching test assumes you are using the file notifications
option displayed in Figure 3. Figure 3 shows how to configure a Kernel-mode caching policy.
Figure 3. Kernel-mode caching policy.
After running the test
with Kernel-mode caching enabled, the results are 946 requests per
second, which is 60 percent more than if you had had no cache policy. Table 17-10 shows results from the three performance tests. The results can vary depending on what type of hardware you are using.
You should be
aware of limitations when you are using Kernel-mode caching. Kernel-mode
caching does not support modules and features that run in user mode,
for example, if your application uses basic authentication or Windows
Authentication or authorization. The content will be served, but it
won’t be cached. The Kernel-mode caching option supports the varyByHeaders attribute, but not varyByQuerystring. To see if a request is in the Kernel-mode cache, type netsh http show cachestate.
One important change with User-mode caching is that any content type can be cached, not just Classic ASP or ASP.NET.
cache is the new user mode response cache added in IIS 7.0. This module
provides functionality that is similar to that provided by the managed
output cache module in ASP.NET. You can control this module’s
functionality by editing the system.webServer/caching section, located in applicationHost.config, or by using IHttpCachePolicy intrinsic. The IHttpCachePolicy intrinsic is for getting/setting kernel-response cache or user-mode output cache policy from code. You can set the following properties in the system.webServer/caching section:
enabled This property tells if output caching is enabled or not for this URL. If disabled, output cache module won’t do anything in ResolveRequestCache and UpdateRequestCache stages.
Note that setting the enabled property to true doesn’t ensure response caching. Some modules must set User-cache policy.
enableKernelCache Controls if kernel caching is enabled for this URL. The output cache module calls IHttpResponse::DisableKernelCache if this property is set to false. The output cache module does kernel caching work in the SendResponse stage if no one called DisableKernelCache in the pipeline. Note that setting enableKernelCache to true doesn’t ensure kernel caching of the response. Some modules must set the kernel cache policy.
This is the maximum size of the output cache in megabytes. A value of 0
means the maximum cache size is calculated automatically by IIS 7.0.
IIS 7.0 uses half of the available physical memory or the available
virtual memory—whichever is less.
maxResponseSize This is the maximum size of the response in bytes that can be stored in the output cache. A value of 0 means no limit.
Note that although you can set maxCacheSize and maxResponseSize for a URL, the output cache module uses values set at the root level only.
Per application pool
properties in the future will be configurable for each application pool.
If the output cache is enabled, you can control its behavior for
different file types by adding profiles for different file extensions.
These profiles make the output cache module populate IHttpCachePolicy
intrinsic, which enables user/kernel caching of the response. Properties
that you can set in a profile are similar to those available for
system.web/caching/outputCacheSettings profiles. The following
properties are allowed for system.webServer/caching profiles:
For example, .asp, .htm. Use * as a wildcard entry. If the profile for a
particular extension is not found, the profile for extension * will be
used if it is present.
policy Can be DontCache, CacheUntilChange, CacheForTimePeriod, or DisableCache (only in the server). Output cache module changes IHttpCachePolicy intrinsic, depending on the value of this property.
Note that DontCache
means that intrinsic is not set, but that doesn’t prevent other modules
from setting it and enabling caching. In the server, we have added the DisableCache
option, which ensures that the response is not cached even if some
other module sets the policy telling output cache module to cache the
kernelCachePolicy Can be DontCache, CacheUntilChange, CacheForTimePeriod, or DisableCache (only in the server). As previously mentioned, DontCache
doesn’t prevent other modules from setting kernel cache policy. For
static files, the static file handler sets kernel cache policy, which
enables kernel caching of the response. In the server, the DisableCache option ensures that the response doesn’t get cached in kernel.
duration The duration property is used only when policy or kernelCachePolicy is set to CacheForTimePeriod.
Sets cache-control response header for client caching. The
cache-control response header is set depending on value of this
property, as follows:
Any | Downstream: public
ServerAndClient | Client: private
None | Server: no-cache
Comma-separated list of request headers. Multiple responses to requests
having different values of these headers will be stored in the cache.
You might be returning different responses based on Accept-Language or
User-Agent or Accept-Encoding headers. All the responses will get cached
Comma-separated query string variables. Multiple responses get cached
if query string variable values are different in different requests. In
the server, you can set varyByQueryString to *, which makes the output cache module cache a separate response if any of the query string variable values are different.
Only user mode cache uses location headers and varyBy. These properties have no effect on kernel caching. So if policy is set to DontCache,
these properties are not used. To make output cache module cache
multiple responses by an ASP page for 30 minutes, which returns
different responses based on value of query string variable “action” and
also based on request header “User-agent,” the caching section will
look like the following, which is located in the applicationHost.config
<add extension=".asp" policy="CacheForTimePeriod" duration="00:30:00"
Output cache module populates the IHttpCachePolicy intrinsic in the BeginRequest
stage if a matching profile is found. Other modules can still change
cache policy for the current request, which might change User-mode or
Kernel-mode caching behavior. The output cache caches 200 responses to
GET requests only. If some module already flushed the response by the
time the request reaches the UpdateRequestCache stage, or if headers are suppressed, the response is not cached in the output cache module.
The output cache module caches the response only if some other module hasn’t already cached it, as indicated by IHttpCachePolicy::SetIsCached. In addition, caching happens only for frequently hit content. The definition of frequently hit content is controlled by the frequentHitThreshold and frequentHitTimePeriod properties, which are defined in the system.webServer/serverRuntime
section located in applicationHost.config. Default values define
frequently hit content as content that is requested twice in any
IIS Team Microsoft
7.0 provides static and dynamic compression capabilities. Most of the
properties are managed under system.webServer\httpCompression, which is
located in applicationHost.config.
Static compression is
on by default in IIS 7.0. Dynamic compression is still off by default,
and you can turn it on for all content by using the following syntax.
Appcmd set config -section:urlCompression /doDynamicCompression:true
In IIS 6.0,
static compression happens on a separate thread. So, upon receiving a
request, the first response is uncompressed, and IIS 6.0 starts a
separate thread to compress the file and keep it in compressed files
cache. Requests for compressed content reaching IIS 6.0 after the compression is complete receive a compressed response.
In IIS 7.0,
compression happens on the main thread. But to avoid the cost of
compression for all requests, compression happens only for frequently
requested content. The definition of frequently requested content is
controlled by the properties frequentHitThreshold and frequentHitTimePeriodsystem.webServer/serverRuntime. If IIS 7.0 receives more than the threshold number of requests in frequentlyHitTimePeriod
for the same URL, IIS 7.0 will go ahead and compress the file to serve a
compressed response for the same request that made IIS reach threshold. under the section
response is saved in the compressed files cache, as in IIS 6.0. If the
compressed response was already present in compression cache, frequentHitThreshhold
logic is not applied, because compressed content will be picked from
cache and there will be no additional cost for compressing the content.
Hit count is maintained per URL. So sending the first request with
Accept-Encoding: gzip and the second with deflate will still qualify as
frequently hit content, and IIS will go ahead and compress the response.
This will require cachuri.dll to present in the globalModules section, because it is the module that keeps URL hit count.
compressed files folder has a nested directory structure in IIS 7.0,
whereas it is flat in IIS 6.0. IIS 7.0 creates folders for each
application pool in temporary compressed files and then creates separate
folders for different schemes under each application pool. Under these
scheme folders, IIS 7.0 creates a folder structure similar to the folder
from which the content was picked. So, if iisstart.htm from
D:\inetpub\wwwroot was compressed using gzip, a cache entry will be
created in the D:\inetpub\temp\IIS Temporary Compressed
IIS 7.0 will ACL
(access control list) the application pool folder with worker process
identity to protect the content from worker processes serving other
application pools. You can still configure the directory from config,
but the default is moved from %windir%\iis temporary compressed files to
%SystemDrive%\inetpub\temp\iis temporary compressed files.
Also with this change, the maxDiskSpaceUsage limit is applied per application pool. So, if you have a value of 100 MB for HcMaxDiskSpaceUsage
in IIS 6.0, then that limit is applied to all the compressed content in
the compressed files cache. In IIS 7.0, this limit applies to
compressed files per application pool. If you have 10 application pools
and have maxDiskSpaceUsage set to 100 MB, total space allocated to compressed files cache is actually 1 GB.
compression is enabled by default, and compression is happening on the
main thread, on-the-fly compression shuts off or resumes, depending on
CPU load. Four properties are added to the system.webServer/httpCompression section to control this behavior. These are as follow:
staticCompressionDisableCpuUsage Compression is disabled when average CPU usage over a specified period of time is above this number.
staticCompressionEnableCpuUsage Compression is enabled if average CPU usage over a specified period of time falls below this number.
dynamicCompressionDisableCpuUsage and dynamicCompressionEnableCpuUsage
Enable or disable dynamic compression depending on the CPU load. IIS
7.0 will calculate average CPU utilization every 30 seconds.
7.0, you can enable/disable compression depending on the content type of
the response. In IIS 6.0, this is possible on an extension basis. In
IIS 7.0, you can have just one entry in the configuration to enable
static or dynamic compression for text/HTML responses. You no longer
need to pick up all extensions that return text/HTML responses. When
configuring these MIME types under the httpCompression
section, you can use * as a wildcard. If the response type is
text/HTML, look for an entry for text/HTML. If you find it, use the
corresponding enabled value. If text/HTML is not found, look for text/*
or */html. If both are present, pick the one that comes first and use
that enabled property value. If you don’t find them, look for */* and
use the corresponding enabled value. For enabling compression for all
content types, add an entry under the httpCompression section in applicationHost.config as shown here.
<add mimeType="*/*" enabled="true" />
entry in IIS 7.0 is configured in megabytes rather than bytes. We
realized that people don’t really want to configure the limit to the
byte level, but the main reason we made this decision was because the
limit is UINT, and we didn’t want users to set it to a value that cannot
be stored in UINT. With large disks today, having a large value won’t
compression enabled by default, IIS 7.0 has only cache-compressed
responses in the kernel (HTTP.sys). So if compression is enabled for a
file, but the current request doesn’t contain an Accept-Encoding header
(or compression didn’t happen because it was the first request), IIS 7.0
won’t tell the HTTP.sys to cache it. Only the compressed response is
cached in the kernel for which compression is enabled. Dynamically
compressed responses are not cached in any of the caches (even in
compressed files, as happens in IIS 6.0).
Deflate is removed
in the default configuration, but the functionality is still present in
gzip.dll. To add the deflate scheme, add the following in the httpCompression section.
<scheme name="deflate" dll="%Windir%\system32\inetsrv\gzip.dll" />
You can use following Appcmd command.
Appcmd set config
Because static compression is enabled by default, the default value of staticCompressionLevel is changed from 10 in IIS 6.0 to 7 in IIS 7.0.
IIS Team Microsoft
7.0 offers a new behavior to allow you to create an application pool
for each Web site. This is a significant change from IIS 6.0, which
required you to have a pre-existing application pool. This new change
can have a significant impact on, for example, Web hosters, because many
of them configured sites in shared application pools.
you have each site isolated in their own application pool. If there is
an issue, you can set the application pool to recycle based on a number
Network Load Balancing
(NLB) refers to having two or more servers handling your Web site.
Spreading the load across two or more machines requires load balancing.
You can use the built-in Network Load Balancing application provided by
Windows Server 2008, or you can use a third-party hardware device. Using
NLB can be a great way to help your application and Web site achieve
application for performance starts literally the moment you start
discussing application needs. Certain architecture and design
considerations have the most impact on how your application will perform
in the long term. How will you design your database? How much data will
you store? Where will you get your data from? Will the data have to be
accessed in real time, or can it be cached? How much data will you
cache? Will your application use session state, or will it be stateless?
How will you authenticate your visitors? What kind of security model
will your application use?
The type of data you have
will have an impact on your application and is something else to
consider when configuring your application. Things like controlling
memory usage and how many database calls will be made have an impact on
Identifying and Isolating Bottlenecks
7.0 has a great story when it comes to helping isolate and identify
bottlenecks in your Web applications. IIS 7.0 offers instrumentation
completely through the request pipeline. IIS Manager puts Web request
data at administrators’ fingertips. This data was first exposed in
Windows Server 2003 SP1, but it is much
more user-friendly in IIS 7.0. Imagine, for example, that you are
experiencing a problem with one of your application pools. With IIS 7.0,
you can quickly identify which Web pages are executing at run time.
Tracing (FRT) can be implemented on all content types, not just ASP or
ASP.NET. You can use the data available to help make your application
perform better. Following is one example: Your application makes an LDAP
(Lightweight Directory Application Protocol) call to a directory
service such as Microsoft Active Directory directory service to look up
some group membership information. This process happens on the very
first step when the application starts. This step is going to make or
break the application, because it’s the first thing the user will see.
Using the new tracing tools, IIS 7.0 enables you to trace completely
through the entire request pipeline to identify how long a step like
this could take. If it takes milliseconds, that is pretty good (when
compared to it taking five to seven seconds).
7.0, developers and administrators would have to troubleshoot where the
bottleneck was occurring. The developer would point to the administrator
and say, “My application performs well after it gets the group
information back.” The administrator would say the opposite: “My Active
Directory is redundant, is running on fast servers, and has fast network
connectivity. What are you trying to do inside your application?”
administrator’s point is valid if the application is round-tripping back
and forth a few times. If you have the user credentials ahead of time,
make one call to the directory for your group information. Doing so can
cut down on latency and improve the user experience. This is one example
of working together with both administrators and developers to achieve
good application performance.
One of the key
concepts to keep in mind when tuning your applications is TTFB (time to
first byte) and TTLB (time to last byte). How many seconds does it take
to get TTFB and how many seconds are there between TTFB and TTLB? What
is the latency? Is there a quick load of the page, but then the
application has to retrieve a lot of data from Web services and other
Your hardware can make
or break an application. However, how an application is architected,
designed, and tested will go a long way toward determining if it is
successful in production.