THE DATA COLLECTION DILEMMA
One of the biggest issues a SQL Server
professional faces when troubleshooting a complex SQL Server problem is
the data collection task. The classic catch-22 situation always
presents itself: installing data collection utilities on the server
versus not collecting all the required data to address the root cause
of the problem at hand.
The most common dilemma encountered is using a
single tool to capture the required set of data simultaneously when
faced with a critical production issue. Unless and until there are
tools that enable quick configuration capabilities, we are left
gambling with the prospect of continued service unavailability and
missing service-level agreements while trying to configure the data
collection tools. Seasoned SQL Server professionals have their own
toolset handy, which they consider to be their equivalent of a
superhero’s utility belt and which serves them well at the time of a
crisis situation.
Once you have successfully fought the demons
inside your SQL Server environment that were wreaking havoc and causing
the problems, you will be tasked with identifying the root cause of the
issue. Identifying the root cause in itself is not a bad thing.
However, this noble task may soon take an ill-fated turn when you
realize that the data required to perform a complete post-mortem
analysis is missing. When battling with a critical production
service-related issue, it is possible that the data necessary for
post-mortem analysis is not collected, as the need of the moment is to
restore service as soon as possible. This makes it highly pertinent to
have a data collection utility in place, one which not only collects
all the necessary data required given a particular situation but also
is easily configurable at the drop of a hat!
The SQLdiag,
which started shipping with SQL Server 2005, is a utility used to
collect diagnostic data from a SQL Server instance. It is a
general-purpose diagnostics utility that can be run as a console
application or a service. SQLdiag can help you collect SQL Server
Profiler traces, Windows Performance Monitor logs, and outputs of
various VBScript, T-SQL, and DOS scripts through an extensible
interface exposed by the SQLdiag Configuration Manager. This close knit
integration of data collection capabilities makes SQLdiag a “must have”
tool in the SQL Server professional’s repertoire.
The data that is collected by the SQLdiag utility can be imported into a SQL Server database using SQL Nexus, a GUI tool for providing an aggregated view of the data collected in a report format.
AN APPROACH TO DATA COLLECTION
In this section, we will delve into
specifics of how data can be collected to analyze the performance of
your SQL Server instance. Basically, SQL Server data analysis can be
done in the following ways:
- Baseline Analysis — A
baseline of your SQL Server instance will tell you what the resource
usage for the SQL Server instance in question looks like on an average
at a particular time of the day. You will know if the delta difference
that you see for a particular set of data has a positive or a negative
connotation only if you the appropriate baselines established for your
SQL Server instance. When you have an existing baseline available, then
it makes sense to capture data during the problem period for a short
period of time in order to establish a comparative study between the
current start and an established baseline. This enables you to look for
seemingly innocuous patterns in the data that might prove to be the
root cause of your SQL Server troubles. This is what is referred to as
a baseline analysis.
- Bottleneck Analysis — The
second option, bottleneck analysis, is the approach to which most SQL
Server professionals are accustomed. This is used when a baseline is
not readily available or an available baseline is not pertinent to the
current state of the environment. In such a situation, you need to
collect data both for a period when the issue is not occurring and
during the period when the problem manifests itself. Then the two sets
of data are compared to weed out the difference and the symptoms that
were exhibited when the problem occurred. The ideal scenario for a
bottleneck analysis is to start data collection a little before the
problem manifests itself, capturing the transition period from a
serenely functional SQL Server environment to an environment that
raises all sorts of red lights on service-level scorecards. Sometimes,
we have to be content with a comparative analysis and compare two sets
of data collection, which may not even belong to the same environment.
This may sound appalling but is a harsh reality in the production world
scenario where it is not always feasible to add additional workload or
bring in new executables to collect diagnostic data. Bottleneck
analysis helps you arrive at the top N
bottlenecks that your SQL Server instance is experiencing by
identifying the road-blocks which are preventing the smooth functioning
of your SQL Server instance.
This requires some precise data
collection capabilities, along with various other requirements like
knowing which system catalogs to query, what tools to run to collect
the required data, etc.