There are two times when you collect data for troubleshooting a SQL Server issue:
- You are faced with a critical problem in your production
environment and the data collection is being done to identify and fix
the root cause.
- You are trying to reproduce an issue for which sufficient data was not collected initially to ascertain a definitive root cause.
The latter scenario is always a
cumbersome process because you might end up playing the waiting game
for intermittent issues. The frustration of such waiting for the actual
issue to occur can become more excruciating due to the overhead of
managing the data that diagnostics tools collect when left running.
This is where SQLdiag becomes a SQL Server
professional’s best friend, living up to the adage that “a friend in
need is a friend indeed.” SQLdiag is a command-line utility that is
available by default in C:\Program Files\Microsoft SQL Server\110\Tools\Binn. This assumes that the binary installation path of the SQL Server instance on the machine was done on the C: drive.
NOTE If
you have SQL Server 2005 or SQL Server 2008/2008 R2 instances installed
on the server, you will find the SQLdiag utility available under the C:\Program Files\Microsoft SQL Server\90\Tools\Binn or C:\Program Files\Microsoft SQL Server\100\Tools\Binn folders, respectively.
You can run SQLdiag in three modes:
- Snapshot mode using the /X command line parameter
- As a command-line application by specifying one or more command-line parameters
- As a service
WARNING The
user who runs SQLdiag must be a member of the Windows Administrators
group and a member of the SQL Server sysadmin fixed server role. When
the /G command-line
switch is specified, on startup SQLdiag does not enforce SQL Server
connectivity checks or verify that the user is a member of the sysadmin
fixed server role. Instead, SQLdiag defers to Windows to determine
whether a user has the appropriate rights to gather each requested
diagnostic. If /G is
not specified, SQLdiag checks whether the user is a member of the
Windows Administrators group; if not, it will not collect SQL Server
diagnostics. If the machine has User Account Control (UAC) enabled, an
elevated command prompt should be used to run SQLdiag to prevent
possible access denied errors.
Using SQLdiag in Snapshot Mode
When SQLdiag is run in snapshot mode,
it collects a snapshot of all the configured diagnostic data (mentioned
in the SQLdiag configuration file discussed in the next section) and
then shuts down automatically. If you execute SQLDIAG /X
from a command prompt window, the SQLdiag utility creates a SQLDIAG
folder in the Binn folder, containing all the data it collected. You
will additionally find an “internal” folder containing log files
created by the utility, along with the XML configuration file that it
used to collect the data, among other files. This method of collecting
data is quite useful when you want a quick snapshot of the state of the
SQL Server instance. You get a plethora of information ranging from SQL
Server and Windows configuration, errors encountered recently, waits
experienced by the database engine and much more.
After data collection is completed, the utility shuts down automatically with the message SQLDIAG Collection complete. Collector exiting. A default snapshot data collection collects the following information:
- SQL Server default traces
- MSINFO32 output in a text file
- A <MachineName>_<InstanceName>_sp_sqldiag_Shutdown.OUT text file, which contains the following data:
- All Errorlogs
- Output of various system stored procedures, such as sp_configure, sp_who, sp_lock, sp_helpdb, to obtain the SQL Server instance and database configuration details
- Output of various DMVs and system catalogs, such as sys.sysprocesses, sys.dm_exec_sessions, and sys.dm_os_wait_stats, to obtain additional information, with the following being of key importance:
- Insight into the instance’s memory usage
- A snapshot of the current values for all the SQL Server PerfMon counters
- A snapshot of the SQL Server wait statistics
- Status of the sessions and requests active on the SQL Server instance. along with the associated input buffers
- Status of the SQL Server schedulers
Using SQLdiag as a Command-line Application
Before looking at the basics of running
a SQLdiag data collection, you should first understand the anatomy of
the SQLdiag configuration file in order to appropriately configure the
various data collectors for capturing diagnostic data. Figure 1 shows a screenshot of the key elements in the SQLDIAG.XML configuration file.
SQLdiag Configuration File Key Elements
- The full-stop (.) for the Machine Name
value signifies that the data collection is to be done from the local
machine (i.e., the machine from where the SQLdiag execution was
initiated).
- The asterisk (*) for the Instance Name value specifies that the utility should collect data from all instances installed on the machine.
- The EventlogCollector element
specifies whether the Windows Event Logs need to be collected and if
so, whether they should be collected when the utility starts up or
shuts down, or both.
- The PerfmonCollector element specifies whether PerfMon data will be collected, and the pollinginterval and maxfilesize parameter values define the time interval between two consecutive samples and the maximum size of the .blg PerfMon file, respectively.
- The SqldiagCollector element
specifies whether the utility will collect the SQLdiag output file,
discussed earlier in the “Using SQLdiag in Snapshot Mode” section.
- The ProfilerCollector element
specifies the SQL Server Profiler events to be captured, along with all
the Profiler trace configuration settings. Note that SQLdiag always
captures a server-side Profiler trace to ensure minimal impact of a
Profiler trace capture on the SQL Server instance.
- The BlockingCollector element enables trace flag 1222 to facilitate the capture of deadlock graphs in the SQL Server Error Log.
You can control the collectors used by SQLdiag using the XML templates. Every collector can be enabled or disabled using <enabled=true> or <enabled=false>
within the configuration file. Use either your favorite text editor to
modify the XML configuration file or even Management Studio, which
supports XML document editing.
WARNING Don’t delete collectors from the XML files, as SQLdiag will subsequently fail to load the XSD if the XML file doesn’t match.
If you are wondering what happened to the CustomDiagnostics
variable explanation, it is covered in the section “Using SQLdiag
Configuration Manager.” The following code snippet shows all the
parameters that the SQLdiag utility can accept.
Usage: sqldiag [START | STOP | STOP_ABORT] [/O outputpath] [/I cfgfile] [/M machine1
[machine2 machineN]|@machinelistfile] [/Q] [/C #] [/G] [/R] [/U] [/N #] [/A appname]
[/T {tcp[,port]|np|lpc|via}] [/X] [/L] [/B YYYYMMDD_HH:MM:SS] [/E YYYYMMDD_HH:MM:SS]
Now take a look at how each of these parameters
is used, along with some best practices to prevent the data collection
effort from becoming an accessory to a performance degradation crime.
SQLdiag Parameter List
- START | STOP | STOP_ABORT — START and STOP are used to start and stop the data collection, respectively. STOP_ABORT
is used when data collection needs to be terminated immediately without
waiting for all the collectors to complete, which proves quite useful
when your data collection might start competing for resources with a
production workload or your data collection has exceeded your intended
data collection time window.
- /O outputpath — This
specifies the directory in which the utility will write the collected
data. It is advisable to use a local disk on which SQL Server database
files do not reside.
- /I cfgfile — This specifies the configuration file to be used for the data collection. The default configuration file is always SQLDiag.Xml.
- /M machinename or machinelist file
— This is a comma-separated list indicating the machines from which the
utility should collect data. This parameter is rarely used, as remote
data collection should be avoided. A machine name provided using this
parameter overrides the machine name value in the configuration file.
- /Q — Runs in quiet mode and prevents any prompts
- /C — This specifies the
compression option, which can be useful when you are collecting
diagnostic data for long periods of time, especially for the large
PerfMon and Profiler trace files. /C 1 directs SQLdiag to compress the output folder files using NTFS compression.
- /G — Connectivity checks are
skipped when this switch is specified, and data collection is prevented
only due to lack of permission or connectivity issues. SQLdiag will not
collect data by default if the user running the utility is not a member
of the Windows Administrators group on the server.
- /R — Registers the utility as a service
- /U — Unregisters the utility as a service
- /N — This parameter defines how SQLdiag controls management of the output folder. /N 2
renames the existing output folder (if present) and instead writes to
the output path specified. This can be a helpful option when you
perform a data collection for a specific period of time and the
captured data is analyzed later. This scenario requires preservation of
the output folder and not an overwrite of the existing data. The
default option is to overwrite.
- /A appname — Provides an
application name for the utility and enables the option of running
multiple concurrent data collection executables with different
application names.
- /T — Tells SQLdiag to connect
to the SQL Server instance using a specified protocol such as TCP,
Named Pipes, Local Procedure Call, or Virtual Interface Adapter.
Various environments are secure and only allow connections using a
specific port or protocol. In such scenarios, this parameter comes to
the rescue and facilitates the important data connection required to
collect diagnostic information.
- /L — Specifies running the
data collection in continuous mode. This has to be defined in
conjunction with either a start time or an end time using the /B or /E
parameters. This parameter enables collecting data around the clock
when you don’t know when the problem will occur or it occurs
intermittently. This ensures that SQLdiag collects the data even if you
are not around monitoring the server during the data collection period.
- /B and /E
— These specify the starting and ending times of the data collection,
respectively. These two parameters can be used together to collect data
while troubleshooting an issue that occurs during a specific time
period. They enable you to schedule your data collection without being
logged onto the server when the issue occurs at inconvenient hours of
the day. The date and time format for these parameters is YYYYMMDD_HH:MM:SS and you can even specify the time using a + sign. For example, /B +02:00:00 specifies that data collection should start two hours from the time the command is issued.
- /P — Sets the support folder path. By default, /P
is set to the folder in which the SQLdiag executable resides. This
folder contains SQLdiag support files, such as the XML configuration
file, Transact-SQL scripts, and other files that the utility uses
during diagnostics collection.
Now that you are familiar with the parameters, a
few examples of their values and behaviors are shown when used in
conjunction with other parameters:
The following command tells SQLdiag to start data collection at 12:01AM on 25th December, 2012, and terminate data collection after two hours with a specific output folder. /G parameter will skip connectivity checks.
SQLDIAG /O D:\SQLDIAG_Data\ /G /B 20121225_00:01:00 /E +02:00:00
The command prompt window will display the
following line on successful execution of the above code indicating
that SQLdiag is successfully initialized:
SQLDIAG Begin time 20121225_00:01:00 specified. Waiting
The following command instructs SQLdiag
to collect data using a configuration file with file compression. The
application name will be shown as DemoDiag with quiet mode enabled to
suppress any prompts.
SQLDIAG /O D:\SQLDIAG_Data\ /A DemoDiag /Q /I D:\SQLDIAG_Custom.xml
The following lines are what you will see in the command prompt window when SQLdiag initializes successfully:
DIAG$DemoDiag Collector version
IMPORTANT: Please wait until you see ‘Collection started’ before attempting to
reproduce your issue
NOTE When
attempting to collect diagnostic data, always wait for the message
“SQLdiag Collection started. Press Ctrl+C to stop” to appear (in a
green font) in the command prompt window before attempting to reproduce
a problem scenario for which diagnostic data needs to be collected.
The time required for SQLdiag to
initialize varies according to the state of the Windows machine from
which the data is being collected. Because some of the servers used to
collect diagnostic data may be under severe stress in terms of physical
resources, it may take a little more time than usual to complete the
initialization phase. Even the shutdown of the SQLdiag utility is
extended due to custom diagnostics configured. The most common sources
of delays are as follows:
- Large SQL Server Error Logs collected during shutdown
- The long time taken to collect MSINFO32 output
- T-SQL scripts captured during shutdown, which take a long time to execute
The SQLdiag console output and the verbose log of the utility can be found in the internal folder of the output directory in the files ##console.log and ##SQLDIAG.LOG,
respectively. When the utility experiences a failure or does not behave
as expected, these two files can provide you with additional insight
about why the failure occurred.
WARNING Do
not shut down the data collection process using the exit button of the
command prompt window. This can cause your Profiler traces or PerfMon
files to be orphaned, compounding an already existing problem. The
easiest way to detect this scenario is that your Profiler and PerfMon
files will continue to grow in size and you will be unable to move or
delete the files. The profiler trace can be easily stopped using the sp_trace_setstatus command.
However, the orphaned PerfMon files are a different story. Most often,
you can only correct this by restarting the SQL Server service, which
is definitely not a good scenario to run into when operating on a
production environment.