SQL Server 2012 : Troubleshooting Methodology and Practices - Defining the Problem

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
1/1/2015 8:32:09 PM

Investing time to understand the problem and application environment often leads to a higher-quality and faster problem resolution. While it is tempting to focus on immediately resolving the problem, complex problems are rarely resolved until causes are fully understood. A thorough understanding of the configuration, patterns, and characteristics of the problem will position you well for resolving the problem.

To learn about the problem, you need to identify the major software and hardware components, review the impact of recent changes, and understand the specific circumstances that cause the problem condition to occur. The following section provides a framework for these aspects. Decomposing the problem into constituent components will help isolate the cause of the problem and identify bottlenecks.

Guidelines for Identifying the Problem

Use the following guidelines to fully comprehend the exact problem you are facing:

  • Construct a diagram of the end-to-end application environment.
  • Obtain visibility of major hardware components, paying special attention to components that may complicate troubleshooting, such as geographically dispersed configurations, local caching, and network load balancing (NLB). Network load balancers can mask a problem with an individual server because the problem server may only serve traffic for 25% of requests (assuming four active servers); therefore, occurrences of the problem can appear random or inconsistent.
  • Gather all relevant logs to a single location:
    • Windows and System Event logs
    • SQL Server Error Logs
    • Dump files
    • Application logs
  • Construct a timeline of activities and events leading up to the failure.
  • Retrieve change logs, including any information relating to changes before the problem occurred and any changes or steps carried out in an attempt to resolve the problem.
  • Understand the steps necessary to reproduce the problem. If possible, ensure that you have a repeatable process to reproduce the problem and validate on a test environment if possible.
  • Agree on success criteria. Where the problem is repeatable, this is easy. With intermittent problems this can be more difficult, although agreeing to a period of non-occurrence may be valid (e.g., before troubleshooting the problem occurred daily, so if one week passes without the problem you can consider the issue resolved).
  • Understand log context, (e.g., client, middle tier, or SQL Server). Pay attention to the time zone on each machine. It may be necessary to synchronize the time zones for data from multiple sources.
  • Understand the rhythm of the business. This enables you to determine whether the current workload is typical, a seasonal spike, or an unusual pattern.
  • Capture any situations when the problem does not occur. Understanding these scenarios can be useful in refining the scope of the problem too.

Part of understanding the problem is understanding why the issue is occurring now. If this is a new system, perhaps you haven’t seen this level of load on the system before. If it is an existing system, review your change control documents to see what has changed recently on the system. Any change, even if seemingly unrelated, should be reviewed. This can mean any alteration, no matter how small, such as a Windows or SQL Server patch, a new policy or removed permission, a configuration option, or an application or database schema change.

Isolating the Problem

Are you certain the problem is related to the database tier? How do you know it’s a database problem? Many problems begin life as an application behavior or performance issue, and there may be other software components or interactions that could affect the database platform.

Once you have a good understanding of the problem, decompose it into manageable elements; isolating each component enables you to focus on the problem area fast. The intention of this approach is to eliminate or incriminate each area of the environment. Approach troubleshooting as a series of mini-experiments, each looking to prove or disprove that a specific feature or component is functioning correctly.

The following list describes what to look for when troubleshooting each major problem category:

  • Connectivity issues — Does the problem only occur with one protocol, such as named pipes or TCP/IP? Are some applications, users, client workstations, or subnets able to connect while others cannot? Does the problem occur only with double hops, whereas direct connections work? Will local connections work but remote connections fail? Is the problem related to name resolution (does ping by name work)? Could network routing be the issue (check ping or tracert)? Can you connect using the dedicated administrator connection (DAC)? Try to connect with SQL Authentication as well as using a domain account.
  • Performance issues — For a performance problem you need to determine if the problem is on the client, the middle tier, the server on which SQL Server runs, or the network. If it is an application performance problem, it is essential to establish how much time is consumed in the database tier; for example, if application response time is 10 seconds, is 1 second or 9 seconds consumed by the database response time? Capture slow-running stored procedures, execute these directly on the server, and confirm execution times.
  • Hardware bottlenecks — Identify resource contention around disk, CPU, network, or memory.
  • SQL Server issues — As well as hardware contention, SQL Server has finite internal resources, such as locks, latches, worker threads, and shared resources such as tempdb. Isolate these problems with wait stats analysis and DMVs, then investigate queries that are causing the resource consumption.
  • Compilation issues — If possible, identify one user query that is slow, the most common causes are insufficient resources. This could be caused by a sub-optimal query plan as a result of missing or outdated statistics, or inefficient indexes. Analyze the plan cache to help identify this problem.

Performance Bottlenecks

Performance troubleshooting involves identifying the bottleneck. This may be done live on the system, or via a post-mortem review by analyzing data collected during problem occurrence. This is often an iterative process, each cycle identifying and resolving the largest bottleneck until the problem is resolved. Often, fixing one bottleneck uncovers another and you need to start the troubleshooting cycle again with the new bottleneck.


If you identify a SQL Server memory bottleneck, you have several options to improve performance. The first is to increase physical memory or change the memory configuration. Another approach is to review queries and optimize performance to consume less memory.

If you decide to increase the memory available to SQL Server, you could consider adding more physical memory, or increasing the memory assignment for virtual machines (VMs). Improving the use of existing memory without adding more is often more scalable and yields better results. While x86 (32-bit) systems are becoming less common, if you are running SQL Server 2005 or 2008 on 32-bit systems or VMs, consider using the Address Window Extension (AWE) or /3GB to increase the buffer pool available to SQL Server (the AWE feature was discontinued in SQL Server 2012). However, if you do see memory contention on a x86 server, consider a plan to migrate to an × 64 system to resolve this issue. The × 64 platform provides increased virtual memory and better memory management.

Aside from physical memory and server configuration, significant performance gains can be made through query tuning to reduce memory requirements. Identify queries that require significant memory grants, such as sorts or hashes, and review the query plans for these scenarios. Try to identify better indexes, and avoid table scans and other operations that force a large number of rows to be read from disk and manipulated in memory.


CPU problems could be sustained or occasional spikes. Occasional CPU spikes, especially for a small number of CPUs, can often be safely ignored. Wait statistics record the resource SQL Server or a query is waiting on. Capturing wait statistics information can prove a useful tool in understanding resource bottlenecks and to identify whether CPU contention is the cause of performance problems. Consider server build and configuration options to improve CPU performance, such as increasing the number and speed of CPU cores. In terms of configuration options, review the maximum degree of parallelism to ensure it is optimal for the intended workload.

In many situations, overall performance may be acceptable while the server demonstrates high CPU. As with memory, once you have established CPU is the dominant wait type, identify the top 10 worst-performing queries by CPU and then work through each of these in turn. Look at the query execution plan and identify expensive CPU operations, such as hash joins, sorts, and computed columns. Look for opportunities to reduce CPU workload with new indexes, consolidated indexes, XML indexes, or to improve query design.

Storage I/O

Storage input/output (I/O) is typically the slowest resource within a server (memory and CPU are orders of magnitude quicker). Therefore, optimizing the storage solution design and configuration (ensuring the solution performs optimally) as well as being considerate with I/O requests (making fewer I/O requests) is essential to achieve scalable systems with good performance. Review the PerfMon disk counters for Average Disk Sec/Read and Average Disk Sec/Write to verify that the time to make a read or write is ideally below 20 milliseconds for OLTP systems, higher for decision support systems. Generally speaking, if storage is performing slower than this, database performance will be affected. When reviewing storage performance, consider the end-to-end solution. Following are some elements that may affect performance:

  • RAID levels
  • Disk types (enterprise flash Disk, SCSI)
  • Dedicated or shared disk arrays
  • Connectivity (InfiniBand, Fibre Channel, iSCSI)
  • HBA cache and queue settings
  • HBA load balancing policy (active; active vs. active; or passive)
  • NTFS cluster size
  • Layout and isolation of data, index, log, and tempdb files
  • Storage cache and controllers policy

In addition to ensuring optimal storage performance, be smart with I/O and ensure that the database is not making unnecessary requests. Reviewing and optimizing a query plan to eliminate index scans and replace them with seeks can often deliver an order of magnitude benefit in I/O reduction. It is common to overwhelm the storage solution with inefficient queries, saturating controllers and cache on the storage array.

Reduce I/O workload by improving indexes for more efficient access, make sure statistics are current, tune or increase memory to improve cache performance, or alter queries to avoid unnecessary I/O. Rationalize and consolidate indexes to minimize the overhead of index maintenance. Use Profiler or DMVs to identify the worst-performing queries by reads and writes. In addition, use STATISTICS IO to identify batches within a query that contain high logical I/Os. Usually, identifying the table or view that has the highest number of logical I/Os is sufficient to identify the table or view requiring optimization.


Network bottlenecks can look like SQL Server performance problems. When query results are not sent or received by the client as fast as SQL Server can send them, SQL Server can appear slow. Often a particular function within an application is described as slow. In this case, you should try to determine the database interaction used by this functionality.

SQL Server Profiler can find which stored procedures, functions, and queries are executed when the application feature is accessed. Sometimes this indicates that each query executes quickly, but either very many queries are executed or there is a large delay between the calls to each query. The latter case usually indicates that the performance problem is somewhere outside of SQL Server.

TCP Chimney is a network interface card (NIC) technology that by default allows servers to offload some TCP workload to the network card itself. This works well on desktop PCs and application servers, but database servers often transfer large amounts of data to clients.
In this scenario, the offload activity may overwhelm the NIC, and the processing capability on the network card can become a bottleneck. Disable TCP offloading using the NETSH command utility and NIC drivers.

If you are able to narrow the problem down to a single stored procedure as the main contributor to the problem, break that stored procedure down into individual queries. Often there will be a single query within that procedure — this is the area to focus on for tuning and optimization.

Top 10
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
- Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
- Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
programming4us programming4us