Performance problems are rarely caused
by a single large query executing on a server. More often, the query
with the highest cumulative cost is a relatively short and fast query,
but one that might be executed thousands of times per minute. A stored
procedure that takes 200 ms to execute and is called thousands of times
per minute will have a greater impact on server performance than a
single query that takes 1.5 seconds to complete. As such, focus your
attention on queries with the highest cumulative cost.
When analyzing data, use aggregates to
consider the total cumulative time (duration, CPU, read/writes, etc.),
rather than identifying the single longest-running query. You can use
the Performance Dashboard reports or SQL Nexus to identify these
queries.
VALIDATING AND IMPLEMENTING RESOLUTION
Once the solution has been identified,
it should be validated through testing and implemented in production.
This process should be as controlled and disciplined as the iterations
of collecting and analyzing data. A production problem does not justify
a cavalier attitude toward production changes, and professionalism must
be maintained even under pressure.
Validating Changes
Changes should always be made in a test
environment prior to production. In an ideal scenario, the problem can
be reproduced in the test environment, which provides an opportunity to
confirm, or validate, that the fix has the desired impact. It is also
important to carry out confidence tests to ensure that the change has
no undesired impact.
Testing Changes in Isolation
If possible, test each change in
isolation. Changing several settings at once can make it harder to
identify which change resolved the problem or caused other issues. In
addition, it can be harder to roll back multiple changes than single,
individual changes. Ensure that you have a thorough understanding of
the consequences of any change, including rollback options.
Implementing Resolution
The final step is to implement the
resolution in production. Ensure that the change is documented and any
impact (such as service restarts) communicated. Note the behavior of
the database or application before and after the change, as well as
exactly what change was made. Ensure that the success criteria are met
once the resolution is implemented, and share your results with the
stakeholders.
Once the resolution is implemented and
the solution is stabilized, carry out post-mortem analysis and ensure
that the root causes are communicated to relevant parties. Identify any
other vulnerable systems within the organization and communicate any
lessons learned that may help you avoid a recurrence of similar
problems in the future.