Monitoring the performance of an individual database is one of the few
areas in the MySQL feature set where community and third-party developers
have improved the MySQL experience. There are basic tools included with
MySQL that you can use to improve performance, but they do not have the
sophistication of some other system-tuning tools. Due to this limitation,
most MySQL DBAs earn their pay through experience in relational query
optimization techniques. We recognize there are several excellent
references that cover database performance in great detail and many readers are
likely to be well versed in basic database optimization. Rather than reintroducing query optimization techniques, we will
concentrate on how you can work with the tools available in MySQL to
assist in optimizing databases. We will use a simple example and a known
sample database to illustrate the use of the query performance command in
MySQL. In the next section we list best practices for improving database
performance.
1. Measuring Database Performance
Traditionally, database management systems have provided profiling tools
and indexing tools that report statistics that you can use to fine-tune
indexes. While there are some basic elements that can help
you improve database performance in MySQL, there is no (free) advanced
profiling tool available.
While the basic MySQL installation does not include formal tools
for monitoring database improvement, the MySQL Enterprise Manager suite
offers a host of performance monitoring features.
Fortunately, MySQL provides a few simple tools to help you
determine if your tables and queries are optimal. They are all SQL
commands and include EXPLAIN,
ANALYZE TABLE, and OPTIMIZE TABLE. The following sections
describe each of these commands in greater detail.
1.1. Using EXPLAIN
The EXPLAIN command
provides information about how a SELECT statement
(EXPLAIN works only for SELECT statements) can be executed. The
syntax for the EXPLAIN command is
shown below. Notice that EXPLAIN is
a synonym for the DESCRIBE command
found in other database systems.
[EXPLAIN | DESCRIBE] [EXTENDED] SELECT <select options>
You can also use the EXPLAIN
and DESCRIBE commands to
view details about the columns or partitions of a table. The syntax
for this version of the command is shown below.
[EXPLAIN | DESCRIBE] [PARTITIONS SELECT * FROM] <table_name>
Note:
A synonym for EXPLAIN
<table_name> is SHOW
COLUMNS FROM
<table_name>.
We will discuss the first use of the EXPLAIN command—examining a SELECT command to see how the MySQL
optimizer executes the statement. The results of this contain a
stepwise list of join operations that the optimizer predicts it would
require to execute the statement.
Note:
Query processing for order-by and
group-by are not shown in a stepwise
format.
The best use of this command is determining if you have the
correct indexes on your tables to allow for more precise targeting of
candidate rows. You can also use the results to test the various
optimizer override options. While this is an advanced technique and
generally discouraged, under the right circumstances you may encounter
a query that runs faster with certain optimizer options. We will see
an example of this later in this section.
Now let’s look at some examples of the EXPLAIN command in action. The following
examples are queries executed on the sakila
sample database provided for MySQL development and experimentation.
Note:We will use the \G or
vertical display format for clarity.
Let’s begin with a simple and seemingly harmless query. Let’s
say we want to see all of the films rated higher than a PG rating. The
result set contains a single row with the following columns:
id
Sequence number of the statement in order of
execution
select_type
The type of statement executed
table
The table operated on for this step
type
The type of join to be used
Note:
If this column shows ALL, you are
doing a full table scan. You should strive to avoid these
operations by adding indexes or rewriting your query.
Similarly, if this column shows INDEX, you
are doing a full index scan, which is very inefficient.
possible_keys
A list of columns available if there are indexes
available
key
The key selected by the optimizer
key_len
The length of the key or portion of the key used
ref
Constraints or columns to be compared
rows
An estimate of the number of rows to process
extra
Additional information from the optimizer
Example 1 shows how the MySQL
optimizer executes this statement.
Example 1. A simple SELECT statement
mysql> EXPLAIN SELECT * FROM film WHERE rating > 'PG' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 892
Extra: Using where
1 row in set (0.01 sec)
|
You can see from this output that the optimizer has only one
step to execute and it is not using any indexes. This is sensible,
because we are not using any columns with indexes. Furthermore, even
though there is a WHERE clause, the
optimizer will still have to do a full table scan. This may be the
right choice when you consider the columns used and the lack of
indexes. However, if we ran this query hundreds of thousands of times,
the full table scan would be a very poor use of time. In this case, we
know from looking at the results that adding an index should improve
execution. Let’s add an index to the table and try again. Example 2 shows the improved query plan.
Example 2. Improved query plan
mysql> ALTER TABLE film ADD INDEX film_rating (rating);
Query OK, 0 rows affected (0.42 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT * FROM film WHERE rating > 'PG' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: film_rating
key: NULL
key_len: NULL
ref: NULL
rows: 892
Extra: Using where
1 row in set (0.00 sec)
|
Note:
For those of you with sharp eyes who have already spotted the
problem, bear with us as we work through it.
Here we see the query has now identified an index (possible_keys) but is still not using the
index, because the key field is
NULL. So what can we do? For this
simple example, you may note that only 892 rows are expected to be
read. The actual row count is 1,000 rows and the result set would
contain only 418 rows. Clearly, it would be a much faster query if it
only read 42 percent of the rows!
Now let’s see if we can get any additional information from the
optimizer by using the EXTENDED
keyword. This keyword allows us to see extra information
via the SHOW WARNINGS
command. You should issue the command immediately after the EXPLAIN command. The warning text describes
how the optimizer identifies table and column names in the statement,
the internal rewrite of the query, any optimizer rules applied, and
any additional notes about the execution. Example 3 shows the
results of using the EXTENDED keyword.
Example 3. Using the EXTENDED keyword for more information
mysql> EXPLAIN EXTENDED SELECT * FROM film WHERE rating > 'PG' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: film_rating
key: NULL
key_len: NULL
ref: NULL
rows: 892
filtered: 100.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)
mysql> SHOW WARNINGS \G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `sakila`.`film`.`film_id` AS `film_id`,
`sakila`.`film`.`title` AS `title`,`sakila`.`film`.`description` AS `description`,
`sakila`.`film`.`release_year` AS `release_year`,
`sakila`.`film`.`language_id` AS `language_id`,
`sakila`.`film`.`original_language_id` AS `original_language_id`,
`sakila`.`film`.`rental_duration` AS `rental_duration`,
`sakila`.`film`.`rental_rate` AS `rental_rate`,
`sakila`.`film`.`length` AS `length`,
`sakila`.`film`.`replacement_cost` AS `replacement_cost`,
`sakila`.`film`.`rating` AS `rating`,
`sakila`.`film`.`special_features` AS `special_features`,
`sakila`.`film`.`last_update` AS `last_update`
from `sakila`.`film` where (`sakila`.`film`.`rating` > 'PG')
1 row in set (0.00 sec)
|
This time, there is one warning (used to contain the information
from the optimizer) that displays a rewritten form of the query to
include all columns and explicitly reference the column in the
WHERE clause. Unfortunately, this
simple query is in need of a bit more consideration to make it more
efficient. You are likely to encounter queries like this that may
force you to redesign the query, reconsider using it frequently, or
(more likely) consider redesigning the table to support a better
index.
Let’s see what happens when we issue a query for a specific
rating rather than using a range query. We will see the optimization
with the index and without. Example 4 shows the results.
Example 4. Removing the range query
mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ref
possible_keys: film_rating
key: film_rating
key_len: 2
ref: const
rows: 195
Extra: Using where
1 row in set (0.00 sec)
mysql> ALTER TABLE film DROP INDEX film_rating;Query OK, 0 rows affected (0.37 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 892
Extra: Using where
1 row in set (0.00 sec)
|
Now we see a little improvement. Notice that the first query
plan does indeed use the index and results in a much improved plan
over not having the index at all. The question then remains, why
doesn’t the optimizer use the index? In this case, we’ve used a
nonunique index on an enumerated field. What sounded like a really
good idea is actually not much help at all for a range query of
enumerated values. However, we could rewrite the above query
differently (in several ways, actually) to produce better performance.
Let’s look at the query again.
We know we want all films rated higher than PG-13. We assumed
that the rating is ordered and that the enumerated field reflects the
order. In the sakila database, the rating field
is defined as having values (G, PG, PG-13, R, NC-17). Thus, it appears
the order is maintained if we accept the enumeration index for each
value that corresponds to the order (e.g., G = 1, PG = 2, etc.). But
what if the order is incorrect or if (like in this example) the list
of values is incomplete?
In the example we’ve chosen, where we want all of the films that
have a rating higher than PG-13, we know from our list of ratings that
this includes films with a rating of R or NC-17. Rather than using a
range query, let’s examine what the optimizer would do if we listed
these values.
Recall that we removed the index, so we will try the query first
without the index, then add the index and see if we have an
improvement. Example 5 shows the
improved query.
Example 5. Improved query without range
mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' OR rating = 'NC-17' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 892
Extra: Using where
1 row in set (0.00 sec)
mysql> ALTER TABLE film ADD INDEX film_rating (rating);
Query OK, 0 rows affected (0.40 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' OR rating = 'NC-17' \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: film_rating
key: NULL
key_len: NULL
ref: NULL
rows: 892
Extra: Using where
1 row in set (0.00 sec)
|
Alas, that didn’t work either. Again, we have chosen to query on
a column that has an index but is not an index the optimizer can use.
We know the optimizer can and will use the index for a simple equality
comparison. Let’s try rewriting the query as the union of two queries.
Example 6 shows the rewritten
query.
Example 6. Query rewritten using UNION
mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' UNION
SELECT * FROM film WHERE rating = 'NC-17' \G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: ref
possible_keys: film_rating
key: film_rating
key_len: 2
ref: const
rows: 195
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: UNION
table: film
type: ref
possible_keys: film_rating
key: film_rating
key_len: 2
ref: const
rows: 210
Extra: Using where
*************************** 3. row ***************************
id: NULL
select_type: UNION RESULT
table: <union1,2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
3 rows in set (0.00 sec)
|
Success! Now we can see we have a query plan that is using the
index and processing far fewer rows. We can see from the result of the
EXPLAIN command that the optimizer
is running each query individually (steps execute from row 1 down to
row n) and combines the result in the last
step.
Note:
MySQL has a session status variable named last_query_cost that stores the cost of
the last query executed. Use this variable to compare two query
plans for the same query. For example, after each EXPLAIN, check the value of the variable.
The query with the lowest cost value is considered to be the more
efficient (less time-consuming) query. A value of 0 indicates no
query has been submitted for compilation.
While this exercise may seem to be a lot of work for a little
gain, consider that there are many such queries being executed in
applications without anyone noticing the inefficiency. Normally we
encounter these types of queries only when the row count gets large
enough to notice. In the sakila database, there
are only 1,000 rows, but what if there were a million or tens of
millions of rows?
Aside from EXPLAIN, there is
no single tool in a standard MySQL distribution that you can use to
profile a query in MySQL.
1.2. Using ANALYZE TABLE
The MySQL optimizer, like most traditional optimizers, uses
statistical information about tables to perform its analysis of the
optimal query execution plan. These statistics include information
about indexes, distribution of values, and table structure, among many
items.
The ANALYZE TABLE
command recalculates the key distribution for one or more tables. This
information determines the table order for a join operation. The
syntax for the ANALYZE TABLE
command is shown below:
ANALYZE [LOCAL | NO_WRITE_TO_BINLOG] TABLE <table_list>
You can update the key distribution for MyISAM and InnoDB
tables. This is very important to note because it is not a general
tool that applies to all storage engines. However, all storage engines
must report index cardinality statistics to the optimizer if they
support indexes. Some storage engines, like third-party engines, have
their own specific built-in statistics. A typical execution of the
command is shown in Example 7. Running the
command on a table with no indexes has no effect, but will not result
in an error.
Example 7. Analyzing a table to update key distribution
mysql> ANALYZE TABLE film;
+-------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| sakila.film | analyze | status | OK |
+-------------+---------+----------+----------+
1 row in set (0.00 sec)
|
In this example, we see the analysis is complete and there are
no unusual conditions. Should there be any unusual events during the
execution of the command, the Msg_type field can indicate “info,” “error,”
or “warning.” In these cases, the Msg_text field will give you additional
information about the event. You should always investigate the
situation if you get any result other than “status” and “OK.”
You can see the status of your indexes using the SHOW INDEX command.
A sample of the output of the film table is shown in Example 8. In this case, we’re
interested in the cardinality of each index, which is an estimate of
the number of unique values in the index. We omit the other columns
from the display for brevity.
Example 8. The indexes for the film table
mysql> SHOW INDEX FROM film \G
*************************** 1. row ***************************
Table: film
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: film_id
Collation: A
Cardinality: 1028
...
*************************** 2. row ***************************
Table: film
Non_unique: 1
Key_name: idx_title
Seq_in_index: 1
Column_name: title
Collation: A
Cardinality: 1028
...
*************************** 3. row ***************************
Table: film
Non_unique: 1
Key_name: idx_fk_language_id
Seq_in_index: 1
Column_name: language_id
Collation: A
Cardinality: 2
...
*************************** 4. row ***************************
Table: film
Non_unique: 1
Key_name: idx_fk_original_language_id
Seq_in_index: 1
Column_name: original_language_id
Collation: A
Cardinality: 2
...
*************************** 5. row ***************************
Table: film
Non_unique: 1
Key_name: film_rating
Seq_in_index: 1
Column_name: rating
Collation: A
Cardinality: 11
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
5 rows in set (0.00 sec)
|
The LOCAL or NO_WRITE_TO_BINLOG keyword prevents the command from being written to the binary
log (and thereby from being replicated in a replication topology).
This can be very useful if you want to experiment or tune while
replicating data or if you want to omit this step from your binary log
and not replay it during PITR.
You should run this command whenever there have been significant
updates to the table (e.g., bulk-loaded data). The system must have a
read lock on the table for the duration of the operation.
1.3. Using OPTIMIZE TABLE
Tables that are updated frequently with new data and deletions can
become fragmented quickly and, depending on the storage engine, can
have gaps of unused space or suboptimal storage structures. A badly
fragmented table can result in slower performance, especially during
table scans.
The OPTIMIZE TABLE command
restructures the data structures for one or more tables. This is
especially beneficial for row formats with variable length fields
(rows). The syntax for the OPTIMIZE
TABLE command is shown below:
OPTIMIZE [LOCAL | NO_WRITE_TO_BINLOG] TABLE <table_list>
You can use this command for MyISAM and InnoDB tables. This is very important to note because it
is not a general tool that applies to all storage engines. If the
table cannot be reorganized (e.g., there are no variable length
records or there is no fragmentation), the command will revert to
re-creating the table and updating the statistics. A sample output
from this operation is shown in Example 9.
Example 9. The optimize table command
mysql> OPTIMIZE TABLE film \G
*************************** 1. row ***************************
Table: sakila.film
Op: optimize
Msg_type: note
Msg_text: Table does not support optimize, doing recreate + analyze instead
*************************** 2. row ***************************
Table: sakila.film
Op: optimize
Msg_type: status
Msg_text: OK
2 rows in set (0.44 sec)
|
Here we see two rows in the result set. The first row tells us
the OPTIMIZE TABLE command could
not be run and that the command will instead re-create the table and
run the ANALYZE TABLE command. The
second row is the result of the ANALYZE
TABLE step.
Like the ANALYZE TABLE
command above, any unusual events during the execution of the command
are indicated in the Msg_type field
by “info,” “error,” or “warning.” In these cases, the Msg_text field will give you additional
information about the event. You should always investigate the
situation if you get any result other than “status” and “OK.”
The LOCAL or NO_WRITE_TO_BINLOG keyword prevents the command from being written to the
binary log (it will therefore not be replicated in a replication
topology). This can be very useful if you want to experiment or tune
while replicating data or if you want to omit this step from your
binary log and thereby not replay it during PITR.
You should run this command whenever there have been significant
updates to the table (e.g., a large number of deletes and inserts).
This operation is designed to rearrange data elements into a more
optimal structure and could run for longer than expected. This is one
operation that is best run during times of lower loads.
Warning:
When using InnoDB, especially when there are secondary indexes
(which usually get fragmented), you may not see any improvement or
you may encounter long processing times for the operation unless you
use the InnoDB “fast index create” option.