Monitoring MySQL : Database Performance (part 1) - Measuring Database Performance

8/11/2012 5:26:10 PM

Monitoring the performance of an individual database is one of the few areas in the MySQL feature set where community and third-party developers have improved the MySQL experience. There are basic tools included with MySQL that you can use to improve performance, but they do not have the sophistication of some other system-tuning tools. Due to this limitation, most MySQL DBAs earn their pay through experience in relational query optimization techniques. We recognize there are several excellent references that cover database performance in great detail and many readers are likely to be well versed in basic database optimization.

Rather than reintroducing query optimization techniques, we will concentrate on how you can work with the tools available in MySQL to assist in optimizing databases. We will use a simple example and a known sample database to illustrate the use of the query performance command in MySQL. In the next section we list best practices for improving database performance.

1. Measuring Database Performance

Traditionally, database management systems have provided profiling tools and indexing tools that report statistics that you can use to fine-tune indexes. While there are some basic elements that can help you improve database performance in MySQL, there is no (free) advanced profiling tool available.

While the basic MySQL installation does not include formal tools for monitoring database improvement, the MySQL Enterprise Manager suite offers a host of performance monitoring features.

Fortunately, MySQL provides a few simple tools to help you determine if your tables and queries are optimal. They are all SQL commands and include EXPLAIN, ANALYZE TABLE, and OPTIMIZE TABLE. The following sections describe each of these commands in greater detail.

1.1. Using EXPLAIN

The EXPLAIN command provides information about how a SELECT statement (EXPLAIN works only for SELECT statements) can be executed. The syntax for the EXPLAIN command is shown below. Notice that EXPLAIN is a synonym for the DESCRIBE command found in other database systems.

[EXPLAIN | DESCRIBE] [EXTENDED] SELECT <select options>

You can also use the EXPLAIN and DESCRIBE commands to view details about the columns or partitions of a table. The syntax for this version of the command is shown below.

[EXPLAIN | DESCRIBE] [PARTITIONS SELECT * FROM] <table_name>

Note:

A synonym for EXPLAIN <table_name> is SHOW COLUMNS FROM <table_name>.

We will discuss the first use of the EXPLAIN command—examining a SELECT command to see how the MySQL optimizer executes the statement. The results of this contain a stepwise list of join operations that the optimizer predicts it would require to execute the statement.

Note:

Query processing for order-by and group-by are not shown in a stepwise format.

The best use of this command is determining if you have the correct indexes on your tables to allow for more precise targeting of candidate rows. You can also use the results to test the various optimizer override options. While this is an advanced technique and generally discouraged, under the right circumstances you may encounter a query that runs faster with certain optimizer options. We will see an example of this later in this section.

Now let’s look at some examples of the EXPLAIN command in action. The following examples are queries executed on the sakila sample database provided for MySQL development and experimentation.

Note:

We will use the \G or vertical display format for clarity.

Let’s begin with a simple and seemingly harmless query. Let’s say we want to see all of the films rated higher than a PG rating. The result set contains a single row with the following columns:

id

Sequence number of the statement in order of execution

select_type

The type of statement executed

table

The table operated on for this step

type

The type of join to be used

Note:

If this column shows ALL, you are doing a full table scan. You should strive to avoid these operations by adding indexes or rewriting your query. Similarly, if this column shows INDEX, you are doing a full index scan, which is very inefficient.

possible_keys

A list of columns available if there are indexes available

key

The key selected by the optimizer

key_len

The length of the key or portion of the key used

ref

Constraints or columns to be compared

rows

An estimate of the number of rows to process

extra

Additional information from the optimizer

Example 1 shows how the MySQL optimizer executes this statement.

Example 1. A simple SELECT statement

mysql> EXPLAIN SELECT * FROM film WHERE rating >  'PG' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 892
        Extra: Using where
1 row in set (0.01 sec)

You can see from this output that the optimizer has only one step to execute and it is not using any indexes. This is sensible, because we are not using any columns with indexes. Furthermore, even though there is a WHERE clause, the optimizer will still have to do a full table scan. This may be the right choice when you consider the columns used and the lack of indexes. However, if we ran this query hundreds of thousands of times, the full table scan would be a very poor use of time. In this case, we know from looking at the results that adding an index should improve execution. Let’s add an index to the table and try again. Example 2 shows the improved query plan.

Example 2. Improved query plan

mysql> ALTER TABLE film ADD INDEX film_rating (rating);
Query OK, 0 rows affected (0.42 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM film WHERE rating >  'PG' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ALL
possible_keys: film_rating
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 892
        Extra: Using where
1 row in set (0.00 sec)

Note:

For those of you with sharp eyes who have already spotted the problem, bear with us as we work through it.

Here we see the query has now identified an index (possible_keys) but is still not using the index, because the key field is NULL. So what can we do? For this simple example, you may note that only 892 rows are expected to be read. The actual row count is 1,000 rows and the result set would contain only 418 rows. Clearly, it would be a much faster query if it only read 42 percent of the rows!

Now let’s see if we can get any additional information from the optimizer by using the EXTENDED keyword. This keyword allows us to see extra information via the SHOW WARN⁠INGS command. You should issue the command immediately after the EXPLAIN command. The warning text describes how the optimizer identifies table and column names in the statement, the internal rewrite of the query, any optimizer rules applied, and any additional notes about the execution. Example 3 shows the results of using the EXTENDED keyword.

Example 3. Using the EXTENDED keyword for more information

mysql> EXPLAIN EXTENDED SELECT * FROM film WHERE rating >  'PG' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ALL
possible_keys: film_rating
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 892
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS \G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `sakila`.`film`.`film_id` AS `film_id`,
`sakila`.`film`.`title` AS `title`,`sakila`.`film`.`description` AS `description`,
`sakila`.`film`.`release_year` AS `release_year`,
`sakila`.`film`.`language_id` AS `language_id`,
`sakila`.`film`.`original_language_id` AS `original_language_id`,
`sakila`.`film`.`rental_duration` AS `rental_duration`,
`sakila`.`film`.`rental_rate` AS `rental_rate`,
`sakila`.`film`.`length` AS `length`,
`sakila`.`film`.`replacement_cost` AS `replacement_cost`,
`sakila`.`film`.`rating` AS `rating`,
`sakila`.`film`.`special_features` AS `special_features`,
`sakila`.`film`.`last_update` AS `last_update`
from `sakila`.`film` where (`sakila`.`film`.`rating` > 'PG')
1 row in set (0.00 sec)

This time, there is one warning (used to contain the information from the optimizer) that displays a rewritten form of the query to include all columns and explicitly reference the column in the WHERE clause. Unfortunately, this simple query is in need of a bit more consideration to make it more efficient. You are likely to encounter queries like this that may force you to redesign the query, reconsider using it frequently, or (more likely) consider redesigning the table to support a better index.

Let’s see what happens when we issue a query for a specific rating rather than using a range query. We will see the optimization with the index and without. Example 4 shows the results.

Example 4. Removing the range query

mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ref
possible_keys: film_rating
          key: film_rating
      key_len: 2
          ref: const
         rows: 195
        Extra: Using where
1 row in set (0.00 sec)

mysql> ALTER TABLE film DROP INDEX film_rating;Query OK, 0 rows affected (0.37 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 892
        Extra: Using where
1 row in set (0.00 sec)

Now we see a little improvement. Notice that the first query plan does indeed use the index and results in a much improved plan over not having the index at all. The question then remains, why doesn’t the optimizer use the index? In this case, we’ve used a nonunique index on an enumerated field. What sounded like a really good idea is actually not much help at all for a range query of enumerated values. However, we could rewrite the above query differently (in several ways, actually) to produce better performance. Let’s look at the query again.

We know we want all films rated higher than PG-13. We assumed that the rating is ordered and that the enumerated field reflects the order. In the sakila database, the rating field is defined as having values (G, PG, PG-13, R, NC-17). Thus, it appears the order is maintained if we accept the enumeration index for each value that corresponds to the order (e.g., G = 1, PG = 2, etc.). But what if the order is incorrect or if (like in this example) the list of values is incomplete?

In the example we’ve chosen, where we want all of the films that have a rating higher than PG-13, we know from our list of ratings that this includes films with a rating of R or NC-17. Rather than using a range query, let’s examine what the optimizer would do if we listed these values.

Recall that we removed the index, so we will try the query first without the index, then add the index and see if we have an improvement. Example 5 shows the improved query.

Example 5. Improved query without range

mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' OR rating = 'NC-17' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 892
        Extra: Using where
1 row in set (0.00 sec)

mysql> ALTER TABLE film ADD INDEX film_rating (rating);
Query OK, 0 rows affected (0.40 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' OR rating = 'NC-17' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
         type: ALL
possible_keys: film_rating
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 892
        Extra: Using where
1 row in set (0.00 sec)

Alas, that didn’t work either. Again, we have chosen to query on a column that has an index but is not an index the optimizer can use. We know the optimizer can and will use the index for a simple equality comparison. Let’s try rewriting the query as the union of two queries. Example 6 shows the rewritten query.

Example 6. Query rewritten using UNION

mysql> EXPLAIN SELECT * FROM film WHERE rating = 'R' UNION
SELECT * FROM film WHERE rating = 'NC-17' \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: film
         type: ref
possible_keys: film_rating
          key: film_rating
      key_len: 2
          ref: const
         rows: 195
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: film
         type: ref
possible_keys: film_rating
          key: film_rating
      key_len: 2
          ref: const
         rows: 210
        Extra: Using where
*************************** 3. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:
3 rows in set (0.00 sec)

Success! Now we can see we have a query plan that is using the index and processing far fewer rows. We can see from the result of the EXPLAIN command that the optimizer is running each query individually (steps execute from row 1 down to row n) and combines the result in the last step.

Note:

MySQL has a session status variable named last_query_cost that stores the cost of the last query executed. Use this variable to compare two query plans for the same query. For example, after each EXPLAIN, check the value of the variable. The query with the lowest cost value is considered to be the more efficient (less time-consuming) query. A value of 0 indicates no query has been submitted for compilation.

While this exercise may seem to be a lot of work for a little gain, consider that there are many such queries being executed in applications without anyone noticing the inefficiency. Normally we encounter these types of queries only when the row count gets large enough to notice. In the sakila database, there are only 1,000 rows, but what if there were a million or tens of millions of rows?

Aside from EXPLAIN, there is no single tool in a standard MySQL distribution that you can use to profile a query in MySQL.

1.2. Using ANALYZE TABLE

The MySQL optimizer, like most traditional optimizers, uses statistical information about tables to perform its analysis of the optimal query execution plan. These statistics include information about indexes, distribution of values, and table structure, among many items.

The ANALYZE TABLE command recalculates the key distribution for one or more tables. This information determines the table order for a join operation. The syntax for the ANALYZE TABLE command is shown below:

ANALYZE [LOCAL | NO_WRITE_TO_BINLOG] TABLE <table_list>

You can update the key distribution for MyISAM and InnoDB tables. This is very important to note because it is not a general tool that applies to all storage engines. However, all storage engines must report index cardinality statistics to the optimizer if they support indexes. Some storage engines, like third-party engines, have their own specific built-in statistics. A typical execution of the command is shown in Example 7 . Running the command on a table with no indexes has no effect, but will not result in an error.

Example 7. Analyzing a table to update key distribution

mysql> ANALYZE TABLE film;
+-------------+---------+----------+----------+
| Table       | Op      | Msg_type | Msg_text |
+-------------+---------+----------+----------+
| sakila.film | analyze | status   | OK       |
+-------------+---------+----------+----------+
1 row in set (0.00 sec)

In this example, we see the analysis is complete and there are no unusual conditions. Should there be any unusual events during the execution of the command, the Msg_type field can indicate “info,” “error,” or “warning.” In these cases, the Msg_text field will give you additional information about the event. You should always investigate the situation if you get any result other than “status” and “OK.”

You can see the status of your indexes using the SHOW INDEX command. A sample of the output of the film table is shown in Example 8 . In this case, we’re interested in the cardinality of each index, which is an estimate of the number of unique values in the index. We omit the other columns from the display for brevity.

Example 8. The indexes for the film table

mysql> SHOW INDEX FROM film \G
*************************** 1. row ***************************
       Table: film
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: film_id
   Collation: A
 Cardinality: 1028
...
*************************** 2. row ***************************
       Table: film
  Non_unique: 1
    Key_name: idx_title
Seq_in_index: 1
 Column_name: title
   Collation: A
 Cardinality: 1028
...
*************************** 3. row ***************************
       Table: film
  Non_unique: 1
    Key_name: idx_fk_language_id
Seq_in_index: 1
 Column_name: language_id
   Collation: A
 Cardinality: 2
...
*************************** 4. row ***************************
       Table: film
  Non_unique: 1
    Key_name: idx_fk_original_language_id
Seq_in_index: 1
 Column_name: original_language_id
   Collation: A
 Cardinality: 2
...
*************************** 5. row ***************************
       Table: film
  Non_unique: 1
    Key_name: film_rating
Seq_in_index: 1
 Column_name: rating
   Collation: A
 Cardinality: 11
    Sub_part: NULL
      Packed: NULL
        Null: YES
  Index_type: BTREE
     Comment:
5 rows in set (0.00 sec)

The LOCAL or NO_WRITE_TO_BINLOG keyword prevents the command from being written to the binary log (and thereby from being replicated in a replication topology). This can be very useful if you want to experiment or tune while replicating data or if you want to omit this step from your binary log and not replay it during PITR.

You should run this command whenever there have been significant updates to the table (e.g., bulk-loaded data). The system must have a read lock on the table for the duration of the operation.

1.3. Using OPTIMIZE TABLE

Tables that are updated frequently with new data and deletions can become fragmented quickly and, depending on the storage engine, can have gaps of unused space or suboptimal storage structures. A badly fragmented table can result in slower performance, especially during table scans.

The OPTIMIZE TABLE command restructures the data structures for one or more tables. This is especially beneficial for row formats with variable length fields (rows). The syntax for the OPTIMIZE TABLE command is shown below:

OPTIMIZE [LOCAL | NO_WRITE_TO_BINLOG] TABLE <table_list>

You can use this command for MyISAM and InnoDB tables. This is very important to note because it is not a general tool that applies to all storage engines. If the table cannot be reorganized (e.g., there are no variable length records or there is no fragmentation), the command will revert to re-creating the table and updating the statistics. A sample output from this operation is shown in Example 9.

Example 9. The optimize table command

mysql> OPTIMIZE TABLE film \G
*************************** 1. row ***************************
   Table: sakila.film
      Op: optimize
Msg_type: note
Msg_text: Table does not support optimize, doing recreate + analyze instead
*************************** 2. row ***************************
   Table: sakila.film
      Op: optimize
Msg_type: status
Msg_text: OK
2 rows in set (0.44 sec)

Here we see two rows in the result set. The first row tells us the OPTIMIZE TABLE command could not be run and that the command will instead re-create the table and run the ANALYZE TABLE command. The second row is the result of the ANALYZE TABLE step.

Like the ANALYZE TABLE command above, any unusual events during the execution of the command are indicated in the Msg_type field by “info,” “error,” or “warning.” In these cases, the Msg_text field will give you additional information about the event. You should always investigate the situation if you get any result other than “status” and “OK.”

The LOCAL or NO_WRITE_TO_BINLOG keyword prevents the command from being written to the binary log (it will therefore not be replicated in a replication topology). This can be very useful if you want to experiment or tune while replicating data or if you want to omit this step from your binary log and thereby not replay it during PITR.

You should run this command whenever there have been significant updates to the table (e.g., a large number of deletes and inserts). This operation is designed to rear⁠range data elements into a more optimal structure and could run for longer than expected. This is one operation that is best run during times of lower loads.

Warning:

When using InnoDB, especially when there are secondary indexes (which usually get fragmented), you may not see any improvement or you may encounter long processing times for the operation unless you use the InnoDB “fast index create” option.

Monitoring MySQL : Database Performance (part 2) - Database Optimization Best Practices

Other

Transact-SQL in SQL Server 2008 : Change Data Capture (part 2) - Querying the CDC Tables

Transact-SQL in SQL Server 2008 : Change Data Capture (part 1)

Transact-SQL in SQL Server 2008 : Spatial Data Types (part 3) - Spatial Data Support in SSMS

Transact-SQL in SQL Server 2008 : Spatial Data Types (part 2) - Working with Geography Data

Transact-SQL in SQL Server 2008 : Spatial Data Types (part 1) - Representing Spatial Data, Working with Geometry Data

Transact-SQL in SQL Server 2008 : Sparse Columns

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 4) - Analysis Services CLR Support: Server-Side ADO MD.NET

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 3) - XMLA at Your Service

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 2) - OLAP Development with ADO MD.NET

SQL Server 2005 : Beyond OWC: Full-On OLAP Development (part 1) - Management Studio as an MDX Client