In order to best understand
how to submit a query through MySQL for Python, it is important to
ensure you understand how to submit a query in MySQL itself. The
similarities between the two outnumber the differences, but the first
may seem confusing if you don't properly understand the second.
MySQL statements have a basic
structure. In following a set structure, they are formed like natural
language statements. Being a computer program, it understandably
responds very poorly to informational statements and only moderately
well to questions. Almost all MySQL statements have an imperatival tone,
expressing your command. This is reflective of the client-server
relationship. The computer is the servant who exists to do the bidding
of yourself as the client or, if you prefer, master.
The syntactic structure of a
simple MySQL statement is not that different from the language you use
every day. Where English would have:
MySQL would need to hear:
Let's look at the MySQL statement, comparing it to the English in detail.
SELECT
MySQL does not support natural language searching like Give me.
Rather, like other programming languages including Python, MySQL has a
set of reserved key words. These are largely single synonyms for common,
core actions. For data retrieval, the key word is SELECT. It could have been GIMME or any of a score of similar ways of saying the same thing, but MySQL is consonant with the Zen of Python:
There should be one—and preferably only one—obvious way to do it
Therefore, the MySQL developers
settled on a single keyword—one that just happens to be
compliant with the SQL standard.
* (asterisk)
Being read up on your regular
expressions, I am sure you recognize this universal quantifier. While it
is one of the most commonly used, MySQL supports several metacharacters
that you can use to nuance your searches.
MySQL supports
different kinds of metacharacters in different contexts. The following
is a full list of metacharacters. (Note that not all of them may be
supported in a given situation.)
.: To match any single character
?: To match zero or one character
*: To match zero or more characters
+: To match one or more characters
{n}: To match an expression n times
{m,n}: To match an expression a minimum of m and a maximum of n times
{n,}: To match an expression n or more times
^: Indicates the bit-level indicator at the beginning of a line
$: Indicates the bit-level indicator at the end of a line
[[:<:]]: To match the beginning of words
[[:>:]]: To match the ending of words
[:class:]: To match a character class
[:alpha:]: For letters
[:space:]: For whitespace
[:punct:]: For punctuation
[:upper:]: For upper case letters
[abc]: To match one of the enclosed characters
[^xyz]: To match any character other than those enclosed
|: Separates alternatives within an expression
In the case of the SELECT command, the asterisk is the only metacharacter supported. In addition to the asterisk, however, SELECT also supports several arguments used to quantify results:
ALL: All matching rows (synonymous to using an asterisk (*)
DISTINCT: Sort the results set into unique values
DISTINCTROW: Where the entire record is unique
Each of these can be
prefaced before the field to be quantified :
SELECT DISTINCT id FROM menu;
This would return the values of the id column from the menu table and remove any duplicates from the results.
FROM
As with the English
equivalent, MySQL needs some context in order to retrieve anything. In
English, if one simply said Give me! without non-verbal cues for the
intended context, the listener would rightly be confused. So here we
tell MySQL from which table in the current database we want it to
extract information.
Note that this is technically
optional. Leaving it off, however, typically means that you are using
MySQL's built-in datasets. For example, here is a statement using the
built-in functions for the current date, user, and version (the \G is the same command as \g, but it tells MySQL to display the results vertically):
mysql> SELECT NOW(), USER(), VERSION()\G
*************************** 1. row ***************************
NOW(): 2009-08-29 12:29:23
USER(): skipper@localhost
VERSION(): 5.1.31-1ubuntu2
1 row in set (0.00 sec)
staff
This is merely the name of the
table to be searched. In English, there are many locations from which
one may desire something. That is why we would need to clarify that we
want the items from the far table. MySQL, on the other hand, only
understands things in terms of databases and tables and so understands
as the name of a table whatever immediately follows the FROM keyword.
; (semicolon)
The semicolon is the
default statement delimiter in MySQL. When creating a MySQL script or
interacting with MySQL dynamically through its shell, leaving off a
semicolon at the end of a statement will result in either a second
prompt or, if you press the matter, an error being thrown. As we will
see shortly, the syntax of MySQL for Python and Python itself mandates a
different way of showing the end of the line. Therefore when passing
MySQL queries in Python, we do not need to end any statements with the
semicolon.
You may wonder why certain
portions of the MySQL query are capitalized. It is a standard
presentation format for MySQL statements to present the static or
standard elements of a MySQL statement in capitals. Variable parts of
the statement, however, are case sensitive and must be called with the
same case in which they were created (otherwise, MySQL will throw an
error). This matter of capitalization is not significant if you interact
with MySQL directly, from a MySQL prompt. There, MySQL will understand
your statements whether they be in all caps or lowercase. However, in
your code, proper capitalization is critical to making your SQL
statements readable to the next developer—both in Python and
in MySQL scripts.
Where the semicolon is the
statement delimiter in the MySQL shell, the backslash (\) is used to
delimit lines within a statement. So, if you want to break up a
statement but not have it executed when you press return, simply use a
backslash at the end of each line of the statement. For example:
mysql> SELECT \
-> * \
-> FROM \
-> menu;
Other helpful quantifiers
The previous discussion offers an overview of the SELECT command and its most common arguments. There are many other ways to nuance the data. In addition to FROM, you can also employ SELECT with the following optional arguments.
WHERE
WHERE is used to declare a condition under which MySQL is to narrow the results of the search. The basic syntax of the clause is:
For example:
mysql> SELECT * FROM menu WHERE id='5';
+----+-------+-------+
| id | name | price |
+----+-------+-------+
| 5 | trout | 6.00 |
+----+-------+-------+
1 row in set (0.00 sec)
GROUP BY
GROUP BY allows you to group results according to one of the following three parameters:
col_name: Is the name of one of the table's columns
expr: Is a regular expression
position: Is a position in the table
Once grouped, you can then tell MySQL to list the results in either ASCending or DESCending order through ASC and DESC,
respectively. The former is the default. Additionally, MySQL provides
for a summative line at the end of the results through the use of WITH ROLLUP.
The syntax of a GROUP BY clause is:
GROUP BY {col_name | expr | position} [ASC | DESC], [WITH ROLLUP]
To appreciate the effect of GROUP BY, you can retrieve all of the values from a table.
mysql> SELECT * FROM menu;
+----+---------------+-------+
| ID | NAME | PRICE |
+----+---------------+-------+
| 4 | catfish | 5.00 |
| 2 | bass | 6.75 |
| 6 | haddock | 6.50 |
| 3 | salmon | 9.50 |
| 5 | trout | 6.00 |
| 1 | tuna | 7.50 |
| 7 | yellowfin tuna | 12.00 |
+----+---------------+-------+
7 rows in set (0.00 sec)
Using GROUP BY on just one column can give us the same list in alphabetical order:
mysql> SELECT * FROM menu GROUP BY name;
+----+----------------+-------+
| id | name | price |
+----+----------------+-------+
| 2 | bass | 6.75 |
| 4 | catfish | 5.00 |
| 6 | haddock | 6.50 |
| 3 | salmon | 9.50 |
| 5 | trout | 6.00 |
| 1 | tuna | 7.50 |
| 7 | yellowfin tuna | 12.00 |
+----+----------------+-------+
7 rows in set (0.00 sec)
If we had multiple entries for some of the fish (for example, tuna and yellowfin tuna), it could also be used to give a count by field value.
mysql> SELECT name, count(*) FROM menu GROUP BY name;
+----------------+----------+
| name | count(*) |
+----------------+----------+
| bass | 1 |
| catfish | 1 |
| haddock | 1 |
| salmon | 1 |
| trout | 1 |
| tuna | 2 |
| yellowfin tuna | 2 |
+----------------+----------+
7 rows in set (0.00 sec)
HAVING
As the WHERE clause has already been discussed, one might wonder rightly—what is the point of the HAVING clause? The WHERE clause is used for simple facts and does not support aggregate evaluations. The HAVING clause is used for aggregate functions. It can be used to replace WHERE, but to do so is generally viewed as poor coding because it violates the SQL standard.
The HAVING clause
is used to quantify results according to aggregate functions. For this
reason, it is usually used in conjunction with the GROUP BY clause.
The basic syntax of the HAVING clause is:
Carrying on with the previous menu example, a basic example of this is:
mysql> SELECT * FROM menu GROUP BY name HAVING id>'3';
+----+----------------+-------+
| id | name | price |
+----+----------------+-------+
| 4 | catfish | 5.00 |
| 6 | haddock | 6.50 |
| 5 | trout | 6.00 |
| 7 | yellowfin tuna | 12.00 |
+----+----------------+-------+
4 rows in set (0.00 sec)
For an example closer to real
life a video rental store that wants to know which customers rent the
most videos might use a query like this one:
mysql> SELECT customer_id,count(*) AS cnt FROM rental GROUP BY customer_id HAVING cnt> 40;
+-------------+-----+
| customer_id | cnt |
+-------------+-----+
| 75 | 41 |
| 144 | 42 |
| 148 | 46 |
| 236 | 42 |
| 526 | 45 |
+-------------+-----+
5 rows in set (0.05 sec)
This shows the customer
number followed by the number of total rentals in the record of rentals
for each customer whose aggregate custom exceeds 40 videos.
ORDER BY
As the name implies, the ORDER BY clause is used to tell MySQL how to order the results of a query. The basic syntactical structure of this clause is as follows:
[ORDER BY {col_name | expr | position} [ASC | DESC], ...]
While the ORDER BY clause can be used in conjunction with the GROUP BY modifiers, this is typically not necessary. The following two examples illustrate why:
mysql> SELECT * FROM menu GROUP BY name ORDER BY id DESC;
+----+----------------+-------+
| id | name | price |
+----+----------------+-------+
| 7 | yellowfin tuna | 12.00 |
| 6 | haddock | 6.50 |
| 5 | trout | 6.00 |
| 4 | catfish | 5.00 |
| 3 | salmon | 9.50 |
| 2 | bass | 6.75 |
| 1 | tuna | 7.50 |
+----+----------------+-------+
7 rows in set (0.00 sec)
mysql> SELECT * FROM menu ORDER BY id DESC;
+----+----------------+-------+
| id | name | price |
+----+----------------+-------+
| 7 | yellowfin tuna | 12.00 |
| 6 | haddock | 6.50 |
| 5 | trout | 6.00 |
| 4 | catfish | 5.00 |
| 3 | salmon | 9.50 |
| 2 | bass | 6.75 |
| 1 | tuna | 7.50 |
+----+----------------+-------+
7 rows in set (0.00 sec)
Because the ORDER BY is applied after the GROUP BY, it largely abrogates the need for the grouping.
LIMIT
The LIMIT clause is
used to restrict the number of rows that are returned in the result set.
It takes two positive integers as arguments. The first number indicates
the point at which to start counting and counts from zero for that
process. The second number indicates how many times to increment the
first number by one in order to determine the desired limit.
The syntax of the LIMIT clause is as follows:
LIMIT {[offset,] row_count | row_count OFFSET offset}
The following four examples show how LIMIT may be used to reduce the returned results neatly. Used in an iterative fashion, incrementing the parameters of a LIMIT clause allows you to step through results.
In this first example, LIMIT is applied to an alphabetic listing of fish names. The table in question is the same one we used previously for GROUP BY. Note that the id numbers are out of sequence.
mysql> SELECT * FROM menu GROUP BY name LIMIT 3,4;
+----+----------------+-------+
| id | name | price |
+----+----------------+-------+
| 3 | salmon | 9.50 |
| 5 | trout | 6.00 |
| 1 | tuna | 7.50 |
| 7 | yellowfin tuna | 12.00 |
+----+----------------+-------+
4 rows in set (0.00 sec)
In order to get the id numbers sequenced correctly, we employ an ORDER BY clause to prep the data before applying the terms of the LIMIT clause to it.
mysql> SELECT * FROM menu ORDER BY id LIMIT 3,4;
+----+----------------+-------+
| id | name | price |
+----+----------------+-------+
| 4 | catfish | 5.00 |
| 5 | trout | 6.00 |
| 6 | haddock | 6.50 |
| 7 | yellowfin tuna | 12.00 |
+----+----------------+-------+
4 rows in set (0.00 sec)
These final two examples illustrate how to apply LIMIT to searches that could easily return scores, if not hundreds or thousands, of hits.
mysql> SELECT * FROM menu ORDER BY id LIMIT 2,3;
+----+---------+-------+
| id | name | price |
+----+---------+-------+
| 3 | salmon | 9.50 |
| 4 | catfish | 5.00 |
| 5 | trout | 6.00 |
+----+---------+-------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM menu LIMIT 2,4;
+----+---------+-------+
| id | name | price |
+----+---------+-------+
| 3 | salmon | 9.50 |
| 4 | catfish | 5.00 |
| 5 | trout | 6.00 |
| 6 | haddock | 6.50 |
+----+---------+-------+
4 rows in set (0.00 sec)
LIMIT and HAVING may seem very
similar as they both work to narrow the aggregate. The difference
between them lies in the timing of their application by MySQL. HAVING is applied as a parameter of the search before MySQL actions the query. The LIMIT clause, on the other hand, is applied after the search results have been returned.
If you are programming for a web
application and your database and web server are located on a single
machine, you need to conserve your server resources. Therefore, you
almost certainly want to use HAVING instead of LIMIT. If you are trying to reduce your search time, again, use HAVING.
However, if your desired hits will comprise a sizable portion of the
results otherwise, or your database server, application server, and web
server are each discrete systems from each other, then you might
consider using LIMIT. In the main, however, LIMIT allows MySQL to use more resources than HAVING because the former is applied after the query is already processed.
INTO OUTFILE
INTO OUTFILE allows for the rapid output of tabular results to a text file on the local host. Its basic syntax is as follows:
For example, one could use:
mysql> SELECT * FROM menu ORDER BY id LIMIT 3,4 INTO OUTFILE '/tmp/results.txt';
Query OK, 4 rows affected (0.00 sec)
This would output the results of the query to a file results.txt in the /tmp directory of the server.