1 The Keyword DISTINCT
Not only is the information
retrieved from a database important, but how the results are presented
to users is also important. Although tables themselves are not allowed
to have duplicate rows, query results can easily contain duplicates.
Suppose you ran Example Query 1.
Example Query 1
SELECT roleID
FROM RoleQuote;
The output of Example
Query 1 would start as shown in the Example Query 1 Results. There
would be as many rows in the results as appear in the RoleQuote table.
Because individual roles can have more than one quote, the query
results would include duplicate roleID values.
Example Query 1 Results
roleID |
---|
00001 |
00001 |
00001 |
00002 |
00003 |
... |
If a user is interested
in identifying only those roles that had quotes, the duplication in the
results is not helpful: You can't see the forest for the trees. The
keyword DISTINCT has the effect
of removing such duplicate rows from query results. Example Query 2
is the same as Example Query 1 except for the inclusion of DISTINCT after the keyword SELECT.
Example Query 2
SELECT DISTINCT roleID
FROM RoleQuote;
The output of Example Query 2 would start as shown in the Example Query 4-2 Results.
Example Query 2 Results
roleID |
---|
00002 |
00003 |
00005 |
00007 |
... |
The keyword DISTINCT can also be used when the query selects more than one column. In such a query, DISTINCT
comes before the list of columns and does not require parentheses or
other punctuation. In this situation, rows are considered to be
duplicates if they contain the same complete set of column values. DISTINCT has the effect of leaving only one row with a given set of values in it and removing any duplicate rows from the query results.
2 The Keywords ORDER BY
Presenting query
results in a particular order can be very helpful to users. For
example, a user might want to see rows of query results alphabetized by
one of the text columns in it or arranged in ascending or descending
order by one of the number columns in it. In other words, it might be
desirable to see query results in sorted order. SQL does not have a
keyword "alphabetize" or "sort." Instead, presenting query results in
sorted order is accomplished with the keywords ORDER BY. These keywords appear at the end of the query, along with the name of the column to sort on.
Suppose you wanted to find the
movieID and title of all movies in the Movie table, in order by title.
Example Query 3 shows this query.
Example Query 3
SELECT movieID, title
FROM Movie
ORDER BY title;
The output of Example Query 3 would start as shown in the Example Query 3 Results.
Example Query 3 Results
movieID | title |
---|
024 | Amadeus |
001 | Apollo 13 |
019 | Batman |
022 | Batman & Robin |
. . . | . . . |
The movie A Few Good Men
is in the database, but it is listed as "Few Good Men, A." Example
Query 3 illustrates why: If the movie title were given with the
article "A" first, it would appear in the results of the query along
with the other movie titles starting with the letter "A."
3 The Keywords ASC and DESC
When ordering results,
the default is to sort them in ascending order. If you want to make
this order explicit, you put the keyword ASC after the name of the column you're ordering on. If you would like results to appear in descending order, use the keyword DESC instead. Example Query 4 illustrates the use of this keyword.
Example Query 4
SELECT movieID, title
FROM Movie
ORDER BY title DESC;
The output of Example Query 4 would start as shown in the Example Query 4 Results.
Example Query 4 Results
movieID | title |
---|
016 | Wizard of Oz, The |
017 | Tootsie |
014 | Terminator, The |
015 | Terminator 2: Judgment Day |
. . . | . . . |
Example Query 4 again
illustrates the usefulness of removing an article from the beginning of
a movie title and putting it at the end. If The Wizard of Oz were listed with "The" first, "Tootsie" would appear at the top of the results and "The Wizard of Oz" would come second.
4 Ordering on More Than One Column
It is possible to order the results of a query on more than one column. The columns of interest are given after the keywords ORDER BY,
separated by commas. The results are ordered first by the first column
given, then by the next column given, and so on. It is possible to mix
ascending and descending order by putting one of the ASC and DESC keywords after the names of the columns. The default is ascending order.
Suppose you wanted to find
the actorID, lastName, and firstName of all actors, ordered first by
lastName and then by firstName. In other words, you wanted the actors'
names alphabetized, last name first. To keep the query manageable,
middleName and suffix are not included in the ordering, although they
would need to be included in a complete example. Example Query 5
shows this query.
Example Query 5
SELECT actorID, lastName, firstName
FROM Actor
ORDER BY lastName, firstName;
Partial output of Example Query 5 is given in the Example Query 5 Results. This output includes
some of the rows where actors have the same last name, showing that
overall, rows are ordered by lastName, and rows with the same lastName
value are ordered by firstName.
Example Query 5 Results
actorID | lastName | firstName |
---|
00069 | | Terry |
00098 | Abraham | F. |
00003 | Bacon | Kevin |
. . . | . . . | . . . |
00058 | Hamilton | Linda |
00067 | Hamilton | Margaret |
. . . | . . . | . . . |
00070 | Hoffman | Dustin |
00025 | Hoffman | Gaby |
. . . | . . . | . . . |
Notice the lastName column of
the very first row; it is null. When presenting the results in order,
the null value comes first. Null is treated as "less than" any actual
value.
5 Calculated Columns in Results
It can be convenient to have
query results showing the results of calculations on number columns in
tables. Arithmetic expressions can be built out of the names of number
columns, numeric constants, addition, subtraction, multiplication,
division, and parentheses. The symbols +, -, *, and / are used for
addition, subtraction, multiplication, and division, respectively.
Suppose you wanted to find the
movieID, title, DVDPrice, and discount-Price of movies, along with the
difference between the regular DVDPrice and the discountPrice. DVDPrice
and discountPrice are both currency columns, so the difference can be
found by subtracting the one price column from the other. Example Query 6 shows this query.
Example Query 6
SELECT movieID, title, DVDPrice, discountPrice,
DVDPrice - discountPrice
FROM Movie;
The output of Example Query 6 would start as shown in the Example Query 6 Results.
Example Query 6 Results
movieID | title | DVDPrice | discountPrice | Expr1004 |
---|
001 | Apollo 13 | $12.99 | $9.99 | $3.00 |
002 | Casablanca | $19.99 | $14.99 | $5.00 |
003 | Dirty Harry | $16.99 | $16.99 | $0.00 |
. . . | . . . | . . . | . . . | . . . |
Notice that the system makes
up an arbitrary expression name for the calculated column in the
results. This is a good place to use the keyword AS
and create a more informative column name. The next example does so and
also illustrates a more complicated arithmetic expression.
Suppose these conditions apply when buying or renting a movie:
You
estimate that if you buy a movie outright, on average you watch it four
times. Also, if you buy a movie, you buy it at the discount price.
You
estimate that the total cost to you of renting a movie is $3.00. This
may include gas money if you go to a video store, the value of the time
spent searching online if you use such a service, and so on.
Now suppose that you wanted
to find the movieID and title of a movie, along with the difference
between the average cost of viewing it if bought at the discount price
and the cost of viewing it if rented. This difference is the actual
additional cost per viewing if you add the movie to your collection.
Let the calculated column have the heading "cost per viewing if
bought." Example Query 7 shows the query you need.
Example Query 7
SELECT movieID, title, (discountPrice / 4) - 3 AS [cost per
viewing if bought]
FROM Movie;
The output of Example Query 7 would start as shown in the Example Query 7 Results.
Example Query 7 Results
movieID | title | cost per viewing if bought |
---|
001 | Apollo 13 | -0.5025 |
002 | Casablanca | 0.7475 |
003 | Dirty Harry | 1.2475 |
. . . | . . . | . . . |
It is apparent from Example Query 7 that if you bought Apollo 13, you would save money. If you bought Casablanca or Dirty Harry,
you would lose money. Of course, after seeing the results, you might
conclude that you would watch these movies more than the average number
of times, and it would pay off to buy them anyway.
6 The Date Functions YEAR, MONTH, and DAY
SQL also includes a number of
built-in functions. The date data type illustrates the kind of function
that computes or extracts a value from a column for each row included
in the query. The names of the functions that apply to dates are YEAR, MONTH, and DAY.
In a query, these keywords are followed by parentheses containing the
name of a date column. The functions extract that part of the date for
which they are named.
Suppose you wanted to find the actorID, lastName, and year of birth of all actors. Example Query 8 shows this query.
Example Query 8
SELECT actorID, lastName, YEAR(birthDate)
FROM Actor;
The output of Example Query 8 would start as shown in the Example Query 8 Results.
Example Query 8 Results
actorID | lastName | Expr1002 |
---|
00001 | Hanks | 1956 |
00002 | Paxton | 1955 |
00003 | Bacon | 1958 |
. . . | . . . | . . . |
Because the component parts
of dates have a numeric nature, it is possible to do arithmetic on
them. You can subtract dates directly, although you will get results
that you may not expect. Suppose you wanted to find the actorID,
last-Name, and the difference between the deathDate and the birthDate
of all actors where the deathDate is not null. Example Query 9 shows
this query.
Example Query 9
SELECT actorID, lastName, deathDate - birthDate
FROM Actor
WHERE deathDate IS NOT NULL;
The output of Example Query 9 would start as shown in the Example Query 9 Results.
Example Query 9 Results
actorID | lastName | Expr1002 |
---|
00008 | Bergman | 25202 |
00068 | Blandick | 29899 |
00007 | Bogart | 21175 |
. . . | . . . | . . . |
The difference between deathDate
and birthDate is an unexpectedly large number because it is expressed
in days instead of years. If you wanted to find the difference in
years, you could rewrite the query using the YEAR function. Example Query 10 shows the new query.
Example Query 10
SELECT actorID, lastName, YEAR(deathDate) - YEAR(birthDate)
FROM Actor
WHERE deathDate IS NOT NULL;
The output of Example Query 10 would start as shown in the Example Query 10 Results.
Example Query 10 Results
actorID | lastName | Expr1002 |
---|
00008 | Bergman | 69 |
00068 | Blandick | 82 |
00007 | Bogart | 58 |
. . . | . . . | . . . |
Although the results in
years are convenient for review purposes, Example Query 10 is not
entirely precise. It does not take into account whether the month and
day of a person's death fall before or after the month and day of that
individual's birth. The results shown may be off by a year. It would be
necessary to write yet another query to solve this problem.
7 The Built-In Function COUNT
The keyword COUNT
is a built-in function that makes it possible to find the number of
rows that would be retrieved by a query with a given set of conditions.
Suppose you wanted to find the number of rows in the Actor table.
Example Query 11 shows this query.
Example Query 11
SELECT COUNT(*)
FROM Actor;
The complete output of Example Query 11 is shown in the Example Query 11 Results.
Example Query 11 Results
The count is a single
number, so the results of Example Query 11 are a table consisting of
a single row and a single column. The column heading is an arbitrary
expression unless the keyword AS is used in the query to define a specific heading.
In Query 11, the wildcard
symbol * does not signify "count all rows." Rather, it signifies that
when doing the count, all columns are taken into consideration. A row
will be included in the count as long as at least one of the columns
contained in it is not null. All rows should meet this condition, and
there is no WHERE condition restricting the number of rows included in the query results, so all rows in the table are counted.
Suppose you wanted to find the number of actors whose last names start with the letter "S." Example Query 12 shows this query.
Example Query 12
SELECT COUNT(*)
FROM Actor
WHERE lastName LIKE 'S*';
The complete output of
Example Query 12 is shown in the Example Query 12 Results. Example
Query 12 counts the number of rows in the Actor table that meets the
condition following the WHERE.
Example Query 12 Results
The keyword COUNT can also be applied to one or more listed columns rather than to *. COUNT
will return a count of all rows where those columns are not null.
Suppose you'd like to find the number of actors for whom one column,
lastName, is not null. Example Query 13 shows this query.
Example Query 13
SELECT COUNT(lastName)
FROM Actor;
The complete output of Example Query 13 is shown in the Example Query 13 Results.
Example Query 13 Results
Notice that it would be redundant to include the condition WHERE lastName IS NOT NULL in Example Query 13.
8 The Built-In Functions MAX, MIN, SUM, and AVG
The MAX, MIN, SUM, and AVG
keywords refer to the mathematical concepts of maximum, minimum, sum,
and average, respectively. These keywords are the names of built-in
functions that can be applied to number columns in tables. To use one
of these functions, you put the number column of interest in
parentheses after the function name. These functions give values that
are based on the contents of more than one row in a table.
Suppose you wanted to find the average DVDPrice of all movies in the Movie table. Example Query 14 shows this query.
Example Query 14
SELECT AVG(DVDPrice)
FROM Movie;
The complete output of Example Query 14 is shown in the Example Query 14 Results.
Example Query 14 Results
Suppose you wanted to find
the sum of the discountPrice values of all movies in the Movie table.
Example Query 15 shows this query.
Example Query 15
SELECT SUM(discountPrice)
FROM Movie;
The complete output of Example Query 15 is shown in the Example Query 15 Results.
Example Query 15 Results
As you saw earlier,
calculated columns may be defined using built-in functions such as the
date functions, because those functions extract a single value for each
row in the query. However, built-in functions such as AVG, which group together values of more than one row, do not mix with arithmetic calculations in some implementations of SQL.
For example, suppose you
wanted to find the difference between the DVD-Price of each movie and
the average DVDPrice of all of the movies. Example Query 16
illustrates this idea.
Example Query 16
SELECT DVDPrice - AVG(DVDPrice)
FROM Movie;
Unfortunately, depending
on the system you are using, this query might not work. This example is
saved as Example Query 16 in this book's Microsoft Access database,
but trying to run it will generate an error message rather than results.
9 The Keywords GROUP BY
Grouping is a powerful
capability in SQL. It allows you to generate query results that
summarize table information in a way that is similar to a report.
Grouping is based on SQL's built-in functions that aggregate the
information found in tables.
Grouping has similarities to ordering in its logic. The general plan of a query using GROUP BY
is this: From a given table, the query selects a column of interest,
column X, for example. The query also selects one of the built-in
functions, COUNT, for example. At the end of the query, the keywords GROUP BY
are followed by the name of the column of interest, X. Different values
for column X may occur more than once in the table. This GROUP BY query will count how many times each different value of X occurs.
In the Movie table, each
movie appears only once; however, different movies are produced by the
same company, so the companies may appear more than once in the table.
Suppose you wanted find out how many movies each company produced. In
other words, you want to count how many times each company appears in
the Movie table. Example Query 17 shows this query.
Example Query 17
SELECT company, COUNT(*)
FROM Movie
GROUP BY company;
The output of Example
Query 17 would end as shown in the Example Query 17 Results. The
expression column contains the counts.
Example Query 17 Results
company | Expr1001 |
---|
. . . | . . . |
Steve Tisch/Wendy Finerman | 1 |
The Malpaso Company | 2 |
TriStar Pictures | 1 |
Warner Brothers | 6 |
Now suppose instead that you wanted to find the average DVDPrice grouped by company. Example Query 18 shows this query.
Example Query 18
SELECT company, AVG(DVDPrice)
FROM Movie
GROUP BY company;
The output of Example Query 18 would end as shown in the Example Query 18 Results. The
expression column contains the averages.
Example Query 18 Results
company | Expr1001 |
---|
. . . | . . . |
Steve Tisch/Wendy Finerman | $14.99 |
The Malpaso Company | $13.49 |
TriStar Pictures | $14.99 |
Warner Brothers | $22.66 |
A GROUP BY query has to contain a built-in function that aggregates values from different rows. The SELECT operation can include only what you're grouping on and the built-in function. Technically, the SELECT expression doesn't have to include the grouping column—but without it, the meaning of the results may not be clear.
10 GROUP BY and ORDER BY
GROUP BY and ORDER BY
can be used in the same query. In the previous example, the query
results happened to be ordered by company. You can change the order of
the results if you like—for example, ordering them by the value of the
results of the built-in function.
Suppose you wanted to find the
average DVDPrice grouped by company, presented in ascending order by
the average DVDPrice. Example Query 19 illustrates this.
Example Query 19
SELECT company, AVG(DVDPrice)
FROM Movie
GROUP BY company
ORDER BY AVG(DVDPrice);
The output of Example Query 19 would start as shown in the Example Query 19 Results.
Example Query 19 Results
company | Expr1001 |
---|
CarolCo Pictures | $9.99 |
Imagine Entertainment and Universal Pictures | $12.99 |
Gordon Company | $12.99 |
The Malpaso Company | $13.49 |
. . . | . . . |
11 The Keyword HAVING
In a simple query, you can limit the number of rows in the results by using the keyword WHERE, which puts a condition on one or more of the columns of the table. In a GROUP BY query, you can use the keyword HAVING to put a condition on those values of the built-in function that will be included in the results. HAVING
appears at the end of the query and can use a constant value and the
comparison operators (<, <=, =, >=, >, and <>).
Running the previous
query would have produced results that included all companies and the
average values of their DVDPrice. To save space, only partial results
were shown. Suppose, however, that you wanted to see just the results
for those companies where the average DVDPrice is less than or equal to
$14.00. Example Query 20 illustrates this idea.
Example Query 20
SELECT company, AVG(DVDPrice)
FROM Movie
GROUP BY company HAVING AVG(DVDPrice) <= 14.00
ORDER BY AVG(DVDPrice);
The complete output of Example Query 20 is shown in the Example Query 20 Results.
Example Query 20 Results
company | Expr1001 |
---|
CarolCo Pictures | $9.99 |
Gordon Company | $12.99 |
Imagine Entertainment and Universal Pictures | $12.99 |
The Malpaso Company | $13.49 |
12 Grouping by More Than One Column
Just as it's possible to order
by more than one column at a time, it's also possible to group by more
than one column at a time. The different grouping columns are included
in the SELECT expression, followed by the built-in function. The different grouping columns are also included after the keywords GROUP BY, separated by commas. Every different combination of values in the grouping columns will give a separate row in the results.
In the example database,
different movies might have the same DVDPrice. Among those movies with
the same DVDPrice, there may be different values for the discountPrice.
Suppose you wanted to find the number of movie DVDs grouped by their
DVDPrice and their discountPrice. Example Query 21 shows this query.
Example Query 21
SELECT DVDPrice, discountPrice, COUNT(*)
FROM Movie
GROUP BY DVDPrice, discountPrice;
The output of Example Query 21 would start as shown in the Example Query 21 Results.
Example Query 21 Results
DVDPrice | discountPrice | Expr1002 |
---|
$9.99 | $7.99 | 1 |
$9.99 | $9.99 | 1 |
$12.99 | $9.99 | 2 |
$14.99 | $9.99 | 7 |
$14.99 | $14.99 | 2 |
. . . | . . . | . . . |
The last two lines of
the Example Query 21 Results table illustrate the idea of multiple
groups most clearly. There are two groups of movies where the DVDPrice
is $14.99. The first group consists of seven movies with a DVDPrice of
$14.99 and a discountPrice of $9.99. The second group consists of two
movies with a DVDPrice of $14.99 and a discountPrice of $14.99.