SQL Server 2008 : T-SQL Tips and Tricks (part 3) - Generating T-SQL Statements with T-SQL & De-Duping Data with Ranking Functions

5/17/2011 4:08:23 PM

Generating T-SQL Statements with T-SQL

The system catalogs in SQL Server 2008 contain a wealth of information you can use to save a lot of time and effort when generating SQL statements that need to be run repeatedly on a large number of database objects or when trying to build a column list for a query. You can use T-SQL code to select information from the system catalogs, system tables, and system views to generate SELECT statements and the like. For example, say you want to grant EXECUTE permission to the user fred on each stored procedure in the bigpubs2008 database. This can be a tedious task to perform using SSMS because in the Securables dialog for the user fred, you have to select each procedure, one at a time, and click the Grant Execute check box. If there were a large number of procedures, this could be pretty time-consuming, and your mouse-clicking finger would probably get pretty tired.

The quicker and easier way would be to build the SQL statements necessary to grant EXECUTE permission to fred on all the stored procedures. The following SELECT statement can be used to generate a SQL script with those commands:

select 'grant execute on ' +  name + ' to fred'
   from sys.procedures
   order by name
go

--------------------------------------------------------
grant execute on byroyalty to fred
grant execute on cursor_proc to fred
grant execute on error_handler to fred
grant execute on find_books_by_type2 to fred
grant execute on gen_sequence to fred
grant execute on get_authors to fred
grant execute on get_next_item_from_queue to fred
grant execute on get_titles to fred
grant execute on p_explicit_cols to fred
grant execute on p_fetch_explicit_cols to fred
grant execute on p_insert_explicit_cols to fred
grant execute on reptq1 to fred
grant execute on reptq2 to fred
grant execute on reptq3 to fred
grant execute on SHOW_PARTS_LIST to fred
grant execute on title_authors to fred
grant execute on trantest to fred
grant execute on ytd_sales to fred
grant execute on ytd_sales2 to fred

You can copy and paste the output from this statement into a query window in SSMS and execute it to grant the desired permissions. When you get to know your system catalog views, you can begin to automate the generation of a number of SQL operations in this manner, freeing up your time to spend on more interesting projects.

Working with @@ERROR and @@ROWCOUNT

When you are writing T-SQL code that needs to check for both errors and the number of rows affected after your SQL statements, one of the common pitfalls is trying to get both the error status and the number of rows after a SQL statement runs. You have to remember that all SQL statements except the DECLARE statement reset the value of @@ROWCOUNT and @@ERROR to the status of the last command executed.

If after a SQL statement you check the value of @@ERROR, the statement used to check @@ERROR resets @@ROWCOUNT. If you check @@ROWCOUNT first, it resets the value of @@ERROR. To check both values, you need to use an assignment SELECT immediately after the SQL statement you are checking and capture both values into local variables. Note that you cannot accomplish this with the SET statement because the SET statement allows setting a value to only a single variable at a time.

The example in Listing 14 provides a way to capture and check both @@ROWCOUNT and @@ERROR after an UPDATE statement in a T-SQL batch.

Listing 14. Capturing Both @@ROWCOUNT and @@ERROR After an UPDATE Statement

declare @rowcnt int,
        @error int
UPDATE dbo.titles set price = price * 1.10
   where type = 'fiction'
select @rowcnt = @@ROWCOUNT, @error = @@ERROR
if @rowcnt = 0
    print 'no rows updated'
if @error <> 0
    raiserror ('Update of titles failed', 16, 1)
return

De-Duping Data with Ranking Functions

One common problem encountered with imported data is unexpected duplicate data rows, especially if the data is being consolidated from multiple sources. In previous versions of SQL Server, de-duping the data often involved the use of cursors and temp tables. Since the introduction of the ROW_NUMBER ranking function and common table expressions in SQL Server 2005, you are able to de-dupe data with a single statement.

To demonstrate this approach, Listing 14 shows how to create an authors_import table and populate it with some duplicate rows.

Listing 15. Script to Create and Populate the authors_import Table

USE bigpubs2008
GO
CREATE TABLE dbo.authors_import(
    au_id dbo.id NOT NULL,
    au_lname varchar(30)  NOT NULL,
    au_fname varchar(20)  NOT NULL)
go

INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('681-61-9588', 'Ahlberg', 'Allan')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('739-35-5165', 'Ahlberg', 'Janet')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('499-84-5672', 'Alexander', 'Lloyd')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('499-84-5672', 'Alexander', 'Lloyd')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('432-31-3829', 'Bate', 'W. Jackson')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('432-31-3829', 'Bate', 'W. Jackson')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('432-31-3829', 'Bate', 'W. Jackson')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('437-99-3329', 'Bauer', 'Caroline Feller')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('378-33-9373', 'Benchley', 'Nathaniel')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('378-33-9373', 'Benchley', 'Nate')
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
  VALUES('409-56-7008', 'Bennet', 'Abraham')
GO

You can see in the data for Listing 15 that there are two duplicates for au_id 499-84-5672 and three for au_id 432-31-3829. To start identifying the duplicates, you can write a query using the ROW_NUMBER() function to generate a unique row ID for each data row, as shown in Listing 16.

Listing 16. Using the ROW_NUMBER() Function to Generate Unique Row IDs

SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *
   FROM dbo.authors_import
go

ROWID                au_id       au_lname                       au_fname
-------------------- ----------- ------------------------------ ----------------
1                    378-33-9373 Benchley                       Nate
2                    378-33-9373 Benchley                       Nathaniel
3                    409-56-7008 Bennet                         Abraham
4                    432-31-3829 Bate                           W. Jackson
5                    432-31-3829 Bate                           W. Jackson
6                    432-31-3829 Bate                           W. Jackson
7                    437-99-3329 Bauer                          Caroline Feller
8                    499-84-5672 Alexander                      Lloyd
9                    499-84-5672 Alexander                      Lloyd
10                   681-61-9588 Ahlberg                        Allan
11                   739-35-5165 Ahlberg                        Janet

Now you can use the query shown in Listing 16 to build a common table expression to find the duplicate rows. In this case, we keep the first row found. To make sure it works correctly, write the query first as a SELECT statement to verify that it is identifying the correct rows, as shown in Listing 17.

Listing 17. Using a Common Table Expression to Identify Duplicate Rows

WITH authors_import AS

(SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *
   FROM dbo.authors_import)
select * FROM authors_import WHERE ROWID NOT IN
(SELECT MIN(ROWID) FROM authors_import
   GROUP BY au_id,au_fname, au_lname);
GO

ROWID                au_id       au_lname                       au_fname
-------------------- ----------- ------------------------------ -----------------
5                    432-31-3829 Bate                           W. Jackson
6                    432-31-3829 Bate                           W. Jackson
9                    499-84-5672 Alexander                      Lloyd

Now you simply change the final SELECT statement in Listing 17 into a DELETE statement, and it removes the duplicate rows from authors_import:

WITH authors_import AS
(SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *
   FROM dbo.authors_import)
delete FROM authors_import WHERE ROWID NOT IN
(SELECT MIN(ROWID) FROM authors_import
   GROUP BY au_id,au_fname, au_lname);
GO
select * from authors_import
go

au_id       au_lname                       au_fname
----------- ------------------------------ --------------------
681-61-9588 Ahlberg                        Allan
739-35-5165 Ahlberg                        Janet
499-84-5672 Alexander                      Lloyd
432-31-3829 Bate                           W. Jackson
437-99-3329 Bauer                          Caroline Feller
378-33-9373 Benchley                       Nathaniel
378-33-9373 Benchley                       Nate
409-56-7008 Bennet                         Abraham

If you want to retain the last duplicate record and delete the previous ones, you can replace the MIN function with the MAX function in the DELETE statement.

Notice that the uniqueness of the duplication is determined by the columns specified in the GROUP BY clause of the subquery. Notice that there are still two records for au_id 378-33-9373 remaining in the final record set. The duplicates removed were based on au_id, au_lname, and au_fname. Because the first name is different for each of the two instances of au_id 378-33-9373, both Nathaniel Benchley and Nate Benchley remain in the authors_import table. If you remove au_fname from the GROUP BY clause, the earlier record for Nathaniel Benchley would remain, and Nate Benchley would be removed. However, this result may or may not be desirable. You would probably want to resolve the disparity between Nathaniel and Nate and confirm manually that they are duplicate rows before deleting them. Running the query in Listing 15 with au_fname removed from the GROUP BY clause helps you better determine what your final record set would look like.

SQL Server 2008 : T-SQL Tips and Tricks (part 2) - Using CONTEXT_INFO & Working with Outer Joins

SQL Server 2008 : T-SQL Tips and Tricks (part 1) - Date Calculations & Sorting Results with the GROUPING Function

Other

SQL Server 2008 : General T-SQL Performance Recommendations

SQL Server 2008 : General T-SQL Coding Recommendations (part 2) - Avoid SQL Injection Attacks When Using Dynamic SQL & Comment Your T-SQL Code

SQL Server 2008 : General T-SQL Coding Recommendations (part 1) - Provide Explicit Column Lists & Qualify Object Names with a Schema Name

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Using Extended Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Installing and Using .NET CLR Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Using Dynamic SQL in Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Stored Procedure Performance

SQL Server 2008 : Using Remote Stored Procedures

SQL Server 2008 : Using Temporary Tables in Stored Procedures

SQL Server 2008 : Advanced Stored Procedure Programming and Optimization - Nested Stored Procedures