Partitioned views are used to access data that has
been horizontally split, or partitioned, across multiple tables. These
tables can be in the same or different databases—or even spread across
multiple servers. Partitioning of tables is done to spread the I/O and
processing load of large tables across multiple disks or servers.
You combine the tables in a partitioned view by using a UNION ALL
statement that causes the data from the separate tables to appear as if
they were one table. These separate tables are referred to as member tables or base tables. The member tables in a SELECT statement of the view must all be structured in the same way, and the view must adhere to the following restrictions:
All the columns from the member tables should be included in the view definition.
Columns with the same ordinal position in the SELECT list should have the same data type.
The same column cannot be used multiple times in the SELECT list.
A
partitioning column that segments the data must be identified and needs
to have the same ordinal position across all the member table SELECT statements.
The partitioning column cannot be a computed column, an identity, a default, or a time stamp.
The data values in the partitioning column cannot overlap in the underlying tables.
The partitioning column must be part of the primary key of the member table.
The member tables in the partitioned view need a CHECK constraint on the partitioning column.
A table can appear only once as part of the UNION ALL statement.
The member tables cannot have indexes created on computed columns in the table.
The number of columns in the member table primary key constraints should be the same.
All member tables should have the same ANSI PADDING setting when created.
The list of restrictions
for creating partitioned views is extensive, but the creation of a
partitioned view is relatively straightforward and intuitive. Consider,
for example, the Sales.SalesOrderHeader table in the Adventureworks2008
database. This table is relatively small, but it is the type of table
that could have a large number of rows and experience heavy utilization.
To balance the workload against this table, you could use a partitioned
view that utilizes base tables that each contain a separate year’s
data. Listing 1 shows the CREATE TABLE
statements to create the base tables for each year. The yearly tables
are intended to hold summarized daily numbers, and each contains only a
subset of the columns in the Sales.SalesOrderHeader table.
Listing 1. Creating the Base Tables for a Partitioned View
CREATE TABLE Sales.Sales_2001 ( OrderDay datetime NOT NULL CHECK (OrderDay BETWEEN '20010101' AND '20011231'), SubTotal money NOT NULL , TaxAmt money not null, Freight money not null, CONSTRAINT PK_Sales_2001_OrderDay PRIMARY KEY CLUSTERED (OrderDay ASC) )
CREATE TABLE Sales.Sales_2002 ( OrderDay datetime NOT NULL, CHECK (OrderDay BETWEEN '20020101' AND '20021231'), SubTotal money NOT NULL , TaxAmt money not null, Freight money not null, CONSTRAINT PK_Sales_2002_OrderDay PRIMARY KEY CLUSTERED (OrderDay ASC) )
CREATE TABLE Sales.Sales_2003 ( OrderDay datetime NOT NULL CHECK (OrderDay BETWEEN '20030101' AND '20031231'), SubTotal money NOT NULL , TaxAmt money not null, Freight money not null, CONSTRAINT PK_Sales_2003_OrderDay PRIMARY KEY CLUSTERED (OrderDay ASC) )
CREATE TABLE Sales.Sales_2004 ( OrderDay datetime NOT NULL CHECK (OrderDay BETWEEN '20040101' AND '20041231'), SubTotal money NOT NULL , TaxAmt money not null, Freight money not null, CONSTRAINT PK_Sales_2004_OrderDay PRIMARY KEY CLUSTERED (OrderDay ASC) )
|
Notice that each table has a primary key on OrderDay, the partitioning column. Also notice that a CHECK constraint is defined for each table; it ensures that only orders for the given year can be stored in the table.
To demonstrate the power of a partitioned view, it is best to populate the base tables that will be used by the view. Listing 2 contains a series of INSERT statements that select from the Sales.SalesOrderHeader table and populate the base tables. The SELECT statements summarize several key columns by day and contain a WHERE clause that limits the result to orders for the respective years.
Listing 2. Populating the Base Tables for a Partitioned View
INSERT Sales.Sales_2001 SELECT CONVERT(VARCHAR(8),OrderDate,112), SUM(SubTotal), SUM(TaxAmt), SUM(Freight) FROM Sales.SalesOrderHeader WHERE OrderDate between '20010101' AND '20011231' GROUP BY CONVERT(VARCHAR(8),OrderDate,112) INSERT Sales.Sales_2002 SELECT CONVERT(VARCHAR(8),OrderDate,112), SUM(SubTotal), SUM(TaxAmt), SUM(Freight) FROM Sales.SalesOrderHeader WHERE OrderDate between '20020102' AND '20021231' GROUP BY CONVERT(VARCHAR(8),OrderDate,112)
INSERT Sales.Sales_2003 SELECT CONVERT(VARCHAR(8),OrderDate,112), SUM(SubTotal), SUM(TaxAmt), SUM(Freight) FROM Sales.SalesOrderHeader WHERE OrderDate between '20030101' AND '20031231' GROUP BY CONVERT(VARCHAR(8),OrderDate,112)
INSERT Sales.Sales_2004 SELECT CONVERT(VARCHAR(8),OrderDate,112), SUM(SubTotal), SUM(TaxAmt), SUM(Freight) FROM Sales.SalesOrderHeader WHERE OrderDate between '20040102' AND '20041231' GROUP BY CONVERT(VARCHAR(8),OrderDate,112)
|
Now that you have the
populated base table, you can create a partitioned view and ensure that
the view is selecting only from the base tables that it needs.
Two types of partitioned views
are discussed in this article: local and distributed. A local
partitioned view utilizes base tables found on the same server. A
distributed partitioned view contains at least one base table that
resides on a different (remote) server. The focus in the section is on
local partitioned views. The T-SQL for creating a local partitioned view
named Sales.vw_Sales_Daily is shown in Listing 3.
Listing 3. Creating a Local Partitioned View
Create View Sales.vw_Sales_Daily as SELECT * FROM Sales.Sales_2001 UNION ALL SELECT * FROM Sales.Sales_2002 UNION ALL SELECT * FROM Sales.Sales_2003 UNION ALL SELECT * FROM Sales.Sales_2004
|
The best way to validate that a partitioned view is working properly is to run a conditional SELECT
against the view and display the execution plan. If the partitioned
view is functioning properly, it should be accessing only the base
tables it needs to satisfy the SELECT and should not access all the tables in the view unless it needs to. The following example shows a sample SELECT against the new partitioned view:
SELECT * FROM Sales.vw_Sales_Daily
WHERE OrderDay > '20040701'
and SubTotal > 2000
If you execute this statement and review the actual execution plan, you see that an index seek is performed against the Sales.Sales_2004 table. This is the correct result, given that the SELECT statement is targeting order data from 2004.
Modifying Data Through a Partitioned View
You can modify data
via a partitioned view if the SQL statement performing the modification
meets certain conditions, as described here:
All columns in the partitioned view must be specified in the INSERT statement. Columns that include a DEFAULT constraint or allow nulls are also subject to this requirement.
The DEFAULT keyword cannot be used on inserts to partitioned views or on updates to partitioned views.
UPDATE statements cannot modify PRIMARY KEY columns if the member tables have text, ntext, or image columns.
Inserts and updates to a partitioned view are not allowed if the view contains a time stamp.
Identity columns in a partitioned view cannot be modified by an INSERT or UPDATE statement.
INSERT, UPDATE, and DELETE
statements are not allowed against a partitioned view if there is a
self-join with the same view or with any of the member tables in the
statement.
Note
Data can be modified through partitioned views only in the Enterprise and Developer Editions of SQL Server 2008.
In
addition to the conditions shown in this list, you must also satisfy
any restrictions that apply to the member tables. Check constraints,
foreign key constraints, and any other table-level restrictions must be
accounted for in the modification statement. The user executing the
modification against the partitioned view must have the appropriate INSERT, UPDATE, or DELETE permissions on the member tables for the update to succeed.
Distributed Partitioned Views
Microsoft provides distributed
partitioned views (DPVs) as a primary means to scale out a database
server. Scalability allows an application or a database to utilize
additional resources, which allows it to perform more work. There are
two kinds of scalability: scaleup and scaleout. A scaleup solution
focuses on a single server scaled to provide more processing power than
its predecessor. An example of scaleup would be migrating from a server
with a single dual-core processor to a machine with 4-quad-core
processor. Scaleout solutions include the addition of servers to augment
the overall processing power.
DPVs are similar to local
partitioned views, but they utilize one or more tables located on a
remote server. The placement of partitioned data on remote servers
allows the processing power of more than one server to be utilized. The
partitioning is intended to be transparent to the application and allow
for additional partitions and servers as the application’s needs scale.
The following list outlines the basic requirements for creating a DPV:
A linked server
definition is added to each member server that will contain the
partitioned data. The linked server contains the connection information
required to run distributed queries on another member server.
The lazy schema validation option is set to true on each of the member servers, using sp_serveroption.
This option is set for performance reasons and allows the query
processor to skip schema checking of remote tables if the query can be
satisfied on a single member server.
A
DPV is created on each member server. This DPV references the local
tables in addition to the tables found on the other member servers.
Listing 3
shows SQL commands that can be used to satisfy the requirements in the
preceding list. The DPV created in the last portion of the script is
similar to the local partitioned view created in the previous section.
The key difference in this DPV example is the inclusion of a distributed
query that retrieves records for Sales.Sales_2002 from a remote server. The remote server in this example is named DbSvrXP.
Listing 3. Creating a Distributed Partitioned View
Exec sp_addlinkedserver @server='dbsvrxp', @srvproduct='', @provider='MSDASQL', @provstr='DRIVER={SQL Server}; SERVER=dbsvrxp;UID=linklogin;PWD=pw;Initial Catalog=Adventureworks2008' —Set the server option for improved DPV performance exec sp_serveroption dbsvrxp, 'lazy schema validation', true
Create View Sales.vw_Sales_Daily as SELECT * FROM Sales.Sales_2001 UNION ALL SELECT * FROM dbsvrxp.Adventureworks2008.Sales.Sales_2002 UNION ALL SELECT * FROM Sales.Sales_2003 UNION ALL SELECT * FROM Sales.Sales_2004
|
The DPV created in Listing 3 contains only one remote table. The example could be further expanded to have each table in the UNION clause on a different remote server. Keep in mind that the DPV CREATE
statement needs to be adjusted when run on the remote server(s). The
tables that are local on one server are now remote on the other server,
and those that are remote can now be local.
If the DPVs are properly
defined, SQL Server 2008 attempts to optimize their performance by
minimizing the amount of data transferred between member servers. The
query processor retrieves the CHECK
constraint definitions from each member table. This allows the query
processor to map the specified search arguments to the appropriate
table(s). The query execution plan then accesses only the necessary
tables and retrieves only the remote rows needed to complete the SQL
statement.
Data can be modified through a
DPV as well. Updatable DPVs, which were introduced in SQL Server 2000,
are still available in SQL Server 2008. Data modifications are performed
against a view, allowing true transparency. The view is accessed as if
it were a base table, and the user or application is unaware of the
actual location of the data. If it is configured properly, SQL Server
determines via the WHERE clause
specified in the update query which partition defined in the view must
be updated rather than updating all tables in the join.
Note
Data can be modified through distributed partitioned views only in the Enterprise and Developer Editions of SQL Server 2008.