Nested Tables
Nested tables are
perhaps one of the most confusing features of SSAS data mining. Once you
understand them, however, they will likely become an integral component
of your data mining toolset. Nested tables allow you to include
transactional data in your mining models without changing your data structure.
For example, suppose we want to include data about the types
of products customers bought in our profitability analysis. Adventure
Works sells four categories of products: bikes, clothing, accessories,
and components. What kinds of products do the high-profit customers buy?
Are bike buyers high-profit customers? What about clothing, accessory,
or component buyers? Although we could add four new columns to the case
table to flag the types of products a customer bought, what if Adventure
Works later decided to sell 10 categories of products? Or what if we
want to know whether particular models
of bikes, of which there are 50, predict profitability? Adding distinct
columns to our case table for each of these values would clearly be
inefficient, both in terms of storage for the case table and the ETL
work needed to build it. Nested tables eliminate these inefficiencies,
allowing you to leave intact the transactional data’s existing
structure.
To understand this a
little better in the context of the mining structure and models we’ve
built so far, let’s look at the create statement for the
vCustomerPurchases view shown in Listing 1.
Listing 1. A view with transaction data
CREATE VIEW [dbo].[vCustomerPurchases] AS SELECT f.SalesOrderNumber ,f.SalesOrderLineNumber ,f.CustomerKey ,t.FullDateAlternateKey ,pc.EnglishProductCategoryName AS ProductCategory ,psc.EnglishProductSubcategoryName AS ProductSubcategory ,isnull(p.[ModelName], p.[EnglishProductName]) AS [Model] ,f.OrderQuantity ,Row_Number ( ) OVER ( Partition BY f.CustomerKey ORDER BY t.FullDateAlternateKey, f.SalesOrderLineNumber) AS PurchaseSequence FROM [dbo].[FactInternetSales] AS f INNER JOIN dbo.dimProduct AS p ON f.ProductKey = p.ProductKey INNER JOIN dbo.dimProductSubCategory AS psc ON p.ProductSubCategoryKey = psc.ProductSubcategoryKey INNER JOIN dbo.dimProductCategory AS pc ON psc.ProductCategoryKey = pc.ProductCategoryKey INNER JOIN dbo.dimTime AS t ON f.OrderDateKey = t.TimeKey
|
The view is constructed with a simple SELECT
statement from the fact table with a join to a few dimension tables to
pick up product names and product categories. (We’ll discuss the
PurchaseSequence column, which uses the new T-SQL Row_Number() function, in the next section.)
Table 1
shows customer attributes along with transactional data about customer
purchases. Suppose we want to identify Melvin as an Accessory Buyer and
Jennifer as both a Bike Buyer and an Accessory Buyer in our mining
models without explicitly adding the attributes “Bike Buyer” and
“Accessory Buyer” to our customer profile data (which in our analysis
comes from vCustomerProfitability). Adding the transactional data (in
our analysis, vCustomerPurchases) as a nested table to the mining
structure will do the trick.
Table 1. Nested Data
| | | | | Customer Purchases |
---|
CustKey | Name | Profit | Gender | Income Group | Product Category | Product Name | Seq |
---|
11580 | Melvin | Low | M | Low | Accessories | Sport-100 Helmet, Blue | 1 |
11036 | Jennifer | Low | F | Moderate | Bikes | Mountain-200 Silver, 46 | 1 |
| | | | | Accessories | Sport-100 Helmet, Black | 2 |
11376 | Lance | Low | M | Moderate | Clothing | Half-Finger Gloves, M | 1 |
| | | | | Clothing | Short-Sleeve Classic Jersey, M | 2 |
11136 | Brianna | High | F | Moderate | Bikes | Mountain-200 Black, 38 | 1 |
| | | | | Accessories | Fender Set, Mountain | 2 |
| | | | | Clothing | Classic Vest, M | 3 |
To use vCustomerPurchases as nested table in our mining structure, we must first add it to our DSV:
1. | Double-click AdventureWorks DW.dsv in Solution Explorer to bring it up in the data source view designer.
|
2. | Click
the Add/Remove Objects toolbar button (first from the left),
right-click the designer window and choose Add/Remove Tables, or choose
Add/Remove Tables from the Data Source View menu. Any of these three
actions opens the Add/Remove Tables dialog box.
|
3. | Locate vCustomerPurchases in the Available Objects list, move it to the Included Objects list, and then click OK.
|
4. | To
complete our DSV changes, we must establish the relationship between
vCustomerPurchases and vCustomerProfitability. Click the CustomerKey
column in vCustomerPurchases and drag it onto the CustomerKey column in
vCustomerProfitability.
|
5. | Save your changes. The data source view designer should appear as shown in Figure 21.
|
6. | Back in the Mining Structure designer, click the Mining Structure tab.
|
7. | Locate, but do not click, the Add A Mining Structure Nested Table Column toolbar button (fourth from the left).
Note the term “nested table column.” SSAS considers the nested table to be an additional column in the mining structure—that is, it considers it to be a columntable.
that has a “data type” of
Note In keeping with the SSAS mapping of nested tables to a special “data type” of table, note that in Table 20-1
we portrayed vCustomerPurchases as just another customer attribute
(albeit a hierarchical one), along with Profit, Gender, and Income
Group. We did this because we knew we would use vCustomerPurchases as a
nested table column. |
|
8. | Click
the Add A Mining Structure Nested Table Column toolbar button. This
brings up the Select A Nested Table Key Column dialog box.
|
9. | Determining
the key of a nested table can be counterintuitive initially; your first
instinct might be to select CustomerKey, SalesOrderNumber, or a
combination of SalesOrderNumber and SalesOrderLineItem as the nested
table key. In fact, none of these is an appropriate key for the nested
table.
When you specify a column as the nested table key column, the
distinct values in that column become usable attributes in your mining
model. As such, the correct key for the nested table in this scenario is
ProductCategory, which has the possible values “Bike,” “Clothing,” and
“Accessory.”
Select vCustomerPurchases from the Nested Table list, and then
select ProductCategory from the Source Column list. Click OK.
As you will see after we modify and process the mining models,
selecting ProductCategory as the nested table key column adds attributes
such as vCustomerPurchases(Bike) and vCustomerPurchases(Clothing) to
our mining models.
Notice that the nested table has been added to the mining
structure tree view as a column of the CustomerProfitCategory structure
(appearing with the name “v Customer Purchases”). Unlike other columns
in the mining structure, it has a plus sign next to it.
|
10. | Click
the plus sign to expand the table. Now expand the columns of the nested
table. The mining structure tab will appear as shown in Figure 22.
Figure 22. A mining structure with a nested table column
|
When you add new
columns to an existing mining structure, they will by default be ignored
by existing mining models. Therefore, you must specify the usage of the
new columns in the models where you want to include them:
1. | Click
the Mining Models tab and then click the plus sign next to v Customer
Purchases. You will see a new row for the Product Category child column,
which is also ignored by all models.
|
2. | Click
the cell corresponding to the intersection of the v Customer Purchases
column (the “parent” row) and the CustomerProfitCategory_CL model.
|
3. | Click
the drop-down arrow in the cell to view the entries in the drop-down
list, and select Input. The setting for the Product Category child
column is automatically changed from Ignore to Key.
|
4. | Repeat
the process for each of the other mining models in the structure.
Afterwards, the Mining Models tab should appear as shown in Figure 23.
|
Deploy and process
your mining models. This might take several minutes. When processing is
complete, click the Mining Model Viewer tab.
Click the Cluster
Diagram tab of the CustomerProfitCategory_CL5 model (the model with five
clusters). Using the Shading Variable drop-down list (note the three
variables that correspond to product category), change the shading
variable to v Customer Purchases(Bikes). In the State drop-down list,
the two possible states of the variable are Existing and Missing. If the
Bikes key exists in the nested table, the customer is a bike buyer; if
the Bikes key is missing, the customer is not a bike buyer. Select
Existing.
Take note of Cluster 1
and Cluster 5, which have the highest concentration of bike buyers.
Change the shading variable to Is New Customer, and select the Y state.
Cluster 5 has more new customers than Cluster 1. Change the shading
variable to ProfitCategory, select the High state, and note that the two
clusters with the highest concentration of bike buyers also have the
highest concentration of high-profit customers.
Figure 24
compares the five clusters on the Cluster Profiles tab. When the key of
a nested table column is included in a mining model, the distinct
values found in the key column form the basis of existence attributes
that are used to train the mining model. The value of the key column
represented by the existence attribute is shown in parentheses after the
nested table column name. Cluster 1 and Cluster 5 have a markedly
higher proportion of bike buyers than do the other clusters. The
proportion of accessory and clothing buyers is similar across all
clusters.
Use
the mining model viewers to explore other mining models. Take note of
the input attributes added to the models as a result of adding the
nested table to the models. Close the project and exit Visual Studio.