DATABASE

Programming Microsoft SQL Server 2005: Using the Data Mining Wizard and Data Mining Designer (part 2) - Creating a Mining Model

1/14/2011 5:31:14 PM

Creating a Mining Model

To create the initial mining model, follow these steps:

1.
Right-click the Mining Structures node in Solution Explorer and choose New Mining Structure. This starts the Mining Structure Wizard.

2.
On the Welcome page, click Next.

3.
On the Select the Definition Method page, select From Existing Relational Database or Data Warehouse and then click Next.

4.
On the Select the Data Mining Technique page, select a data mining technique for your initial mining model, which you must configure when you create a mining structure. After the mining structure is created, you can add additional mining models to that structure. Select Microsoft Clustering from the drop-down list and then click Next.

5.
On the Select Data Source View page, select the DSV that you created earlier and then click Next.

6.
On the Specify Table Types page, you will see vCustomerProfitability from your DSV along with check boxes for specifying the table type. Select the Case check box next to vCustomerProfitability. Click Next.

Tables for Mining Models

SSAS has two types of tables for use in mining models: case tables and nested tables. Every mining structure must have a case table. Simply put, the case table defines the entities to be analyzed for your mining model. In the customer profitability analysis, the basic unit of analysis is the customer; therefore, we will define vCustomerProfitability, which has one row per customer, as the case table. Nested tables are used to provide additional detail about your cases. Not every mining structure will have a nested table. For now, our mining structure will have only a case table.


7.
On the Specify The Training Data page, you will see a list of all the columns in your case table. To the right of each column is a set of three check boxes for specifying whether each column is a Key, Input, or Predictable column in your model, as shown in Figure 2.

Figure 2. The Data Mining Wizard’s Specify The Training Data page


The Key column uniquely identifies each record in your case table. Input columns form the basis on which data pattern discoveries are made, whether they be descriptive or predictive patterns. For example, the clustering algorithm creates clusters based on the values of the input columns. The Decision Trees algorithm decides tree splits based on how well the values of an input column predict a particular outcome.

Predictive and Descriptive Mining Models

From a technical standpoint, marking a column as predictable means that it can be selected from the model in a DMX prediction query. This is true of the predictable columns in any mining model, even those that do not have prediction as a goal. This definition doesn’t really convey what predictable columns are for. For a common-sense definition, we need to distinguish between predictive and descriptive mining models.

Predictive mining models are designed to forecast (predict) the value of their predictable columns. For descriptive mining models, predicted columns have a technical usage, and that usage varies. In a clustering model, input columns are used to determine the clusters. Selecting only the Predict check box for a column means that the column will not be used to determine the clusters; however, including the column as a predictable column in the model allows you to view its distribution within each cluster and compare its distribution across clusters after the model has been processed.


8.
Back on the Specify the Training Data page, select CustomerKey as the Key column of your case table and select the Input check box for the following columns: Age, CommuteDistance, EnglishEducation, Gender, HasKidsAtHome, IncomeGroup, IsCarOwner, IsHomeOwner, IsNewCustomer, MaritalStatus, NumProdGroup, RecencyGroup, and Region.

9.
Select the Predict check box for the ProfitCategory column, and then click Next.

10.
On the Specify Columns’ Data Type and Content page, click Detect.

This forces SSAS to determine whether your input and predictable columns are discrete (categorical) or continuous. You can also modify the content and data type by clicking in the cell and selecting the type from the drop-down list that appears, as shown in Figure 3.

Figure 3. Content and data types of columns are specified at the mining structure level in the Data Mining Wizard.


11.
Verify that the content type is continuous for the Age column, key for the CustomerKey column, and discrete for all other columns. Click Next.

12.
Name the mining structure CustomerProfitCategory and the mining model CustomerProfitCategory_CL and then click Finish.

This brings you to the Mining Structure tab of the data mining structure designer, as shown in Figure 4.

                                       Figure 4. The data mining structure designer’s Mining Structure tab shows the columns in the mining structure.

The left pane of the tab contains a treeview display of our mining structure. From here, you can add and delete columns and nested tables by using the tab’s toolbar or by right-clicking anywhere in the treeview pane (on the left) and selecting options from the context menu. The Data Source View pane (on the right) allows you to explore and edit the DSV.
Other  
  •  Microsoft SQL Server 2005 : Report Management
  •  Microsoft SQL Server 2005 : Report Definition and Design (part 5) - Report Builder
  •  Defensive Database Programming with SQL Server : Using TRY...CATCH blocks to Handle Errors
  •  Microsoft SQL Server 2005 : Report Definition and Design (part 4)
  •  Microsoft SQL Server 2005 : Report Definition and Design (part 3)
  •  Microsoft SQL Server 2005 : Report Definition and Design (part 2) - Business Intelligence Development Studio
  •  The SQL Programming Language : Creating Tables and Entering Data
  •  Microsoft SQL Server 2005 : Report Definition and Design (part 1) - Data Sources
  •  The SQL Programming Language : Complex Queries and Join Queries (part 4)
  •  The SQL Programming Language : Complex Queries and Join Queries (part 3)
  •  The SQL Programming Language : Complex Queries and Join Queries (part 2)
  •  The SQL Programming Language : Complex Queries and Join Queries (part 1)
  •  SQL Server 2008 : Implementing Objects - Viewing and Modifying Data
  •  SQL Server 2008 : Implementing Objects - Understanding DDL and DML Language Elements
  •  SQL Server : Reporting Services - Report Access and Delivery
  •  SQL Server : Reporting Services - Report Server Architecture
  •  SQL Programming Language : Ordering, Calculating, and Grouping in Queries
  •  Automating Blind SQL Injection Exploitation
  •  Blind SQL Injection Exploitation : Using Alternative Channels
  •  Blind SQL Injection Exploitation : Using Response-Based Techniques
  •  
    Most View
    SQL Server 2008 : Transact-SQL Programming - Common Table Expressions
    EA4500 – Cheap Router With Wavebands
    Windows Server 2008 R2 networking : Overview of Windows Server 2008 R2 Networking
    Palicomp Alpha Inferno PC - Super-Fast PC
    Surviving Changes to the Definition of a Primary or Unique Key
    AOC MyStage E2343Fi 23" LED Monitor
    MySQL Server Monitoring (part 1) - SQL Commands
    PC Doctor 7 - Free Tune-Up And Security Tool
    WAP and Mobile HTML Security : Application Attacks on Mobile HTML Sites
    Sennheiser PC 323D G4ME – Reasonable Headphone
    Top 10
    Windows Phone 8 In-Depth Review (Part 6)
    Windows Phone 8 In-Depth Review (Part 5)
    Windows Phone 8 In-Depth Review (Part 4)
    Windows Phone 8 In-Depth Review (Part 3)
    Windows Phone 8 In-Depth Review (Part 2)
    Windows Phone 8 In-Depth Review (Part 1)
    Xiaomi Phone 2 - High-End Specifications In A Surprisingly Cheap Package (Part 5)
    Xiaomi Phone 2 - High-End Specifications In A Surprisingly Cheap Package (Part 4)
    Xiaomi Phone 2 - High-End Specifications In A Surprisingly Cheap Package (Part 3)
    Xiaomi Phone 2 - High-End Specifications In A Surprisingly Cheap Package (Part 2)