Creating a Mining Model
To create the initial mining model, follow these steps:
1. | Right-click
the Mining Structures node in Solution Explorer and choose New Mining
Structure. This starts the Mining Structure Wizard.
| 2. | On the Welcome page, click Next.
| 3. | On the Select the Definition Method page, select From Existing Relational Database or Data Warehouse and then click Next.
| 4. | On
the Select the Data Mining Technique page, select a data mining
technique for your initial mining model, which you must configure when
you create a mining structure. After the mining structure is created,
you can add additional mining models to that structure. Select Microsoft
Clustering from the drop-down list and then click Next.
| 5. | On the Select Data Source View page, select the DSV that you created earlier and then click Next.
| 6. | On
the Specify Table Types page, you will see vCustomerProfitability from
your DSV along with check boxes for specifying the table type. Select
the Case check box next to vCustomerProfitability. Click Next.
SSAS has two types of
tables for use in mining models: case tables and nested tables. Every
mining structure must have a case table. Simply put, the case table
defines the entities to be analyzed for your mining model. In the
customer profitability analysis, the basic unit of analysis is the
customer; therefore, we will define vCustomerProfitability, which has
one row per customer, as the case table. Nested tables are used to
provide additional detail about your cases. Not every mining structure
will have a nested table. For now, our mining structure will have only a
case table.
|
| 7. | On
the Specify The Training Data page, you will see a list of all the
columns in your case table. To the right of each column is a set of
three check boxes for specifying whether each column is a Key, Input, or
Predictable column in your model, as shown in Figure 2.
The Key column uniquely identifies each record in your case
table. Input columns form the basis on which data pattern discoveries
are made, whether they be descriptive or predictive
patterns. For example, the clustering algorithm creates clusters based
on the values of the input columns. The Decision Trees algorithm decides
tree splits based on how well the values of an input column predict a
particular outcome.
From a technical
standpoint, marking a column as predictable means that it can be
selected from the model in a DMX prediction query. This is true
of the predictable columns in any mining model, even those that do not
have prediction as a goal. This definition doesn’t really convey what
predictable columns are for. For a common-sense definition, we need to
distinguish between predictive and descriptive mining models.
Predictive mining
models are designed to forecast (predict) the value of their predictable
columns. For descriptive mining models, predicted columns have a
technical usage, and that usage varies. In a clustering model, input
columns are used to determine the clusters. Selecting only the Predict
check box for a column means that the column will not be used to
determine the clusters; however, including the column as a predictable
column in the model allows you to view its distribution within each
cluster and compare its distribution across clusters after the model has
been processed.
|
| 8. | Back
on the Specify the Training Data page, select CustomerKey as the Key
column of your case table and select the Input check box for the
following columns: Age, CommuteDistance, EnglishEducation, Gender,
HasKidsAtHome, IncomeGroup, IsCarOwner, IsHomeOwner, IsNewCustomer,
MaritalStatus, NumProdGroup, RecencyGroup, and Region.
| 9. | Select the Predict check box for the ProfitCategory column, and then click Next.
| 10. | On the Specify Columns’ Data Type and Content page, click Detect.
This forces SSAS to determine whether your input and predictable
columns are discrete (categorical) or continuous. You can also modify
the content and data type by clicking in the cell and selecting the type
from the drop-down list that appears, as shown in Figure 3.
| 11. | Verify
that the content type is continuous for the Age column, key for the
CustomerKey column, and discrete for all other columns. Click Next.
| 12. | Name the mining structure CustomerProfitCategory and the mining model CustomerProfitCategory_CL and then click Finish.
This brings you to the Mining Structure tab of the data mining structure designer, as shown in Figure 4.
Figure 4. The data mining structure designer’s Mining Structure tab shows the columns in the mining structure.
The left pane of
the tab contains a treeview display of our mining structure. From here,
you can add and delete columns and nested tables by using the tab’s
toolbar or by right-clicking anywhere in the treeview pane (on the left)
and selecting options from the context menu. The Data Source View pane
(on the right) allows you to explore and edit the DSV. |
|