DATABASE

Programming Microsoft SQL Server 2005: Using the Data Mining Wizard and Data Mining Designer (part 3) - Editing and Adding Mining Models

1/14/2011 5:33:37 PM

Editing and Adding Mining Models

With your mining structure and first mining model created, you are ready to create additional models within the structure. On the Mining Models tab of the data mining structure designer, you can view the definition of your mining model, as shown in Figure 5. You will see the columns in your mining structure configured according to the choices you previously made for how those columns should be used in your clustering model.

Figure 5. On the Mining Model tab, you can see how the mining structure columns are used in each mining model.

Editing a Mining Model

You can also edit the definition of your mining model from the Mining Model tab. Profit Category is specified as a PredictOnly column in the mining model. We want to change the usage type from PredictOnly to Predict. To edit column usage, take these steps:

1.
On the Mining Models tab of the data mining structure designer, click the cell next to Profit Category.

2.
In the drop-down list that appears, change the column usage to Predict, as shown in Figure 6.

Figure 6. Clicking a cell next to a column to change its usage in a mining model

Changing the column usage from PredictOnly to Predict allows customer profit category to be a factor in how the clusters are determined, and not just a descriptive characteristic analyzed after the fact.

Adding a Mining Model

By default, the Microsoft Clustering algorithm creates 10 clusters from your training data. As a thoughtful data analyst, you might be concerned that users will have trouble grasping the nuances of the differences in so many clusters. You might therefore want to create a second clustering model that segments your customers into only five clusters. To do this, follow these steps:

1.
Right-click anywhere on the Mining Models tab and then choose New Mining Model.

2.
In the New Mining Model dialog box, type CustomerProfitCategory_CL5 in the Model Name text box and then select Microsoft Clustering from the Algorithm Name drop-down list. Click OK. A new mining model is added to the right of the existing model.

3.
To change the number of clusters created, from 10 to 5, right-click the CustomerProfit Category_CL5 column header or any of its column cells and then choose Set Algorithm Parameters.

4.
In the Algorithm Parameters dialog box that appears (Figure 7), you can set algorithm parameters that are specific to the selected mining model.

Figure 7. The Algorithm Parameters dialog box for a Microsoft Clustering model


This can be a bit perplexing at first because the designers abstract much of the complexity of data mining through their UI and wizards. Meanwhile, the dialog box cracks open the black box and allows you to fine-tune your model. Click the various parameters in the parameter list and review the descriptions in the Description area below.

5.
When you’re done exploring the parameters, click the cell at the intersection of the CLUSTER_COUNT row and the Value column, type 5, and then click OK.

Adding a Model That Uses a Different Algorithm

We now have two clustering models in our mining structure. To validate the accuracy of one model in a mining structure, it is often useful to create an additional model using a different algorithm. We’ll add a new model using the Decision Trees algorithm. This can be done with surprisingly little effort:

1.
Right-click anywhere on the Mining Models tab and choose New Mining Model.

2.
In the New Mining Model dialog box, name your model CustomerProfitCategory_DT, select the Decision Trees algorithm (it should be selected by default), and then click OK.

The CustomerProfitCategory_DT model appears to the right of the existing clustering models with the same columns and usage.

Changing Column Usage

Adding the model is helpful, but it would be even better if we could use it to predict the number of products people buy (in addition to predicting profit category). To change the column usage for NumProductGroup in the CustomerProfitCategory_DT mining model, click the cell corresponding to the NumProductGroup under CustomerProfitCategory_DT and select Predict from the drop-down list. By setting the content type of both the ProfitCategory and the NumProdGroup variables to Predict, we are allowing each to be an input in the decision tree of the other. If we had instead set the content type of the NumProdGroup to PredictOnly, the number of products purchased would not be a factor in creating the ProfitCategory decision tree—that is, the number of products purchased would not be considered in the splits of the ProfitCategory tree.

More Info

Splits are the branches at each node of a decision tree. Each node of a decision tree represents a subset of the population. This subset is characterized by the parentage of the node. Splits at each node are determined by identifying the input characteristic by which the distribution of the predicted variable differs most for the subset defined at that node. When the distribution of the predicted variable at a node does not vary significantly by any input characteristic, there are no further splits and the branch ends. A node from which there are no splits is referred to as a leaf.


Mining Models and Data Types

Let’s continue building our mining structure by adding another model, this time using the Naïve Bayes algorithm:

1.
Right-click the CustomerProfitCategory_CL model, and choose New Mining Model.

2.
Name your model CustomerProfitCategory_NB, select the Naïve Bayes algorithm, and click OK.

A message box will appear explaining that the Age column will be ignored because the Naïve Bayes algorithm does not support working with continuous columns. Click Yes. Notice that the content type for the Age column is set to Ignore, the content type for the NumProd column is set to Input, and the content type for the ProfitCategory column is set to Predict. For this model, we want to predict only ProfitCategory, so no content type modifications are necessary.

Important

Because we created the Naïve Bayes model by right-clicking the CustomerProfitCategory_CL model and choosing New Mining Model, the CustomerProfitCategory_CL content type settings were used in the new model, making ProfitCategory the only predicted column. Had we instead right-clicked the CustomerProfitCategory_DT model and chosen New Mining Model, that model’s content type settings would have been used, and NumProd, in addition to Profit Category, would have been a predicted column.


We mentioned earlier that age is ignored in the Naïve Bayes model because it is a continuous variable. If age is a significant determinant of whether a customer is a high-profit or low-profit customer, the Naïve Bayes model will appear to perform worse than other models where age is included. Therefore, we want to include at least some indication of age in the Naïve Bayes model. To do that, we must add a “discretized” version of the age column to our mining structure and include it in our Naïve Bayes model.

The Mining Models tab supports the deletion of columns from a mining structure, but to add a column to the structure, we need to go back to the Mining Structure tab.

  1. On the Mining Structure tab, right-click the tree view and choose Add A Column.

  2. In the Select a Column dialog box (Figure 8), select the AgeGroup column in the Source Column list and then click OK to add AgeGroup to the mining structure.

    Figure 8. The Select a Column dialog box for adding a column to your mining structure

    Note

    The AgeGroup column in vCustomerProfitability categorizes customers into groups such as “Under 30,” “Age 30 through 35,” and “Age 36 through 45” by using a CASE statement. SSAS also has a column content type called “discretized” that can be used to categorize a continuous attribute.


  3. Return to the Mining Models tab. You will see that AgeGroup appears in the mining structure, although its usage is set to Ignore in all of the defined models. To include it in the Naïve Bayes model, click in the cell corresponding to the AgeGroup column under the Naïve Bayes model and change the usage to Input. Your Mining Models tab should appear as shown in Figure 9.

    Figure 9. A mining structure with several mining models and a “discretized” Age Group column
Other  
 
Video
Top 10
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
The latest Audi TT : New angles for TT
Era of million-dollar luxury cars
Game Review : Hearthstone - Blackrock Mountain
Game Review : Battlefield Hardline
Google Chromecast
Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
Michael Kors Designs Stylish Tech Products for Women
REVIEW
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone