DATABASE

Programming Microsoft SQL Server 2005: Using the Data Mining Wizard and Data Mining Designer (part 6) - Validating and Comparing Mining Models

1/14/2011 5:40:51 PM

Validating and Comparing Mining Models

We have processed the mining models and analyzed their predictive contents, which, though similar, are not identical. Although it is easy to conclude that being a long-time customer is an important determinant of customer profitability, the models differ somewhat in their conclusions about which other customer characteristics or buying behaviors are important indicators of profitability. Given these differences, how can you determine which of these secondary predictions are correct? You can do so by using the data mining structure designer’s built-in “lift charts” and classification matrix facilities, which make it easy to validate the relative accuracy of various mining models in your mining structure and compare their predictive ability.

Click the Mining Accuracy Chart tab. It has three subtabs—Column Mapping, Lift Chart, and Classification Matrix—which you can use to compare the accuracy of your models based on supplied test data.

We need to tell SSAS how to map the columns from the test set (vCustomerProfitability in our example) to the mining models. The correct mappings are shown in Figure 18.
Figure 18. The Column Mapping tab of the Microsoft Mining Accuracy Chart designer

The Column Mapping tab has three sections. In the top section, you specify the table that has the test data set and how the columns of the test data set will map to the columns of your mining models. In the middle section, you can specify filters, if any, for your test data. This is useful if you want to test the accuracy of your mining models for a particular subset of data. Lastly, you specify the models, predictable columns, and predictable column values from which lift charts and classification matrices will be created.

Starting at the top section of the tab, verify that the CustomerProfitCategory mining structure is selected in the Mining Structure mini-window on the left side. If it is not, click the Select Structure button in the Mining Structure mini-window, select the CustomerProfitCategory mining structure, and click OK. In the Select Input Table(s) mini-window farther to the right, click the Select Case Table button to specify the test set. In the resulting Select Table dialog box, select vCustomerProfitability and click OK.

Columns in the input table will be mapped to columns in the mining structure that have the same name. To fine-tune these mappings, right-click anywhere on the Column Mapping tab and choose Modify Connections (or choose Modify Connections from the Mining Model menu, which is available only when the Column Mapping tab is active and when you have clicked it). This brings up the Modify Connections dialog box, which allows editing of the mappings between the mining structure and the input table. Click a cell below the Table Column header in the dialog box to reveal a drop-down list of all the columns in the input table. You can change a default mapping, if necessary, by clicking a cell and selecting the desired column from the drop-down list. The proper settings for our mining structure and test data table are (partially) shown in Figure 19. After you verify that the mappings are correct, click OK.

Figure 19. The Modify Mapping dialog box, where you map columns from the test data set to mining model columns

More Info

The Column Mapping tab’s input filtering grid (in the center pane, below the column mappings) allows you to create a WHERE clause for your input table. This is useful if you are interested in testing the mining model for only a particular subset of the test data. For example, to test the model for customers in North America only, you can select vCustomerProfitability as the Source, select Region as the Field, and type = ‘North America’ as the Criteria/Argument.


You can create two types of lift charts on the Mining Accuracy Chart tab, one of which allows you to specify the state of the predictable column. (The other type of lift chart does not allow you to do this.) If you specify the state of the predictable column, the lift chart shows how well each model predicts that particular state. If you do not specify the state of the predictable column, the chart shows the relative accuracy of the models in predicting the column value overall.

By default, all mining models in the mining structure are selected, and no state is specified for the predictable column. Click any cell in the Predict Value column of the grid in the bottom pane of the Column Mapping tab, select the High option from the drop-down list that appears, and then click the Lift Chart tab. After the chart has finished loading, make sure the Lift Chart option is selected from the Chart Type drop-down list. The result, which compares how well each mining model predicts high-profit customers, is shown in Figure 20.

Figure 20. A lift chart comparing models of customer profitability

The x-axis of the lift chart shows the percentage of the overall population, and the y-axis shows the percentage of the population that is high profit. Recall that we defined high-profit customers as those in the top quartile of customers. In an ideal model, in which you can predict perfectly who the high-profit customers are, you need to look at only 25 percent of the overall population to identify 100 percent of the high-profit customers. The ideal model is represented by the salmon-colored line in the figure. Note how it hits the top of the chart at the 25-percent mark along the x-axis. The cornflower blue line in the figure represents random guessing. By randomly guessing, you would need to look at 100 percent of the overall population to identify 100 percent of the high-profit customers. The mining models are an improvement over random guessing; the decision tree model (represented by the mustard-colored line) is the most accurate, followed by the Naïve Bayes model (represented by the light blue line). The colors of lines may differ in your version of the graph, but the results will be the same.

Other  
 
Video
Top 10
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
The latest Audi TT : New angles for TT
Era of million-dollar luxury cars
Game Review : Hearthstone - Blackrock Mountain
Game Review : Battlefield Hardline
Google Chromecast
Keyboards for Apple iPad Air 2 (part 3) - Logitech Ultrathin Keyboard Cover for iPad Air 2
Keyboards for Apple iPad Air 2 (part 2) - Zagg Slim Book for iPad Air 2
Keyboards for Apple iPad Air 2 (part 1) - Belkin Qode Ultimate Pro Keyboard Case for iPad Air 2
Michael Kors Designs Stylish Tech Products for Women
REVIEW
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone