SSAS data mining is no less programmable than OLAP,
and although SSAS data mining can sometimes be more straightforward
than OLAP development, for the most part the two are comparable in
complexity.
Then again, data mining development is
different from OLAP programming. To start with, data mining queries are
typically not drilldown-oriented, making Excel PivotTables far less
compelling as a front-end development tool. On the other hand, the model
content browsers available in Visual Studio and SQL Server Management
Studio are available to developers as Windows Forms controls, and
building on that visualization programmability, the data mining team at
Microsoft has also included sample code that implements a few of those
browsers as ASP.NET server controls. Browsing models from within your
applications using these controls is important and merits discussion.
Whether it be ADO MD.NET,
AMO, or XMLA used programmatically or from Management Studio, virtually
everything you’ve learned in the context of OLAP cubes works with SSAS
data mining models. This is true for both client-side development and
managed code running on the server. In fact, you can easily create
stored procedures that execute DMX prediction queries and return result
sets to the client, and you can do so without using the AdomdClient library and without granting the assembly Unrestricted permissions.
We will provide a few applied examples of using model content browser controls and AdomdClient
code in both Windows Forms and ASP.NET 2.0 environments. We’ll also
show you how easy it is to create a server-side stored procedure that
executes a DMX prediction query and returns the results, and an XMLA
query that does likewise. We’ll finish up by showing you how to take
advantage of the direct support for data mining prediction queries
provided by SQL Server Reporting Services.
Data Mining and API Programming
As we just said, ADO MD.NET provides full support for data mining. This
is true on the metadata side as well as the query side. For example, the
AdomdConnection object has both a MiningStructures and a MiningModels collection. Each member of the MiningStructures collection has its own MiningModels collection as well. The MiningModel object also has a Parent property pointing back to the MiningStructure
object that contains it. This flexibility in the ADO MD.NET object
model allows your code to browse the models in a database as a flat
database-wide collection or through the hierarchical collection of
structures and models. The MiningModel object also contains an Algorithm property and an Attributes
collection, allowing you to determine programmatically the input and
predictable columns in a model and the algorithmic context in which they
are used.
Using the WinForms Model Content Browser Controls
One really good
application of all of this metadata is to provide an interface where
users can select a model and display its contents in the appropriate
content browser control, in effect providing much of the same
functionality that SQL Server Management Studio provides via its Object
Explorer window and the Browse shortcut menu command, but without
requiring users to install and use Management Studio itself.
You can get to the content
browser controls in a couple of ways: if you have Management Studio
installed on your machine, the controls are already there—you simply set
a reference to the Microsoft.AnalysisServices.Viewers.DLL assembly,
which by default is installed in the C:\Program Files\Microsoft Visual
Studio 8\Common7\IDE\PrivateAssemblies folder. If your development
machine does not have Management Studio installed, you can download the
controls from the SQL Server 2005 “feature pack” page at microsoft.com.
More Info
At the time of this book’s publication, the SQL Server 2005 feature pack page was located at http://www.microsoft.com/downloads/details.aspx?familyid=d09c1d60-a13c-4479-9b91-9e8b9d835cdc&displaylang=en.
An Internet shortcut to this URL, which will save you the trouble of
typing it in manually, is provided with the sample code. If the link is
broken when you attempt to navigate to it, try performing a Web search
on “Feature Pack for Microsoft SQL Server 2005” or “Analysis Services
2005 Data Mining Viewer Controls.” Each
component of the feature pack is available as a separately downloadable
file; the content model viewer controls require first downloading and
installing the Analysis Services 9.0 OLE DB Provider, which is also part
of the feature pack. |
After installing the
controls, if necessary, you might want to add them to the Visual Studio
Toolbox. To do so, use the Browse button in the Choose Toolbox Items
dialog box in Visual Studio and select the controls’ assembly. Each of the constituent controls
will be selected, and you can just click OK to add them to the Toolbox.
Using the controls is a snap; simply drag one onto a form and then set its ConnectionString and MiningModelName
properties through the Visual Studio Properties window or in code
(making sure the algorithm of the model you specify matches the viewer
type you selected from the Toolbox). The controls use OLE DB, so make
sure the ConnectionString value you assign includes the clause “Provider=MSOLAP.3;” in addition to Data Source and Initial Catalog clauses. Then all you do is call the LoadViewerData method on the control, passing null as the required parameter value. The following three lines of code show how to do this, assuming that objViewer is an instance of one of the viewer controls, that strConnectionString contains an ADO MD.NET–compliant connection string (specifying Data Source and Initial Catalog) as its value, and that cboMiningModels is a ComboBox control containing a list of mining models on the server, with one having been selected by the user:
objViewer.ConnectionString = "Provider=MSOLAP.3;" + strConnectionString;
objViewer.MiningModelName = cboMiningModels.Text;
objViewer.LoadViewerData(null);
The Chapter20WinClient project includes a multiple-document interface (MDI) child form called frmModelViewer
(file name DMMViewerDemo.cs) that allows the user to specify a server,
database, and model to view and then displays the model’s content in the
appropriate viewer control type. Rather than using a control created at
design time, before it executes the code shown previously, the code in
the form declares objViewer as type Microsoft.AnalysisServices.Viewers.MiningModelViewerControl
(the class from which each viewer control inherits) and then assigns a
new instance of the appropriate algorithm-specific viewer type to that
variable, based on the Algorithm property of the selected MiningModel object. It then adds the object to the form’s Controls collection and sizes, positions, and anchors it within the form.
The UI of the form lets the user click a Show Databases button after specifying a server, the Click event handler for which opens an AdomdConnection to the server and retrieves a Schema DataSet containing all of the catalogs (databases) on the server, which it uses to populate a ComboBox control. Here’s a simplified version of that code, in which txtServer is a TextBox control containing the server name, and cboCatalogs is the target ComboBox control:
AdomdConnection conn = new AdomdConnection("Data Source=" +
txtServer.Text);
conn.Open();
foreach (DataRow dr in conn.GetSchemaDataSet(AdomdSchemaGuid.Catalogs,
null).Tables[0].Rows)
{
cboCatalogs.Items.Add((string)dr[0]);
}
conn.Close();
Notice that only a Data Source clause, and not an Initial Catalog clause, is supplied in the connection string. We need server-level information only, so this is sufficient.
When the user makes a selection from cboCatalogs, the code fetches a list of, and populates a ComboBox control with, names of all the models belonging to the selected database. Here’s a simplified version of that code (with cboMiningModels as the target ComboBox):
strConnectionString = "Data Source=" + txtServer.Text + ";Initial Catalog="
+ cboCatalogs.Text;
AdomdConnection conn = new AdomdConnection(strConnectionString);
conn.Open();
foreach (MiningModel mm in conn.MiningModels)
{
cboMiningModels.Items.Add(mm.Name);
}
conn.Close();
As soon as a model is selected from cboMiningModels, the code that instantiates a viewer control and connects it to a specific model can be executed.
Executing Prediction Queries with ADO MD.NET
The
programming techniques we have explored so far are fairly
straightforward, but executing prediction queries is even easier. You can use the ADO MD.NET AdomdCommand and AdomdDataAdapter objects to execute MDX queries and return flattened result sets in the form of an AdomdDataReader or a DataSet/DataTable.
Similarly, you can use them to execute DMX prediction queries and
return their result sets in tabular form. The following code, a
simplified version of the code in the Chapter20WinClient project’s form frmDMXAdHoc (file name DMXAdHoc.cs), executes a DMX query, retrieves the results as a DataTable, and then binds the DataTable to dgvResults (a DataGridView control):
AdomdConnection conn = new AdomdConnection("Data Source=" + txtServer.Text
+ ";Initial Catalog=" + cboCatalogs.Text);
AdomdCommand cmd = new AdomdCommand(txtQuery.Text, conn);
AdomdDataAdapter da = new AdomdDataAdapter(cmd);
DataTable dt = new DataTable();
da.Fill(dt);
dgvResults.DataSource = dt;
In the code, txtServer and cboCatalogs are equivalent to their namesakes in the form frmModelViewer (discussed previously), and txtQuery is a multiline TextBox control containing the DMX query.
Note that frmDMXAdHoc
is functionally equivalent to Management Studio’s DMX window. This
means that almost all of the DMX prediction queries in the sample code’s
Management Studio project can be run from this form (and even the
comments can be left in the query text). One exception is that the code
will not properly handle result sets with nested columns (such as the
one in Cluster5ProbabilityDist.dmx); be sure to use the FLATTENED keyword to avoid this problem (as was done in Cluster5ProbabilityDistFlattened.dmx).
Model Content Queries
Other DMX query types can be handled by the ADO MD code used in the viewer in frmDMXAdHoc.
For example, each of the DML queries in the sample code’s Management
Studio project can be executed in the form. However, none of these
returns any data, and it would be more reasonable to run them using the AdomdCommand object’s ExecuteNonQuery
method. Finally, model content queries (which we briefly discussed
earlier in the context of DMX templates) can be executed as well. For
example, the following query returns textual information about the
CustomerProfitCategory_CL clustering model:
SELECT NODE_CAPTION, NODE_DESCRIPTION, NODE_PROBABILITY
FROM [CustomerProfitCategory_CL].CONTENT
This query can be run from a DMX Query window in Management Studio or from frmDMXAdHoc
in Chapter20WinClient. (Try it and see.) The content data in text form
is much less elegant than the visualization of that data, but its
programmatic use is intriguing. Your application could, for example,
query the model in advance, learn of certain patterns by parsing the
NODE_DESCRPTION column, and then execute predictive business logic upon
entry of new data.
ADO MD.NET and ASP.NET
Most of what we’ve covered
so far in the context of Windows Forms development can be implemented
similarly in ASP.NET. The Web site Chapter20WebClient, which is included
with the sample code’s Visual Studio solution, shows an example of how
to code an ASP.NET equivalent of Chapter20WinClient. The model content
viewers must be handled in a different way from their Windows Forms
counterparts (as we will discuss shortly). However, the execution of DMX
queries is handled in much the same way as shown previously.
For example, you can populate a GridView control called gvResults with the non-hierarchical results of a DMX query, as follows:
AdomdConnection conn = new AdomdConnection("Data Source=" + txtServer.Text
+ ";Initial Catalog=" + ddlCatalogs.Text);
AdomdCommand cmd = new AdomdCommand(txtQuery.Text, conn);
AdomdDataAdapter da = new AdomdDataAdapter(cmd);
DataTable dt = new DataTable();
da.Fill(dt);
gvResults.DataSource = dt;
gvResults.DataBind();
In this snippet (which is a simplified version of the code on the DMXAdHoc.aspx page in the Chapter20WebClient application), ddlCatalogs is an ASP.NET DropDownList control containing a list of the databases on the server whose name is specified in the TextBox control txtServer. This code is strikingly similar to its Windows Forms equivalent and is just as versatile at handling DMX queries.
Using the Data Mining Web Controls
It’s good to know that DMX
querying can be handled much the same way in an ASP.NET application and
in Windows Forms, but what about the visualization of model content? The
content viewer controls are designed for the Windows Forms environment,
so we can’t use them in an ASP.NET application. However, Microsoft
provides sample .NET code that implements dynamic Hypertext Markup
Language (DHTML) versions of the Cluster, Naïve Bayes, and Decision
Trees viewers as ASP.NET server controls. Specifically, the
Cluster/Attribute Characteristics and Cluster/Attribute Discrimination
tabs of the Cluster and Naïve Bayes viewers and the Decision Tree tab of
the Tree Viewer are implemented. This means that only certain models
can be browsed and only certain views of those models are possible.
Although
they provide only a subset of the functionality supplied by the Windows
Forms controls, these ASP.NET components are still extremely helpful,
especially given the reach of the Internet and the Web. However, these
controls are provided as sample code only and are therefore a little
harder to use in your applications than their Windows Forms equivalents.
You can obtain the
source code for the controls (in the form of a C# class library project)
by installing the Microsoft SQL Server 2005 Samples. (The setup package
for these samples is available in the Documentation and
Tutorials\Samples submenu of the SQL Server 2005 main menu group on the
Windows Start menu. The installer places the Visual Studio source code
for the controls, by default, in the folder C:\Program Files\Microsoft
SQL Server\90\Samples\Analysis Services\DataMining\Data Mining Web
Controls. If you open the WebControls.sln solution file in that folder
and build the project, you can reference the generated assembly in an
ASP.NET Web site in Visual Studio and add the three viewer controls to
the Visual Studio Toolbox window.
You can use the controls by placing them on a Web form and setting their Server, Database, and Model
properties, each of which is string-typed, and store or return the name
of an SSAS server, database, and mining model whose content you want to
display. As an alternative to assigning values to the Server and DatabaseOleDbConnection object, connect it to a specific SSAS database, and then assign it to the controls’ Connection property. properties, you can create an
The Tree Viewer has a ViewType property that you must set to one of two enumerated constants: Tree or NodeDetails. If you set it to Tree, you must also set the TargetTree property to the name of a predictable column contained in the mining model. If you set it to NodeDetails, you must set the TargetNode property to a specific node in the decision tree. The Naïve Bayes and Cluster viewer controls both feature a ViewerMode property that also must be set to one of two enumerated constants: Discrimination or Characteristics, depending on the functionality of the tab of the Windows Forms viewers you want to duplicate.
Developing Managed Stored Procedures
Applying your OLAP development skills to data mining doesn’t end with ADO MD.NET on the client; you can use the AdomdServer library to execute server-side DMX prediction or content queries and return the results. For example, Listing 1
provides all the code necessary to implement a server-side version of
the prediction query in the LeastLikelyCar.dmx Management Studio script
covered earlier.
Listing 1. Using AdomdServer for DMX queries
using System; using System.Data; using Microsoft.AnalysisServices.AdomdServer;
namespace Chapter20ServerAssembly { public class StoredProcedures { public DataTable GetData() { DataTable dt = new System.Data.DataTable(); dt.Columns.Add("FirstName", typeof(string)); dt.Columns.Add("LastName", typeof(string)); dt.Columns.Add("Phone", typeof(string)); dt.Columns.Add("ProbNotCarOwner", typeof(double));
object[] row = new object[4];
AdomdCommand comContext = new AdomdCommand("SELECT TOP 250 t.FirstName, t.LastName, t.Phone, PredictProbability([Is Car Owner],'N') AS ProbNotCarOwner FROM CarOwner_DT PREDICTION JOIN OPENQUERY([Adventure Works DW], 'SELECT c.FirstName,c.LastName,tc.* FROM vCustomerProfitability tc INNER JOIN dimCustomer c ON tc.CustomerKey = c.CustomerKey') AS t ON t.Gender = CarOwner_DT.Gender AND t.IncomeGroup = CarOwner_dt.[Income Group] AND t.MaritalStatus = CarOwner_dt.[Marital Status] AND t.Region = CarOwner_dt.Region ORDER BY PredictProbability([Is Car Owner],'N') DESC"); try { AdomdDataReader dr = comContext.ExecuteReader(); while (dr.Read()) { row[0] = dr[0]; row[1] = dr[1]; row[2] = dr[2]; row[3] = dr[3]; dt.Rows.Add(row); } dr.Close(); return dt; } catch { return null; } } } }
|
Although it is not possible to execute an MDX query using AdomdServer, it is permissible to execute a DMX query. In the case of this stored procedure, we simply create the AdomdServer.AdomdCommand object comContext, passing the DMX query to its constructor.
We make sure to type the function as returning a DataTable, and we populate a local DataTable variable with the contents of an AdomdDataReader. The latter is obtained by calling the comContext object’s ExecuteReader method.
The code in Listing 1
is taken from the Class Library project Chapter20ServerAssembly in the
sample code’s Visual Studio solution. When deploying the assembly, you
should assign it ExternalAccess permissions; assigning it Unrestricted
permissions is not necessary, and assigning it Safe permissions will
prevent the DMX query from executing and returning data properly.
Once deployed, the stored procedure can be called using the following DMX code:
CALL
Chapter20ServerAssembly.Chapter20ServerAssembly.StoredProcedures.GetData()
This query can be
executed in a SQL Server Management Studio DMX Query window (it is
contained in the Management Studio project script ManagedCode.dmx) or in
the Chapter20WinClient and Chapter20WebClient applications.
Keep in mind that your stored procedures can also execute DDL queries using DMX (by using the AdomdCommand object’s ExecuteNonQuery method) or perform administrative or DDL tasks using AMO. The AMO Database object contains a MiningStructures collection, each member of which contains its own MiningModels collection. The AMO MiningStructure and MiningModel
objects have an array of properties, methods, and child objects that
are useful for performing any number of administrative or DDL tasks
against your mining structures and models.