Cubes are great, and having
all that BI data in one place is convenient, but sometimes parts of your
cube will have different demands placed on them than other parts.
Analysis Services partitions allow you to slice your cube physically and
use specific storage and aggregation settings that are appropriate to
each subset of your cube’s data and the unique demands placed on it.
For
example, if users rarely need data older than a certain date, you might
store that data on a slower (cheaper) disk, process it much less
frequently, and perhaps precalculate many fewer aggregations than you
would for recent data. You might also or instead want to implement such
physical separation of your cube’s storage along product lines,
geography, or any combination of dimensional criteria. Partitions allow
you to do just that.
Analysis
Services 2005 partitions allow you to define their content (or
“source”) by specifying a fact table that the partition should be based
on or a SQL query that selects only a subset of the rows from a fact
table. You can then configure various aspects of the partition’s
processing and storage as well as its aggregation design.
Every cube consists of at
least one partition (or, to be more precise, at least one partition per
measure group), so you need to be familiar with partition configuration
even if you don’t want to “partition” your cube (in the ordinary sense
of the word). In fact, we will concentrate on a single partition
scenario using our Northwind Sales cube for the remainder of this section.
Editing and Creating Partitions
If you click the Partitions tab of the cube designer, you will find that a partition for the Sales
cube has already been defined. The existing partition is built on our
fact table, tblFact, without any filtering query defined for row
selection. Storage and processing locations are set to the Analysis
Services defaults; you can modify them in the Partition Type Selection
dialog box (see Figure 1) by selecting the partition’s row in the designer’s grid and then selecting the StorageLocation property in the Properties window and clicking the ellipsis button in its value column.
You can modify partition
content by clicking in the partition’s cell in the Source column of the
designer’s grid and clicking its ellipsis button or selecting the
partition’s Source property in the Properties window and clicking the ellipsis button in its Value column (Figure 2).
Either approach opens the Partition Source dialog box in table binding view (Figure 3).
Select the Query
Binding option from the Binding Type drop-down list to bring up the
query binding view. The table binding view lets you define your
partition based on the contents of an entire fact table, whereas the
query binding view allows you to specify a SQL query that might select
only a subset of rows. When you’re done examining these options, click
the dialog box’s Cancel button because we don’t want to modify our
partition’s source.
More Info
You
might also be interested in exploring the Partition Wizard, which
provides an alternative interface for setting many of the properties
mentioned here—but only for new partitions. You can do so by clicking
the New Partition toolbar button (third from the left), the New
Partition... hyperlink, the Cube/New Partition... main menu option, or
the New Partition... shortcut menu option. You
might not be able to fully explore each wizard page unless you delete
the existing partition, thus freeing up the fact table to be used in a
new partition. If you do this, be sure to close the cube designer after
you delete the existing partition and then discard your changes. This will ensure that the existing partition is not actually deleted from the cube. |
In addition to these partition properties, you can also modify storage settings and aggregation design, as we’ll discuss next.
Aggregation Design
Aggregation design lets you
specify a limit on the number of precalculated aggregations stored in
your cube as a function of disk space or percentage performance gain.
Alternatively, you can start building the aggregations manually and
manually end the process; in this case, the aggregations are limited by
processing time rather than by disk cost or performance gain.
You configure
aggregation settings using the Aggregation Design Wizard, which you can
invoke from the Design Aggregations toolbar button (third from the
right), the Design Aggregations... hyperlink, the Cube/Design
Aggregations... main menu option, or the Design Aggregations...shortcut
menu option. You specify all aggregation design settings on the wizard’s
Set Aggregation Options page, which is the second-to-last page in the
wizard.
Before you reach the Set
Aggregation Options page, you must define the medium for your cube’s
storage and the way it is cached on the Specify Storage And Caching
Options page, which appears immediately after the Welcome page. You can
also set these options in the Partition Storage Settings dialog box (Figure 4).
You
can open the Partition Storage Settings dialog box when a specific
partition is selected by selecting the Storage Settings toolbar button
(on the far right), the Storage Settings... hyperlink, the Cube/Storage
Settings... main menu option, or the Storage Settings... shortcut menu
option. You can also select the partition’s ProactiveCaching property in the Properties window and then click the ellipsis button in its value column.
More Info
You can also modify a partition’s storage settings by editing the individual child properties of the ProactiveCaching
property. However, the Partition Storage Settings dialog box and the
Storage And Caching Options page of the Aggregation Design Wizard
provide a much friendlier UI. |
The fact that
partition storage options are simply configured through a dialog box (or
wizard page) can be misleading. The dialog box and wizard page actually
offer an enormously rich set of options that can take a while to fully
appreciate. The dialog box is also, in effect, a dashboard for
controlling what is arguably the most important feature of Analysis
Services: proactive caching.
Partition Storage Options
In the SQL Server 7.0
OLAP Services and SQL Server 2000 Analysis Services days there were
three types of storage: Multidimensional OLAP (MOLAP), Relational OLAP
(ROLAP), and Hybrid OLAP (HOLAP). With MOLAP (the default), all fact and
aggregation data are stored in the Analysis Services multidimensional
store, which provides the fastest query response times. ROLAP, in which
facts stay in the relational database’s fact table and aggregations are
stored in special relational tables of their own, provides for faster
cube processing times (for obvious reasons) but slower query response
times. HOLAP, a sort of compromise option, keeps the facts data in the
relational database but stores aggregations multidimensionally.
Proactive Caching
MOLAP, ROLAP, and HOLAP
have not gone away in Analysis Services 2005, but their limitations are
mitigated by the proactive caching feature. For example, proactive
caching allows you to use ROLAP as your underlying storage medium (which
ensures faster processing time and thus accommodates real-time, or
near-real-time, OLAP) but instructs Analysis Services to cache your cube
in the MOLAP medium (for better query performance).
You choose how often the
cache is flushed. Depending on your cache settings, the MOLAP cube can
be immediately invalidated when certain changes to the underlying data
warehouse are made. In
this case, queries are fulfilled against the base ROLAP cube; the MOLAP
version is reprocessed in the background, and then it’s swapped back in
when processing is complete. You can also configure things such that
the outdated (“stale”) MOLAP data continues to be used while updated
MOLAP data is being rebuilt and processed. This enhances performance
(because MOLAP storage continues to be used) but decreases freshness of
data.
As you can see in the
Partition Storage Settings dialog box, proactive caching allows for a
“spectrum” of seven standard settings, ranging from Real-Time ROLAP on
the left to pure, non-cached MOLAP on the right. All but the last option
set a degree of caching and automatic processing of your partition. As
you move the slider between the settings, you see new text appear in the
dialog box, below the slider control, with a high-level description of
the current setting. For an even more detailed view of these settings
click the Options button, which brings up the Storage Options dialog box
(Figure 5).
On the General tab of
the Storage Options dialog box you can view the specific cache update
and latency intervals. The Notifications tab lets you specify particular
tables or SQL queries to monitor for data changes. You get a huge
amount of granular control in this dialog box. Take your time getting
acquainted with it. These tuning options let you strike the perfect
balance between performance, freshness of data, and real-time
availability of data. If you get things right here, you will be a hero
to your entire organization—no exaggeration.
If you
modify any of the preconfigured settings in the Storage Options dialog
box, you automatically shift from using one of the seven standard
settings to using a custom setting (unless you happen to configure the
individual settings to match exactly one of the other six standard
settings). Try it yourself and see: Change one of the settings slightly,
and then click OK. When you
return to the Partition Storage Settings dialog box, you will see that
the Custom Setting option button, rather than the Standard Setting
option button, is selected and the slider control is disabled.
Note
You
can also configure storage settings at the cube, database dimension,
and measure group level rather than at the partition level. You can edit
cube-level or measure-group-level settings in the cube designer: Just
select a cube or measure group node on a tab that has one (for example,
the two root nodes of the Measures tree view in the Cube Structure tab)
and click the ellipsis button in the Value column of the ProactiveCaching
property in the Properties window. You can similarly configure
dimension storage options through the Properties window, but in the
dimension designer rather than in the cube designer. |
Additional Features and Tips
Our coverage of
partitions, proactive caching, and aggregation design would not be
complete without mentioning a few miscellaneous points and features that
are worthy of your further exploration, as time permits.
First, a word on
writeback (write-enabled) measure groups and dimensions. Analysis
Services permits users to modify their view of the contents of measure
groups by storing those modifications in special writeback tables in the
relational database. Choose the Cube/Writeback Settings... main menu
option or the Writeback Settings... shortcut menu option to specify the
data source and table for writeback storage. Only measure groups that do
not contain special aggregates (that is, aggregates other than SUM) can be writeback-enabled. Our measure group, because it contains the non-sum-based Avg Total Sales measure, is not writeback-enabled; however, the Sales Targets measure group of the Analysis Services sample Adventure Works cube is a good example of such a partition.
You can also
write-enable dimensions, thus allowing client applications to update
their contents, by simply setting the dimension’s WriteEnabled property to True through the Properties window in the dimension designer.
And
one last tip: In Management Studio, you can create partitions and
change storage/writeback settings at the dimension, partition, and cube
level. Management Studio’s property sheets let you modify these settings
so you don’t have to use Visual Studio to change them after deploying
your cube.