When
your data is on-premises, you can manage your SQL Server database's
growth by adding more storage capacity. Typically, an on-premises
storage area network is shared across multiple databases and
applications, and it's only a matter of acquiring an extra block of
storage from the company's storage-management team. Even though a cost
is associated with the storage, you still have control over how you
distribute your database growth.
When your data is in SQL Azure,
there is a storage constraint of 10GB per database, and you don't have
control over how the data files are stored or distributed across the
storage area network. Microsoft's argument behind this constraint is
that according to the company's analysis, 90% of the SQL Server
databases in the world are less than 9GB in size.
With this constraint in mind,
how do you architect your database for growth beyond 10GB? The
following are a few strategies I have designed for SQL Azure customers:
Partition data by location, and distribute it across multiple SQL Azure data centers.
Partition data by date into multiple databases.
Partition data by business functions into bucket databases.
Partition data by tenant, with one configuration and one content database per tenant.
Partition data between on-premises and SQL Azure databases.
NOTE
Because of the SQL Azure
size restrictions, all these strategies revolve around creating
multiple SQL Server databases in SQL Azure and partitioning data across
these databases.
In all the
partitioning options, typically a centrally located or replicated
configuration database maintains the references and boundary parameters
of the content databases. The content databases contain the actual
content partitioned by the appropriate boundary condition. These
boundary conditions may be one of more of the following: location,
date, business function, tenant, and premises. Figure 1 illustrates some of these partitioning strategies.
In Figure 1,
the configuration database contains the partition information of the
content databases. The application queries the configuration database
with query parameters and retrieves the list of content databases that
fall within the specified parameters. For example, if your query is for
dates in 2007, then the configuration database sends references to the
2007 database. The application can then connect to the appropriate
database to execute the query.
Because of the two hops
between the application and the databases, there is a performance
impact on the data retrieval from the content databases. The
configuration database isn't expected to change frequently because it
depends on the partitioning parameters, which don't change often.
Therefore, you can cache the configuration data in the application and
synchronize only when it changes, bypassing an additional hop to the
configuration database.