By default, SharePoint stores all the uploaded
documents and files in its content databases. This has always led to
storage, performance, and manageability issues, especially for large
SharePoint deployments in SharePoint 2003 and the early days of
SharePoint 2007. In SharePoint 2007 SP1, Microsoft made its first
attempt to store documents and files out of SQL Server; however, the
implementation was difficult and came with many limitations. In
SharePoint 2010, Microsoft took this to a completely new level by making
use of the powerful storage externalization capabilities of SQL Server
2008 R2. This section explains what BLOBs are, how they are stored in
SharePoint, and how Microsoft made use of the Remote BLOB Storage
technology in SharePoint 2010 to move the storage of large documents and
files from SQL Server to remote stores. At the end of this section,
administrators learn how to install and configure RBS in SharePoint 2010
environments and how to migrate and move data between different stores.
Understanding BLOBs
Most of the values stored
in SQL Server consist of ASCII (American Standard Code for Information
Interchange) characters. A basic explanation of ASCII characters is that
they are the letters, numbers, and symbols found on the keyboard. A
text editor such as Notepad can alter a file holding only ASCII
characters without any consequences. However, data is not limited to
strings and numbers; it is always a common requirement to store a large
amount of binary data in a SQL Server table along with other ASCII data;
Word documents, XML documents, and images are some examples. Binary
files contain ASCII characters, special control characters, and byte
combinations not found on the keyboard. Opening a Microsoft Word
document inside Notepad and modifying it would result in the file being
corrupted and not readable, because Notepad cannot correctly interpret
or create binary bits. BLOBs, then, are binary files that are large, or
Binary Large Objects (BLOB).
SQL Server provides
special data types for dealing with such large volumes of binary data.
These various data types have changed over time.
In SQL 2000, there were
two different families of data type options for this type of data:
binary and image. The binary family included two different data types:
the binary data type and the VARBINARY data type. The VAR in VARBINARY
means that the size is variable rather than fixed, as in the case of the
standard binary data type. However, it still has a maximum length of
8,000 bytes.
The image data type family
was used to store binary large objects that are greater than 8,000
bytes. This data type is still present in newer versions, but it is
deprecated. Microsoft recommends avoiding using these data types in new
development work and recommends modifying applications that currently
use them.
Starting in SQL Server 2005,
Microsoft included the VARBINARY(MAX) data type to the binary data type
family. This variation extends the usual limit of around 8,000 bytes and
allows storage up to 2GB of data.
Later, in SQL Server
2008, Microsoft introduced the FILESTREAM option for a VARBINARY(MAX)
fields. This enables storage, efficient streaming, and integrated
management of large BLOBs in a SQL database by using the underlying NTFS
file system for BLOB storage/streaming, while managing and accessing it
directly within the context of the database.
Instead of being a
completely new data type, FILESTREAM is a storage attribute of the
existing VARBINARY(MAX) data type. FILESTREAM alters how the BLOB data
is stored—in the file system rather than in the SQL Server data files.
Because FILESTREAM is implemented as a VARBINARY(MAX) column and
integrated directly into the database engine, most SQL Server management
tools and functions work without modification for FILESTREAM data.
It is also worth
mentioning that the behavior of the regular VARBINARY(MAX) data type
remains entirely unchanged in SQL Server 2008, including the 2GB size
limit. The addition of the FILESTREAM attribute means a VARBINARY(MAX)
column can essentially be unlimited in size. (In reality, the size is
limited to that of the underlying NTFS volume.)
BLOB Storage in SharePoint
In SharePoint 2003 and the
early days of SharePoint 2007, Microsoft SQL Server stored BLOB data in
its databases as a rule, as illustrated in Figure 1.
As a database’s usage increased, the total size of its BLOB data could
quickly grow larger than the total size of the document metadata and the
other structured data stored in the database. There were no exceptions
for this; content metadata and BLOBs had to go into content databases.
This was not efficient because Microsoft estimates that as much as 80
percent of the data stored in SharePoint content databases is
nonrelational BLOB data, such as Microsoft Office Word documents,
Microsoft Office Excel spreadsheets, and Microsoft Office PowerPoint
presentations. Only 20 percent is relational metadata, and this caused
storage, performance, and manageability issues, especially for large
SharePoint deployments.
In May 2007, Microsoft
released a hotfix for Windows SharePoint Services 3.0 and Microsoft
Office SharePoint Server 2007 that was later rolled into Service Pack 1.
The hotfix exposed an External BLOB Storage API (EBS), which enabled
the storage of BLOBs outside content databases by implementing a set of
interfaces. This seemed revolutionary, but EBS was difficult to
implement because it exposed an unmanaged interface. Accordingly, this
part had to be handled by third parties, or independent software vendors
(ISVs). In addition, EBS had some limitations; for example, it could be
enabled only on the farm level and not on the content database level.
Microsoft heard its customers’
feedback loud and clear and decided to fix the BLOB storage pain
points. In SharePoint 2010, it is now possible to move the storage of
BLOBs from database servers to commodity storage solutions by using
Remote Blob Storage technology, which was introduced in SQL Server 2008.
Introducing Remote BLOB Storage
Remote BLOB Storage (RBS)
is a library API set designed to move storage of large binary data
(BLOBs) from Microsoft SQL Server to external storage solutions. RBS is
incorporated as an add-on feature pack for Microsoft SQL Server 2008 and
Microsoft SQL Server 2008 Express, and later SQL Server 2008 R2 and SQL
Server 2008 R2 Express.
Using RBS, applications
can store large amounts of unstructured data (such as Office documents,
PDF files, or videos) and enjoy both the relational capabilities of SQL
Server and the scalability of a dedicated blob store. Best of all,
developers do not have to write the code to handle the job of tying
together the SQL metadata and the blob data. RBS handles the
transactional consistency completely.
An application stores and
accesses blob data by calling into the RBS client library. ISVs and
storage solution vendors can create their own RBS Provider Library to
enable the use of custom stores with applications written against the
RBS API set. Microsoft has even created a provider named FILESTREAM RBS
provider, which comes with RBS 2008 R2 and can be used for storing BLOBs
on the underlying NTFS file system. The FILESTREAM RBS provider ties
the RBS technology with the FILESTREAM feature introduced in SQL Server
2008.
SharePoint 2007 did not
take advantage of the recent SQL Server features that Microsoft
introduced for unstructured data in SQL Server 2008, such as the
FILESTREAM attribute or RBS technology; instead, SharePoint 2007
provided its own options to enhance the storage efficiency and
manageability of huge data though External BLOB Storage (EBS).
SharePoint 2010 supports RBS
and can leverage the SQL Server FILESTREAM RBS provider, thus providing
cheaper storage and much better performance. Figure 2 illustrates how RBS works with SharePoint 2010.
In SharePoint 2010, the
FILESTREAM RBS provider can also be used to work around the 4GB database
limitation in SQL Server Express. Unlike Windows SharePoint Services
3.0, SharePoint Foundation 2010 does not use the Windows Internal
Database, which had no database size limit. Therefore, when Windows
SharePoint Services 3.0 users are trying to upgrade
their farms, they might fall into a scenario where they have an
internal database larger than 4GB, which would exceed the limitation of
SQL Server Express. At this point, they will be prompted to download the
FILESTREAM RBS provider to use the underlying file system to
externalize their BLOBs.
RBS Versus EBS
As mentioned, EBS was
an earlier attempt by Microsoft in SharePoint 2007 SP1 to help customers
externalize their BLOBs. However, EBS was hard to implement and had
some limitations. Microsoft introduced EBS as an immediate help, and it
was designed in a way that it is an evolutionary approach in that
administrators can move to RBS later. EBS will continue to be supported
for SharePoint 2010, but it is on the deprecation list, which means its
support will end in a future release of SharePoint. Microsoft recommends
using RBS in SharePoint 2010 not only because EBS is on the deprecation
list but also because RBS is more powerful and maintainable. Table 1 illustrates some of the advantages of RBS over EBS.
Table 1. Comparison of Legacy Versus RBS
Feature | RBS | EBS |
---|
BLOB store scope | Can be configured on the content database level. (Each content database can have its own BLOB store.) | Can be configured only on the farm level |
Number of providers | Multiple. | Only one |
Interface | Managed. | Unmanaged |
Migrating BLOBs from SQL Server stores to BLOB stores and vice versa | Windows PowerShell. | Custom |
SharePoint interface | SharePoint 2010 ships with many Windows PowerShell commandlets that can be used to manage RBS installation and configuration. | None |