Externalizing BLOB Storage in SharePoint 2010 (part 1)

2/18/2011 2:30:27 PM
By default, SharePoint stores all the uploaded documents and files in its content databases. This has always led to storage, performance, and manageability issues, especially for large SharePoint deployments in SharePoint 2003 and the early days of SharePoint 2007. In SharePoint 2007 SP1, Microsoft made its first attempt to store documents and files out of SQL Server; however, the implementation was difficult and came with many limitations. In SharePoint 2010, Microsoft took this to a completely new level by making use of the powerful storage externalization capabilities of SQL Server 2008 R2. This section explains what BLOBs are, how they are stored in SharePoint, and how Microsoft made use of the Remote BLOB Storage technology in SharePoint 2010 to move the storage of large documents and files from SQL Server to remote stores. At the end of this section, administrators learn how to install and configure RBS in SharePoint 2010 environments and how to migrate and move data between different stores.

Understanding BLOBs

Most of the values stored in SQL Server consist of ASCII (American Standard Code for Information Interchange) characters. A basic explanation of ASCII characters is that they are the letters, numbers, and symbols found on the keyboard. A text editor such as Notepad can alter a file holding only ASCII characters without any consequences. However, data is not limited to strings and numbers; it is always a common requirement to store a large amount of binary data in a SQL Server table along with other ASCII data; Word documents, XML documents, and images are some examples. Binary files contain ASCII characters, special control characters, and byte combinations not found on the keyboard. Opening a Microsoft Word document inside Notepad and modifying it would result in the file being corrupted and not readable, because Notepad cannot correctly interpret or create binary bits. BLOBs, then, are binary files that are large, or Binary Large Objects (BLOB).

SQL Server provides special data types for dealing with such large volumes of binary data. These various data types have changed over time.

In SQL 2000, there were two different families of data type options for this type of data: binary and image. The binary family included two different data types: the binary data type and the VARBINARY data type. The VAR in VARBINARY means that the size is variable rather than fixed, as in the case of the standard binary data type. However, it still has a maximum length of 8,000 bytes.

The image data type family was used to store binary large objects that are greater than 8,000 bytes. This data type is still present in newer versions, but it is deprecated. Microsoft recommends avoiding using these data types in new development work and recommends modifying applications that currently use them.

Starting in SQL Server 2005, Microsoft included the VARBINARY(MAX) data type to the binary data type family. This variation extends the usual limit of around 8,000 bytes and allows storage up to 2GB of data.

Later, in SQL Server 2008, Microsoft introduced the FILESTREAM option for a VARBINARY(MAX) fields. This enables storage, efficient streaming, and integrated management of large BLOBs in a SQL database by using the underlying NTFS file system for BLOB storage/streaming, while managing and accessing it directly within the context of the database.

Instead of being a completely new data type, FILESTREAM is a storage attribute of the existing VARBINARY(MAX) data type. FILESTREAM alters how the BLOB data is stored—in the file system rather than in the SQL Server data files. Because FILESTREAM is implemented as a VARBINARY(MAX) column and integrated directly into the database engine, most SQL Server management tools and functions work without modification for FILESTREAM data.

It is also worth mentioning that the behavior of the regular VARBINARY(MAX) data type remains entirely unchanged in SQL Server 2008, including the 2GB size limit. The addition of the FILESTREAM attribute means a VARBINARY(MAX) column can essentially be unlimited in size. (In reality, the size is limited to that of the underlying NTFS volume.)

BLOB Storage in SharePoint

In SharePoint 2003 and the early days of SharePoint 2007, Microsoft SQL Server stored BLOB data in its databases as a rule, as illustrated in Figure 1. As a database’s usage increased, the total size of its BLOB data could quickly grow larger than the total size of the document metadata and the other structured data stored in the database. There were no exceptions for this; content metadata and BLOBs had to go into content databases. This was not efficient because Microsoft estimates that as much as 80 percent of the data stored in SharePoint content databases is nonrelational BLOB data, such as Microsoft Office Word documents, Microsoft Office Excel spreadsheets, and Microsoft Office PowerPoint presentations. Only 20 percent is relational metadata, and this caused storage, performance, and manageability issues, especially for large SharePoint deployments.

Figure 1. BLOB storage in previous versions of SharePoint.

In May 2007, Microsoft released a hotfix for Windows SharePoint Services 3.0 and Microsoft Office SharePoint Server 2007 that was later rolled into Service Pack 1. The hotfix exposed an External BLOB Storage API (EBS), which enabled the storage of BLOBs outside content databases by implementing a set of interfaces. This seemed revolutionary, but EBS was difficult to implement because it exposed an unmanaged interface. Accordingly, this part had to be handled by third parties, or independent software vendors (ISVs). In addition, EBS had some limitations; for example, it could be enabled only on the farm level and not on the content database level.

Microsoft heard its customers’ feedback loud and clear and decided to fix the BLOB storage pain points. In SharePoint 2010, it is now possible to move the storage of BLOBs from database servers to commodity storage solutions by using Remote Blob Storage technology, which was introduced in SQL Server 2008.

Introducing Remote BLOB Storage

Remote BLOB Storage (RBS) is a library API set designed to move storage of large binary data (BLOBs) from Microsoft SQL Server to external storage solutions. RBS is incorporated as an add-on feature pack for Microsoft SQL Server 2008 and Microsoft SQL Server 2008 Express, and later SQL Server 2008 R2 and SQL Server 2008 R2 Express.

Using RBS, applications can store large amounts of unstructured data (such as Office documents, PDF files, or videos) and enjoy both the relational capabilities of SQL Server and the scalability of a dedicated blob store. Best of all, developers do not have to write the code to handle the job of tying together the SQL metadata and the blob data. RBS handles the transactional consistency completely.

An application stores and accesses blob data by calling into the RBS client library. ISVs and storage solution vendors can create their own RBS Provider Library to enable the use of custom stores with applications written against the RBS API set. Microsoft has even created a provider named FILESTREAM RBS provider, which comes with RBS 2008 R2 and can be used for storing BLOBs on the underlying NTFS file system. The FILESTREAM RBS provider ties the RBS technology with the FILESTREAM feature introduced in SQL Server 2008.

SharePoint 2007 did not take advantage of the recent SQL Server features that Microsoft introduced for unstructured data in SQL Server 2008, such as the FILESTREAM attribute or RBS technology; instead, SharePoint 2007 provided its own options to enhance the storage efficiency and manageability of huge data though External BLOB Storage (EBS).

SharePoint 2010 supports RBS and can leverage the SQL Server FILESTREAM RBS provider, thus providing cheaper storage and much better performance. Figure 2 illustrates how RBS works with SharePoint 2010.

Figure 2. Externalizing BLOB storage using RBS.

In SharePoint 2010, the FILESTREAM RBS provider can also be used to work around the 4GB database limitation in SQL Server Express. Unlike Windows SharePoint Services 3.0, SharePoint Foundation 2010 does not use the Windows Internal Database, which had no database size limit. Therefore, when Windows SharePoint Services 3.0 users are trying to upgrade their farms, they might fall into a scenario where they have an internal database larger than 4GB, which would exceed the limitation of SQL Server Express. At this point, they will be prompted to download the FILESTREAM RBS provider to use the underlying file system to externalize their BLOBs.

RBS Versus EBS

As mentioned, EBS was an earlier attempt by Microsoft in SharePoint 2007 SP1 to help customers externalize their BLOBs. However, EBS was hard to implement and had some limitations. Microsoft introduced EBS as an immediate help, and it was designed in a way that it is an evolutionary approach in that administrators can move to RBS later. EBS will continue to be supported for SharePoint 2010, but it is on the deprecation list, which means its support will end in a future release of SharePoint. Microsoft recommends using RBS in SharePoint 2010 not only because EBS is on the deprecation list but also because RBS is more powerful and maintainable. Table 1 illustrates some of the advantages of RBS over EBS.

Table 1. Comparison of Legacy Versus RBS
BLOB store scopeCan be configured on the content database level. (Each content database can have its own BLOB store.)Can be configured only on the farm level
Number of providersMultiple.Only one
Migrating BLOBs from SQL Server stores to BLOB stores and vice versaWindows PowerShell.Custom
SharePoint interfaceSharePoint 2010 ships with many Windows PowerShell commandlets that can be used to manage RBS installation and configuration.None
  •  Sharepoint 2010 : Managing SharePoint Content Databases
  •  Sharepoint 2010 : Maintaining SQL Server in a SharePoint Environment
  •  Sharepoint 2010 : Monitoring SQL Server in a SharePoint Environment
  •  Database Availability Group Replication in Exchange Server 2010 : Load Balancing in Exchange Server 2010
  •  Database Availability Group Replication in Exchange Server 2010 : Comparing and Contrasting DAG Versus CCR/SCR/SCC
  •  SQL Server 2008: Managing Query Performance - Forcing Index Seeks
  •  Exchange Server 2010 : Deploying a Database Availability Group (part 4)
  •  Exchange Server 2010 : Deploying a Database Availability Group (part 3)
  •  Exchange Server 2010 : Deploying a Database Availability Group (part 2) - Suspending and Reseeding a Database
  •  Exchange Server 2010 : Deploying a Database Availability Group (part 1) - Creating the File Share Witness
    Top 10 Video Game
    -   Minecraft Mods - MAD PACK #10 'NETHER DOOM!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Minecraft Mods - MAD PACK #9 'KING SLIME!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Minecraft Mods - MAD PACK #2 'LAVA LOBBERS!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Minecraft Mods - MAD PACK #3 'OBSIDIAN LONGSWORD!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
    -   Total War: Warhammer [PC] Demigryph Trailer
    -   Minecraft | MINIONS MOVIE MOD! (Despicable Me, Minions Movie)
    -   Minecraft | Crazy Craft 3.0 - Ep 3! "TITANS ATTACK"
    -   Minecraft | Crazy Craft 3.0 - Ep 2! "THIEVING FROM THE CRAZIES"
    -   Minecraft | MORPH HIDE AND SEEK - Minions Despicable Me Mod
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 92 "IS JOE DEAD?!"
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 93 "JEDI STRIKE BACK"
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 94 "TATOOINE PLANET DESTRUCTION"
    -   Minecraft | Dream Craft - Star Wars Modded Survival Ep 95 "TATOOINE CAPTIVES"
    -   Hitman [PS4/XOne/PC] Alpha Gameplay Trailer
    -   Satellite Reign [PC] Release Date Trailer