In versions of SQL Server prior to SQL Server 2008,
there were two ways of storing unstructured data: as a binary large
object (BLOB) in an image or varbinary(max)
column, or in files outside the database, separate from the structured
relational data, storing a reference or pathname to the file in a varchar
column. Neither of these methods is ideal for handling unstructured
data. Storing the data outside the database makes managing the
unstructured data and keeping it associated with structured data more
complex. This approach lacks transactional consistency, coordinating
backups and restores with the structured data in the database is
difficult, and implementing proper data security can be quite
cumbersome.
Storing the unstructured data in
the database solves the transactional consistency, backup/restore, and
security issues, but BLOBs have different usage patterns than relational
data. SQL Server’s storage engine is primarily concerned with doing I/O
on relational data stored in pages and extents, not streaming large
BLOBs. I/O performance typically degrades dramatically if the size of
the BLOB data increases beyond 1MB. Accessing BLOB data stored inside a
SQL Server database is generally slower than storing it externally in a
location such as the NTFS file system. In addition, BLOB storage is not
as efficient as the file system for storing large data values, so more
storage space is required.
FILESTREAM storage, introduced
in SQL Server 2008, helps to solve the issues with using unstructured
data by integrating the SQL Server Database Engine with the NTFS file
system for storing unstructured data such as documents and images on the
file system with a pointer to the data in the database. The file
pointer is implemented in SQL Server as a varbinary(max) column, and the actual data is stored in files in the file system.
In addition to enabling
client applications to leverage the rich NTFS streaming APIs and the
performance of the file system for storing and retrieving unstructured
data, other advantages of FILESTREAM storage include the following:
You are able to use
T-SQL statements to insert, update, query, and back up FILESTREAM data
even though the actual data resides outside the database in the NTFS
file system.
You are able to maintain transactional consistency between the unstructured data and corresponding structured data.
You
are able to enforce the same level of security on the unstructured data
as with your relational data using built-in SQL Server security
mechanisms.
FILESTREAM
uses the NT system cache for caching file data rather than caching the
data in the SQL Server buffer pool, leaving more memory available for
query processing.
FILESTREAM storage also eliminates the size limitation of BLOBS stored in the database. Whereas standard image and varbinary(max)
columns have a size limitation of 2GB, the sizes of the FILESTREAM
BLOBs are limited only by the available space of the file system.
Columns with the FILESTREAM
attribute set can be managed just like any other BLOB column in SQL
Server. Administrators can use the manageability and security
capabilities of SQL Server to integrate FILESTREAM data management with
the rest of the data in the relational database—without needing to
manage the file system data separately. This includes maintenance
operations such as backup and restore, complete integration with the SQL
Server security model, and full-transaction support to ensure
data-level consistency between the relational data in the database and
the unstructured data physically stored on the file system. The database
administrator does not need to manage the file system data separately
Whether you should use
database storage or file system storage for your BLOB data is
determined by the size and use of the unstructured data. If the
following conditions are true, you should consider using FILESTREAM:
The objects being stored as BLOBS are, on average, larger than 1MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
Enabling FILESTREAM Storage
If you decide to use FILESTREAM
storage, it first needs to be enabled at both the Windows level as well
as at the SQL Server Instance level. FILESTREAM storage can be enabled
automatically during SQL Server installation or manually after
installation.
If you are enabling
FILESTREAM during SQL Server installation, you need to provide the
Windows share location where the FILESTREAM data will be stored. You can
also choose whether to allow remote clients to access the FILESTREAM
data.
If you did not enable the
FILESTREAM option during installation, you can enable it for a running
instance of SQL Server 2008 at any time using SQL Server Configuration
Manager (SSCM). In SSCM, right-click on the SQL Server Service and
select Properties. Then select the FILESTREAM tab, which provides
similar options as those displayed during SQL Server installation (see Figure 1).
This enables SQL Server to work directly with the Windows file system
for storing FILESTREAM data. You have three options for how FILESTREAM
functionality will be enabled:
Allowing only T-SQL access (by checking only the Enable FILESTREAM for Transact-SQL Access option).
Allowing
both T-SQL and Win32 access to FILESTREAM data (by checking the Enable
FILESTREAM for File I/O Streaming Access option and providing a Windows
share name to be used to access the FILESTREAM data). This allows Win32
file system interfaces to provide streaming access to the data.
Allowing
remote clients to have access to the FILESTREAM data that is stored on
this share (by selecting the Allow Remote Clients to Have Streaming
Access to FILESTREAM Data option).
Note
You need to be Windows Administrator on a local system and have sysadmin rights to enable FILESTREAM for SQL Server.
After you enable
FILESTREAM in SQL Server Configuration Manager, a new share is created
on the host system with the name specified. This share is intended only
to allow very low-level streaming interaction between SQL Server and
authorized clients. It is recommended that only the service account used
by the SQL Server instance should have access to this share. Also,
because this change takes place at the OS level and not from within SQL
Server, you need to stop and restart the SQL Server instance for the
change to take effect.
After restarting the SQL
Server instance to enable FILESTREAM at the Windows OS level, you next
need to enable FILESTREAM for the SQL Server Instance. You can do this
either through SQL Server Management Studio or via T-SQL. To enable
FILESTREAM for the SQL Server instance using SQL Server Management
Studio, right-click on the SQL Server
instance in the Object Explorer, select Properties, select the Advanced
page, and set the Filestream Access Level property as shown in Figure 2. The available options are
Disabled (0)— FILESTREAM access is not permitted.
Transact SQL Access Enabled (1)— FILESTREAM data can be accessed only by T-SQL commands.
Full Access Enabled (2)— Both T-SQL and Win32 access to FILESTREAM data are permitted.
You can also optionally enable FILESTREAM for the SQL Server instance using the sp_Configure system procedure, specifying the 'filestream access level' as the setting and passing the option of 0 (disabled), 1 (T-SQL access), or 2 (Full access). The following example shows full access being enabled for the current SQL Server instance:
EXEC sp_configure 'filestream access level', 2
GO
RECONFIGURE
GO
After
you configure the SQL Server instance for FILESTREAM access, the next
step is to set up a database to store FILESTREAM data.