How you’ll use it
According to Amatruda, midsized and large
enterprises (with more than 500 employees) are deploying storage optimization
as part of dedicated devices (often specialized backup appliances). Smaller
firms, he says, often opt for a software solution. Nevertheless, he notes,
appliance adoption is on the rise in all but the smallest firms.
"Companies can deploy an appliance with built-in data deduplication and
not have to change any of their processes or people, but [they] still get that
dramatic increase in storage efficiency."
Regarding the approaches (target vs.
source; in-process vs. post-process; even file vs. block) each has its
drawbacks and benefits. For example, both target and in process ensure data is deduped
before it makes it to the backup device. This saves storage space but can make
the and backup process take longer.
The
de-duplication process looks for redundant instances of backup data at a
sub-file or block level across all backup data so that only one copy is saved
on the storage. The main benefit of data de-duplication is cost saving.
Some experts we spoke with said that while
many businesses may never know the specifics of the solutions deduping they
have in place, they do need to do their homework and choose partners they
trust. "For SMBs especially, data deduplication is going to be something
that is built-in," says Hill.
Filks concurs, noting, "SMB's will use
deduplication when it becomes a standard and transparent feature that does not
require any or very little customization, implementation, or management."
Furthermore, he says companies can largely dismiss concerns about data integrity,
such as the possibility of incorrect tagging or over deletion. "Fear of
the new and the lack of understanding of the algorithms will always raise
concerns, but the technical issues have been resolved," Filks says.
"Consider flying most do not understand how lift is obtained, but we still
use airplanes and trust them."
Amatruda says the reasons to use data
deduplication and other storage management techniques are simply too compelling
for businesses to ignore. "Redundancies are controlled with the ability to
resurrect files without any unnecessary time, processing power, or effort. The
end result is less overhead, less expense, less power, and less physical
space."
A planning opportunity
Despite his enthusiasm, Hill urges business
owners and executives not to adopt data deduplication casually especially if
they are moving from tape or server based storage to a networked storage
environment. Hill says, "This is a good opportunity for SMBs to say, 'What
do we need to have done here? We are not going into data deduplication just to
save disk space. We have to see how it fits into our data retention policies;
the need for e-discovery; our disaster recovery plan.'
"Data deduplication providers will
talk about in-line vs. post-processing; my answer is, 'Who cares?' Business
owners need to consider all the [issues surrounding data deduplication what
kind of toolset they need; whether the business has unique requirements like
regulatory burdens that affect the solution; how it impacts their data
protection and how they will control access to it."
With
the rate of data growth doubling every year, storage needs will grow nearly 10X
between 2010 and 2015.
He also recommends that businesses
calculate how much the storage savings will be once they consider other
infrastructure investments such as the controller, the array, the dual power
supplies. "Some business owners think, 'Data deduplication is great
because I don't have to keep adding disks,' but those are only a fraction of
the overall cost of the storage itself."
Finally, the storage solution itself must
be of the highest caliber to ensure minimal chance of data loss or corruption.
With deduplication, where 1,000 files may rely on a single copy of a source
image, integrity of the source data becomes incredibly important. Storage
systems that incorporate data deduplication must be able to tolerate multiple drive
failures, drive rebuilds, power failures, and other likely causes of data loss
and corruption.
Although the need is higher for data
integrity in deduplication storage, it also offers new opportunities for data
verification. Furthermore, although most companies use deduplication for
secondary (e.g., backup) drives, some are finding it useful in primary (production
or live-use) environments.
Bright outlook; Deep shadows
The potential benefits of data deduplication
are enormous and so tantalizing that some companies are even experimenting
with its use in live (production or real-time) environments. Nevertheless, the
issues surrounding its usage are not trivial.
Hill recommends companies develop strong
data governance policies before implementing storage solutions with data
deduplication. Furthermore, he says, "If a company doesn't have personnel
with the skill- sets to address the important questions, management better
make sure their storage partner is more than a technology provider. They need
an IT strategist."
The
number of files is growing at 1.5 times the rate of data and 7.5 times the rate
of servers, increasing the challenge of storage management.
Key points
Data deduplication replaces many copies of
a file or group of bytes with a single original placeholder file, substantially
reducing storage requirements.
There are numerous methodologies and
processes for achieving data deduplication, but those differences are less
significant than other concerns.
Companies implementing data deduplication
should look at the feature as part of a total storage solution and implement
the necessary policies to manage it.
Because all but one instance of a file is
deleted with data deduplication, purchasing extremely reliable equipment is
paramount.