Dealing in data
Using responses from dozens of IT
personnel, Aberdeen Group identifies three storage tools to help manage data
growth, with each addressing a different challenge associated with storing big
data. These include storage virtualization (manages various data types), data
compression or deduplication (managed the size of data), and data tiering
(manages the speed at which data changes). "Using these three storage
features, organizations can reduce the financial impact of big data
management," according to Aberdeen's report.
Of the three, Csaplar says storage
virtualization is probably interesting to SMEs because they likely only have
one storage device or multiple devices from the same vendor. Data deduplication
and storage tiering, however, "are capabilities available today from
multiple vendors that can be set and forgotten," he says.
In its "The Economic Benefit of
Storage Efficiency Technologies" study, IDC reported that storage end
users that currently deploy or plan to deploy storage and/or data efficiency
technologies in data centers typically adopt technologies in stages, beginning
with compression and deduplication, then storage virtualization, tiering, thin
provisioning, and replication. IDC identified data compression and storage
virtualization as the two most adopted storage efficiency technologies and data
deduplication as the most desirable for near-future implementation.
In preparing for and managing big
data-level storage requirements, Quinn says existing tools SMEs use to predict
storage requirements "typically suffice for big data applications."
If acquiring a big data appliance, "the levels are largely
predetermined," though he advises ensuring "that storage and CPU upgrade
paths are clear and pricing transparent expect that the upgrades will be
required." ESG, Quinn says, believes tiered storage and thin provisioning
will prove key to ongoing big data operations.
Robosson says organizations of all sizes
are assessing methods and tools to improve real-time delivery of big data
analytics. "Traditional SANs may in some cases be insufficient, giving
rise to SSD [solid-state drive] and DAS [direct-attached storage] as higherperformance options purely from a device
perspective," he says. Once SMEs determine a big data strategy, he says,
new data storage appliances offer good options.
Where backup and DR (disaster recovery)
requirements related to big data are concerned, Robosson says, "as the
data model expands to include nontraditional data types and sources, so will
disaster recovery protocols undergo scope changes." Companies will need to
determine how critical new data types are to their operations, he says.
"In other words, there may be cases were conventional DR rules might not
apply to non-transactional data (images, video, social media, spatial, etc.)
vs. relational data that businesses rely on for accounting, customer service,
and routine operations," he says.
Is
Virtualization Necessary for SME's?
For initial big data projects that rate as
"experimental" or "first pass discovery," Quinn says
disaster recovery and data protection requirements may lag behind what
companies currently require at an existing operational data store/data
warehouse. As big data becomes essential to critical decision-making, however,
the "big data facility and/or apps will rank among the top tier of apps,
like transactional applications," he says. "That implies that
eventually DR and data protection will be extremely important. Our surveys . .
. suggest that after availability, data archiving and data migration are the
next most important infrastructural functions in support of big data."
Woo explains that some companies will view
data related to big data as derivative, meaning if it's lost, it can be
regenerated. Others will view it as archival, "which means that backups
should have already been taken of the data," Woo says. "More progressive
and aggressive companies may need to update backup procedures to accommodate an
increased amount of data retention, and therefore upgrade backup
infrastructure. Tape is actually very good for this purpose."
Where cloud computing and storing big data type
data volumes is concerned, Csaplar sees the cloud as an ultimate storage tier
vs. primary storage site. "Clouds are remote, and therefore latency is
introduced if you try and use cloud for just storage. I would think of clouds
for archiving, backup, and recovery or long term storage options," he
says. Woo says much data now used for big data analysis is actually cloud-based
data that includes social media posts. "In these cases, since the data is
already in the cloud, it's best left in the cloud," he says.
Many vendor storage appliances are easily
deployable in a private cloud, Robosson says. SMEs must evaluate the investment
and associated benefits of cloud-based access, he says. Public cloud providers
are also a viable option for many SMEs that can't incur the upfront investment,
he says. Quinn, meanwhile, says companies can use the cloud "as the
overflow or temporary data bucket for peaks, projects, and special but
temporary or less important information-management needs associated with big
data."
How to proceed
The approach SMEs should take to prepare
for a big data initiative can vary. Robosson advises companies thoroughly
assess their data model, data types, volumes, and potential business
intelligence when developing a big data strategy and associated storage game
plan. The SME should then complete a proof of concept to validate a vendor
solution it's considering before proceeding with implementation.
The
approach SMEs should take to prepare for a big data initiative can vary
Due to the complexity and cost involved,
Quinn says SMEs should consider smaller platform options for big data. He
suggests choosing one in which vendors offer "cloud and/or prepackaged
(with partners) infrastructures and a decent array of integration tools,
analytics functions, and rich visualization." He adds that the larger big
data suppliers are often "designed to better serve the larger data
volumes and broader complexity of requirements of the Global 2000."
Positively, Quinn says most best practices
that businesses have already learned apply to big data. "Understanding
the importance and periodicity of the data flows for big data are essential,
and remember the data flows aren't just for data ingest but also for feeding
data potentially to visualization tools. You may have new integration flows as
well, and those may require fresh storage and information management."
After a business attunes itself to data, Quinn says, it can use tiered storage
approaches.
Cost-wise, Woo believes costs should be
secondary to the business purpose for which the company is deploying big data.
That said, Woo says, costs may not generally be that high, as companies can use
industry-standard servers along with open source software. "The real cost
really comes in people," he says.
Robosson suggests that SMEs budget for an
initial analysis to help map out an overall game plan and plan for IT training,
storage appliance acquisition, and implementation services. Quinn, meanwhile,
advises companies and IT managers work jointly to create a complete cost
estimate for big data, "not just storage, not just infrastructure, but
software, training, additional personnel, etc." Overall, he says, expect
costs similar to that of existing data warehousing and business intelligence
solutions, "but most importantly remember that you're starting new,
meaning you will have capital outlays and/or increases in subscription
outlays."
Key points
Data is generated at unprecedented levels,
making processing and analyzing information difficult using traditional
databases and software.
Many big data vendors are including
infrastructure and software in their appliances to alleviate pressure and
guesswork facing SMEs considering big data initiatives.
When developing a big data strategy, an SME
should thoroughly assess its data model, data types, volumes, and potential
business intelligence.
Storage end users that currently deploy or
plan to deploy storage and/or data efficiency technologies in data centers
typically adopt technologies in stages.