ENTERPRISE

Code Sport - Scale-Up vs. Scale-Out Storage

7/11/2013 9:09:55 AM

Over the next few columns, we will continue our discussion on data storage systems and look at how they are evolving to cater to the world of data-centric computing.

Last month, we had started our discussion of storage systems by looking at various concepts like SAN, NAS, etc. In this column, we will look at the concept of scale-up vs. scale-out storage, and discuss their relative advantages and disadvantages.

Scale-up vs. scale-out storage

Most of us are familiar with the concepts of scale up and scale-out computing as they apply to server computing. A scale-up server is a single system with high CPU power and huge memory, and hence is able to handle increasing workloads. As the load on the system increases, it can be scaled up by adding more CPUs, more memory and storage to the single system, which is typically a shared memory system. The work is usually done by multiple threads and processes running on the same system, and they typically communicate through shared memory. This is a ‘shared-everything’ system. On the other hand, scale-out computing is when multiple nodes are part of the system, and they all act together to handle the workload, which is typically partitioned across the nodes. There is usually no shared memory across the nodes – they communicate through explicit messages over the network. Each node has both processing and storage elements. This is normally a ‘shared nothing’ approach, where the nodes interact in a loosely coupled manner. The scale-out system can be scaled up to handle additional loads by adding more nodes to it; the work can be distributed across newly added nodes transparently, without the need to bring down the system.

The concept of scale-up vs. scale-out storage, and discuss their relative advantages and disadvantages

The concept of scale-up vs. scale-out storage, and discuss their relative advantages and disadvantages

Now, scale-out and scale-up storage systems have the same conceptual difference. In a scale-up storage system, you add more and more storage capacity to the single storage node to meet increasing storage requirements. This can be achieved by adding many individual disk drives to a storage controller (or a pair of storage controllers so as to ensure failover and high availability). Since the storage capacity (i.e., number of individual disk drives) that a storage controller can support is limited, if storage requirements exceed that capacity, the only option is to move to the next bigger controller, with a higher capacity. With scale-out storage, this problem is avoided, since each node has its own storage controller, and as you add more nodes to meet increasing storage demands, it allows the controller architecture to grow as well. Scale-out storage is typically marketed under the slogan, “Pay as you grow”, since it allows you to add nodes as and when your storage requirements increase, instead of having to over-provision from the beginning itself.

How do scale-up and scale-out storage systems compare in terms of performance?

How do scale-up and scale-out storage systems compare in terms of performance?

The next question that arises is about performance and cost. How do scale-up and scale-out storage systems compare in terms of performance? Before that, we need to understand what the common measures of performance for storage systems are. For computing systems, you can specify its performance in terms of its clock frequency (1/clock frequency gives the clock cycle period) and Cycles Per Instruction (CPI). It is possible to approximate the execution time for a program as (clock period * CPI * Instruction Count). For a computation system, the number of operations/instructions completed per second is a performance measure. This is typically specified as how many MIPS (million instructions per second) or how many FLOPS (floating point operations per second) the system can execute. For storage systems, the typical performance measures used are:

1.    Throughput,

2.    IOPS (number of Input /Output Operations Per Second), and

3.    Average latency of an operation.

The most common measure of a storage system – be it a simple SATA drive connected directly to your PC, or a complex storage array controller connected through SCSI – is how much maximum throughput it can deliver. This is measured as data transfer per second (megabytes per second or MBps). IOPS is the number of IO operations that can happen per second on the storage device, and is defined as the total number of read/write operations that can be performed in one second on the storage device in question. An IO operation request will take a certain time to complete. This can be specified in terms of average latency of an IO operation. Each IO operation is an IO request for a certain size of IO. In simple terms, an IO request can be a read operation of X bytes or a write operation of Y bytes. Hence, each IO request is characterized by its type and size. Now, we can approximate the throughput/data transferred into and out of storage systems as:

Throughput = IOPS * average latency of IO operation * average request size

The IO request latency is limited by a number of factors, including the physical factors associated with the storage system. Let us consider a trivial example where the storage system is the traditional hard disk. Here the IO latency is limited by the rotational delay of the disk, the seek time of the disk and data transfer latency. The rotational delay is governed by the disk’s rotational speed, which is expressed in Revolutions Per Minute (RPM) and indicates the amount of time it takes to get the right sector under the disk head. The seek time is governed by the time it takes for the disk head to position itself on the right track on the disk. Once the disk head is in the right position, it can start reading/writing the data. The delay incurred in reading/writing the data is the data transfer latency. All these three components contribute to the IO request latency.

Depending on the purpose of the storage system, any or all of these performance measures would be significant. For instance, if you are using the storage system for backup/archival purposes, you may be more concerned with overall IO throughput rather than the individual IO request latency. On the other hand, if the purpose is to serve data to a latency-sensitive/client facing application such as a database, IO request latency would become a significant measure of performance for the storage system under consideration. Hence, different performance measures are applicable for different workloads being deployed on the storage system.

Given that the file access operation can span multiple nodes as in the example mentioned earlier, let us assume that we are trying to delete the file located at ‘/dir1/file1’ in this example

Given that the file access operation can span multiple nodes as in the example mentioned earlier, let us assume that we are trying to delete the file located at ‘/dir1/file1’ in this example

Coming back to comparing scale-out storage and scale-up storage, each has its own performance benefits. Scale-up storage can offer good IO request latencies, while scale-out can offer good IO throughputs. Since scale-out storage is a distributed system, it can suffer from greater communication costs incurred between the nodes. For instance, consider a distributed file system supported on scale-out storage. Both the data and the meta-data associated with each file can be distributed over different nodes on the storage system. For instance, consider a simple example where you are trying to access a file ‘/dir1/file1’. It is possible that dir1 is located on node1 and file1 is located on node2. Hence, the file access operation spans multiple nodes in the scale-out storage system, which leads to greater communication costs and a longer request latency.

Here is a question for our readers to think about. Given that the file access operation can span multiple nodes as in the example mentioned earlier, let us assume that we are trying to delete the file located at ‘/dir1/file1’ in this example. This operation spans two nodes. How can we ensure that the operation is atomic, even if one of the nodes fails while the operation is in progress?

Other  
 
Top 10
3 Tips for Maintaining Your Cell Phone Battery (part 2) - Discharge Smart, Use Smart
3 Tips for Maintaining Your Cell Phone Battery (part 1) - Charge Smart
OPEL MERIVA : Making a grand entrance
FORD MONDEO 2.0 ECOBOOST : Modern Mondeo
BMW 650i COUPE : Sexy retooling of BMW's 6-series
BMW 120d; M135i - Finely tuned
PHP Tutorials : Storing Images in MySQL with PHP (part 2) - Creating the HTML, Inserting the Image into MySQL
PHP Tutorials : Storing Images in MySQL with PHP (part 1) - Why store binary files in MySQL using PHP?
Java Tutorials : Nested For Loop (part 2) - Program to create a Two-Dimensional Array
Java Tutorials : Nested For Loop (part 1)
REVIEW
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
VIDEO TUTORIAL
- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 1)

- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 2)

- How to create your first Swimlane Diagram or Cross-Functional Flowchart Diagram by using Microsoft Visio 2010 (Part 3)
Popular Tags
Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Biztalk Exchange Server Microsoft LynC Server Microsoft Dynamic Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Indesign Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe After Effects Adobe Photoshop Adobe Fireworks Adobe Flash Catalyst Corel Painter X CorelDRAW X5 CorelDraw 10 QuarkXPress 8 windows Phone 7 windows Phone 8 BlackBerry Android Ipad Iphone iOS