A Look At Open Source Nosql Databases And Cloud Computing (Part 1)

7/10/2013 11:47:34 AM

The paradigm shift to Web 2.0 has led to the enormous popularity of social networking, blogs, articles, and wikis, resulting in the demand for a huge knowledge base across enterprises.

We have two kinds of data:

1.    Structured data, which includes a pre-defined data model that fits into relational tables, e.g., databases, XML files, and enterprise systems such as CRM and ERP.

2.    Unstructured data, which does not have a predefined data model, and does not fit into relational models; e.g., RSS feeds, audio files, video files, word documents, emails, and spreadsheets.

‘Not only SQL’ or NoSQL is a type of database management system that is not centered on the SQLbased relational database model. It is extremely effective when working with a huge volume of structured or unstructured data. NoSQL databases do not use SQL for data manipulation operations.

A look at open source Nosql databases and cloud computing

A look at open source Nosql databases and cloud computing

SQL, NoSQL and Cloud Computing

The focus of traditional databases is mainly on consistency, but being relational is not necessary for some specific use cases, and can add avoidable overhead. The use of NoSQL databases opens up the scope for enormous scalability, the ability to grow the capacity of your database on demand, low latency, and a relatively easier programming model – which SQL databases do not provide in a cost-effective manner.

In traditional SQL databases, data is normalized so that it can provide effective results, and prevent isolated records and duplicate data. Normalizing data requires multiple tables, which requires multiple join statements, thus requiring more keys and indexes. A primary disadvantage of SQL databases is the high abstraction level. To execute a single statement, SQL often requires the data to be processed multiple times, which takes time, and requires high performance, e.g., multiple queries are executed when there is a join operation. In addition, RDBMSs might not scale out easily – but the new breed of NoSQL databases are designed to expand transparently, and are designed with low-cost commodity hardware in mind.

In SQL databases, there is always a schema involved. Requirements may change, and the database has to be modified to support the new requirements. For example, the application may need two extra fields to store data; with SQL databases, this may take some time and thinking, while in case of NoSQL, it can be done easily, allowing the database to adopt new business requirements. However, SQL databases do have the advantage of better support for Business Intelligence.

The NoSQL world is one without relations, with pure scalability and no joins. NoSQL databases manage data that is not rigorously relational and tabular, so do not use SQL for data manipulation. NoSQL databases are typically non-relational, horizontally scalable, open source, and distributed. A key advantage of NoSQL over SQL databases is its ability to scale an application to new levels. NoSQL databases characteristically highlight horizontal scalability by partitioning and leveraging the elastic provisioning capabilities of the cloud. The NoSQL data services are based on scalable architectures and built for the cloud environment. They provide freedom to select a data model as per needs and use familiar tools.

With NoSQL databases, data replication can be done more easily than with SQL databases. As NoSQL databases are built without relations, data need not be on the same server, and can be processed independently, which allows better scaling than SQL databases. Don’t forget, scaling is one of the primary characteristics in Cloud Computing environments. The traditional RDBMS may not be a good fit for cloud-scale applications, due to the strict and upfront schema requirements.

Figure 1: NoSQL databases

Figure 1: NoSQL databases

Open source NoSQL databases

The NoSQL movement can be defined by a simple principle: To use the solution that best fits the problem and suits the objectives. Use a key-value-pair database if the data structure is more appropriately accessed through this. And if you have connected data such as social networking or financial transaction graphs, then graph databases are appropriate.

Many cloud applications demand availability, speed, and fault tolerance over consistency, and hence have expanded beyond RDBMS technologies, resulting in the growth of NoSQL databases.

Tabular/ columnar data stores

Tabular/ columnar data stores look similar to tabular databases. Their primary data retrieval model uses column filters by leveraging hand-coded map-reduce algorithms.


HBase is based on Google BigTable and is column-oriented, open source, and distributed. It uses the Hadoop infrastructure (Zookeeper as a lock service and NameNode, the HDFS file system) and hence supports fault tolerance and scalability inherently, and adds random read-write capability. HBase tables are distributed as regions, and regions are automatically split and redistributed as data grows. It supports linear and modular scaling, adding RegionServers that can be hosted on the public cloud. Regions are vertically divided by column families into stores, which are stored as files on HDFS. Potential use cases and features include:

§  Reads, supported by a single-write master

§  Ordered partitions that support efficient row scans

§  Range-based scans

§  Batch analysis

§  Large caches

HBase does not have many features such as triggers, secondary indexes, etc.

Figure 2: HBase cluster architecture

Figure 2: HBase cluster architecture


Hypertable is an open source database inspired by publications on the design of BigTable. It runs on top of a distributed file system such as the Apache Hadoop DFS, GlusterFS, or the Kosmos File System (KFS). It is written almost entirely in C++, for performance. Its features are:

§  Scalability: It is based on a design developed by Google to meet scalability requirements.

§  Performance: It offers a responsive user experience with low request latency.

§  Supports a wide range of applications: Data is sorted by a primary key

§  Cost saving: It has a high capacity on a tiny proportion of the hardware.

§  Clean semantics: It has a consistent database.

Figure 3: Hyperspace is a highly available lock manager and provides a file system for storing small amounts of metadata. The master handles all Meta operations such as creating and deleting tables; range servers are responsible for managing ranges of table data, handling all reading and writing of data; DFS broker provides normalized file system interface and translates normalized file system requires into native file system requests and vice-versa; distributed File System

Figure 3: Hypertable model

Figure 3: Hypertable model

Document stores

The main concept of a document store is the document. Document-oriented databases are designed to store, retrieve, and manage document-oriented structures (like XML files – XML data sources leverage XQuery), or semi-structured data (like text files – text documents are indexed and facilitate keyword search-like retrieval). Each document-oriented database encapsulates and encodes data in some standard format or encodings such as XML, YAML, or JSON as well as binary forms such as BSON, PDF, and Microsoft Office documents.

  •  Getting Started With An Open Source Circuit Simulator
  •  Introducing NVIDIA’s Compute Unified Device Architecture (CUDA)
  •  Open Source-Packed Innovations
  •  API Design - Learning From Mistakes In The C Library
  •  Ubuntu Phones To Hit Stores In October (Part 3)
  •  Ubuntu Phones To Hit Stores In October (Part 2)
  •  Ubuntu Phones To Hit Stores In October (Part 1)
  •  Various DSL Technologies And How They Differ (Part 3)
  •  Various DSL Technologies And How They Differ (Part 2)
  •  Various DSL Technologies And How They Differ (Part 1)
  •  Planning An Android-Based Device For Enterprise
  •  New Products Recently Introduced – July 2013
  •  Essential Extra - Got An Android Phone Or Tablet? (Part 2)
  •  Essential Extra - Got An Android Phone Or Tablet? (Part 1)
  •  For Start-Ups, The Cloud Is The Way To Go!
  •  Hot Technologies From A Cloud Country (Part 2)
  •  Hot Technologies From A Cloud Country (Part 1)
  •  CodeSport - Given The Importance Of Data Storage In A ‘Big Data’ World
  •  Wake Up Your Wi-Fi (Part 2)
  •  Wake Up Your Wi-Fi (Part 1)
    Most View
    Pioneer DDJ-WeGO - Entry-lever DJ Controller
    Samsung WB750 - Small Camera, Big Zoom
    Custom: Installation Nation (Part 2)
    Livescribe Sky 4GB Wi-Fi Smartpen
    HP Envy 23 TouchSmart All-in-One
    Smartphones That Well Surf On Website
    ASP.NET 4 : Data Source Controls (part 2) - Parameterized Commands
    Microsoft Wedge Keyboard - The First Keyboard To Be Built For Windows 8
    Lite-On iHes112-115 - 12x SATA Internal Blu-Ray Combo Drive
    The Roundup Of 120mm Fans: 1,350RPM Speed And More (Part 5)
    Top 20
    Awesome Foursome Of July 2013
    Google Glass - It’s In Your Hands
    Opulence Re-defined - Bigger Ultra HD TV’s Is Better?
    The New 19" LED TV From AOC - Almost Life Like
    What's Hot in Tech – July 2013
    The Architecture Of Cloud Foundry
    Samsung Galaxy Tab 2 10.1 - A 10.1-Inch Android-Based Tablet Computer
    The Battle Of The Budget Tablets
    The Zync Z1000 - Yet Another Budget Tablet
    Videocon VT10 - A Promising Debut
    Test Stereo Amplifiers - Driving Your Tunes Forward (Part 3) : Rotel RA-10
    Test Stereo Amplifiers - Driving Your Tunes Forward (Part 2) : Marantz PM6004, Onkyo A-9050
    Test Stereo Amplifiers - Driving Your Tunes Forward (Part 1) : Denon PMA-720AE
    The Cowon EM1 Earphones - Musically Yours
    The JBL J22i - A Lot More Than Just Earphones
    Code Sport - Scale-Up vs. Scale-Out Storage
    Network Deployment And Alternative OSs
    Simplifying Deployment On The Cloud With Heroku
    Karbonn Smart Tab 10 - Cosmic - The Komplete Tablet
    Razer Edge - PC Gaming Has Never Been More Portable