ENTERPRISE

A Look At Open Source Nosql Databases And Cloud Computing (Part 1)

7/10/2013 11:47:34 AM

The paradigm shift to Web 2.0 has led to the enormous popularity of social networking, blogs, articles, and wikis, resulting in the demand for a huge knowledge base across enterprises.

We have two kinds of data:

1. Structured data, which includes a pre-defined data model that fits into relational tables, e.g., databases, XML files, and enterprise systems such as CRM and ERP.

2. Unstructured data, which does not have a predefined data model, and does not fit into relational models; e.g., RSS feeds, audio files, video files, word documents, emails, and spreadsheets.

‘Not only SQL’ or NoSQL is a type of database management system that is not centered on the SQLbased relational database model. It is extremely effective when working with a huge volume of structured or unstructured data. NoSQL databases do not use SQL for data manipulation operations.

A look at open source Nosql databases and cloud computing

SQL, NoSQL and Cloud Computing

The focus of traditional databases is mainly on consistency, but being relational is not necessary for some specific use cases, and can add avoidable overhead. The use of NoSQL databases opens up the scope for enormous scalability, the ability to grow the capacity of your database on demand, low latency, and a relatively easier programming model – which SQL databases do not provide in a cost-effective manner.

In traditional SQL databases, data is normalized so that it can provide effective results, and prevent isolated records and duplicate data. Normalizing data requires multiple tables, which requires multiple join statements, thus requiring more keys and indexes. A primary disadvantage of SQL databases is the high abstraction level. To execute a single statement, SQL often requires the data to be processed multiple times, which takes time, and requires high performance, e.g., multiple queries are executed when there is a join operation. In addition, RDBMSs might not scale out easily – but the new breed of NoSQL databases are designed to expand transparently, and are designed with low-cost commodity hardware in mind.

In SQL databases, there is always a schema involved. Requirements may change, and the database has to be modified to support the new requirements. For example, the application may need two extra fields to store data; with SQL databases, this may take some time and thinking, while in case of NoSQL, it can be done easily, allowing the database to adopt new business requirements. However, SQL databases do have the advantage of better support for Business Intelligence.

The NoSQL world is one without relations, with pure scalability and no joins. NoSQL databases manage data that is not rigorously relational and tabular, so do not use SQL for data manipulation. NoSQL databases are typically non-relational, horizontally scalable, open source, and distributed. A key advantage of NoSQL over SQL databases is its ability to scale an application to new levels. NoSQL databases characteristically highlight horizontal scalability by partitioning and leveraging the elastic provisioning capabilities of the cloud. The NoSQL data services are based on scalable architectures and built for the cloud environment. They provide freedom to select a data model as per needs and use familiar tools.

With NoSQL databases, data replication can be done more easily than with SQL databases. As NoSQL databases are built without relations, data need not be on the same server, and can be processed independently, which allows better scaling than SQL databases. Don’t forget, scaling is one of the primary characteristics in Cloud Computing environments. The traditional RDBMS may not be a good fit for cloud-scale applications, due to the strict and upfront schema requirements.

Figure 1: NoSQL databases

Open source NoSQL databases

The NoSQL movement can be defined by a simple principle: To use the solution that best fits the problem and suits the objectives. Use a key-value-pair database if the data structure is more appropriately accessed through this. And if you have connected data such as social networking or financial transaction graphs, then graph databases are appropriate.

Many cloud applications demand availability, speed, and fault tolerance over consistency, and hence have expanded beyond RDBMS technologies, resulting in the growth of NoSQL databases.

Tabular/ columnar data stores

Tabular/ columnar data stores look similar to tabular databases. Their primary data retrieval model uses column filters by leveraging hand-coded map-reduce algorithms.

HBase

HBase is based on Google BigTable and is column-oriented, open source, and distributed. It uses the Hadoop infrastructure (Zookeeper as a lock service and NameNode, the HDFS file system) and hence supports fault tolerance and scalability inherently, and adds random read-write capability. HBase tables are distributed as regions, and regions are automatically split and redistributed as data grows. It supports linear and modular scaling, adding RegionServers that can be hosted on the public cloud. Regions are vertically divided by column families into stores, which are stored as files on HDFS. Potential use cases and features include:

§ Reads, supported by a single-write master

§ Ordered partitions that support efficient row scans

§ Range-based scans

§ Batch analysis

§ Large caches

HBase does not have many features such as triggers, secondary indexes, etc.

Figure 2: HBase cluster architecture

Hypertable

Hypertable is an open source database inspired by publications on the design of BigTable. It runs on top of a distributed file system such as the Apache Hadoop DFS, GlusterFS, or the Kosmos File System (KFS). It is written almost entirely in C++, for performance. Its features are:

§ Scalability: It is based on a design developed by Google to meet scalability requirements.

§ Performance: It offers a responsive user experience with low request latency.

§ Supports a wide range of applications: Data is sorted by a primary key

§ Cost saving: It has a high capacity on a tiny proportion of the hardware.

§ Clean semantics: It has a consistent database.

Figure 3: Hyperspace is a highly available lock manager and provides a file system for storing small amounts of metadata. The master handles all Meta operations such as creating and deleting tables; range servers are responsible for managing ranges of table data, handling all reading and writing of data; DFS broker provides normalized file system interface and translates normalized file system requires into native file system requests and vice-versa; distributed File System

Figure 3: Hypertable model

Document stores

The main concept of a document store is the document. Document-oriented databases are designed to store, retrieve, and manage document-oriented structures (like XML files – XML data sources leverage XQuery), or semi-structured data (like text files – text documents are indexed and facilitate keyword search-like retrieval). Each document-oriented database encapsulates and encodes data in some standard format or encodings such as XML, YAML, or JSON as well as binary forms such as BSON, PDF, and Microsoft Office documents.

Related

A Look At Open Source Nosql Databases And Cloud Computing (Part 2)

Other

Getting Started With An Open Source Circuit Simulator

Introducing NVIDIA’s Compute Unified Device Architecture (CUDA)

Open Source-Packed Innovations

API Design - Learning From Mistakes In The C Library

Ubuntu Phones To Hit Stores In October (Part 3)

Ubuntu Phones To Hit Stores In October (Part 2)

Ubuntu Phones To Hit Stores In October (Part 1)

Various DSL Technologies And How They Differ (Part 3)

Various DSL Technologies And How They Differ (Part 2)

Various DSL Technologies And How They Differ (Part 1)

Planning An Android-Based Device For Enterprise

New Products Recently Introduced – July 2013

Essential Extra - Got An Android Phone Or Tablet? (Part 2)

Essential Extra - Got An Android Phone Or Tablet? (Part 1)

For Start-Ups, The Cloud Is The Way To Go!

Hot Technologies From A Cloud Country (Part 2)

Hot Technologies From A Cloud Country (Part 1)

CodeSport - Given The Importance Of Data Storage In A ‘Big Data’ World

Wake Up Your Wi-Fi (Part 2)

Wake Up Your Wi-Fi (Part 1)

Most View

- Pioneer DDJ-WeGO - Entry-lever DJ Controller

- Samsung WB750 - Small Camera, Big Zoom

- Custom: Installation Nation (Part 2)

- Livescribe Sky 4GB Wi-Fi Smartpen

- HP Envy 23 TouchSmart All-in-One

- Smartphones That Well Surf On Website

- ASP.NET 4 : Data Source Controls (part 2) - Parameterized Commands

- Microsoft Wedge Keyboard - The First Keyboard To Be Built For Windows 8

- Lite-On iHes112-115 - 12x SATA Internal Blu-Ray Combo Drive

- The Roundup Of 120mm Fans: 1,350RPM Speed And More (Part 5)

Top 20

- Awesome Foursome Of July 2013

- Google Glass - It’s In Your Hands

- Opulence Re-defined - Bigger Ultra HD TV’s Is Better?

- The New 19" LED TV From AOC - Almost Life Like

- What's Hot in Tech – July 2013

- The Architecture Of Cloud Foundry

- Samsung Galaxy Tab 2 10.1 - A 10.1-Inch Android-Based Tablet Computer

- The Battle Of The Budget Tablets

- The Zync Z1000 - Yet Another Budget Tablet

- Videocon VT10 - A Promising Debut

- Test Stereo Amplifiers - Driving Your Tunes Forward (Part 3) : Rotel RA-10

- Test Stereo Amplifiers - Driving Your Tunes Forward (Part 2) : Marantz PM6004, Onkyo A-9050

- Test Stereo Amplifiers - Driving Your Tunes Forward (Part 1) : Denon PMA-720AE

- The Cowon EM1 Earphones - Musically Yours

- The JBL J22i - A Lot More Than Just Earphones

- Code Sport - Scale-Up vs. Scale-Out Storage

- Network Deployment And Alternative OSs

- Simplifying Deployment On The Cloud With Heroku

- Karbonn Smart Tab 10 - Cosmic - The Komplete Tablet

- Razer Edge - PC Gaming Has Never Been More Portable

site
stats