Relational database management systems (e.g. MySQL, PostgreSQL, SQL Server, Oracle) have been considered as the one-size-fits-all solution for data persistence and retrieval for decades. They have matured after extensive research and development eorts and very successfully created a large market and solutions in different business domains. However, ever increasing need for scalability and new application requirements have created new challenges for traditional RDBMS. Therefore, recently, there has been some dissatisfaction with this one-size-fits-all approach in some Web scale applications.
A new generation of low-cost, high-performance database software has emerged to challenge dominance of relational database management systems. A big reason for this movement, named as NoSQL (Not Only SQL), is that dierent implementations of Web, enterprise, and cloud computing applications have dierent requirements of their databases (e.g. not every application requires rigid data consistency). For example, for high-volume Web sites (e.g. eBay, Amazon, Twitter, Facebook), scalability and high availability are essential requirements that can not be compromised. For these applications, even the slightest outage can have significant nancial consequences and impacts customer trust. In particular, these new NoSQL systems have a number of design features in common:
- The ability to horizontally scale out throughput over many servers.
- A simple call level interface or protocol (in contrast to a SQL binding).
- Supporting weaker consistency models than the ACID transactions in most traditional RDBMS. These models are usually referred to as BASE models (Basically Available, Soft state, Eventually consistent).
- Efficient use of distributed indexes and RAM for data storage.
- The ability to dynamically dene new attributes or data schema.
These design features are mainly targeting to achieve the following system goals:
- Availability: They must always be accessible even on the situations of having a network failure or a whole datacenter is went oine.
- Scalability: They must be able to support very large databases with very high request rates at very low latency.
- Elasticity: They must be able to satisfy changing application requirements in both directions (scaling up or scaling down). Moreover, the system must be able to gracefully respond to these changing requirements and quickly recover its steady state.
- Load Balancing: They must be able to automatically move load between servers so that most of the hardware resources are eectively utilized and to avoid any resource overloading situations.
- Fault Tolerance: They must be able to deal with the situation that the fararest hardware problems go from being freak events to eventualities. While hardware failure is still a serious concern, this concern need to be addressed at the architectural level of the database, rather than requiring developers, administrators and operations sta to build their own redundant solutions.
- Ability to run in a heterogeneous environment: On scaling out environment, there is a strong trend towards increasing the number of nodes that participate in query execution. It is nearly impossible to get homogeneous performance across hundreds or thousands of compute nodes. Part failures that do not cause complete node failure, but result in degraded hardware performance become more common at scale. Hence, the system should be designed to run in a heterogeneous environment and must take appropriate measures to prevent performance degrading due to parallel processing on distributed nodes.
In general, the NoSQL systems can be broadly classied into the following categories:
- Key-value stores: These systems use the simplest data model which is a collection of objects where each object has a unique key and a a set of attribute/value pairs.
- Extensible record stores: They provide variable-width tables (Column Families) that can be partitioned vertically and horizontally across multiple nodes.
- Document stores: Where the data model consists of objects with a variable number of attributes with a possibility of having nested