RDBMS is highly touted for its ACID properties. That was fine until the the advent of applications which demand speed and scalability.
In an effort to address the shortcomings of RDBMS technology in modern interactive software systems, developers have adopted a number of “band aid” tactics:
In this approach an application will implement some form of data partitioning to manually spread data across servers. While this does work to spread the load, there are undesirable consequences to the approach:
- When you fill a shard, it is highly disruptive to re-shard.
- You lose some of the most important benefits of the relational model.
- You have to create and maintain a schema on every server.
This approach allows the type of information being stored in the database to change without requiring an update to the schema, makes sharding much easier and allows for rapid changes in the data model. Unfortunately, just about all relational database functionality is lost in the process.
This approach employs distributed caching technologies, such as Memcached, which sit in front of an RDMBS system, caching recently accessed data in memory and storing that data across any number of servers or virtual machines. Memcached and similar distributed caching technologies are useful to a point but are not a panacea and can create problems:
- Accelerate only data reads – Memcached was designed to accelerate the reading of data by storing it in main memory, but it was not designed to permanently store data.
- Cold cache thrash – As the application seeks but doesn’t find data in the caching tier, it is forced to read the data from the RDBMS, delaying both reads and writes, leading to application time-outs, unacceptably slow response times and user dissatisfaction.
- Another tier to manage – Inserting another tier of infrastructure into the architecture adds more capital costs, more operational expense, more points of failure, more complexity.
To address these shortcomings, NoSQL was born:
While implementations differ, NoSQL database management systems share a common set of characteristics:
- No schema required – Data can be inserted in a NoSQL DB without first defining a rigid database schema. The format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.
- Auto-sharding (sometimes called “elasticity”) – A NoSQL database (also known as a scale out database) automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime, with data (and I/O) automatically spread across the servers. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster, and even across data centers, to ensure high-availability and support disaster recovery. A properly managed NoSQL database system should never need to be taken offline, for any reason, supporting 24x7x365 continuous operation of applications.
- Distributed query support – “Sharding” an RDBMS can reduce, or eliminate in certain cases, the ability to perform complex data queries. NoSQL database systems retain their full query expressive power even when distributed across hundreds or thousands of servers.
- Integrated caching – To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers and explicitly managed by the ops team.
SQL or NoSQL? Try NewSQL! (courtesy: HighScalability.com)
Matt Aslett from the 451 group created a term called “NewSQL”. On the definition of NewSQL, Aslett writes:
“NewSQL” is our shorthand for the various new scalable/high performance SQL database vendors. We have previously referred to these products as ‘ScalableSQL’ to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all the products, we adopted the term ‘NewSQL’ in the new report.
And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL.
As with NoSQL, under the NewSQL umbrella you can see various providers, with various solutions.
I think these can be divided into several sub-types:
- New MySQL storage engines. These give MySQL users the same programming interface, but scale very well. You can Xeround or Akiban in this field. The good part is that you still use MySQL, but on the downside it’s not supporting other databases (at least not easily) and even MySQL users need to migrate their data to these new databases.
- New databases. These completely new solutions can support your scalability requirements. Of course, some (hopefully minor) changes to the code will be required, and data migration is still needed. Some examples are VoltDB and NimbusDB.
- Transparent Sharding. ScaleBase, which offers such a solution, lets you get the scalability you need from the database, but instead of rewriting the database, you can use your existing one. This allows you to reuse your existing skill set and eco-system, and you don’t need to rewrite your code or perform any data migration – everything is simple and quick. Other solutions in the field are dbShards for instance.
As in NoSQL, I believe each NewSQL solution has its own spot, answering specific needs.