A user recently asked:
I’m learning traditional Relational Databases (with PostgreSQL) and doing some research I’ve come across some new types of databases. CouchDB, Drizzle, and Scalaris to name a few, what is going to be the next database technologies to deal with?
SQL is a language for querying and manipulating relational databases. SQL is dictated by an international standard. While the standard is revised, it seems to always work within the relational database paradigm.
Here are a few new data storage technologies that are getting attention currently:
- CouchDB is a non-relational database. They call it a document-oriented database.
- Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web service. Amazon also has a distributed key-value store called Dynamo, which powers some of its S3 services.
- Dynomite and Kai are open source solutions inspired by Amazon Dynamo.
- BigTable is a proprietary data storage solution used by Google, and implemented using their Google File System technology. Google’s MapReduce framework uses BigTable.
- Hadoop is an open-source technology inspired by Google’s MapReduce, and serving a similar need, to distribute the work of very large scale data stores.
- Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It’s a research project from the Zuse Institute in Berlin, Germany.
- RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its own query language SPARQL, which resembles SQL superficially, but is actually totally different.
- Vertica is a highly scalable column-oriented analytic database designed for distributed (grid) architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon’s Elastic Compute Cloud.
- Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL.
- XML isn’t a DBMS at all, it’s an interchange format. But some DBMS products work with data in XML format.
- ODBMS, or Object Databases, are for managing complex data. There don’t seem to be any dominant ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL is gradually gaining some OO features (e.g. extensible data types and tables).
- Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architectural changes designed to manage data in a scalable “cloud computing” system architecture. Presumably it will continue to use standard SQL with some MySQL enhancements.
Relational databases have weaknesses, to be sure. People have been arguing that they don’t handle all data modeling requirements since the day it was first introduced.
Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don’t fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.
Even though these advanced technologies do great things to solve the specialized problem they were designed for, relational databases are still a good general-purpose solution for most business needs. SQL isn’t going away.
I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “The Next-Gen Databases.”
Leave a Reply