In the following post, I briefly describe why I decided to use a document-based NoSQL database. After a brief introduction, I will mention reasons for NoSQL and compare the three most popular document-based databases. First of all, I would like to point out that I am by no means a NoSQL expert and my post has no claim to completeness. I use this post to reflect how I have proceeded to opt for a system.
I’m familiar with the basic concepts and the syntax
I’m able to write simple queries
I know roughly how to use Hibernate or Spring-Data-JPA
In addition, the advantage of the in-memory database is that I do not have to worry about the state of the data and the schema, because the data and structure are discarded and rebuilt on each restart.
So if I have experience with relational databases, and know the basic commands why should I choose a NoSQL database? In addition, as it is given that there are a variety of frameworks and tools to model, modify and integrate the data.
One argument for NoSQL database is the more efficient storage of non-structured data. Unlike relational databases, NoSQL specifies no fixed structure. By dispensing a fixed schema, the responsibility of how the data is stored is shifted to the developer. Therefore, NoSQL does not stand for no structure or no schema, but rather for a flexible schema, which can be changed at any time. By waiving of a fixed schema, relations between the entries are not necessary. Typically, related information is attached to an existing entry. The document is similar to a joined entry of normalized tables of a relational database.
Figure 1 – Normalized Database Entries
One possible document-oriented equivalent for the example from Figure 1 would be:
If you want to choose a NoSQL database, you need to consider the type of data to be stored and the purpose of the application. Depending on this, you can use different systems to store data.
In general you can distinguish NoSQL between graph databases, key-value databases, wide-column databases, document oriented databases and multi-model databases.
Key-Value databases like Redis or Voldemort store key-value pairs. As with relational databases, the key must be unique across the entries. Key focus is the fast read and write speed. The data is indexed using the key and can thus be queried directly. Because key-value entries are not related, each entry should have all relevant information.
Key-Value databases are the foundation of many graph, wide-column and document databases, where an entry consists of an arbitrary number of key-value pairs.
For graph database like Neo4J or Onyx DB the information are stored as a directed graph. In contrast to key-value, wide-column and document databases the relations must be stored since a graph is represented by it’s nodes and edges. The advantage is the simple traversing of the graph, whereas the RDB creates relationships through recursive join queries.
In case of wide-column databases like Cassandra or HBase the data is stored column by column. Unlike RDB, where a row of a table represents all the attributes, wide-column databases use column-families for each attribute. A column-family can have as many entries as you like.
As with key-value, the entries are arranged using a row-key. The corresponding value consists of a set of column-families. A column-family contains the columns, which contain the versioned values.
For document databases such as Couchbase, CouchDB or MongoDB the content is stored in a JSON, or JSON-like document.
In particular you use a document oriented database for storing large amounts and interrelated data, such as blogs, message boards or wikis. Since all related data can be stored within a document, you don’t have to join documents to get all related information.
Multi-model Databases are NoSQL databases, which combine several concepts. Typically the characteristics of graph and document databases are supported.
Which system should I choose?
As stated in the previous section, NoSQL stands for a variety of different databases. This raises the question: which system should I choose? In my use case, I am going to use my marketplace prototype, to store and load inspectIT configuration. A configuration can include additional details. At the same time, users can rate a configuration and leave a comment.
For this use case, it’s appropriate to use a document-based database. Each document will represent a configuration. The ratings, details, comments, etc. can be embedded in the document. As soon as a configuration is read, all relevant information is available and does not need to be queried via further selects.
Which systems are available? How do they differ? And which system is most suitable for my use case?
The website http://nosql-database.org lists an extensive amount of NoSQL databases. Under the header “Document Store” you will find a lot of different document-oriented databases. Since the list is too extensive to represent all databases and a large part of these databases, in my opinion, play no or hardly a role, I restrict myself to the following three most popular open source document databases (based on https://db-engines.com/en/ranking/document+store).
MongoDB vs Couchbase vs CouchDB
There is a variety of articles and blog posts dealing with MongoDB, Couchbase and CouchDB. Unfortunately, many of these posts are out of date and therefore only partially valid. During the versions, the different systems have evolved and many functions have been added or adapted to the needs of the users. As a result the systems have approached and the functional differences between the systems have decreased. Of course, there are still a lot of differences, like UI, Indexing or Caching, but they are less relevant to me.
In general, any number of criteria can be defined on which a comparison can take place and depending on the requirements you also can define different criteria which lead to recommendation for a different system.
But the most important criteria for me are:
Data organization: how can I organize the documents?
How complex is the query going to be and how do I get a document?
How popular is the product and can I find sufficient support online?
CouchDB shows the strongest differences. Couchbase and MongoDB both offer a way to organize the documents in different container. In MongoDB these are called Collection and at Couchbase they are called Data Buckets. CouchDB on the other hand, store the documents directly in the database.
Generally it’s for all systems possible to organize the documents. With CouchDB, however, you have to define an extra field in the document, through which documents of same type can be grouped.
In NoSQL databases the data is usually queried using MapReduce functions. With MapReduce one can process and analyze large amount of data. The request are be divided into a map and a reduce function. The map function iterates through a set of input data and applies a custom mapping function. In the following reduce function the intermediate results are reduced to an output value.
Figure 2- Map-Reduce Example
In addition to Map-Reduce, all three systems offer a more comfortable way to write queries.
With CouchDB you can use Mango query to write simple queries to select specific attributes, specify the sorting and output only specific fields. To describe aggregate functions or to perform more complex queries, you must create your own views and write MapReduce functions.
At MongoDB all CRUD operations can be performed by
insertOne() / insertMany(),
updateOne() / updateMany() / replaceOne() und
deleteOne() / deleteMany()
functions. Additionally MongoDB offers aggregate functions such as limit(), sort() or count().
Unlike CouchDB and MongoDB, Couchbase uses a custom query language N1QL, which is based in SQL. With N1QL you can basically perform all query operation, which are also possible with SQL.
The following examples show a Mango Query, MongoDB find() function and Couchbase N1QL query. The queries aim to get all documents, whose ID is not null, the values for low and max in the field prices is greater than 1 or rather lower than 5.99 and only the fields item & prices should be displayed.
Mango Query example:
Mango Query Example
MongoDB find() example:
MongoDB Query Example
Couchbase N1QL example
Couchbase N1QL Example
MongoDB is currently (May 2017) the most popular database. Both at Google-Trends, see Figure 3, as well as Stackoverflow, see Figure 4, show MongoDB as leading in search request and technical questions.
Figure 3 – Google-Trends; MongoDB, Couchbase and CouchDB
The advantage of a high popularity is that many others have already dealt with the application. Thus there are many tutorials and many solutions to different problems. Due to the larger user group, it’s more likely that even complex questions will be answered more quickly.
Starting from my criteria, I decide to use MongoDB. Just the simple syntax for data queries, the possibility to organize documents in different collections and higher popularity or search results speak in favor for MongoDB. In particular, the popularity is not a negligible factor as it is an indicator of how large demand is and how much help is available online.