user-icon Stefan Ludwig
08. August 2014
timer-icon 4 min

Layered Data Models using Neo4J

NoSQL databases are getting more and more relevant as new challenges like Big Data or Cloud Computing are becoming part of our daily business. Among the most popular NoSQL databases are graph databases like Neo4J. One of the many advantage a graph database has over its relational relatives, is that they don’t have a fixed schema. This means you don’t have to define tables and columns for your application or change these if your application changes. Additionally you can write easily extendable applications using layered data models.

For anybody with a background in graph theory or other graph databases the terminology of Neo4J is as follows: Nodes are Vertices and Relationships are Edges. I will use this terminology throughout this blog post. Additionally at least a rudimentary understanding of graphs is necessary to understand the concepts discussed in this article.

Graphs are a very simple but powerful data structure, they consist of only two parts:

  • Nodes (untyped) with properties
  • Relationships (typed by a label)  with properties

The properties of nodes and relationships can be added at runtime and don’t need to be defined beforehand.

In addition to these basics, Neo4J adds the following features:

  • Optional multiple labels for nodes to achieve (polymorphic) typing.
  • Optional indices for label/property combinations of nodes.
  • Unique constraints for label/property combination of nodes.

In this article I want to illustrate how Neo4J can be used to build applications (or more specific data models) that can adapt to ever changing requirements by being layered and easily extendible.

To learn more about graphs and Neo4J visit:

Base Model

Lets say we are working on an application that processes and archives RSS feeds.

Our data model for that application has two types of nodes:

  • FEED: Holds an URL to the RSS feed, the URL of the feeds homepage and a title
  • FEED_ITEM: Holds the URL to an article, the title and a publication timestamp

These nodes are linked by he following relationship types:

  • A feed CONTAINS feed item
  • A feed has a FIRST feed item
  • A feed has a LAST feed item
  • A feed item has a NEXT feed item
  • A feed item has a PREVIOUS feed item

This means for each feed the feed items are persisted as a double linked list with the feed knowing the lists beginning and end.

blog_layered_datamodels_1

Scenario #1: Additional Nodes

Now lets say we need the option to find feed items that share a common topic.

The model extension is quite simple:

  • New node type TOPIC for all topics.
  • A new relationship type that says feed items BELONGS_TO topic.

Since our base model works fine on its own as a feed archive and topics are an optional feature, the base application doesn’t need to know about any of these new types.

To achieve this we create a plugin that encapsulates the all knowledge about topics and how they can be linked to the base model.

blog_layered_datamodels_2

Scenario #2: Polymorphic Nodes

Instead of adding completely new types of nodes and relationship types we want to use certain existing nodes in a different way and extend the model with additional information.

To illustrate this, lets say we want the option to collect links that match a specific pattern from our feed items and archive them as nodes in our graph.

To make things even more interesting:

  1. Not every feed should be considered when collection links.
  2. The pattern a link has to match should be configurable for each feed.
  3. The original model must not be changed.

The model extension itself is (again) very simple:

  • New node type LINK for all collected links.
  • The already existing relationship type that says feed item CONTAINS link can be reused.

In a relational database requirement #1 would most likely be done with a boolean column in the feed table to switch feed collection on or off. In Neo4J we have labels to declare a node type. Each node can have as many labels as we want it to have. This means that we can simply add the label LINK_FEED to any existing FEED node and mark it in this way for link collection.

Requirement #2 would require the addition of new columns to the feed table in a relational database which violates requirement #3. Since graph databases like Neo4J have no fixed database schema a change like this is purely programmatic.The application simply adds additional properties to the node as needed.

Although we modify existing nodes with requirement #2 the base model is unchanged and the base application (again) doesn’t need to know about any of the new types or properties. Since link collection is an optional feature all this knowledge and logic can again be encapsulated into a plugin.

blog_layered_datamodels_3

The Whole Picture

Both these scenarios combined into one application with two plugins gives us the option to activate / deactivate features that have an impact on the data model simply by adding or removing plugins.

The data model itself is split into two different layers:

  1. base model providing data from feeds
  2. additions like topics or links

It should be easy now, to imagine a third layer that adds information based on different second layer data.

blog_layered_datamodels_4

Practical Applications @NovaTec

Here at NovaTec we are using Neo4J graphs in our testIT – Quality Solution Suite.
More specific we are using the power of graphs to homogenize test results from different sources in the testIt ResultRepository.

Comment article