Graph databases in NoSQL space

In the previous section, we’ve discussed NoSQL and various types of NoSQL databases (aggregate databases). In this section, we’ll discuss another NoSQL database type – Graph Databases. We’ll see how it fits in NoSQL space and what are its uses.

For simple queries, aggregate stores use indexing, basic document linking or a query language. However, for more complex queries, aggregate stores cannot generate deeper insights simply by examining individual data points. To compensate, an application typically has to identify and extract a subset of data and run it through an external processing infrastructure such as the MapReduce framework (often in the form of Apache Hadoop).

MapReduce is a parallel programming model that splits data and operates on it in parallel before gathering it back together and aggregating it to provide focused information.

But even with a lot of machines and a fast network infrastructure, MapReduce can be quite latent. So latent, in fact, that often a development team needs to introduce new indexes or ad hoc queries in order to focus (and trim) the dataset for better MapReduce speeds. NoSQL aggregate databases are also not ACID-compliant.

On the other hand, RDBMS databases are ACID-compliant, but it has its own problems. As database size grows, query becomes time consuming. Query execution ties increase as the size of tables and the number of JOINs grow. One way to solve is by using Indexing (but many indexing slows down query performance even badly) – more details on indexing in row-oriented databases. So Graph Databases for highly interconnected and schema-less data.

Graph databases

Graph databases were designed with the view that developers often build graph-like structures in their applications but still store the data in an unnatural way, either in tables and columns of relational databases, or even in other NoSQL storage systems. As we mentioned before, problems like ACL lists, social networks, or indeed any kind of networks are natural graph problems. The graph data model is at the core of graph databases—you’re finally able to store an object model that represents graph data as a persisted graph!

Graph databases allow deep traversals faster than Relational.

The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connected devices by 2020 — one can imagine applications that depend on data stored in graphs with many more nodes and edges than the ones currently maintained by social media companies.

Common Graph databases:

Complete list of graph DBs.

Graph databases have a list of advantages:

  • Performance: Graph DBs’ performance remains constant even as data grows
  • Data modelling:  Graph data modelling is easy for developers. White-board sketch can simply be converted to graph model.
  • Speedy queries with JOINS:  Query time from hours in RDBMS to seconds in graph databases.
  • Graph DBs excel at many-to-many relationships, while relational databases are not so great for modelling many-to-many relationships, especially in large data sets.
    • E.g., friend of friend likes a comment on a specific post – now with this, GraphDB have to traverse only a few nodes, but relational database has to make complex query containing multiple JOINs.
  • GraphDB reduces the impedance mismatch b/w technical and business domains
  • Reducing dev overhead of translating back and forth b/w an object model and tabular relational model.

Graph Database Uses Cases:

According to Emil Eifrem (CEO and Co-Founder of Neo4j), top three Neo4j use cases are:

  • Real time query results
    • Real time recommendation engine
    • Retail: coupon to users on recent shopping
  • Fraud Detection
  • Master Data Management
  • Neo4j Case Studies

Other use cases include:

  • Friends of my friends – Social networks  (In a relational model, you’d have a “friends” join table, and you’d do a lot of self-joins that would be extremely inefficient)
  • Shortest path finding
  • Interconnected data
  • Domain can be represented with nodes and relationships naturally
  • Access control lists (ACL)

Graph traversal

There’s one key difference between relational and Neo4j databases – it is data querying. There are no tables and columns in Neo4j, nor are there any SQL-based select and join commands. So how do you query a graph database?

The answer is not “write a distributed MapReduce function.” Neo4j, like all graph databases, takes a powerful mathematical concept from graph theory and uses it as a powerful and efficient engine for querying data. This concept is graph traversal.

The traversal is the operation of visiting a set of nodes in the graph by moving between nodes connected with relationships. It’s a fundamental operation for data retrieval in a graph, and as such, it’s unique to the graph model.

The key concept of traversals is that they’re localized—querying the data using a traversal only takes into account the data that’s required, without needing to perform expensive grouping operations on the entire data set, like you do with join operations on relational data.

There are several common graph databases, but we’ll focus on Neo4j. check Why use Neo4j?