Why use Neo4j?

In the previous sections, we explained NoSQL Aggregate databases and graph database, and how each type fits in NoSQL space. Now we’ll look at a specific graph database – Neo4j.

What is Neo4j ?

Neo4j is internet-scale graph database which executes connected workload faster than any other DBMS. It’s a property graph database which uses graph for storage as well as processing.

  • Started in 2003
  • Publicly available 2007
  • Written in Java and Scala
  • Enterprise ready
    • ACID compliant – ACID ensures resulted graph isn’t corrupted
    • High scalability
    • Horizontal read scalability
    • Storage of billion of entities
  • It isn’t analytical DB, it’s transactional DB to store critical business data.
  • It is schema less.
  • Relationships take first priority (first class citizens).
  • No foreign-keys for data connections, or out-of-bound processing like MapReduce
    • These foreign keys have another weak point too: they only “point” in one direction, making reciprocal queries too time-consuming to run. Developers usually work around this problem by inserting backward-pointing relationships or by exporting the dataset to an external compute structure, like Hadoop. Either way, the results are slow and latent.

Why Neo4j among other GraphDBs?

Different people will use different definitions of engines and APIs, but in the end it’s all about data structures. If a database relies on data structures that are not a natural fit for a graph, and does not have all the right indexing in place, then although queries may be easier to write using a graph API on top of it, their performance can only be as good as your database. Here are few reasons to use Neo4j as graph DB:

  • Neo4j is the market leader in GraphDB.
  • It is highly performant, scalable, and flexible.
  • It has Property Graph Data model.
  • Neo4j is referred to as Native Graph DB, because it uses graphs for storage as well as processing. It implements the property graph model efficiently down to the storage level.
  • It is equipped with a rich UI, called the Neo4j browser, that is used for queries, visualization, and data interaction.
  • It has compatibility with most programming platforms like Java, Nodejs, PHP, Python, .NET, etc.
  • It supports the Cypher query language, whose syntax naturally depicts data and relationships.
  • It has a good integration support with Apache Spark, Elasticsearch, MongoDB, Cassandra, and Docker.
  • Real-time Insights: Neo4j provides results based on real-time insights
  • Indexing:  Neo4j supports Indexes by using Apache Lucence.
  • Reduces query time from hours (in relational databases) to seconds (in Neo4j) – due to Index-free adjacency.
  • The community edition is free under GPL v3 license.

Graph Databases Everywhere by 2020, Says Neo4j CEO Emil Eifrem. “It’s become clear that by the end of this decade, every single Global 2000 company will have at least one if not several graph projects within their company”.

Two main elements in Graph technology

There are two main elements that distinguish native graph technology: storage and processing.

  • Graph Storage
    • Graph storage commonly refers to the underlying structure of the database that contains graph data. When built specifically for storing graph-like data, it is known as native graph storage.
    • Neo4j and some other graph databases use “native” graph storage that is specially designed to store and manage graphs. Other GraphDBs use Object-Oriented or Relational DB under the hood. Non-native storage is always slower than native.
  • Graph Processing Engine
    • Graph processing refers to how a graph database processes database operations, including both storage and queries.
    • Native graph processing uses “index-free adjacency”. Non-native graph engine use other means to process CRUD operations.
    • Index-free adjacency is the most efficient means of processing data in a graph because connected nodes physically “point” to each other in the database
    • Graph databases that rely on global indexes (rather than index-free adjacency) to gather results are classified as having non-native processing.

At write time, index-free adjacency speeds up storage processing by ensuring that each node is stored directly to its adjacent nodes and relationships. Then, during query processing (i.e., read time), index-free adjacency ensures lightning-fast retrieval without the need for indexes.

Neo4j uses graph for storage as well as processing

Graph Data models

Graph databases adopt different data models. The most common graph data models include:

  • Property Graphs
    • Following are the elements which make graph database a property graph:
      • Property graph contains nodes and relationships.
      • Nodes can contain properties (key-value pairs)
      • Nodes can be labeled with one or more labels.
      • Relationships have both names and directions.
      • Like nodes, relationships can also contain properties.
  • Hypergraphs
    • A hypergraph is a graph model in which relationship can connect any number of nodes. While a property graph permits a relationship to have only one start node and one end node, the hypergraph model allows any number of nodes at either end of a relationship. Hypergraphs can be useful when your data includes a large number of many-to-many relationships
  • Triples
    • Triple stores are modelled around the Resource Description Framework (RDF) specification, using SPARQL as their query language.
    • Triple stores are not native graph databases because they don’t support index-free adjacency.

Neo4j is property graph database.

According to 3PillarGlobal:

  • Apache Giraph and TitanDB are really promising projects and they may overshadow Neo4j some day.
  • Our take is any graph database that can work over Apache Mesos would be the ultimate winner – will it be Apache Spark?

Check this for popular graph databases comparison.

Graph basic constructs

  • Node (Entity):
    • Nodes have properties (OR attributes OR key/value pairs).
    • Nodes have Labels (roles in domain)
      • Labels also serve to attach meta data , like index, constraints to certain nodes.
  • Relationship – Connection between nodes
    • Like nodes, relationships also have properties.
    • Can have any number of relationship without sacrificing performance.
    • No Broken links – Since relationship always has a start and end node, you can’t delete a node without deleting its associated relationship

 

Neo4j is fully ACID-compliant transactional database. Check out our next section on what ACID-compliant graph database means.

 

Sources: