Why not RDBMS for connected data?

In the previous section, we explained what is graph and why use graph databases? In this section, we’ll discuss NoSQL databases and why not use RDBMS for connected data.

What is RDBMS

Initially data was saved in flat files. E. F. Codd proposed relational database in 1970. Relational databases like MySQL, PostgreSQL and SQLite3 were designed to store tabular and structured data. They’re based on a branch of algebraic set theory known as relational algebra. In RDBMS – Relational Database Management System, data is represented and stored in tabular form – a row represents a record, a column represent properties. Each table stores only one type of data, and the relationship between different items is maintained by foreign keys. In past decade, NoSQL database movement has emerged particularly in response to three of these data challenges: data volume, data velocity, and data variety.

NoSQL Databases

In last decade, most applications would use relational database (RDBMS). But in the past decade, the data landscape has changed significantly – more and more unstructured data being produced – which traditional RDBMSs can’t manage, so NoSQL databases.

Why shouldn’t use RDBMS for connected data?

Relational DBs are still good for storing structured data, but they do not fit for all types of unstructured data being produced today. There are few cases where RDBMS is NOT good at:

  • Large no. of JOINS: Relational database are drastically slow when queries involve JOINs. The more the JOINs the slower the query. So RDBMS are not meant for storing highly connected data.
  • The more the DATA, the slower the query. In RDBMS, we use Indexing to speed up the queries, but still as the data grows, the query becomes slower and slower. It’s because RDBMS has to search the whole database against each query.
  • Self-joins are inefficient in relational DBs.
  • Queries become long and complex.
  • Denormalization issues: To tackle reduced performance and for faster querying, we do denormalization which has its own associated issues, like
    • Increasing the size of DB
    • Speeds up retrieval but updates become slower
    • Impacts data quality
    • In case of database modification we’ve to reconsider denormalization, etc.
  • Data-readability becomes difficult: Relational tables are introduced in case of many-to-many relationships which makes data-readability in database more difficult.
  • Maintenance is painful: Frequent schema changes require reconsidering the whole design again.

Use the right tool for the right job

With this ever-changing and growing data requirements, we can’t just stick to one type of database, be it relational, aggregate stores (key-value, column-family, document-database), or GraphDB. We need to use the right tool for the right job.

Relational databases aren’t fit for storing highly connected data. In their book “Neo4j in action”, the authors performed experiment between RDBMS and Neo4j, and here are the results:

RDBMS vs Neo4j

RDBMS vs Neo4j (Source: Graph Databases for Beginners by Neo4j)

Some of the workarounds used in relational databases is precomputing results based on past data.

On the other hand, Graph databases are designed for connectedness from the ground up. Unlike relational and other NoSQL databases, Graph databases store relationships/connections as first-class citizens.

With growing use of social platforms like Twitter, Facebook, more and more unstructured data is being produced by people – including audios, videos, etc. RDBMS was not designed to store this unstructured data. So the need for other database schemas arose; hence NoSQL databases. See next section on NoSQL  aggregate databases.