A Look at the Graph Database Landscape
Graph databases are the fastest growing category in all of data management, according to DB-Engines.com, a database consultancy. Since seeing early adoption by companies including Twitter, Facebook and Google, graphs have evolved into a mainstream technology used today by enterprises in every industry and sector.
So, what makes graph databases so popular? By storing data in a graph format, including nodes, edges and properties, graphs overcome the big and complex data challenges that other databases cannot. Graphs offer clear advantages over both traditional RDBMs and newer big data products. Here’s a look at a few of them particular.
Key Benefits of a Graph Database
Better, Faster Queries and Analytics: Graph databases offer superior performance for querying related data, big or small. The graph model offers an inherent indexed data structure, so it never needs to load or touch unrelated data for a given query. This makes it an excellent solution for better and faster real-time big data analytical queries. This is in contrast to Hadoop HDFS systems, which have architectures built for data lakes, sequential scans and the appending of new data (no random seek). The assumption in such systems is that every query touches the majority of a file. With graph databases, queries only touch the relevant data.
Simpler and More Natural Data Modeling: Anyone who has studied relational database modeling understands the strict rules for satisfying database normalization and referential integrity. Some NoSQL architectures go to the other extreme, pulling all types of data in one massive table. In a graph database, on the other hand, you define whatever vertex types you want to represent your object types, and define edge types to represent particular relationship types. A graph model has exactly as much semantic meaning as you want, with no normalization and no waste. Additionally, the graph model supports object-oriented thinking, as clear, explicit semantics are required for every written query. There are no hidden assumptions, such as in relational SQL where knowledge of tables in the FROM clause is needed as they will implicitly form cartesian products.
Simultaneous Support for Real-Time Updates and Queries: The graph model enables real-time updates on big graph data, while supporting queries at the same time.
Flexibility for Evolving Data Structures: Graph databases feature flexible schema evolution. Users can continually add or drop new vertex / edge types and attributes, extending or shrinking the data model. This is particularly convenient for managing constantly changing object types. Most graph databases can change schema online, while continuing to serve queries. In comparison, relational databases can’t easily support the frequent schema changes that are now so commonplace in the modern data management era.
An Evolving Landscape
With the popularity of graph databases driven by such benefits, we’ve seen an emergence of new players to the market, creating a bonafide landscape of tools and technologies. Let’s consider a deep dive look at the graph database landscape, defined by categories and leading solutions.
Operational Graph Databases
Gartner defines Operational Databases as “Relational and non-relational DBMS products suitable for a broad range of enterprise-level transactional applications, and DBMS products supporting interactions and observations as alternative types of transactions.” (Gartner Inc., Magic Quadrant for Operational Database Management Systems, published: October 2016, ID: G00293203)
Bloor Research states these solutions tend to be native graph stores or built on top of a NoSQL platform. They are focused at transactions (ACID) and operational analytics, with no absolute requirement for indexes. (Bloor Research: Graph and RDF databases 2015 #2, published Sept. 2015)
Operational Graph Databases include: Titan, JanusGraph, OrientDB and Neo4j.
Knowledge Graph / RDF
The Resource Description Framework (RDF, sometimes known as triple stores) is a family of World Wide Web Consortium specifications originally designed as a metadata model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats.
According to Bloor Research, these graphs are often semantically focused and based on underpinnings (including relational databases). They are ideal for use in operational environments, but have inferencing capabilities and require indexes even in transactional environments. (Bloor Research: Graph and RDF databases 2015 #2, published Sept. 2015)
A number of graph database vendors have based their knowledge graph technology on RDF, including: AllegroGraph, Virtuoso, Blazegraph, Stardog, and GraphDB.
Multi-Modal Graphs
This category encompasses databases designed to support different model types. For example, a common possibility is a three-way option of document store, key value store or RDF/graph store. (Bloor Research: Graph and RDF databases 2016, published Jan. 2017) The advantages of a multi-modal approach are that different types of queries, such as graph queries and key value queries can be run against the same data. The main disadvantage is that the performance cannot match a dedicated and optimized database management system.
Examples of multi-modal graphs include: Microsoft Azure Cosmos DB, ArangoDB and Sqrrl.
Analytic Graphs
Bloor Research describes that analytic graphs focus on solving ‘known knowns’ problems (the majority) – where both entities and relationships are known, or on ‘known unknowns’ and even ‘unknown unknowns.’ The research firm says, “Multiple approaches characterize this area with different architectures including both native and non-native stores, different approaches to parallelisation, and the use of advanced algebra.” (Bloor Research: Graph and RDF databases 2015 #2, published Sept. 2015)
Examples of Analytic Graphs include: Apache Giraph and Turi (formerly GraphLab, now owned by Apple).
Real-Time Big Graphs
I propose a new category of graph databases, called the real-time big graph, that is designed to deal with massive data volumes and data creation rates and to provide real-time analytics. Real-time big graphs enable real-time large graph analysis with both 100M+ vertex or edge traversals/sec/server and 100K+ updates/sec/server. To handle big and growing datasets, real-time big graph databases are designed to scale up and scale out well.
Examples of Real-Time Big Graphs include: TigerGraph.
Making Sense of Offerings
In summary, there are many different kinds of graph database offerings available today and unique advantages to each, which is why it’s important to understand the differences as graph databases continue to see adoption by enterprises across every vertical and also use case. Per a recent Forrester Research survey, “51 percent of global data and analytics technology decision makers either are implementing, have already implemented, or are upgrading their graph databases.” (Forrester Research, Forrester Vendor Landscape: Graph Databases, Yuhanna, 6 Oct. 2017)
As organizations embrace the power of the graph, knowing offerings available and their advantages is important to determine the best option for a particular use case. Demonstrated by the Real-Time Big Graphs category, graph technology is evolving into the next-generation. These solutions are specifically designed to support real-time analytics for organizations with massive amounts of data. We expect to see a considerable impact in the market and it continues to move into this direction.
About the author: Dr. Yu Xu is the founder and CEO of TigerGraph, the world’s first native parallel graph database. Dr. Xu received his Ph.D. in Computer Science and Engineering from the University of Califoria San Diego. He is an expert in big data and parallel database systems and also graph databases. He has 26 patents in parallel data management and optimization. Prior to founding TigerGraph, Dr. Xu worked on Twitter’s data infrastructure for massive data analytics. Before that, he worked as Teradata’s Hadoop architect where he led the company’s big data initiatives.
Related Items: