KAIST Introduces T-GPS, a Tool for Processing a Trillion-Edge Graph on One Computer
Graphs are widely used to represent a wide variety of systems, ranging from the relationships between users of a social network to the payments among a network of bank accounts, and graph algorithms can process that data to do everything from making friend suggestions to detecting fraud. Now, researchers at KAIST – a national research university in Daejeon, South Korea – have created a new technology that allows a single computer to run large-scale graph algorithms without storing the graph in the computer’s main memory or storage.
While clients often need graph algorithms to process giant graphs, such graphs are difficult for algorithm developers to come by: they’re typically proprietary or prohibitively difficult to collect. As a result, graph algorithm developers use synthetic graphs, which are generated using parameter-based generation or graph upscaling and then stored prior to testing the algorithm.
The storage element here poses a problem: these giant graphs take up an enormous amount of space, which might be feasible for industrial or research users but is somewhat less feasible for developers or small-scale users, who might be unable to produce the large clusters needed to store the data, real or synthetic.
KAIST’s tool – which is named “Trillion-scale Graph Processing Simulation,” or T-GPS – bypasses the storage step. Instead, T-GPS loads the smaller, real graph into its main memory. Then, it runs the graph algorithm on that small graph, with the algorithm treating the small graph as a portion of a larger, synthetic graph that does not exist. This method, the KAIST researchers report, returns exactly the same result as conventional approaches.
Crucially, T-GPS enables users to process a trillion-edge graph on a single computer, while (KAIST researchers said) the conventional approach can handle just a billion, and even that requires a cluster of eleven computers, assuming equal hardware. Furthermore, the developers report that T-GPS runs more quickly – up to 43 times faster – due the absence of network communication latency among the cluster of computers.
“T-GPS can significantly increase both the scale and efficiency of developing a new graph algorithm,” concluded Min-Soo Kim, a professor in KAIST’s School of Computing, who helped to develop T-GPS.
About the research
The research discussed in this article was published as “Trillion-scale Graph Processing Simulation based on Top-Down Graph Upscaling” and was presented at IEEE ICDE 2021. The abstract for the paper is available at this link.