DEX scalability with high-performance (UPDATED)

With this post we would like to show you some performance results that will explain why DEX is claimed to be a high-performance graph database with great scalability.

To go further into DEX scalability let us see the results using the R-MAT benchmark testing DEX performance for very large databases. In this benchmark we are interested intro tracking the following indicators:

  • How many nodes and edges could be created?
  • Which was the size of the database created?
  • How long did the load of the database take?
  • How many traversals we could possibly make per unit of time?

To test the scalability we automatically created some graphs with 2SF nodes (for a scale factor, SF, ranging from 25 to 28) and the number of edges (shown in column 3 of the table) automatically created by the R-MAT synthetic graphs generator.  The rest of the columns depict the size, load time and execution measures for a heavy traversal query.

Load time and size of a graph

For this benchmark we load up to 2.1 billion edges and 230 million nodes.  For this very large graph, answers to the former questions are quite impressive: the graph was loaded in only 15 hours, including the creation of all the indices for the direct access to the nodes and edges. The edges are loaded at a rate of 40K per second.  This large database occupies 83 GB , which leads to an average of only 36 bytes per node or edge.

Traversal query

In addition, we made a query against the database to test its response time (Q1 in the table).  Query 1 founds the node with the maximum out degree and then, it performs a BFS traversal starting from the node selected.   We can see that 4.2 billion traversals are made with an average of 295K nodes traversed per second.  More information about the BFS algorithm can be found on the post about DEX graph algorithms.

We obtained the remarkable results shown in the third rightmost column, with a minimum of 24 minutes for the SF=25 graph, and 4 hours for the SF=28 graph, with a degradation of less than 5% between scale factors.

Conclusions

Considering its fast loading and querying DEX is the graph database to go when dealing with large datasets because it has great performance for graphs with billions of objects.  Furthermore, it does not only give quick answers but also its size is optimized with only 36 bytes per object available in the database.

Stay tuned for more benchmarks to come, including an interesting analytical use case with Wikipedia.

Note: The experiments are performed using a computer with two quadcore  Intel(R) Xeon(R) E5440 at 2.83 GHz. The memory hierarchy is organized as follows: 6144 KB second level cache, a 64 GB main memory and a Disk with 1.7 TB. The operating system is Linux Debian etch 4.0.

UPDATE:

We have uploaded the code of the R-MAT syntethics graph generator we implemented for this benchmark in the download section of DEX . With this code the same benchmark can be repeated for other graph databases. We also include the code for DEX from the query 1 tested in the benchmark. To test this query for another graph database it should be adapted first. Please feel free to download it and share with us your results!

This entry was posted in DEX and tagged , , , , , , , . Bookmark the permalink.

9 Responses to DEX scalability with high-performance (UPDATED)

  1. Pingback: Tweets that mention DEX scalability with high-performance | -- Topsy.com

  2. Armin Ofen says:

    Impressive results. Is the benchmark open source? Would be interested in comparing your results with other graph-dbs.

    • Stinktshier, Mira says:

      dito, /me too.

      • admin says:

        Dear Armin and Mira,

        Thank you for your comments. The benchmark code we have implemented in Java follows the guidelines proposed in this article: http://www.siam.org/proceedings/datamining/2004/dm04_043chakrabartid.pdf.

        Realizing from your comments that it could be interesting to make the same benchmark for other graph databases we will be making our code available to download in the website. We’ll be announcing it as soon as we have it ready to be published.

        Thanks,

        Damaris Coll
        - Sparsity Technologies -

        • admin says:

          Dear Mira and Armin,

          We have updated the post to add the link to the Benchmark code we implemented.

          Feel free to download it and report back your benchmarks!

          Damaris
          - Sparsity Technologies-

  3. Armin Ofen says:

    Thanks a lot for the fast response :)

    I hope that I can publish my results soon.

  4. parker says:

    Hey Damaris,
    Could you tell me how to rmat-load.des and rmat-schema.des to create DEX graph? Thanks

    • admin says:

      Hi Zhao,

      If you still need information about the rmat generator and DEX, do not hesitate to ask me by mail. Hope we can help!

      Damaris

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>