As we have already announced in the previous post we would like to share an analytical use case that shows DEX high performance. This time we are taking a look at how DEX responds to some queries performed on a single dataset taking as a reference the results of another well-known open source graph database.
With this benchmark we would like to join the celebration of Wikipedia for its 10th anniversary. Wikipedia was launched in 2001 by Jimmy Wales and Larry Sanger and has become the largest and most popular general reference work on the Internet having 365 million readers. Our congratulations to everyone making Wikipedia possible!
For the benchmark we used all the Wikipedia articles written before January 2010. In particular the database loaded contained 55M articles, 2.1M images and 321M references between articles.
With this benchmark we want to obtain the following information:
- Loading times, including the generation of full index structures for the graph.
- Graph database size
- Response times for typical queries made to the loaded data, which include:
- Query 1(Q1): Finds the node with the maximum outdegree, the one with most relationships with other nodes, and then runs a BFS traversal of the graph starting from that node. More info about traversals algorithm in the graph algorithms post.
- Query 2(Q2): Finds the node with the maximum indegree, select nodes referencing that node, and with this new set, finds again those referencing every node in the set. In other words, it performs the 2-hops operation. Finally the query ranks the nodes by number of references and returns de top 5.
- Query 3(Q3): Finds a pattern in the graph. The pattern tries to find articles written in Catalan (CA) which are translated into English (EN) without some images from the original article.
- Query 4(Q4): Finds the number of articles and images for every language available.
- Query 5(Q5): Materializes the number of images for all the articles.
- Query 6(Q6): Deletes all the articles from the database with no images.
See the results in the following table:
It is remarkable to notice that loading all the articles from the Wikipedia to DEX only took 2.25 hours with a resulting database size of 16.98 GB. This shows the huge amounts of information that can be loaded to DEX with no disk restrictions in reasonable times. The results for all six queries are always positive for DEX with results of more than two orders of magnitude for all the queries except for Q3, which still is one order of magnitude faster.
DEX gives the greatest performance results in both loading time and responding time to queries, making DEX an attractive option for those solutions with big volumes of data that are cumbersome to analyze. Try DEX now and you’ll see how this performance is in action.