How & when to use the recovery functionality

On this new edition of Sparksee’s how-to series, we would like to highlight the recovery functionality that will keep your database save at all times and it’s specially recommended for first-time users.

Sparksee includes an automatic recovery manager which keeps the database safe for any eventuality. In case of application or system failures, the recovery manager is able to bring the database to a consistent state in the next restart.

By default the recovery manager is disabled but we recommend, specially for new Sparksee users, to enable it before starting to construct your first graph database. The recovery can be set at SparkseeConfig time, which should be your first line of code when creating your database(*):

 SparkseeConfig cfg = new SparkseeConfig();
 Sparksee sparksee = new Sparksee(cfg);

The recovery has the following variables to set:

  • Set it to true to enable the recovery.
  • Set the name & path of the recovery log file, otherwise it will be stored in the same path as your database. Remember that the extension for this file is .log
  • Set the time – in microseconds – when the recovery will copy the committed transactions at the recovery log. By default it’s 60 seconds (60000000).
  • Set the size of your recovery cache. We don’t recommend changing the default option.

Here is an example of a typical configuration for the recovery functionality:

SparkseeConfig cfg = new SparkseeConfig();
cfg.setRecoveryEnabled(true); // Enabling the recovery
cfg.setLogFile("recoverylogfile.log"); // it will be stored in the execution directory, same as your database
cfg.setRecoveryCheckpointTime(90000000); //we are setting it to 1.5 minutes

And why isn’t the recovery enabled on the first place? The recovery introduces a small penalty in the performance that strongly depends on the checkpoint time, therefore we allow the user with the knowledge about the characteristics of its application and its typical update patterns, to discern which compromise can be made in order to achieve the highest possible performance while keeping the database the most secure. If the user is actively aware of this functionality he will be able to take the maximum of it although the default parameters are used.

Don’t forget to tell us if you are using the recovery and how; your feedback is key to make Sparksee grow!

(*) Examples are shown in Java, please refer to your language of choice in the User Manual chapters Configuration & Maintenance and Monitoring.

Posted in Documentation, Sparksee | Tagged , , , | Comments Off on How & when to use the recovery functionality

Graph Database Use Case: Fraud detection

Fraud and financial crimes are a form of theft or larceny that occur when a person or entity takes money or property for their own use, or uses them in an illicit manner for their personal benefit. These crimes typically involve some form of deceit, subterfuge or the abuse of a position of trust, which distinguishes them from common theft or robbery.

For most countries, one of the financial crimes which is more difficult to prevent, detect and prosecute is money laundering. Money laundering is the process in which the proceeds of crime are transformed into apparently legitimate money or other assets. These kind of processes usually follow specific transaction patterns that can be simplified as the following (see figure 1):

1) Collecting the money coming from illegal activities.
2) Placing it into a depositary institution.
3) Adding a layer to the transaction (such as a payment of a false invoice or a loan to another company).
4) Integrating the money into the financial system by purchasing financial/industrial investments, luxury assets etc.


Figure 1 – Diagrammatic description of a money laundering scheme by ExplicitImplicity under CC-BY-SA-3 and GFDL.

All the information regarding these transactions is registered by the banks and financial entities that take part in the process, and it can be represented as a graph, being each entity (person, company, organization…) involved a node and each transaction an edge of the network. Then, a fraud detection application would compare the before-hand known transaction patterns of previous prosecuted fraud cases with the patterns of our network to analyze if there are common points between them. Figure 2 is an example of a graph representing a money laundering fraud.

money_laundering_graph (3)

Figure 2 – Money laundering graph example.

In this case, Subject X transfers the illicit proceeds to the associate Company Y (placement), which pays a false invoice coming from Company Z. Company Z makes a loan to Company Y for the same amount than the false invoice, adding a layer to the process and making the fraud more difficult to spot. More layers can be added at this point, for instance, purchasing chips on a casino and changing them again for their value. Then, Company Y invests on a legit financial institution to integrate the money into the financial system, and finally it withdraws the capital transferring the earnings back to Subject X, who receives the “clean” money. As you can glimpse from Figure 2 a graph representation of the information would help us to more easily identify the loop that makes Subject X suspect of a possible fraudulent transaction.

Although all the connections happen necessarily at a specific point of time -e.g. Company Y cannot transfer the “clean” money to Subject X before making all the other transactions- note that we don’t need this information to compare one pattern to another.

Other similar use cases involving graph databases for fraud detection include tax evasion and illegal funding, where the key aspect also lies into searching known irregular patterns in the transactions graph.

If you want to know more about graph database use cases, scenarios and success stories, you can search for the “use case” tag on the blog or visit the “scenarios” section of our website. Remember that you can download Sparksee 5.1 for free and use it for your project!

Posted in Sparksee, Use Case | Tagged , , , | Comments Off on Graph Database Use Case: Fraud detection

Recap of the year and future outlook for 2015


Approaching the end of this 2014 we believe it’s a good time to look back and take stock of all that we have been working on and happened to Sparksee on this year.

One of the most important hits for 2014 has been the release of Sparksee 5.1. Key features like the new Objective-C API, an enhanced compatibility with Blueprints, the dynamic size adapting cache, the compatibility with Visual Studio 2013 and the rollback functionality have meant a great step forward for our high performance graph database.

During the year, Sparsity has also started a Tetracom Technology Transfer Project and joined the European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC) in order to further improve academia-industry interaction. We have also kept involved with the Linked Data Benchmark Council (LDBC) and the Coherent PaaS European Projects.

2014 has also been a year full of interesting events related to the graph and NoSQL world. Sparsity attended GraphLab in San Francisco, NoSQL Matters in Barcelona, the LDBC TUC meeting in Athens, the ICT Proposers’ Day in Florence, SIGMOD and Grades in Salt Lake City and BizBarcelona and the MWC in Barcelona. Hope we were able to meet during one of the former events! In case we still haven’t, you will definitely find us in the 2015 graph database event arena.

Last but not least, we have been able to share and interact with all the Sparksee community through Twitter, Facebook, Google+ and our blog. During this year we have published 17 posts including tutorials, use cases, news & events, Sparksee technical details and research articles. Don’t hesitate to check them out using our archive on the sidebar.

A lot of positive things have already happened during the year, but there’s a lot more to come on 2015.  Sparsity will keep moving forward thanks to your feedback and contributions, to bring together our high performance solutions to the next level.

The Sparsity Technologies team wishes you the best for the holiday season and a happy new year 2015!

Posted in News, Sparksee | Tagged , , , , | Comments Off on Recap of the year and future outlook for 2015

SNA: How to predict the most viral users with Sparksee

Social Network Analysis (SNA) is one of those Use Cases that everyone mentions when talking about the strengths of graph databases. It’s not a secret that the network of people interacting together makes instantly a good image of a graph in everyone’s head. Once you have constructed the social graph it opens plenty of possibilities to explore it wisely in order to effectively answer questions like the one we are going to deal in today’s post: how to discover whom is more likely to make my message viral in the network.

To give a more insight about how to construct a good algorithm that will find us the most viral users and which exploits the capabilities of the graph we are using the literature and will refer to the “greedy algorithm”. For those still not familiar with this algorithm let us introduce its definition.

The greedy algorithm starting with a solution tree is able to calculate those solutions that maximise a defined function f(n). Therefore for each iteration the algorithm will take a look at the child nodes of a certain source node and select the one that maximises the f(n) and move forward.

Figure 1.0 shows an example of an execution of the greedy algorithms. Blue nodes are the ones already included in our solution, yellow nodes are the ones being evaluated with our function f(n) (we also call these nodes candidates) and the white nodes that are those never visited and thus not evaluated. It’s of vital importance that we are able to establish the best heuristic for f(n) so the algorithm delivers the optimal solution. We can see how important is to tune the algorithm on the example shown in Figure 1.0 where we are looking for three consecutive nodes that maximise the sum of their values. A simple greedy algorithm will answer the blue nodes (5, 7 and 5) while a more optimal solution in our example would be nodes 5, 3 and 50.

greedy algorithm example (Viral users)

Figure 1.0 – Example of a Greedy Algorithm 

Let’s see then which ideas you could use to construct a good function for a greedy algorithm to discover the most viral users in our social network. For each node (users) you can evaluate a weight so the greedy algorithm can move through the ones that maximise that propagation weight. The measure of propagation should take into account things like the previous propagations of that user, that propagation could also be valued against the rest of propagations of the other users or the number of documents ever propagated by that user. Also one important matter that we could maybe consider are restricting to only previous propagations from a similar theme.

With all those ideas you should be able to tune your own and unique function of propagation that could then be used in an algorithm such as the following:

 Require: A graph G and a node N
 Ensure: I are infected by N
 1: I = empty set;
 2: P = pendent nodes with N queue;
 3: V = visited set;
 4: while P no empty do
 5: x P.dequeue();
 6: edges edges(x, source);
 7: for edge 2 edges do
 8: tail = edge.tail();
 9: if V not contains tail then
 10: V union( V, tail);
 11: P union( P, tail);
 12: if Math:random() > edge:weight() then
 13: if not tail 2 I then
 14: I union( I, tail);
 15: end if
 16: end if
 17: end if
 18: end for
 19: end while
 20: return I;

Hope you find our successful story of using Sparksee with this greedy algorithm to discover the most viral users inspiring for your Social Networks Analysis projects.

Download now Sparksee for free, start building your social graph to search for propagation constructing the algorithms explained here!

Posted in SNA, Sparksee | Comments Off on SNA: How to predict the most viral users with Sparksee

Learning high-performance graph database management with Sparksee at the NoSQL matters Training Session

On Friday 21st of November from 9h to 13h Sparsity will host a Training Session as part of the NoSQL matters events.

Skilled trainers from Sparksee will explain to the attendees how to take advantage of the graph learning about the most common queries that are best suited to be answered using a graph. The training will take Twitter model and dataset to build the graph and then will cover queries to the resulting graph such as discovering how two twitter users are connected.


Attendees will be given a Netbeans project with Sparksee Java and a complete set of exercises to fill in the blanks. Also they will be gifted with a free development license to build graphs up to 1B objects and unlimited sessions during 6 months.

Looking forward to meet you at the NoSQL matters Training Session!

Posted in Events, News, Sparksee | Comments Off on Learning high-performance graph database management with Sparksee at the NoSQL matters Training Session

Sparsity is attending NoSQL Matters 2014


We are glad to announce that Sparsity Technologies is attending NoSQL Matters Conference 2014 that takes place in Barcelona on November 21st and 22nd. The conference will cover a broad spectrum of topics, including new products, use-cases and field reports of day-to-day operations of NoSQL infrastructures.

On November 21st, Arnau Prat and Joan Guisado will be giving a training session called Introduction to high performance graph data management with Sparksee. Attendees will learn how to model and load data as a graph, and discover the full potential of graph databases and what can they offer compared to other traditional database paradigms using Sparksee graph database. More details about this session will be explained in a following post next week.

On November 22nd, Sparsity’s CEO Josep Lluís Larriba-Pey will be giving the talk Graph databases go mobile, Sparksee 5 mobile use cases. He will present Sparksee mobile and explain a few use cases in the area of analytics for Social and Open Data where the use of graphs boosts job search, private recommendation, community search and personal tourist route planning.

You can still buy your tickets for both the Training Day and the Conference, but hurry up, there are just a few left!

We will also host a booth during the 22nd of November in the hall area, don’t forget to stop by and say hello.

The conference will take place at Casa Convalescència, C/ Sant Antoni Maria Claret 171, 08041, Barcelona. Looking forward to meeting you there!

Posted in Events, News, Sparksee, Sparksee mobile | Comments Off on Sparsity is attending NoSQL Matters 2014

Scalable Community Detection on the Cloud: SCD made product

In our post Graph Databases research: Social community search we introduced the Scalable Community Detection (SCD) algorithm, born from the research work of the Sparksee team altogether with DAMA-UPC. The basic idea behind SCD is to search for the  number of transitive relations (triangles) and understand how they structure to form cohesive and structured communities. This leads to a more fast and accurate communities finder: faster than Louvain algorithm (fastest so far) and more accurate that Oslom algorithm (which had the highest quality so far).

Now we can proudly announce that we are taking the first steps towards commercialization of the SCD thanks to a Technology Transfer Project (TTP) provided by TETRACOM FP7 project. The mission of TETRACOM is to boost European academia-to-industry technology transfer (TT), and its main tool are the TTPs that provide partial funding of academia-industry collaborations that bring concrete R&D results into industrial use. In this case, the industrial partner Sparsity Technologies and the research partner Universitat Politècnica de Catalunya are working to make Scalable Community Detection on the Cloud (SCDC) a reality.

The general idea behind SCDC is to provide an scalable cloud service that when a user introduces to our system his network  he is going to get in the shortest time the most accurate communities inherently there. For those curious about our technology choices we are going to use a scalable architecture using Golang and Revel for the REST API, MongoDB to store the information and NSQ to process distributed queues.













Stay tuned for more information about the project. Remember that you can get the SCD algorithm at Github.

Posted in News, Research, SNA | Tagged , , | Comments Off on Scalable Community Detection on the Cloud: SCD made product

How to install & configure Sparksee iOS (Objective-C)

apple logo-01Sparksee is the first graph database available for iOS devices applications, available since 5.1 in both an Objective-C and C++ interfaces. In this article we will guide through a typical installation & configuration for the Objective-C, so you can start working with Sparksee in your mobile development environment in a few minutes. You can take a look at our C++ tutorial published here as well.


Step 1) Downloading Sparksee

Download your Sparksee mobile library from our website here:

We will send you an email on how to download your own copy. Downloads for mobile (like the licenses requests) are moderated, please wait until a support member contacts back.

Once you receive your download, uncompress the “.dmg” file to get the Sparksee.framework directory. The documentation will be available in the framework Resources/Documentation.

Step 2) Creating a new project linking Sparksee mobile library

  • Add the Sparksee.framework to the Link Binary With Libraries build phase of your application project. You can just drag it there.
  • Now, this step changes whether you are using C++ in your project or not:
    • For developers not using C++, you must explicitly add the right C++ standard library because the Sparksee library core depends on it. Click on the “+” sign of the same “Link Binary With Libraries” build phase of your application project, then select the appropiate C++ library (“libc++.dylib” for LLVM C++11 version or “libstdc++.6.dylib” for the GNU C++ version) and finally click the “Add” button.
    • For developers already using C++ in their project, choose the most appropriate library: libstdc++ (GNU C++ standard library) or libc++ (LLVM C++ standard library with C++11 support) in the C++ Standard Library option from the build settings of the compiler. This version must match the one you downloaded in first place.
  • Now import the header in your source code by adding:
    #import <Sparksee/Sparksee.h>
  • Take into account that after all these changes you may need to Clean your Project.

Run the empty application! That’s it! You now have a new project using Sparksee.

 Step 3) Other configuration considerations

Setting an explicit memory limit to the Sparksee cache is highly recommended. For more information about Sparksee configuration variables check the Configuration chapter in the User Manual.

Step 4) Initial steps

Now you should create a new Sparksee database,  follow this order of actions:

  • You should make some configuration steps before creating the database. First of all, create a new configuration class with STSSparkseeConfig *cfg = [[STSSparkseeConfig alloc] init];
  • Set the license code with [cfg setLicense: @”THE_LICENSE_KEY”]; Sparksee mobile only works with a valid key, you are going to get that code in the same email of your download.
  • Limit the Sparksee cache memory with [cfg setCacheMaxSize: smallsizeinMB]: Sparksee by default takes all the free available memory space, but that is something you surely may control in a mobile device.
  • Activate the recovery functionality with [cfg setRecoveryEnabled: TRUE]: The recovery is a helpful functionality that will allow you to recover all your data if any error occurs.
  • Set the log file with [cfg setLogFile: pathtothelogfile]: Specify a log full name&path where you have write permission.
  • Create the main Sparksee class with  STSSparksee *sparksee =[[STSSparksee alloc] initWithConfig: cfg]: take into account that the argument of this method is the SparkseeConfig created.
  • You have already configured Sparksee, now let’s create your database with STSDatabase *db = [sparksee create: pathtothedbfile alias: @”nameofthedatabase”].
  • It’s time to create a new session with STSSession *sess = [db createSession].
  • Graph objects and operations are available at Graph class level; you need to get the graph from the session with STSGraph *g = [sess getGraph].


Posted in Documentation, Sparksee, Sparksee mobile | Tagged , , , | Comments Off on How to install & configure Sparksee iOS (Objective-C)

Sparksee 5.1 new release!


We are proud to announce today the official release of Sparksee 5.1.

During this half of year we have been working in providing a group of exciting features for the new version that we hope you’ll find interesting:


  • A brand new Objective-C API for MacOS and iOS. Although we already offered a C++ API to work with Objective-C based projects, some of you noticed that it would be much better to directly be able to work with a Objective-C interface. In addition this new API would allow you to work with Swift projects.
  • Rollback functionality in our transactions. We have included the rollback functionality which brings us full ACID compliance to our database.
  • Enhanced compatibility with Blueprints. Blueprints is analogous to the JDBC, but for graph databases, providing a common set of interfaces to allow developers to plug-and-play their graph database backend. We have implemented the following elements in order to be fully compatible with the complete Tinkerpop stack:
    • Implementation of the full TransactionalGraph, thanks to our newer rollback functionality we are providing now this implementation and thus supporting Blueprints transactions.
    • Attribute scope to all nodes or to all edges. We are providing a new scope for our attributes allowing to have the same attribute type for all the nodes in the graph (or analogously to all edges). Sparksee now has the following scopes for attributes:
        • Attribute for a certain node or edge type (the most common)
        • Attribute for all nodes or all edges
        • Global attributes
    • Allow to begin transactions directly as write transactions. Sparksee by default (using a simple begin) starts transactions as read transactions and it does not change their state until an update is detected, if you need to avoid a “lost update” you can now use “begin update”.
  • Dynamic size adapting cache. Specially in mobile devices it may be relevant to be able to dynamically change the maximum size dedicated to the cache because there are certain situations where the OS would require the application to release memory in order to give it to another application, if your app fails to release it the process can be stopped. Sparksee nows offers several configuration methods to handle this dynamically.
  • Compatibility with Visual Studio 2013 for .Net and C++ developers, so you can work with the latest programming environment.
  • Improved documentation for Sparksee mobile. Included in the User Manual you will now find specifics about the installation and configuration of Sparksee for the iOS and Android most common development frameworks.

You can right now start taking advantage of all the new features downloading the last version for free. Sparksee 5.1 has retro compatibility, therefore if you are already developing with Sparksee 5.0 the only requirement is to switch one library for the other.

Don’t hesitate to contact us if you need more information in any of the new features and please consider registering for one of our license programs, we offer development programs for free during all the process to selected companies!

Posted in News, Sparksee | Tagged , , , , , , , , | Comments Off on Sparksee 5.1 new release!

Graph Databases research: Social community search

We would like today to share an interesting article recently published at DZone’s portal about the research the Sparksee team altogether with DAMA-UPC is working on about social community search.

Community search is a very important aspect of Social Network (SN) analysis. Communities are defined as tightly related groups of people, who, for instance, communicate or interact with the members of the community more intensely than with the rest of the population. SNs are complex representations of society and understanding their structure is key to be able to find those communities accurately.

Lately our research has focused on understanding the nature of social communities in order to use that knowledge to build a more accurate and fast communities finder in very large graphs. As a result of that on March 2014 at the WWW’14 we presented The Scalable Community Detection (SCD) algorithm.

SCD exploits one of the basic properties of SNs; they have a number of transitive relations (triangles) larger than other types of networks. The number of triangles in a specific community will be larger than in the whole SN. The basic idea of SCD is to search for those triangles and understand how they structure to form cohesive and structured communities around those triangles.


Comparing SCD results to other algorithms in the State of the Art, we can claim that it is fastest that Louvain algorithm (fastest so far) and more accurate that Oslom algorithm (which had the highest quality so far). Check the Dzone article to read more details about this claim and check DAMA-UPC website to download the code of this algorithm.

Are you interested in using Sparksee for your research? Go ahead and request your free license under our Research program.

Posted in Research, Sparksee | Comments Off on Graph Databases research: Social community search