Management of mobile device data

** Note: This is a curated article published first at Quora by Sparsity’s CEO Mr. Larriba Pey **

The content stored in Mobile Devices (MD) grows as the users evolve in their tastes, the trends in applications change and the needs for each work environment grow. This way, the users of MDs keep increasing the amount of data and metadata generated as well as the Apps installed in their device. Also, the users keep growing their interaction with applications like Twitter, Facebook or LinkedIn, increasing the amount of own data managed by third parties.

Time and practice will show that Mobile Graph Databases (M-GDBs) will be the perfect match to manage and query all those datasets for two reasons: the management of a single data repository will provide added-value linked data and the querying capabilities will be rocketed with M-GDBs.

Added value linked data. With M-GDBs, one single data management system will allow all the mobile Apps accessing a significant variety of data, turning this into added value linked data for the user including: friends, topics, metadata for image and video content, own data stored by third party applications, applications’ usage, GPS localisation, weather forecasts, etc.

For instance, using the M-GDB to automatically disambiguate the phone and e-mail contacts using the calls performed and the mails sent and received will provide a single source of increased-value Social Data. Going further, M-GDBs will also allow automatically linking the MD contacts with the data that can be obtained through the Social Network APIs from Twitter, Facebook or LinkedIn and others.

Once linked, it will be possible to enrich those Social data with metadata about the photos taken with the MD camera, the GPS information about current location or the weather information provided by third party public APIs.

Rocketing the querying capabilities. In addition to the capabilities of Relational DBs, M-GDBs will further allow graph oriented queries providing added value features.

Queries like the following will be easy to implement providing significant added value information: Among my closest contacts (friends or FOAFs), who have similar tastes than I so that I can send them the last photo taken with my MD? Is there a friend or a FOAF who lives in the place I am visiting and I could call or send a mail? Can I have attractions recommended using my friends or FOAFs social review information?

Sparksee 5 mobile is the only M-GDB in the market covering a full range of Operating Systems like Android, iOS and BB10. Request your download now!

If you are interested in mobile technology regarding big data and/or in-device analytics stay tuned for our Twitter next week, we are going to be at the 2015 edition of the international MWC (Mobile World Congress) that will be held at Barcelona sharing the latest developments.

Posted in Sparksee, Sparksee mobile | Tagged , , , , , | Leave a comment

Graph database use case: business intelligence applications of Indoor Positioning Systems

Estimote iBeacon and iPhone 6. Picture by Jonathan Nalder.

Estimote iBeacon and iPhone 6. Picture by Jonathan Nalder.

Since the Real Time Locating Systems (RTLS) entered the market back in the 1990s, there has been several attempts to create a reliable secure system to locate objects & people nearby in indoor environments. That is not surprising given that people spend most of the time inside buildings, where space-based satellite location systems like GPS suffer from signal attenuation. There has been a large number of technology approaches after RTLS, most of them based on radio waves and radio signals. It wasn’t until Bluetooth enabled devices became more popular, that the first beacon-based Indoor Positioning Systems (IPS) like Apple’s iBeacon and its android-based homologous Datzing entered the market, allowing for indoor location, mapping, geofencing and proximity detection.

These technologies bring the possibility to acquire relevant in-store behaviour data from customers, which could become a key factor to improve the customer’s experience while, for instance, shopping; developing new marketing strategies and boost the efficiency of the spatial organization of buildings and stores. Let’s see a couple of use cases where graph database technology could be key to develop high performance solutions for indoor positioning analysis applications.

Product placement optimization

For both examples we will consider a clothing store with several collections, with each collection placed in a certain spot. Sometimes customers may find the collections they like close to each other, but more frequently they not, resulting in a loss of interest and probably with one customer walking away. A beacon-based product placement optimization application could be the solution to this problem. Given the nature of the data, using a graph database like Sparksee could make a difference on the performance of such an application. Consider the example that follows:

When a customer is browsing a collection (e.g. stands more than 30 seconds in a collection spot) it becomes a node in the graph. Every time a customer goes from one collection to another we can create a weighted edge between them. If the same pattern of behavior is repeated (by the same user or another user) the relation between these collections is stronger, increasing that weight between the two nodes. We can then discover a path that optimizes the weight between two spots or nodes, navigating through all the nodes included in the graph, because we wish to place each of the collections inside the building. This is an example of an application for the optimal placement of certain products in a store, which could be used also to predict the location of further products.

In-store advertising

Another potential use case of graph databases and beacon-based Indoor Positioning Systems is presenting offers and ads based on prior customer behavior. From a marketing point of view, it is not efficient to advertise the same products to every client, given that they have different tastes and needs. Using the patterns that we acquired through the process described on the first use case, we could optimize the ads and offers that are presented to each customer. This would result in a better experience for the customer and in a greater probability of them purchasing the item announced. This ads could be presented via smartphone application or also through monitors placed on the walls in the store. Having a mobile graph database like Sparksee would allow the application to be updated based on the customer’s current movements in the shop and his and similar costumer’s previous behaviours on real time showing the customer ads that could trigger his attention to a particular part of the shop.

You can find more Sparksee use cases, tutorials and other useful resources in the Sparsity Technologies website, our blog or Sparsity’s social media channels. Also remember that you can download Sparksee for free and start using it for your own project.

Stay tuned and follow us for more graph databases use cases inspiration!

Posted in Sparksee, Sparksee mobile, Use Case | Tagged , , , , , , | Leave a comment

Sparksee’s seminar at BarcelonaTech

Sparsity will teach an introductory course to Sparksee for students at BarcelonaTech. The course is part of the Seminari d’empresa 2015 initiative that pretends to be a hub between IT companies and the university students so they can learn about the latest industry advances.

Sparksee’s course will be divided in three days of about 3 hours each part:

– Part 1: Introduction to Graph Databases & to Sparksee and why we claim the high performance for large volume of data. The first part of the seminar will include some interesting graph database use cases like root cause analysis or enterprise staff analysis.

– Part 2 : Hands-on tutorial that will cover the basics of Sparksee and the first graph operations. Students will learn how to create their first Sparksee graph database, add some data and work with its first low level operations.

– Part 3: Second part of the hands-on tutorial, where the students will face advanced queries such as page rank or finding communities, which will make visible the strengths of graph databases and how to take advantage of their characteristics to create higher performance solutions.

Sparsity is glad to be part of this BarcelonaTech initiative again for this 2015 edition to make graph databases more known among the University students.

Posted in Events, Sparksee | Tagged , , , , , , | Leave a comment

How & when to use the recovery functionality

On this new edition of Sparksee’s how-to series, we would like to highlight the recovery functionality that will keep your database save at all times and it’s specially recommended for first-time users.

Sparksee includes an automatic recovery manager which keeps the database safe for any eventuality. In case of application or system failures, the recovery manager is able to bring the database to a consistent state in the next restart.

By default the recovery manager is disabled but we recommend, specially for new Sparksee users, to enable it before starting to construct your first graph database. The recovery can be set at SparkseeConfig time, which should be your first line of code when creating your database(*):

 SparkseeConfig cfg = new SparkseeConfig();
 Sparksee sparksee = new Sparksee(cfg);

The recovery has the following variables to set:

  • Set it to true to enable the recovery.
  • Set the name & path of the recovery log file, otherwise it will be stored in the same path as your database. Remember that the extension for this file is .log
  • Set the time – in microseconds – when the recovery will copy the committed transactions at the recovery log. By default it’s 60 seconds (60000000).
  • Set the size of your recovery cache. We don’t recommend changing the default option.

Here is an example of a typical configuration for the recovery functionality:

SparkseeConfig cfg = new SparkseeConfig();
cfg.setRecoveryEnabled(true); // Enabling the recovery
cfg.setLogFile("recoverylogfile.log"); // it will be stored in the execution directory, same as your database
cfg.setRecoveryCheckpointTime(90000000); //we are setting it to 1.5 minutes

And why isn’t the recovery enabled on the first place? The recovery introduces a small penalty in the performance that strongly depends on the checkpoint time, therefore we allow the user with the knowledge about the characteristics of its application and its typical update patterns, to discern which compromise can be made in order to achieve the highest possible performance while keeping the database the most secure. If the user is actively aware of this functionality he will be able to take the maximum of it although the default parameters are used.

Don’t forget to tell us if you are using the recovery and how; your feedback is key to make Sparksee grow!

(*) Examples are shown in Java, please refer to your language of choice in the User Manual chapters Configuration & Maintenance and Monitoring.

Posted in Documentation, Sparksee | Tagged , , , | Leave a comment

Graph Database Use Case: Fraud detection

Fraud and financial crimes are a form of theft or larceny that occur when a person or entity takes money or property for their own use, or uses them in an illicit manner for their personal benefit. These crimes typically involve some form of deceit, subterfuge or the abuse of a position of trust, which distinguishes them from common theft or robbery.

For most countries, one of the financial crimes which is more difficult to prevent, detect and prosecute is money laundering. Money laundering is the process in which the proceeds of crime are transformed into apparently legitimate money or other assets. These kind of processes usually follow specific transaction patterns that can be simplified as the following (see figure 1):

1) Collecting the money coming from illegal activities.
2) Placing it into a depositary institution.
3) Adding a layer to the transaction (such as a payment of a false invoice or a loan to another company).
4) Integrating the money into the financial system by purchasing financial/industrial investments, luxury assets etc.


Figure 1 – Diagrammatic description of a money laundering scheme by ExplicitImplicity under CC-BY-SA-3 and GFDL.

All the information regarding these transactions is registered by the banks and financial entities that take part in the process, and it can be represented as a graph, being each entity (person, company, organization…) involved a node and each transaction an edge of the network. Then, a fraud detection application would compare the before-hand known transaction patterns of previous prosecuted fraud cases with the patterns of our network to analyze if there are common points between them. Figure 2 is an example of a graph representing a money laundering fraud.

money_laundering_graph (3)

Figure 2 – Money laundering graph example.

In this case, Subject X transfers the illicit proceeds to the associate Company Y (placement), which pays a false invoice coming from Company Z. Company Z makes a loan to Company Y for the same amount than the false invoice, adding a layer to the process and making the fraud more difficult to spot. More layers can be added at this point, for instance, purchasing chips on a casino and changing them again for their value. Then, Company Y invests on a legit financial institution to integrate the money into the financial system, and finally it withdraws the capital transferring the earnings back to Subject X, who receives the “clean” money. As you can glimpse from Figure 2 a graph representation of the information would help us to more easily identify the loop that makes Subject X suspect of a possible fraudulent transaction.

Although all the connections happen necessarily at a specific point of time -e.g. Company Y cannot transfer the “clean” money to Subject X before making all the other transactions- note that we don’t need this information to compare one pattern to another.

Other similar use cases involving graph databases for fraud detection include tax evasion and illegal funding, where the key aspect also lies into searching known irregular patterns in the transactions graph.

If you want to know more about graph database use cases, scenarios and success stories, you can search for the “use case” tag on the blog or visit the “scenarios” section of our website. Remember that you can download Sparksee 5.1 for free and use it for your project!

Posted in Sparksee, Use Case | Tagged , , , | Leave a comment

Recap of the year and future outlook for 2015


Approaching the end of this 2014 we believe it’s a good time to look back and take stock of all that we have been working on and happened to Sparksee on this year.

One of the most important hits for 2014 has been the release of Sparksee 5.1. Key features like the new Objective-C API, an enhanced compatibility with Blueprints, the dynamic size adapting cache, the compatibility with Visual Studio 2013 and the rollback functionality have meant a great step forward for our high performance graph database.

During the year, Sparsity has also started a Tetracom Technology Transfer Project and joined the European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC) in order to further improve academia-industry interaction. We have also kept involved with the Linked Data Benchmark Council (LDBC) and the Coherent PaaS European Projects.

2014 has also been a year full of interesting events related to the graph and NoSQL world. Sparsity attended GraphLab in San Francisco, NoSQL Matters in Barcelona, the LDBC TUC meeting in Athens, the ICT Proposers’ Day in Florence, SIGMOD and Grades in Salt Lake City and BizBarcelona and the MWC in Barcelona. Hope we were able to meet during one of the former events! In case we still haven’t, you will definitely find us in the 2015 graph database event arena.

Last but not least, we have been able to share and interact with all the Sparksee community through Twitter, Facebook, Google+ and our blog. During this year we have published 17 posts including tutorials, use cases, news & events, Sparksee technical details and research articles. Don’t hesitate to check them out using our archive on the sidebar.

A lot of positive things have already happened during the year, but there’s a lot more to come on 2015.  Sparsity will keep moving forward thanks to your feedback and contributions, to bring together our high performance solutions to the next level.

The Sparsity Technologies team wishes you the best for the holiday season and a happy new year 2015!

Posted in News, Sparksee | Tagged , , , , | Leave a comment

SNA: How to predict the most viral users with Sparksee

Social Network Analysis (SNA) is one of those Use Cases that everyone mentions when talking about the strengths of graph databases. It’s not a secret that the network of people interacting together makes instantly a good image of a graph in everyone’s head. Once you have constructed the social graph it opens plenty of possibilities to explore it wisely in order to effectively answer questions like the one we are going to deal in today’s post: how to discover whom is more likely to make my message viral in the network.

To give a more insight about how to construct a good algorithm that will find us the most viral users and which exploits the capabilities of the graph we are using the literature and will refer to the “greedy algorithm”. For those still not familiar with this algorithm let us introduce its definition.

The greedy algorithm starting with a solution tree is able to calculate those solutions that maximise a defined function f(n). Therefore for each iteration the algorithm will take a look at the child nodes of a certain source node and select the one that maximises the f(n) and move forward.

Figure 1.0 shows an example of an execution of the greedy algorithms. Blue nodes are the ones already included in our solution, yellow nodes are the ones being evaluated with our function f(n) (we also call these nodes candidates) and the white nodes that are those never visited and thus not evaluated. It’s of vital importance that we are able to establish the best heuristic for f(n) so the algorithm delivers the optimal solution. We can see how important is to tune the algorithm on the example shown in Figure 1.0 where we are looking for three consecutive nodes that maximise the sum of their values. A simple greedy algorithm will answer the blue nodes (5, 7 and 5) while a more optimal solution in our example would be nodes 5, 3 and 50.

greedy algorithm example (Viral users)

Figure 1.0 – Example of a Greedy Algorithm 

Let’s see then which ideas you could use to construct a good function for a greedy algorithm to discover the most viral users in our social network. For each node (users) you can evaluate a weight so the greedy algorithm can move through the ones that maximise that propagation weight. The measure of propagation should take into account things like the previous propagations of that user, that propagation could also be valued against the rest of propagations of the other users or the number of documents ever propagated by that user. Also one important matter that we could maybe consider are restricting to only previous propagations from a similar theme.

With all those ideas you should be able to tune your own and unique function of propagation that could then be used in an algorithm such as the following:

 Require: A graph G and a node N
 Ensure: I are infected by N
 1: I = empty set;
 2: P = pendent nodes with N queue;
 3: V = visited set;
 4: while P no empty do
 5: x P.dequeue();
 6: edges edges(x, source);
 7: for edge 2 edges do
 8: tail = edge.tail();
 9: if V not contains tail then
 10: V union( V, tail);
 11: P union( P, tail);
 12: if Math:random() > edge:weight() then
 13: if not tail 2 I then
 14: I union( I, tail);
 15: end if
 16: end if
 17: end if
 18: end for
 19: end while
 20: return I;

Hope you find our successful story of using Sparksee with this greedy algorithm to discover the most viral users inspiring for your Social Networks Analysis projects.

Download now Sparksee for free, start building your social graph to search for propagation constructing the algorithms explained here!

Posted in SNA, Sparksee | Leave a comment

Learning high-performance graph database management with Sparksee at the NoSQL matters Training Session

On Friday 21st of November from 9h to 13h Sparsity will host a Training Session as part of the NoSQL matters events.

Skilled trainers from Sparksee will explain to the attendees how to take advantage of the graph learning about the most common queries that are best suited to be answered using a graph. The training will take Twitter model and dataset to build the graph and then will cover queries to the resulting graph such as discovering how two twitter users are connected.


Attendees will be given a Netbeans project with Sparksee Java and a complete set of exercises to fill in the blanks. Also they will be gifted with a free development license to build graphs up to 1B objects and unlimited sessions during 6 months.

Looking forward to meet you at the NoSQL matters Training Session!

Posted in Events, News, Sparksee | Leave a comment

Sparsity is attending NoSQL Matters 2014


We are glad to announce that Sparsity Technologies is attending NoSQL Matters Conference 2014 that takes place in Barcelona on November 21st and 22nd. The conference will cover a broad spectrum of topics, including new products, use-cases and field reports of day-to-day operations of NoSQL infrastructures.

On November 21st, Arnau Prat and Joan Guisado will be giving a training session called Introduction to high performance graph data management with Sparksee. Attendees will learn how to model and load data as a graph, and discover the full potential of graph databases and what can they offer compared to other traditional database paradigms using Sparksee graph database. More details about this session will be explained in a following post next week.

On November 22nd, Sparsity’s CEO Josep Lluís Larriba-Pey will be giving the talk Graph databases go mobile, Sparksee 5 mobile use cases. He will present Sparksee mobile and explain a few use cases in the area of analytics for Social and Open Data where the use of graphs boosts job search, private recommendation, community search and personal tourist route planning.

You can still buy your tickets for both the Training Day and the Conference, but hurry up, there are just a few left!

We will also host a booth during the 22nd of November in the hall area, don’t forget to stop by and say hello.

The conference will take place at Casa Convalescència, C/ Sant Antoni Maria Claret 171, 08041, Barcelona. Looking forward to meeting you there!

Posted in Events, News, Sparksee, Sparksee mobile | Leave a comment

Scalable Community Detection on the Cloud: SCD made product

In our post Graph Databases research: Social community search we introduced the Scalable Community Detection (SCD) algorithm, born from the research work of the Sparksee team altogether with DAMA-UPC. The basic idea behind SCD is to search for the  number of transitive relations (triangles) and understand how they structure to form cohesive and structured communities. This leads to a more fast and accurate communities finder: faster than Louvain algorithm (fastest so far) and more accurate that Oslom algorithm (which had the highest quality so far).

Now we can proudly announce that we are taking the first steps towards commercialization of the SCD thanks to a Technology Transfer Project (TTP) provided by TETRACOM FP7 project. The mission of TETRACOM is to boost European academia-to-industry technology transfer (TT), and its main tool are the TTPs that provide partial funding of academia-industry collaborations that bring concrete R&D results into industrial use. In this case, the industrial partner Sparsity Technologies and the research partner Universitat Politècnica de Catalunya are working to make Scalable Community Detection on the Cloud (SCDC) a reality.

The general idea behind SCDC is to provide an scalable cloud service that when a user introduces to our system his network  he is going to get in the shortest time the most accurate communities inherently there. For those curious about our technology choices we are going to use a scalable architecture using Golang and Revel for the REST API, MongoDB to store the information and NSQ to process distributed queues.













Stay tuned for more information about the project. Remember that you can get the SCD algorithm at Github.

Posted in News, Research, SNA | Tagged , , | Leave a comment