Sparsity at the Smart City Expo World Congress 2015 presenting CIGO!

Sparsity will be at this next Smart City Expo World Congress edition that takes place in Barcelona from the 17th to the 19th of November.

The Smart City Expo World Congress (SCEWC) it’s a worldwide leading event for the smart city industry, with more than 350 renowed speakers, popular side events such as the 4YFN or the BcnRail Congress, exhibition area and smart network activities. The SCEWC attracts thousands each year, making it consistently the top event for exhibitors and visitors alike.

 The SCEWC seeks to give everyone involved the chance to be inspired, share experience and knowledge, strive for innovation, network and do business.

Sparsity will present our latest project called CIGO!

CIGO! is a cloud platform to ease the deployment of mobility policies towards an efficient, balanced and citizen-oriented Smart City.

CIGO! is aimed at:

  • City Governments: To deploy their mobility policies through mobile apps while monitoring KPIs.
  • Points of interest (POIs), shops & restaurants: To increase their exposure and awareness among citizens and tourists.
  • Citizens & tourists: To discover POIs, shops, restaurants and new areas avoiding crowded & overpriced places while getting personalised routes around the city.
  • Mobile Apps developers: To enrich their Apps with new content for their apps and recommendation functionalities.

Our coordinates are:

GV – P2 – Street D – Stand 44915 (Generalitat de Catalunya area)

Make sure to pay us a visit and learn more about CIGO! possibilities.

SCEWC 2015

Posted in Events, Mobility, Smart Cities | Tagged , , | Leave a comment

Sparsity involved in IT2Rail project: a step towards a new seamless multimodal travel experience


The IT2Rail -“Information Technologies for Shift2Rail” project is a first step towards the long term IP4 -“IT Solutions for Attractive Railway Services”, which aims at providing a new seamless travel experience, giving access to a complete multimodal travel offer which connects the first and last mile to long distance journeys.

The project will achieve its objectives by taking into account this concept: The traveller is placed at the heart of the solutions, accessing multimodal services like shopping, ticketing and tracking while using an open published framework providing full interoperability.

The following benefits will raise from IT2Rail:

  • Impact on the economics travel services providers escosystem: By supporting full semantic interoperability of interchangeable and loosely coupled tools, data and services, within a distributed ‘web of transportation things’, multiple concurrent implementations will be developed independently by specialist suppliers and co-exist competitively.
  • On time to market for innovations: By allowing an early discovery of potential technological, organisational or business process issues and reducing the risks of expensive redesign and rework.
  • Enrich passenger experience: By having an integrated use case for travellers, services retailers and transport operators.

Sparsity will be participating in the Interoperability Framework for its Semantic Query & Aggregation Engine, and with specially more stress on the Business Analytics Framework (BAF). The BAF is focused on leveraging social, mobile, structured and unstructured data to obtain valuable, actionable insights that allows rail operators, product/service providers, Traveller/Transport Enterprises to make better decisions in order to increase quality of service and revenues, to better adapt their level of service to the passengers demand and to optimise their operations in order to bring and retain more people on the train-urban mobility.

Keep informed about the project here: and via the hashtag #it2rail from Twitter.

Stay tuned for our next post about IT2Rail with more details about the Business Analytics Framework and our work.

Posted in European Project, Research | Tagged , , , , , , | Leave a comment

Help Sparsity win the first EC Innovation Radar prize

To put the spotlight on some of the brightest new ideas in European technology, the European Commission is awarding innovators agents for the first time with the Innovation Radar prize. The award recognises the most promising innovators emerging from EU funded projects in the area of information and communication technologies.

We need your support!

After being considered the European SME with highest innovation capacity in ICT by the EC, Sparsity is now among the 14 shortlisted companies to win the Innovation Radar prize. Taking into consideration the results of the public poll, the judges will announce the winner at the ICT 2015 event in Lisbon on the 20-22 Oct 2015.

Voting is easy: click on the links below for each language and search for Sparsity Technologies on the list.









We truly appreciate your support to make Sparsity grow, thank you very much for your help!

Posted in Events, News | Tagged , , , , , , , | Leave a comment

Understanding Graph Structure of Wikipedia for Query Expansion

Query Expansion









Knowledge bases are very good sources for knowledge extraction, the ability to create knowledge from structured and unstructured sources and use it to improve automatic processes as query expansion. Wikipedia, in particular, could be analyzed to see how articles and categories relate to each other and how these relationships can support a query expansion technique. In particular, the authors of this article show that the structures in the form of dense cycles with a minimum amount of categories tend to identify the most relevant information.

Let’s see a little overview on this approach. Read the complete article by Joan Guisado and Arnau Prat, presented during last GRADES 2015 in Melbourne.

Query expansion is the process of expanding a query issued by a user, introducing new terms, called expansion features, in order to improve the quality of the retrieved results. Query expansion is motivated by the assumption that the query introduced by the user is not the best to express its real intention. For example, vocabulary mismatch between queries and documents is one of the main causes of a poor precision in information retrieval systems. Poor results also arise from the topic inexperience of the users. The challenge is to properly select the best expansion features.

Wikipedia has been proven to be a good source for query expansion, but the innovation in this paper lies in the fact of considering the differences between a social network and a knowledge base by:

  • Creating a ground truth consisting of those articles in Wikipedia that provide good results for each of the queries that are the baseline in the experiments.
  • Analysing how the articles and categories of the ground truth are structured within the Wikipedia graph.
  • Identifying cycles of articles and categories as an important structure and also trends within them. 30% of the dense cycles with minimum ratio of categories, are tagged as the best expansion features.
  • Identifying challenging and open problems for graph processing technologies when it comes to exploit structures of large graphs such as Wikipedia

Query Expansion

A quick analysis of the query graphs reveals that they are, in general, disconnected graphs composed by a moderately large connected component. This is an interesting observation as it means that, in general, the terms users introduce in a search engine are semantically related either directly or by means of extra articles or categories. This suggests that Wikipedia, contains this semantic relation encoded within its structure, and therefore, can be exploited. Also, we observe that the largest connected component is clearly dominated by categories.

Read about the results in the experiments here.

If you are interested in query expansion, make sure to check our first post with the basis of this on-going research.

Also, if you are using graphs for your research don’t hesitate to request being part of our Research program where we grant free licenses of Sparksee.

Posted in Research | Tagged , , , , , | Leave a comment

Sparsity involved in SOMATCH EU project: bringing BI to SMEs of the fashion industry

We are glad to announce that Sparsity will take part of the SOMATCH EU project under the Horizon 2020 programme. The main objective of the project is the improvement of the competitiveness of European SMEs dedicated to fashion design and Textile & Clothing (T&C) sectors by providing an IT solution that will hand over to creative designers detailed and reliable fashion trends estimations and forecasts of user acceptance for clothing designs. This will be achieved with the creation of an innovative tool for the mining and visualization of large sets of unstructured data, regarding the use and preferences of fashion products by consumers, supporting T&C companies quick reaction to the market dynamics and better adaptation of design to real consumers’ demand.

SOMATCH faces its complex and challenging deal by the combined development and application of SoA advanced image analysis technology- unexploited and innovative in clothing and fashion- combined with social network analysis.


The visualisation of the generated data will be performed from off-line statistics, generated after data processing, and by new real-time instruments for image collection and designs evaluation. They will be targeted also by the integration of the systems with new SoA mobile devices to collect information and to visualise trend interpretation. This approach will open a vast field of new approaches for the fashion designers, supporting end users involvement into the whole trend evaluation and a close interaction with them.

Sparsity’s role in the project is to provide the Social Networks Analysis to detect influential trendsetters powered by Sparksee high-performance graph database management system. Other project partners include research centres focused on image and content analysis (Technical University of Munich, UPC-Barcelonatech), software providers with expertise in the fashion industry and platform development (iDeal, Holonix), end users from SME textile industry and retail (Dena Cashmere) and fashion-related social networking and e-commerce sites owners (Weblogs, Not Just a Label).

Posted in Research, Sparksee | Tagged , , , , , , | Leave a comment

Sparsity is the SME with the highest innovation capacity in Europe according to the European Commission

Sparsity Technologies is the SME with the highest innovation capacity in Europe according to European Commission’s first Innovation Radar (IR) Report published last week by the Joint Research Centre. The IR report is a support initiative that focuses on the identification of high-potential innovations in the ICT FP7, CIP and H2020 EU funded projects and the key organisation in delivering these innovations to the market. The report documents the details of the IR methodology and the results of its first application.

Between May 2014 and January 2015, the Commission reviewed 279 ICT projects, equivalent to the 10.6% of ICT projects funded by the EC, which had resulted in a total of 517 innovations, delivered by 544 organisations in 291 European cities.

Sparsity has been identified as a key organisation in two innovations:  the first one, providing a new method to partition any given graph into its community structure of strongly connected components, within the Linked Data Benchmark Council (LDBC) project whose goal was to create the first comprehensive suite of open, fair and vendor-neutral benchmarks for RDF/graph databases. The second, the development of the common query engine for the different “one size” data stores optimized for particular tasks, data, and workloads, within the CoherentPaaS project whose objective is to provide a rich PaaS (Platform as a service).

Top 10 SMEs and their innovations

Barcelona, where Sparsity has its headquarters, has been identified as the largest innovation hub in ICT in Europe, hosting a total of 19 organisations that are key players in innovation and beating London and Paris which each host 17 such organisations. According to the report, Germany, Spain and the UK are the countries with the most organisations that are key players in delivering innovations.

Sparsity wants to thank all the hard work and dedication of the talented Sparsity team that during all these years has made possible this great achievement!

Posted in News, Research | Tagged , , , , , | Leave a comment

Microblogging queries on graph databases

In the last edition of the GRADES Workshop co-located with SIGMOD/PODS Conference, a group of researchers from the RMIT University (Australia) presented the paper “Microblogging Queries on Graph Databases: An Introspection”. In this paper the authors shared their experience on executing a wide variety of micro-blogging queries on two popular graph databases: Neo4j and Sparksee(*). Their queries were designed to be relevant to popular applications making use of microblogging data such as from Twitter providing friend recommendations, analyzing user influence or finding co-occurrences. The queries were executed on a large real graph data set consisting of nearly 50 million nodes and 326 million edges. In this post we are going to discuss about the conclusions drawn by the researchers of the paper from the execution of 2 of the most relevant advanced queries: recommendation queries and influence queries.

Recommendation queries

User recommendation on microblogging sites like Twitter usually involves looking at 1-step and 2-step followers and/or followees, given that it is more probable to know or share interests with the local community that the friends of your friends (or followers) create rather than with an outsider. Taking this into account the paper describes the following recommendation query:

  • Q4.1 finds all the 2-step followees of a given user A, who A is not following. Such followees are recommended to A.

To implement Q4.1 Sparksee offers the neighbours operator which will return all the followers of a certain user hence all the neighbouring nodes for the edge follows for the given user. This would be a good example of the type of information to materialise at the creation phase of your database so you’ll have an index created to access to them and will result in faster retrieval queries. In this case the authors decided to avoid materialisation during the import phase to make it faster. It is always a trade-off between having a faster import/creation or better query performance that should be considered regarding each particular scenario. The result of executing Q4.1 are shown in Figure 1.


Figure 1: recommendation query (Q4.1) execution results.

Finding 2-step followees results in an explosion of nodes when 1-step followees have high out-degree forcing the systems to keep a large portion of the graph in memory. The authors explain the sudden spike in the plot for Neo4j with the fact that, the direct degree of the node in concern is much higher even though the number of rows returned are lower and they think noteworthy to mention that “Neo4j’s performance degrades with a large intermediate result in memory while Sparksee is able to take advantage of the graph already in memory observing less fluctuations with the output”.

When looking at Figure 1, one should also note the scale differences on the Y axis between the two plots, which can be misleading. The plot on the right (Sparksee) shows the average time in almost 2 orders of magnitude less than the plot on the left (Neo4j). So for example at 750k rows Sparksee’s average time is around 1.7 seconds, while Neo4j’s is around 47 seconds.


Influence queries

Trying to discover which is the current or potential influence of a user on his or her community is useful in a wide range of situations from affiliate marketing strategies to ad targeting. Although there are plenty of models of influence propagation the authors in the paper take an intuitive road defining it as:

  • Current influence: the most frequent users who mention someone and who are already followers of that user.
  • Potential influence: people who are most mentioning an user without being direct followers of that user.

Both in Neo4j and Sparksee this translates to finding the users who mentioned A, and removing (or retaining) the users who are already following A. The performance results of finding influencers are shown in Figure 2:


Figure 2: influence query (Q5.2) execution results.

Like in Figure 1, the plot on the right (Sparksee) shows the average time in 2 orders of magnitude less than the plot on the left (Neo4j), which means that similar plot profiles reflect significant performance differences. For instance users with 60K mentions on twitter are identified using Sparksee in only 0.3 seconds.

In the paper the authors also discuss other queries like the built-in shortest path algorithms of both databases. Sparksee has improved its performance in the latest version 5.2.

In case you are interested in more details, please check the complete article here. If you are interested in benchmarking Sparksee or using it to leverage your Research do not hesitate to ask us for a free license under our Research program!


(*) Experiments from the paper were conducted on a standard Intel Core 2 Duo 3.0 Ghz and 8GB RAM with a non-SSD HDD. Neo4j v2.2M03 & Sparksee v5.1

Posted in Research, Sparksee | Leave a comment

Sparksee 5.2 new release!

Sparksee 5.2 new features

We are proud to announce the release of Sparksee 5.2 today! Download your trial on our website or request a license for free.

New version includes the following features:

  • The fastest community detection algorithm: We have included the SCD algorithm which has been proven by the state-of-the-art to be the fastest & more precise community algorithm for unweighted graphs. It is now available as part of our graph algorithm package. Don’t forget to check our documentation to start using it!
  • Concurrency performance boost in read transactions: Changes in our internal structures allowing more concurrency at read level & reorganized locks to decrease its performance penalty.
  • Faster shortest path algorithm: For the BFS Shortest Path we have upgraded the mechanics of the algorithm to explore less nodes while improving its performance.
  • Added support for 64 bits processors on Android: We now support the 64bit architecture for Android development. Sparksee is the only available graph database for Android, take this advantage to leverage your mobile apps!

As always bug fixing and further improvement of the documentation has been implemented as well with the newest version, we would like to thank the feedback from our users!

Download Sparksee 5.2

Posted in News, Sparksee | Tagged , , , , , , | Leave a comment

Query Expansion by Exploiting Graph Properties for Image Retrieval

When searching images for a certain query, ambiguity problems may arise. For instance, when searching for the query “colored Volkswagen beetle” the most common search engines would retrieve some images with the famous car, but also the bug and even a bird that also recalls for that name. Query expansion would avoid this problem by using a context to the query and adding related concepts that may additionally enrich the results or increase its precision. For example and going back to the former example, it would be maybe more useful that the results would contain images of other cars similar to the Volkswagen beetle rather than the fauna images.

Query Expansion - Sparsity Technologies

Figure 1: Image retrieval with query expansion process diagram

In Figure 1 the process for image retrieval including Query expansion is shown. It starts with the original query Qo which will be the entry point for the Query Expansion module which will then search for several complimenting terms and phrases that will be combined to deliver a Qe enriched query. Qe will be introduced in the regular search engine. The big question here is which are those terms to enrich the query and how we do find them? Accordingly to the research “Massive Query Expansion by Exploiting Graph Knowledge Bases for Image Retrieval” by authors Joan Guisado-Gámez, David Dominguez-Sal and Josep L. Larriba-Pey the answer may be graph techniques.

Their method is based on the topological analysis of the graph built out of a knowledge base. They differ from previous works because they are not considering the links of each article individually, instead they are mining the global link structure of the knowledge base to find related terms using graph mining techniques. The technique consist in, first, locating the most relevant terms in the knowledge base, and connecting them by a path using the knowledge base relations, and second, extracting communities of concepts around the detected path. With this technique they are able to identify: (i) concepts that match with the user need, and (ii) a set of semantically related concepts. Matching concepts provide equivalent reformulations of the query that reduce the vocabulary mismatch. Semantically related concepts introduce a significant set of different terms that are likely to appear in a relevant document.

While in the first phase of this innovative Query Expansion technique it uses the shortest path to find the most direct way to connect a particular word to another related, on the second they are using community search on a graph to enrich the previously computed paths with articles that are closely related.

For the example query, the system finds 182 shortest paths using the English Wikipedia. Among them, nine score 3/2 that is the top score:

volkswagen→volkswagen beetle
volkswagen fox→volkswagen beetle
volkswagen passat→volkswagen beetle
volkswagen type 2→volkswagen beetle
volkswagen golf→volkswagen beetle
volkswagen jetta →volkswagen beetle
volkswagen touareg→volkswagen beetle
volkswagen golf mk4 →volkswagen beetle
volkswagen beetle→volkswagen transporter
The first path in the list is specially relevant because it connects the generic concept Volkswagen to the most specific context Volkswagen Beetle. The rest of the paths are also valuable because they refer to the real intent of the user and discard any path that contains articles about other interpretations of the term beetle.

For the second part of the technique, the most direct solution would be to enrich the path with all the neighbors of the Wikipedia articles. However, when they tested this naive solution, it did not work because it introduced articles that were loosely related to the path. Wikipedia articles have typically many links, and many of them refer to topics that have some type of relation but semantically are very distant. They implemented a community search algorithm to distinguish the semantically strong and weak links. A community in a graph is a set of closely linked nodes which are similar among them but are different from other nodes in the rest of the graph. On our example, on this second part of the query expansion process related terms such as German cars, Volkswagen group, vw bug, VW Type 1, Wolfsburg (which is the city where they were manufactured), Baja bugs (which refers to an original Volkswagen Beetle modified to operate off-road) or Cal Look (name used to refer to customized version of Volkswagen Beetle cars that follow a style coined in California in 1969) are added to enrich the results.

Check the complete article for the experimental results and precision measures!

Read more articles about research using graph databases in our blog. If you think your research will benefit from using a graph database consider using Sparksee, available for Phyton, Java, .Net, C++ and Objective-C. Request one of our FREE licenses under our Research Program.

Posted in Research | Tagged , , , , , , , | Leave a comment

Graph database use case: Insurance fraud detection

According to a fact sheet released by the Southwest Insurance Information Service (SIIS), Approximately 10% of all insurance claims are fraudulent, and nearly $80 billion in fraudulent claims are spent annually in the U.S., estimates the Coalition Against Insurance Fraud. Insurance fraud is certainly an issue that must be addressed given the benefits both the insurer and the insured will obtain from its prevention: the insurance buyer is able to receive coverage at a lower price, which gives the insurance company a competitive advantage.

Insurance fraud can be perpetrated by the seller or the buyer. Seller fraud occurs when the seller of a policy hijacks the usual process, in a way that maximizes his or her profit. Some examples are premium diversion, fee churning, ghost companies and worker’s compensation fraud. Buyer fraud occurs when the buyer deliberately invents or exaggerates a loss in order to obtain more coverage or receive payment for damages. Some examples are false medical history, murder for proceeds, post-dated life insurance and faking accidents.

Traditional methods to detect and prevent this form of fraud include duplicate testing, using date validation systems, calculating statistical parameters to identify outliers, using stratification or other types of analysis to identify unusual entries, and identifying gaps on sequential data. These methods are a great way to catch most of the casual, single fraudsters, but sophisticated fraud rings are usually well-organized and informed enough to avoid being spotted by the traditional means. They use layered “false” collusions in a similar way than money laundry rings.

In this scenario, where implementing alternative fraud detection methods is crucial, graph database management systems play a significant role. In the case of buyer fraud, the only way to catch the complex layered collusion performed by criminal rings is to analyze the relationships of the elements involved in the claim, which is a tedious task to perform on a relational database. While a RDBMS has to join a large number of tables –accidents, companies, drivers, lawyers, witnesses…- in complex schemes, a GDBMS only has to traverse the graph considering the relationships between the nodes, which is significantly more efficient, especially in cases that require querying large datasets. An example of buyer insurance fraud represented as a graph would be as follows:

graph post insurance 2

Example 1: Simple buyer insurance fraud case represented as a graph.

On this example, subjects 5 and 4 participate on both accidents in a direct way. Subject 1 is also related to both accidents in an indirect way, given that the car he drives is owned by the driver of Car 3, who is involved in Accident 2, which means Subject 1 and 5 must know each other. With the added value of social networks analytics— another scenario where graph DBMS usually outperform relational DBMS–we have one more clue pointing to our suspecting of fraud: Subject 3 and subject 6 are friends on Facebook, meaning that 5 out of 6 nodes of the graph are related to both accident 1 and accident 2 in some way, which should ring a bell to a possible fraud involved. In addition a graph database would make easy to add data from different sources in a changing schema and for instance move from subject 1 to 6 to see that they are in fact also related. To make it clear and easy to visualize we have used an example of a fraud ring that claims only two false accidents, but real cases of large fraud rings usually result in greater number of claims, where the relationships between the people involved can be hardly explained by coincidence.

A graph database could also be used to search for fraud across different insurance companies to find similarities in patterns and behaviors that could add value to an analysis like the one showed on the example above.

As we have said, a high performance graph database like Sparksee is a perfect match to deal with large amounts of data in situations where a deep relationships analysis is required. Remember that you can download it for free under evaluation or research license and use it for your own project. You can find more graph database use cases, scenarios and success stories searching for the “use case” tag on the blog or visiting the “scenarios” section of our website.



Posted in Sparksee, Use Case | Tagged , , , , | 1 Comment