Sparsity Technologies at GraphLab Conference 2014

2014GLconference_banner-sparsity

 

We are glad to announce that Sparsity Technologies is attending the GraphLab Conference 2014, the 3rd edition of the largest trade-show of graph based systems that takes place in San Francisco on July 21st.

Sparsity Technologies will be among the data scientists, software engineers and big data analytics thinkers from the companies, academic institutions and organizations leading the way in the graph arena.

Curated presenters at the GraphLab Conference cover several domains: graph analytic frameworks, graph databases (here you will find us among Neo4j, Titan, Franz and Objectivity), as well as graph visualization solutions, python data science tools and several interesting application on graphs for different fields.

The conference will take place at Hotel Nikko, 222 Mason Street in San Francisco (US), starting at 8:00AM, and you will find our stand at table 22.

Looking forward to meeting you there!

Posted in Events | Tagged , , , | Leave a comment

How to install & configure Sparksee mobile for Android

Sparksee is the first graph database available for Android devices applications, available in two interfaces java and C++. In this article we will guide through a typical installation & configuration for the java interface, so you can start working with Sparksee in your mobile development environment in a few minutes.

To make sure your application development meets our requirements, Sparksee Android requires Platform level 9 (Android 2.3) and an Android SDK, while it supports the processors armeabi, armeabi-v7a, x86, mips.

Although you are free to work with the development environment that best suits your needs, for this quick tutorial we are going to suppose that you are either using:

a) Eclipse & plugin ADT + SDK
b) Android Studio + SDK

You will need to update or have installed the API platforms and the emulator.

Step 1) Downloading Sparksee

Download your Sparksee mobile library from our website here: http://www.sparsity-technologies.com/#download

We will send you an email on how to download your own copy. Downloads for mobile (like the licenses requests) are moderated, please wait until a support member contacts back.

Once you receive your download, unzip it & check the contents:

READ.ME: Basic Android sparkseejava instructions.
LICENSE.txt: License details.
ReleaseNotes.txt: Current version changes.
lib: Dynamic libraries.
doc: Sparksee documentation.

Step 2) Creating a new project linking Sparksee mobile library

Create an Android application project using the IDE (Eclipse or Android Studio) called (for instance) HelloSparksee

Follow the READ.ME instructions:

Copy all the content of the “lib” directory to the “libs” directory of your android project.

a) Eclipse option
Right click on the “libs/sparkseejava.jar” file and select “Build Path > Add to Build Path”.
Set in your “AndroidManifest.xml” a minimum sdk version >= 9

b) Android Studio option
In AndroidStudio, right click on the “libs/sparkseejava.jar” file and select “Add as library…”.
Add this exact text “compile files(‘libs/sparkseejava.jar’)” to the “build.gradle” file.
Set a minimum sdk version >= 9 in the “build.gradle” file.

Run the empty application! That’s it! You now have a new project using Sparksee.

Step 3) Initial steps

Now you should create a new Sparksee database, to make that follow this order of actions:

  • Add the following import: import com.sparsity.sparksee.gdb.*;
  • Now you should make some configuration steps before creating the database. First of all, create a new configuration class with SparkseeConfig cfg = new SparkseeConfig();
  • Set the license code with SparkseeConfig.setLicense(key): Sparksee mobile only works with a valid key, you are going to get that code in the same email of your download.
  • Limit the Sparksee cache memory with SparkseeConfig.setCacheMaxSize(MB): Sparksee by default takes all the free available memory space, but that is something you surely may control in a phone device.
  • Activate the recovery functionality with SparkseeConfig.setRecoveryEnabled(true): The recovery is a helpful functionality that will allow you to recover all your data if any error occurs
  • Set the log file with SparkseeConfig.setLogFile(filename): Specify a log full name&path where you have write permission
  • Create the main Sparksee class with new Sparksee(cfg): take into account that the argument of this method is the SparkseeConfig created.
  • You have already configured Sparksee, now let’s create your database with Sparksee.create(filename)). Again you will need name&path as happens with the log.
  • It’s time to create a session session with Gdb.newSession()
  • Graph objects and operations are available at Graph class level; you need to get the graph from the session with Session.getGraph()

Done!

Continue creating your nodes, edges and interact with them, check our documentation to keep working. Make sure to close everything properly and in the following order: close session, close database and finally close sparksee.

Posted in Documentation, Sparksee mobile | Tagged , , | Leave a comment

Sparsity fuelled by research

ID-10065998Graph management and algorithmics are two of the hottest topics in top international data management conferences. As a by-product of some of the findings from those research topics, graph management technologies have flourished in the last 5 years. Sparksee (formerly called DEX) was born as such more than 8 years ago, being one of the precursors of graph management technologies. At the time, the seminal research work was done by DAMA-UPC research group at BarcelonaTech, which spun out to Sparsity Technologies a few years later.

Sparksee was born as the product of intense research on the representation of graphs for high performance management. The seminal paper on Sparksee dates back to 2007, where an analysis and comparison of Sparksee was done (read the seminal paper here). In line with this, Sparksee was designed to have a small software footprint and high data compression, allowing for the execution of complex graph queries on databases with billions of objects in off-the-shelve computing devices. Sparksee makes an intensive use of bitmaps allowing for the use of simple logic operations and high data locality to solve graph analytics. All the resulting technology from our research has been patented internationally.

As a consequence of the interesting field that the seminal paper unwrapped, Sparksee is currently supporting research on related areas that will surely help us improve & secure our product on the following versions, such as  community search algorithms, benchmarking, performance analysis, graph applications, query languages, graph algebras and query optimisation.

Keep tuned to the next chapters where we are going to talk a little bit more about all the research areas that Sparsity is working right now with very promising results.

 

Image courtesy of David Castillo Dominici / FreeDigitalPhotos.net

Posted in Research | Tagged , , | Leave a comment

Graph Databases power in-device analytical appplications: Sparksee Mobile, the first graph database available for iOS and Android

MobileMobile device data analytics is going to be an important issue in the next few years. Hardware improvements like more efficient batteries, larger memories and more conscious energy consumption will be crucial to allow for complex computations in such devices. Added to that, the analytics capability analytical engines embedded in a mobile device will also allow the users to gather and manage their own private data with analytic objectives at the tip of their fingers.

Graph databases will be important in that area with situations where the mobile device will have to solve different problems like the management of the mobile data, social network analytics, mobile device security, geo-localized medical surveillance and real time geo-localized travel companion services.

Sparksee 5 mobile is an important provider player for mobile analytics applications, being the first graph database for Android and iOS. Sparksee small footprint of less that 50Kbytes* makes it especially attractive for mobile devices along with its high performance capabilities and the compact storage space required. Sparksee is powered by a research-based technology that makes an intensive use of bitmaps allowing for the use of simple logic operations and remarkable data locality to solve graph analytics.

Do you want to  be among the first applications making real use of the device hardware possibilities? Have you considered resolving your analytical operations in the device, storing & querying the information in a graph database instead of having an external server? Let us know what do you think about this new possibilities and which applications do you think will benefit more of having an in-device real time process.

Download Sparksee graph database mobile for free at: http://sparsity-technologies.com/#download

 

*Size of Sparksee compiled software

Posted in Sparksee | Tagged , , | Leave a comment

Sparksee, the high performance graph database

sparsitygraph

Graph technologies are being used in many situations where it is necessary to analyze large amounts of highly linked data and high performance is required.  In many of those cases it is necessary to process data that will provide an important input for specific added value business queries at high performance levels.

Questions like: What are the communities in a Social Network which may be interested in my service? Do I have any restaurant around my current location that I would like? Could you show me the shortest way to get there?  Who are the most influential people in a scientific topic? Those queries come from very different environments like Social Networks, fleet routing through map analysis, bibliographic networks, etc.

The datasets emerging from those environments can be represented as graphs, and graph management solutions provide advantageous solutions for them. In particular, Sparksee provides the best compromise between high performance, capacity and small footprint.

Sparksee’s high performance graph management engine is designed to have a small software footprint and high data compression, answering queries dealing with billions of objects in off-the-shelve computing device in sub-seconds. Sparksee is powered by a patented research-based technology that makes an intensive use of bitmaps allowing for the use of simple logic operations and high data locality to solve graph analytics.

Sparksee 5 high performance graph database is available for Windows, Linux, MacOS, iOS and Android, with native APIs in Java, .Net (C#), C++ and Python.

Do you want to try out Sparksee for your specific use case? Do you think that other database management solutions are not providing enough performance or capacity for your data? Contact us and we’ll help you  to make the most of your data.

 

Posted in Sparksee | Tagged , , , | Leave a comment

Genezik is able to trace paths in music with DEX

The researchers at EPFL’s Signal Processing Laboratory propose a software that allows the exploration and rediscovering of your own musical library, called Genezik, which will be presented in the next Montreux Jazz Festival.

Instead of other approximations to recommend music based on tags or user reviews this software uses the music signal to identify those recommendations that may not be as obvious. For instance, although Led Zeppelin is sometimes classified as hard rock some of its songs are more near Bob Marley; Genezik is able to identify this relationships.

It also comprises a social compoment, where different users’ graphs are connected to each other, in order to use this network to recommend songs discovered by others.

Genezik successfully uses DEX as its graph database management system. Stay tuned for more news of this exciting new software by EPFL (École polytechnique fédérale de Lausanne)!

Check more information at: http://actu.epfl.ch/news/epfl-software-is-able-to-trace-paths-amidst-a-musi/

lts2LTS2 is a team of researchers working at the Department of Electrical Engineering of the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. Their research focuses on modern challenges in data processing.

Posted in DEX, News, Use Case | Tagged , , , , , , , , , | Leave a comment

Research with DEX: TreeBank browser, the syntactical structures specialized search engine

We are glad to share a new application powered by DEX for searching syntactical structures called TreeBank Brower implemented by the Institut Universitari de Lingüística Aplicada (IULA)  at the Universitat Pompeu Fabra (Barcelona).

The TreeBank browser is an interesting tool addressed to linguists, which contains a Spanish treebank with more than 42.000 sentences syntactically annotated. Dependency grammar is the formalism used to represent the syntactic information. Such formalism allows seeing a sentence as a graph, therefore all the syntactic information in the corpus is represented as a DEX directed graph, being the nodes the words in the corpus and the dependencies the relationship edges. The dependency is the annotation among related words, for instance the fragment “Sr. Salvatori vendió” would be represented by the following subgraph:

Treebank browser

 

 

“Sr. Salvatori ” and “vendió” are two nodes in a sentence of the corpus and the relationship between them is a dependency called “SUBJ” (Subject). It means that in a given sentence the main verb is “vendió” (to sell) and “Sr. Salvatori” is the subject who sold something.

All the sentences in the corpus have been semi-automatically analysed using a grammar with a predefined set of dependency relations such as subject, direct object, specifier, modifier, punctuation … Consider a more complete example in the following sentence “Además, la memoria es la base para el aprendizaje”. It has been analyzed and showed by the Treebank browser as follows:

Treebank browser

 

The TreeBank browser allows searching for sentences in the corpus that satisfies a user defined patterns. Such patterns take into consideration both dependencies and words information; the latter may include any combination of part-of-speech, word form and lemma. As an example, we may query for all the sentences in the corpus whose main verb is “establecer” with a modifier and has a common noun as SUBJECT.

Taking profit of the nature of the graph, there are no restrictions in the position of the elements in a search. Therefore, in the previous query, we will find a solution independently of the relative position of each item of the query in the sentences (ex. subjects/modifiers in preverbal or postverbal position). For each result, the user may download it in a standard tabulated form or as a graph by exporting it to a standard graphml format, which makes the information more attractive and readable.

Treebanks are a resource for developing a number of useful tools in the Natural Language Processing area like training of parsers and taggers, work on machine translation and speech recognition among many others.

For more information about the tool please take a look at the documentation available in the TreeBank browser website.

If you would also like to use DEX to power your research do not hesitate to contact us and request a free research license to join the research program!


—-

* This article appeared first on DZone

Posted in DEX, Research, Use Case | Tagged , , , , , , | Leave a comment

2nd GraphLab Workshop

Next 1st of July we will attend & participate in the 2nd workshop organized by the GraphLab. The aim of the workshop is to bring together people from the academia and scientist from the industry that have a special focus of large scale machine learning on sparse graphs.

The preliminary agenda can be checked here: http://graphlab.org/graphlab-workshop-2013/preliminary-agenda/

Join us there!

GraphLab logosparsity technologies logo

 

Posted in DEX, Events, News | Tagged , , , , , | Leave a comment

Research with DEX: Detecting social capitalists on Twitter using similarity measures

We would like today to share a very interesting article published by Nicolas Dugué and Anthony Perez from the University of Orleans about the detection of social capitalists on Twitter.

Social capitalists are those users that try to gain visibility by following users regardless of their content. Social capitalists are not healthy for social networks as they help spammers to gain visibility and may mislead influence detection.

In the article, they show that social capitalists can be detected using similarity measures, and that there is no need to analyze the tweets of the users, but rather the graph topology.

Another aim of the research was to focus on efficient & high-level techniques to store and handle very large graphs. After unsuccessfully evaluating SQL and other NOSQL technologies, such as Cassandra, they moved to graph databases which are better suited to quickly answer questions like retrieving the neighborhood of the nodes, which is essential in the computation of their algorithms. They chose DEX because in their own words “meets their requirements of efficiency and storage of large graphs” and “appeared as viable for several reasons: high-performance and graph oriented, a high-level API, and well-documented”. Nicolas Dugué and Anthony Perez research uses the Twitter graph, a spam graph and a list of 100.000 potential social capitalists: “using DEX we were able to store a graph containing about 15M vertices and 1B arcs”.

Some of the techniques used by social capitalists are “follow me and I follow you” or “I follow you, follow me”, making that the most of the users they follow should follow them back (overlap). On the other side, spammers wish to accumulate followers and then spread spam links. A previous paper about link-farming on Twitter and focus on spammers by Gosh et al introduced social capitalists as the users who most respond to request by spammers. Nicolas Dugué and Anthony Perez, use the previous results to contrast theirs using the proposed new faster detection techniques, obtaining an even bigger list of social capitalists.

To learn more about the detection of social capitalists, we highly engage you to read their article here: http://link.springer.com/chapter/10.1007/978-3-642-36844-8_1

If you would also like to use DEX to power your research do not hesitate to contact us and request a free research license to join the research program!

Posted in DEX, Research, Use Case | Leave a comment

Research with DEX: Detection of threats of insiders

What?

We would like to share an interesting article from the RMIT University in collaboration with the CA Labs from CA Technologies about the detection of threats of insiders. DEX graph database (now known as Sparksee) is used as the management system to power their analysis.

Insiders are those people who work, or have previously worked, in a company and intentionally misused the access to compromise some information available. A popular example is Wikileaks, and how the threat of insiders should be a concern for any company. Nowadays, with the outsourcing done with the “cloud computing”, it is more important to detect insider attacks than ever .

With this issue in mind, the researchers at RMIT and CA labs want to propose an analysis in order to detect deviations of users from normal behavior while accessing the systems, using DEX graph database in order to benefit from its capabilities to store huge volumes of data to be analyzed.

How?

From 3 years of logs (2008 to 2011) extracted from the SVN access of a certain CA program they obtained 700M lines of access logs, and 282 unique users. In order to deal with such huge numbers they chose DEX graph database management system, which allowed them to store the following databases:

  • Log database, with 700M nodes and 3500M edges, a really huge database with a total size of 305GB
  • Command database, storing the commands executed by the users accessing the SVN. This is a smaller database of 6GB total size

DEX graph databases were used in the cluster analysis to detect communities, based on the accessed resources, projects and the daily access patterns. They discovered that a deviation on the daily pattern can be an alert of a possible insider threat.

For more details about the analysis, conclusions and future work we recommend reading the complete article here.

Our congratulations to the researchers at the RMIT University & CA Labs for such an interesting investigation towards building more secure systems for companies.

If you are also interested in using DEX for your research, do not hesitate to join the research program!

—–
CA logo
CA Labs was established in 2005 to strengthen relationships between research communities and CA Technologies. CA Labs works closely with universities, professional associations and government organizations on various projects that relate to our company’s products, technologies and methodologies. The results of these projects vary from research publications, to best practices, to new directions for products.

Posted in DEX | Tagged , , , , , , , , , , | Leave a comment