Sparksee, the high performance graph database

sparsitygraph

Graph technologies are being used in many situations where it is necessary to analyze large amounts of highly linked data and high performance is required.  In many of those cases it is necessary to process data that will provide an important input for specific added value business queries at high performance levels.

Questions like: What are the communities in a Social Network which may be interested in my service? Do I have any restaurant around my current location that I would like? Could you show me the shortest way to get there?  Who are the most influential people in a scientific topic? Those queries come from very different environments like Social Networks, fleet routing through map analysis, bibliographic networks, etc.

The datasets emerging from those environments can be represented as graphs, and graph management solutions provide advantageous solutions for them. In particular, Sparksee provides the best compromise between high performance, capacity and small footprint.

Sparksee’s high performance graph management engine is designed to have a small software footprint and high data compression, answering queries dealing with billions of objects in off-the-shelve computing device in sub-seconds. Sparksee is powered by a patented research-based technology that makes an intensive use of bitmaps allowing for the use of simple logic operations and high data locality to solve graph analytics.

Sparksee 5 high performance graph database is available for Windows, Linux, MacOS, iOS and Android, with native APIs in Java, .Net (C#), C++ and Python.

Do you want to try out Sparksee for your specific use case? Do you think that other database management solutions are not providing enough performance or capacity for your data? Contact us and we’ll help you  to make the most of your data.

 

Posted in Sparksee | Tagged , , , | Leave a comment

Genezik is able to trace paths in music with DEX

The researchers at EPFL’s Signal Processing Laboratory propose a software that allows the exploration and rediscovering of your own musical library, called Genezik, which will be presented in the next Montreux Jazz Festival.

Instead of other approximations to recommend music based on tags or user reviews this software uses the music signal to identify those recommendations that may not be as obvious. For instance, although Led Zeppelin is sometimes classified as hard rock some of its songs are more near Bob Marley; Genezik is able to identify this relationships.

It also comprises a social compoment, where different users’ graphs are connected to each other, in order to use this network to recommend songs discovered by others.

Genezik successfully uses DEX as its graph database management system. Stay tuned for more news of this exciting new software by EPFL (École polytechnique fédérale de Lausanne)!

Check more information at: http://actu.epfl.ch/news/epfl-software-is-able-to-trace-paths-amidst-a-musi/

lts2LTS2 is a team of researchers working at the Department of Electrical Engineering of the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. Their research focuses on modern challenges in data processing.

Posted in DEX, News, Use Case | Tagged , , , , , , , , , | Leave a comment

Research with DEX: TreeBank browser, the syntactical structures specialized search engine

We are glad to share a new application powered by DEX for searching syntactical structures called TreeBank Brower implemented by the Institut Universitari de Lingüística Aplicada (IULA)  at the Universitat Pompeu Fabra (Barcelona).

The TreeBank browser is an interesting tool addressed to linguists, which contains a Spanish treebank with more than 42.000 sentences syntactically annotated. Dependency grammar is the formalism used to represent the syntactic information. Such formalism allows seeing a sentence as a graph, therefore all the syntactic information in the corpus is represented as a DEX directed graph, being the nodes the words in the corpus and the dependencies the relationship edges. The dependency is the annotation among related words, for instance the fragment “Sr. Salvatori vendió” would be represented by the following subgraph:

Treebank browser

 

 

“Sr. Salvatori ” and “vendió” are two nodes in a sentence of the corpus and the relationship between them is a dependency called “SUBJ” (Subject). It means that in a given sentence the main verb is “vendió” (to sell) and “Sr. Salvatori” is the subject who sold something.

All the sentences in the corpus have been semi-automatically analysed using a grammar with a predefined set of dependency relations such as subject, direct object, specifier, modifier, punctuation … Consider a more complete example in the following sentence “Además, la memoria es la base para el aprendizaje”. It has been analyzed and showed by the Treebank browser as follows:

Treebank browser

 

The TreeBank browser allows searching for sentences in the corpus that satisfies a user defined patterns. Such patterns take into consideration both dependencies and words information; the latter may include any combination of part-of-speech, word form and lemma. As an example, we may query for all the sentences in the corpus whose main verb is “establecer” with a modifier and has a common noun as SUBJECT.

Taking profit of the nature of the graph, there are no restrictions in the position of the elements in a search. Therefore, in the previous query, we will find a solution independently of the relative position of each item of the query in the sentences (ex. subjects/modifiers in preverbal or postverbal position). For each result, the user may download it in a standard tabulated form or as a graph by exporting it to a standard graphml format, which makes the information more attractive and readable.

Treebanks are a resource for developing a number of useful tools in the Natural Language Processing area like training of parsers and taggers, work on machine translation and speech recognition among many others.

For more information about the tool please take a look at the documentation available in the TreeBank browser website.

If you would also like to use DEX to power your research do not hesitate to contact us and request a free research license to join the research program!


—-

* This article appeared first on DZone

Posted in DEX, Research, Use Case | Tagged , , , , , , | Leave a comment

2nd GraphLab Workshop

Next 1st of July we will attend & participate in the 2nd workshop organized by the GraphLab. The aim of the workshop is to bring together people from the academia and scientist from the industry that have a special focus of large scale machine learning on sparse graphs.

The preliminary agenda can be checked here: http://graphlab.org/graphlab-workshop-2013/preliminary-agenda/

Join us there!

GraphLab logosparsity technologies logo

 

Posted in DEX, Events, News | Tagged , , , , , | Leave a comment

Research with DEX: Detecting social capitalists on Twitter using similarity measures

We would like today to share a very interesting article published by Nicolas Dugué and Anthony Perez from the University of Orleans about the detection of social capitalists on Twitter.

Social capitalists are those users that try to gain visibility by following users regardless of their content. Social capitalists are not healthy for social networks as they help spammers to gain visibility and may mislead influence detection.

In the article, they show that social capitalists can be detected using similarity measures, and that there is no need to analyze the tweets of the users, but rather the graph topology.

Another aim of the research was to focus on efficient & high-level techniques to store and handle very large graphs. After unsuccessfully evaluating SQL and other NOSQL technologies, such as Cassandra, they moved to graph databases which are better suited to quickly answer questions like retrieving the neighborhood of the nodes, which is essential in the computation of their algorithms. They chose DEX because in their own words “meets their requirements of efficiency and storage of large graphs” and “appeared as viable for several reasons: high-performance and graph oriented, a high-level API, and well-documented”. Nicolas Dugué and Anthony Perez research uses the Twitter graph, a spam graph and a list of 100.000 potential social capitalists: “using DEX we were able to store a graph containing about 15M vertices and 1B arcs”.

Some of the techniques used by social capitalists are “follow me and I follow you” or “I follow you, follow me”, making that the most of the users they follow should follow them back (overlap). On the other side, spammers wish to accumulate followers and then spread spam links. A previous paper about link-farming on Twitter and focus on spammers by Gosh et al introduced social capitalists as the users who most respond to request by spammers. Nicolas Dugué and Anthony Perez, use the previous results to contrast theirs using the proposed new faster detection techniques, obtaining an even bigger list of social capitalists.

To learn more about the detection of social capitalists, we highly engage you to read their article here: http://link.springer.com/chapter/10.1007/978-3-642-36844-8_1

If you would also like to use DEX to power your research do not hesitate to contact us and request a free research license to join the research program!

Posted in DEX, Research, Use Case | Leave a comment

Research with DEX: Detection of threats of insiders

What?

We would like to share an interesting article from the RMIT University in collaboration with the CA Labs from CA Technologies about the detection of threats of insiders. DEX graph database (now known as Sparksee) is used as the management system to power their analysis.

Insiders are those people who work, or have previously worked, in a company and intentionally misused the access to compromise some information available. A popular example is Wikileaks, and how the threat of insiders should be a concern for any company. Nowadays, with the outsourcing done with the “cloud computing”, it is more important to detect insider attacks than ever .

With this issue in mind, the researchers at RMIT and CA labs want to propose an analysis in order to detect deviations of users from normal behavior while accessing the systems, using DEX graph database in order to benefit from its capabilities to store huge volumes of data to be analyzed.

How?

From 3 years of logs (2008 to 2011) extracted from the SVN access of a certain CA program they obtained 700M lines of access logs, and 282 unique users. In order to deal with such huge numbers they chose DEX graph database management system, which allowed them to store the following databases:

  • Log database, with 700M nodes and 3500M edges, a really huge database with a total size of 305GB
  • Command database, storing the commands executed by the users accessing the SVN. This is a smaller database of 6GB total size

DEX graph databases were used in the cluster analysis to detect communities, based on the accessed resources, projects and the daily access patterns. They discovered that a deviation on the daily pattern can be an alert of a possible insider threat.

For more details about the analysis, conclusions and future work we recommend reading the complete article here.

Our congratulations to the researchers at the RMIT University & CA Labs for such an interesting investigation towards building more secure systems for companies.

If you are also interested in using DEX for your research, do not hesitate to join the research program!

—–
CA logo
CA Labs was established in 2005 to strengthen relationships between research communities and CA Technologies. CA Labs works closely with universities, professional associations and government organizations on various projects that relate to our company’s products, technologies and methodologies. The results of these projects vary from research publications, to best practices, to new directions for products.

Posted in DEX | Tagged , , , , , , , , , , | Leave a comment

BMAT uses DEX matching millions of music records

BMAT trusts Sparsity Technologies solutions to improve its matching engine so as to be able to identify, with the highest quality, all the music available worldwide.

BMAT listens, analyzes and processes more than 2000 radio and television broadcasters from around the world to generate precise music playlist information for every monitored channel. For BMAT, to have a duplicate-free, high-performance and reliable matching engine that is able to identify all the music in the world is key to providing a meaningful and trustworthy monitoring service for their clients.

Sparsity Technologies enables this by offering BMAT its Extreme Data solutions. Sparsity’s Graph Databases can manage and analyze billions of data points in real-time, providing an unprecedented level of performance as demonstrated in the best graph benchmarks in the Industry*.

Song titles, artists, collaborations, composers and all the metadata related with the music industry can be conveniently represented as a graph. BMAT uses DEX as the core for their matching engine where hundreds of millions of entities and relations are stored, indexed, processed and queried efficiently. BMAT includes music intrinsic characteristics (such as mood, tempo, rhythm, style, etc.) to each song to further improve the quality of the identifications.

Sparsity’s solutions are the only ones capable of this level of performance on commodity hardware. Traditional solutions, even modern Big Data solutions could just not cope with the complexity involved to support this level of analysis in near real-time, especially considering the combination of music metadata information and music characteristics.

Sparsity Technologies is proud to count with BMAT as a client of its Big Data solution DEX.

About BMAT

BMAT (Barcelona Music & Audio Technologies) was founded in February 2006 as a spin-off of MTG (Music Technology Group) of UPF (la Universitat Pompeu Fabra). MTG is the world-renowned research centre dedicated to developing audio and digital music technologies. From its headquarters in Barcelona, BMAT collaborates with more than 50 clients worldwide; clients include Samsung, Yamaha, Intel, Telefónica, EMI and Endemol. BMAT is also fastest-growing provider of music monitoring service of radios and televisions in the world, servicing more than 30 performing rights organizations and collecting societies globally.
For more information, please visit www.bmat.com

Posted in DEX, News | Tagged , , , , , , , , | Leave a comment

SNTalent selects Sparsity Technologies as the engine for its Next Generation recruiting services

SNTalent leverages Sparsity Technologies solutions to scale recruiting to hundreds of millions of CVs and job offers.

SNTalent and Sparsity Technologies have entered into an agreement whereby SNTalent will use Sparsity solutions and development platform to provide a world-class recruiting marketplace capable of scaling to 10’s of millions of CVs and job offers.

SNTalent has always been a pioneer in using technology as a key differentiator in the recruiting space, with solutions capable of searching candidates in middle management, technical and senior professionals, and matching these with open reqs. SNTalent now wants to scale massively to 10’s of millions of CVs on a global basis and has been looking for the ideal Big Data solution to support its core business.

Sparsity Technologies offers Extreme Data solutions. Sparsity’s Graph Databases can manage and analyze billions of data points in real-time, providing an unprecedented level of performance as demonstrated in the best graph benchmarks in the Industry*.

SNTalent has defined a very strong methodology for managing the lifecycle of job fulfilling. Key to this methodology is the shortlisting process, whereby a list of millions of CVs is dynamically reduced to the best 10 or 20 candidates based on customer specific criteria, SNTalent’s recruiting know-how, and relationship data. Allowing 1000’s of concurrent shortlistings in near real-time on 10’s of millions of CVs is clearly what positions SNTalent as the preferred marketplace and platform for recruiting globally.

Sparsity’s solutions are the only ones capable of this level of performance on commodity hardware. Traditional solutions, even modern Big Data solutions could just not cope with the complexity involved to support this level of analysis in near real-time, especially considering feedback loops from previous customer engagements, and relationships between candidates, sponsors and customers – typical requirements in the area of “social analysis”.

Sparsity Technologies is proud to have been selected by SNTalent as the provider of its Big Data solution.

* (Benchmarks are available at http://sparsity-technologies.com/blog/?p=196 and http://sparsity-technologies.com/blog/?p=228)

About SNTalent
SNTalent is a B2B company with headquarters in Spain, Germany and Brazil. We combine search technology in social media with a Marketplace of Recruitment 2.0 certified consultants, experts in attracting and selecting middle management, technical and senior professionals.

Posted in DEX, News | Tagged , , , , , , | Leave a comment

Graph Database Use Case: SNA (Social Network Analysis)

In this second release of the series of the use cases, we are looking through one of the most interesting scenarios for graph databases: Social Network Analysis (SNA).

DEX highest-performance with huge volumes of processed data, its flexibility and the nature of the graph, makes it the perfect solution for Social Network Analysis.

More info? We have created a new section in DEX site called Scenarios that contains a detailed explanation about the fields where Graph Databases are key, and which we have plenty of experience. We will be adding more, stay tuned!

Do not forget to check the list of features we believe SNA must cover. We welcome your feedback! Please tell us which do you think are SNA requirements and achievements and why graph databases could be a good solution.

If you think SNA is your area, we encourage you to evaluate DEX here, and do not hesitate to contact us at info@sparsity-technologies.com for additional support. Use our knowledge in the SNA field!.

Read also the first release of the Use Case series: Bibliographic exploration

Posted in DEX, Documentation | Tagged , , , , , , , , | Leave a comment

Sparsity Technologies new headquarters at Parc UPC K2M

Sparsity Technologies announces its new offices opening this April at the K2M (Knowledge to Market) building.

UPC Park was conceived with the mission to become a socioeconomic dynamic between UPC, administration and companies in order to promote research, innovation and transfer of technological progress and results.

You can now find us at floor 0 (hall level) offices 001a.

APTE also shares the news here: http://www.apte.org/es/noticia-parque529.cfm

Posted in News | Tagged , , , , , , , , | Leave a comment