Research with DEX: Detecting social capitalists on Twitter using similarity measures

We would like today to share a very interesting article published by Nicolas Dugué and Anthony Perez from the University of Orleans about the detection of social capitalists on Twitter.

Social capitalists are those users that try to gain visibility by following users regardless of their content. Social capitalists are not healthy for social networks as they help spammers to gain visibility and may mislead influence detection.

In the article, they show that social capitalists can be detected using similarity measures, and that there is no need to analyze the tweets of the users, but rather the graph topology.

Another aim of the research was to focus on efficient & high-level techniques to store and handle very large graphs. After unsuccessfully evaluating SQL and other NOSQL technologies, such as Cassandra, they moved to graph databases which are better suited to quickly answer questions like retrieving the neighborhood of the nodes, which is essential in the computation of their algorithms. They chose DEX because in their own words “meets their requirements of efficiency and storage of large graphs” and “appeared as viable for several reasons: high-performance and graph oriented, a high-level API, and well-documented”. Nicolas Dugué and Anthony Perez research uses the Twitter graph, a spam graph and a list of 100.000 potential social capitalists: “using DEX we were able to store a graph containing about 15M vertices and 1B arcs”.

Some of the techniques used by social capitalists are “follow me and I follow you” or “I follow you, follow me”, making that the most of the users they follow should follow them back (overlap). On the other side, spammers wish to accumulate followers and then spread spam links. A previous paper about link-farming on Twitter and focus on spammers by Gosh et al introduced social capitalists as the users who most respond to request by spammers. Nicolas Dugué and Anthony Perez, use the previous results to contrast theirs using the proposed new faster detection techniques, obtaining an even bigger list of social capitalists.

To learn more about the detection of social capitalists, we highly engage you to read their article here: http://link.springer.com/chapter/10.1007/978-3-642-36844-8_1

If you would also like to use DEX to power your research do not hesitate to contact us and request a free research license to join the research program!

Posted in DEX, Research, Use Case | Leave a comment

Research with DEX: Detection of threats of insiders

What?

We would like to share an interesting article from the RMIT University in collaboration with the CA Labs from CA Technologies about the detection of threats of insiders. DEX graph database (now known as Sparksee) is used as the management system to power their analysis.

Insiders are those people who work, or have previously worked, in a company and intentionally misused the access to compromise some information available. A popular example is Wikileaks, and how the threat of insiders should be a concern for any company. Nowadays, with the outsourcing done with the “cloud computing”, it is more important to detect insider attacks than ever .

With this issue in mind, the researchers at RMIT and CA labs want to propose an analysis in order to detect deviations of users from normal behavior while accessing the systems, using DEX graph database in order to benefit from its capabilities to store huge volumes of data to be analyzed.

How?

From 3 years of logs (2008 to 2011) extracted from the SVN access of a certain CA program they obtained 700M lines of access logs, and 282 unique users. In order to deal with such huge numbers they chose DEX graph database management system, which allowed them to store the following databases:

  • Log database, with 700M nodes and 3500M edges, a really huge database with a total size of 305GB
  • Command database, storing the commands executed by the users accessing the SVN. This is a smaller database of 6GB total size

DEX graph databases were used in the cluster analysis to detect communities, based on the accessed resources, projects and the daily access patterns. They discovered that a deviation on the daily pattern can be an alert of a possible insider threat.

For more details about the analysis, conclusions and future work we recommend reading the complete article here.

Our congratulations to the researchers at the RMIT University & CA Labs for such an interesting investigation towards building more secure systems for companies.

If you are also interested in using DEX for your research, do not hesitate to join the research program!

—–
CA logo
CA Labs was established in 2005 to strengthen relationships between research communities and CA Technologies. CA Labs works closely with universities, professional associations and government organizations on various projects that relate to our company’s products, technologies and methodologies. The results of these projects vary from research publications, to best practices, to new directions for products.

Posted in DEX | Tagged , , , , , , , , , , | Leave a comment

BMAT uses DEX matching millions of music records

BMAT trusts Sparsity Technologies solutions to improve its matching engine so as to be able to identify, with the highest quality, all the music available worldwide.

BMAT listens, analyzes and processes more than 2000 radio and television broadcasters from around the world to generate precise music playlist information for every monitored channel. For BMAT, to have a duplicate-free, high-performance and reliable matching engine that is able to identify all the music in the world is key to providing a meaningful and trustworthy monitoring service for their clients.

Sparsity Technologies enables this by offering BMAT its Extreme Data solutions. Sparsity’s Graph Databases can manage and analyze billions of data points in real-time, providing an unprecedented level of performance as demonstrated in the best graph benchmarks in the Industry*.

Song titles, artists, collaborations, composers and all the metadata related with the music industry can be conveniently represented as a graph. BMAT uses DEX as the core for their matching engine where hundreds of millions of entities and relations are stored, indexed, processed and queried efficiently. BMAT includes music intrinsic characteristics (such as mood, tempo, rhythm, style, etc.) to each song to further improve the quality of the identifications.

Sparsity’s solutions are the only ones capable of this level of performance on commodity hardware. Traditional solutions, even modern Big Data solutions could just not cope with the complexity involved to support this level of analysis in near real-time, especially considering the combination of music metadata information and music characteristics.

Sparsity Technologies is proud to count with BMAT as a client of its Big Data solution DEX.

About BMAT

BMAT (Barcelona Music & Audio Technologies) was founded in February 2006 as a spin-off of MTG (Music Technology Group) of UPF (la Universitat Pompeu Fabra). MTG is the world-renowned research centre dedicated to developing audio and digital music technologies. From its headquarters in Barcelona, BMAT collaborates with more than 50 clients worldwide; clients include Samsung, Yamaha, Intel, Telefónica, EMI and Endemol. BMAT is also fastest-growing provider of music monitoring service of radios and televisions in the world, servicing more than 30 performing rights organizations and collecting societies globally.
For more information, please visit www.bmat.com

Posted in DEX, News | Tagged , , , , , , , , | Leave a comment

SNTalent selects Sparsity Technologies as the engine for its Next Generation recruiting services

SNTalent leverages Sparsity Technologies solutions to scale recruiting to hundreds of millions of CVs and job offers.

SNTalent and Sparsity Technologies have entered into an agreement whereby SNTalent will use Sparsity solutions and development platform to provide a world-class recruiting marketplace capable of scaling to 10’s of millions of CVs and job offers.

SNTalent has always been a pioneer in using technology as a key differentiator in the recruiting space, with solutions capable of searching candidates in middle management, technical and senior professionals, and matching these with open reqs. SNTalent now wants to scale massively to 10’s of millions of CVs on a global basis and has been looking for the ideal Big Data solution to support its core business.

Sparsity Technologies offers Extreme Data solutions. Sparsity’s Graph Databases can manage and analyze billions of data points in real-time, providing an unprecedented level of performance as demonstrated in the best graph benchmarks in the Industry*.

SNTalent has defined a very strong methodology for managing the lifecycle of job fulfilling. Key to this methodology is the shortlisting process, whereby a list of millions of CVs is dynamically reduced to the best 10 or 20 candidates based on customer specific criteria, SNTalent’s recruiting know-how, and relationship data. Allowing 1000’s of concurrent shortlistings in near real-time on 10’s of millions of CVs is clearly what positions SNTalent as the preferred marketplace and platform for recruiting globally.

Sparsity’s solutions are the only ones capable of this level of performance on commodity hardware. Traditional solutions, even modern Big Data solutions could just not cope with the complexity involved to support this level of analysis in near real-time, especially considering feedback loops from previous customer engagements, and relationships between candidates, sponsors and customers – typical requirements in the area of “social analysis”.

Sparsity Technologies is proud to have been selected by SNTalent as the provider of its Big Data solution.

* (Benchmarks are available at http://sparsity-technologies.com/blog/?p=196 and http://sparsity-technologies.com/blog/?p=228)

About SNTalent
SNTalent is a B2B company with headquarters in Spain, Germany and Brazil. We combine search technology in social media with a Marketplace of Recruitment 2.0 certified consultants, experts in attracting and selecting middle management, technical and senior professionals.

Posted in DEX, News | Tagged , , , , , , | Leave a comment

Graph Database Use Case: SNA (Social Network Analysis)

In this second release of the series of the use cases, we are looking through one of the most interesting scenarios for graph databases: Social Network Analysis (SNA).

DEX highest-performance with huge volumes of processed data, its flexibility and the nature of the graph, makes it the perfect solution for Social Network Analysis.

More info? We have created a new section in DEX site called Scenarios that contains a detailed explanation about the fields where Graph Databases are key, and which we have plenty of experience. We will be adding more, stay tuned!

Do not forget to check the list of features we believe SNA must cover. We welcome your feedback! Please tell us which do you think are SNA requirements and achievements and why graph databases could be a good solution.

If you think SNA is your area, we encourage you to evaluate DEX here, and do not hesitate to contact us at info@sparsity-technologies.com for additional support. Use our knowledge in the SNA field!.

Read also the first release of the Use Case series: Bibliographic exploration

Posted in DEX, Documentation | Tagged , , , , , , , , | Leave a comment

Sparsity Technologies new headquarters at Parc UPC K2M

Sparsity Technologies announces its new offices opening this April at the K2M (Knowledge to Market) building.

UPC Park was conceived with the mission to become a socioeconomic dynamic between UPC, administration and companies in order to promote research, innovation and transfer of technological progress and results.

You can now find us at floor 0 (hall level) offices 001a.

APTE also shares the news here: http://www.apte.org/es/noticia-parque529.cfm

Posted in News | Tagged , , , , , , , , | Leave a comment

How to use DEX algorithm package

The latest version of DEX includes the helpful algorithm package that give more high-level operations to the API.

Here you can find the list of algorithms explained including examples of use for the Java and .NET APIs:

Traversals algorithms
To traverse a graph is to visit the nodes included in the graph. You can choose between DFS or BFS techniques.

DFS (depth first search) is a technique were the nodes are visited starting at the root and selecting one of the neighbors’ nodes which are explored as far as possible along each branch before backtracking.

BFS (breadth first search) is a technique were the nodes are visited starting at the root which all its neighbors are explored and so on.

For both techniques you can restrict the visit by a certain type of nodes or only navigating through a certain type of edges.

Java example:

System.out.println("Traversal BFS");
// Create a new BFS traversal from the node "startingNode"
TraversalBFS bfs = new TraversalBFS(sess, startingNode);
// Allow the use of all the node types
bfs.addAllNodeTypes();
// Allow the use of all the edge types but only in outgoing direction
bfs.addAllEdgeTypes(EdgesDirection.Outgoing);
// Limit the depth to 3 hops from the starting node
bfs.setMaximumHops(3);
// Get the nodes
while (bfs.hasNext())
{
long nodeid = bfs.next();
int depth = bfs.getCurrentDepth();
System.out.println("Node "+nodeid+" at depth "+depth+".");
}
// Close the traversal
bfs.close();

The same with the TraversalDFS method.

.Net example:

System.Console.WriteLine("Traversal BFS");
// Create a new BFS traversal from the node "startingNode"
TraversalBFS bfs = new TraversalBFS(sess, startingNode);
// Allow the use of all the node types
bfs.AddAllNodeTypes();
// Allow the use of all the edge types but only in outgoing direction
bfs.AddAllEdgeTypes(EdgesDirection.Outgoing);
// Limit the depth to 3 hops from the starting node
bfs.SetMaximumHops(3);
// Get the nodes
while (bfs.HasNext())
{
long nodeid = bfs.Next();
int depth = bfs.GetCurrentDepth();
System.Console.WriteLine("Node "+nodeid+" at depth "+depth+".");
}
// Close the traversal
bfs.Close();

The same with the TraversalDFS method.

Find shortest path algorithms
Find the shortest way to travel from one node to another. The APIs offer two techniques BFS or Dijkstra. Dijkstra is the one to use if you have weights, that matter in the path retrieval, in the edges whileas BFS is the one to use otherwise.

Java example:

System.out.println("SinglePairShortestPath BFS");
// Create a new unweighted shortest path from "startingNode" to "endingNode"
SinglePairShortestPathBFS spBFS = new SinglePairShortestPathBFS(sess, startingNode, endingNode);
// Allow the use of all the edge types in Any direction
spBFS.addAllEdgeTypes(EdgesDirection.Any);
// Allow the use of all the node types
spBFS.addAllNodeTypes();
// Calculate the shortest path
spBFS.run();
// Check the path if it exists
if (spBFS.exists())
{
// Get the total path cost
System.out.println("A shortest path exists with cost: "+spBFS.getCost()+".");
// Get the path
OIDList pathAsNodes = spBFS.getPathAsNodes();
OIDListIterator pathIt = pathAsNodes.iterator();
while (pathIt.hasNext())
{
long nodeid = pathIt.next();
System.out.println("Node: "+nodeid);
}
}
else
{
System.out.println("No path found");
}
// Close the shortest path
spBFS.close();

Analogously the Dijkstra method.

.Net example:

System.Console.WriteLine("SinglePairShortestPath BFS");
// Create a new unweighted shortest path from "startingNode" to "endingNode"
SinglePairShortestPathBFS spBFS = new SinglePairShortestPathBFS(sess, startingNode, endingNode);
// Allow the use of all the edge types in Any direction
spBFS.AddAllEdgeTypes(EdgesDirection.Any);
// Allow the use of all the node types
spBFS.AddAllNodeTypes();
// Calculate the shortest path
spBFS.Run();
// Check the path if it exists
if (spBFS.Exists())
{
// Get the total path cost
System.Console.WriteLine("A shortest path exists with cost: "+spBFS.GetCost()+".");
// Get the path
OIDList pathAsNodes = spBFS.GetPathAsNodes();
OIDListIterator pathIt = pathAsNodes.Iterator();
while (pathIt.HasNext())
{
long nodeid = pathIt.Next();
System.Console.WriteLine("Node: "+nodeid);
}
}
else
{
System.Console.WriteLine("No path found");
}
// Close the shortest path
spBFS.Close();

Analogously the Dijkstra method.

Connected components algorithms
Connectivity shows in which degree a group of nodes are connected to each other. With DEX you can find strongy connected components using Gabow technique or weakly connected components using DFS technique.

Java example:

System.out.println("Weak Connectivity DFS");
// Create a new WeakConnectivityDFS
WeakConnectivityDFS weakConnDFS = new WeakConnectivityDFS(sess);
// Allow the user of all the edge types
weakConnDFS.addAllEdgeTypes();
// Allow the use of all the node types
weakConnDFS.addAllNodeTypes();
// Don't set a materialized attribute
// Calculate the weakly connected components
weakConnDFS.run();
// Get the connected components
ConnectedComponents weakCC = weakConnDFS.getConnectedComponents();
long numWeakComponents = weakCC.getCount();
System.out.println("Weakly connnected componennts: "+numWeakComponents);
for (long ii = 0; ii < weakCC.getCount(); ii++)
{
Objects ccNodes = weakCC.getNodes(ii);
long numNodes = ccNodes.count();
System.out.println("Connected component "+ii+" has "+numNodes+" nodes.");
ccNodes.close();
}
// Close the connected components
weakCC.close();
// Close the WeakConnectivityDFS
weakConnDFS.close();

Analogously the StrongConnectivityGabow method.

.Net example:

System.Console.WriteLine("Weak Connectivity DFS");
// Create a new WeakConnectivityDFS
WeakConnectivityDFS weakConnDFS = new WeakConnectivityDFS(sess);
// Allow the user of all the edge types
weakConnDFS.AddAllEdgeTypes();
// Allow the use of all the node types
weakConnDFS.AddAllNodeTypes();
// Don't set a materialized attribute
// Calculate the weakly connected components
weakConnDFS.Run();
// Get the connected components
ConnectedComponents weakCC = weakConnDFS.GetConnectedComponents();
long numWeakComponents = weakCC.GetCount();
System.Console.WriteLine("Weakly connnected componennts: "+numWeakComponents);
for (long ii = 0; ii < weakCC.GetCount(); ii++)
{
Objects ccNodes = weakCC.GetNodes(ii);
long numNodes = ccNodes.Count();
System.Console.WriteLine("Connected component "+ii+" has "+numNodes+" nodes.");
ccNodes.Close();
}
// Close the connected components
weakCC.Close();
// Close the WeakConnectivityDFS
weakConnDFS.Close();

Analogously the StrongConnectivityGabow method.

Finally do not forget to include the package when using the former methods! by adding:

Java:
import com.sparsity.dex.algorithms.*;

.Net:
using com.sparsity.dex.algorithms;

Posted in DEX, Documentation | Tagged , , , , , , , , , , , , , , , , , , | 2 Comments

Graph Database Use Case: Bibliographic exploration

Bibliographic exploration is an interesting use case for Graph Databases. Bibliographic exploration rises after the need to query huge bibliographic resources to obtain relevant information for researchers.

There are many questions that researchers try to ask to Bibliographic resources, but the vast amount of heterogeneous information stored in them makes it difficult to obtain good and fast answers.

Articles, its authors and the keywords that best describe those articles are stored in Bibliographic resources. This type of information is naturally linked, for instance authors are linked with other authors by the articles they have collaboratively written and at the same time articles may be connected with the keywords that are most relevant in them.

Graph Databases are a good solution to store huge amount of strongly connected information. Graph Databases store information the same way it is connected naturally; therefore answers can be retrieved directly without having to join all the data as it happens in SQL traditional databases.

The following figure is an example of how a Bibliographic resource may be stored in a Graph Database.


Capture from Bibex online demo

We can see in the illustration that authors are nodes in the graph, and they are connected by their collaboration in papers (edge). With a click on the edge of the graph you can obtain all the articles written together by both authors.


Capture from Bibex online demo

This type of query takes seconds to have a result in a Graph Database and could be relevant to new researchers, like PhD students, or researchers in a new area in order to investigate authors, the papers they have written, who they have collaborated with and about what topic areas.

If you want to play with this type of query in a graph, visit Bibex social network free demo. Bibex is able to process large quantities of data because it is powered by DEX Graph Database. The online demo stores the information from DBLP a well-known bibliographic resource, but it could use any other bibliographic source, even combining them.

Another interesting aspect about storing Bibliographic information with Graph Databases is the use of the citation metric. An article or an author can be considered to be of quality depending on both the number and the acknowledgment (quality) of the citations. Again Graph Databases are the most suitable technology to work with this metrics, since it would represent only retrieving the neighbors for a certain node “article”, that have the edge type “cite”:

//Once the DB is open

article = graph.findType("article");
title = graph.findAttribute(article, “title”);
www = graph.findObject(title, new Value(“The World-Wide Web.”));
cite = graph.findType("cite");
citations = graph.neighbors(www, cite, EdgesDirection.Ingoing)
articleQualityValue = citations.count();

//You should close here the DB
Example of code using DEX Graph Database Java API

Using citations we could answer questions like “Who is an authority in a certain topic?” or “Who is the most suitable reviewer for a certain paper?” The possibility to answer those new complex queries is what makes graph databases an excellent use case for bibliographic exploration.

Let’s conclude with the big pros of using Graph Databases for Bibliographic exploration:

  • Data sources with bibliographic information are huge and strongly connected. Graph Databases can store billions of objects and are specially created to store linked information.
  • Bibliographic exploration is more interesting if it merges as many sources as possible. Graph Databases can store data with heterogeneous schemas, like bibliographic repositories, publishers, patents, or any other source of information.
  • Researchers need to have answers as quick as possible, in order to have his/her efforts focused in its main topic of research. Graph Databases can query connected data in a few seconds, even for complex queries.
  • New complex questions can be easily answered using graph database ease to navigate through linked information.
Posted in Bibex, DEX | Tagged , , , , , , , , , , | 1 Comment

NEW Bibex demo

We would like to announce the release of a complete free demo for the Bibex social network query.

Click here to launch Bibex demo

Bibex resolves multiples queries that help retrieving very relevant information for researchers in very short responding times. Bibex uses DEX graph database to resolve questions like “who is the most important authority in a certain subject?” in a few seconds. Moreover as Bibex results are shown in a network it is easy to jump between articles, keywords and authors while navigating the answer. Read more information about Bibex here .

Bibex demo is able to resolve the “Social network” query available in Bibex.

You will be able to search for the social network of any author*, retrieve their curriculum, relationships & statistics. In addition you can jump to the publication source of each article with just one click. Check how Bibex resolves this query in our Bibex demo here.

Bibex social network has an intuitive web interface. Once you click on this link, a search box will be shown. There you can search for any author name you think of*. For instance, in the following example we search for Tim Berners-Lee. Search box has an autocomplete facility to help you discern faster the exact author you are trying to search.

Once the search is performed, if there’s only one possible answer to your query the author curriculum and its social network are shown right away. If there are multiple answers a list of possible authors appear on the left in alphabetical order with the first author social network already loaded in the right side of the screen.

In that second case you must click on the name you were actually searching for to load the social network of the author.

Authors’ curriculum on the left contains a list of all their publications. They can be sorted by date, alphabetically or you can search inside the list as well. Selecting a publication and clicking in the “Go to URL” icon shown in the following picture jumps you to the original source of the publication.

Authors’ collaborations on the left contain a list of all the authors that have co-written some publication. It is interesting to sort this list by number of collaborations allowing discerning which have most strong relationships and therefore may be also of your interest.

Finally on the bottom of this left side there’s a statistic of the productivity of the author through time.

Another important part of the results are shown in the right side of the screen. The social network of the author can be navigated; discovering all the relationships.

Double clicking on another author’s name jumps you to that author information and social network. In addition, clicking two authors’ edge reveals a list of the publications written collaboratively by both authors.

Hope you enjoy Bibex, feel free to play and experience the smoothness and quickness of answer.

*Bibex demo searches in DBLP bibliographic database. The DBLP used for Bibex has an amount of 999,053 authors and 2,740,244 articles.

Posted in Bibex, News | Tagged , , , , , , , , , | Leave a comment

DEX Graph Database version 4.2 goes .NET

The possibilities of native .NET programming now for the highest performance graph database.

Now you can have all the scalability and performance of DEX graph database with a dedicated Microsoft .NET API in your secure and robust professional MS environment.

DEX new release comes with a completely renovated Java API and the brand new .NET API, for .NET languages programmers.

DEX does not forget compatibility with applications that use previous versions of its API, that’s why we not only give an easy migration guide but also offer an API for DEX 4.2 with compatibility with previous versions. Nevertheless, we recommend to all DEX programmers to start migrating to the new Java API, since we assure the process to be quite painless and quick and will guarantee the continuity of all their applications.

Posted in DEX, News | Tagged , , , , , , , , , , | Leave a comment