Sparksee User Manual

TinkerPop is a software developer community providing open source software products in the area of graphs.

As this is a software stack and Blueprints is in the bottom, those vendors providing an implementation for the Blueprints API are able to enable other elements from the stack for its users. Together with Sparksee, other graph vendors such as Neo4j, OrientDB, InfiniteGraph, Titan, MongoDB or Oracle NoSQL also provide an implementation for Blueprints.

The following sections describe the particularities of using some of the previously described TinkerPop software with Sparksee.

Blueprints

Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model. Blueprints would be the same as JDBC is for relational databases but for graph databases. It provides a common set of interfaces to allow developers to plug-and-play their graph database backend. Moreover, software written on top of Blueprints works over all Blueprints-enabled graph databases.

SparkseeGraph is the Sparksee-based implementation of the Blueprints Graph base interface. Specifically it implements the following interfaces from Blueprints:

To use Sparksee and Blueprints in a Maven-based application the user only has to add the following dependency to the Maven configuration file (pom.xml):

<dependency>
    <groupId>com.tinkerpop.blueprints</groupId>
    <artifactId>blueprints-sparksee-graph</artifactId>
    <version>X.X.X (*) </version>
</dependency>

(*) You should check Blueprint’s current version. Sparksee’s implementation is always up to date to the latest one.

When using Sparksee’s implementation some aspects regarding the type management, the sessions and the collections may be taken into account to obtain the maximum performance or to be able to use Sparksee-exclusive functionality.

Type management

There are some differences between the Blueprints property graph model and the Sparksee’s graph model:

SparkseeGraph solves these differences and allows any Blueprints application to work without needing to be conscious of this particularity. Moreover, it provides a way to get all Sparksee functionality from Blueprints as well. To work with types SparkseeGraph has a public ThreadLocal<String> label field which specifies the node type and updates the default behavior of some Blueprints APIs. It also includes the public ThreadLocal<Boolean> typeScope field which specifies if the attributes follows the blueprints property graph model or if otherwise you are restricting the attributes to specific Vertex/Edge type. For example it enables the possibility of:

Take into account that the attribute mode in Blueprints is mutually exclusive for example you can not see specific Vertex/Edge types if you are working with the pure blueprints property graph model. Thus, they can share the same name but contain different values.

Here is an example of use of the blueprints implementation for the creation of nodes that are of the type “PEOPLE”:

KeyIndexableGraph graph = new SparkseeGraph("blueprints_test.gdb");

Vertex v1 = graph.addVertex(null);
assert v1.getProperty(StringFactory.LABEL).equals(SparkseeGraph.DEFAULT_SPARKSEE_VERTEX_LABEL));

((SparkseeGraph) graph).label.set("people");
Vertex v2 = graph.addVertex(null);
assert v2.getProperty(StringFactory.LABEL).equals("people");
Vertex v3 = graph.addVertex(null);
assert v3.getProperty(StringFactory.LABEL).equals("people");
// v2 and v3 are two new vertices for the 'people' node type

((SparkseeGraph) graph).label.set("thing");
Vertex v4 = graph.addVertex(null);
assert v4.getProperty(StringFactory.LABEL).equals("thing");
// v4 is a new vertex for the 'thing' node type

((SparkseeGraph) graph).label.set("people");
graph.createKeyIndex("name", Vertex.class);
// 'name' is defined for the 'people' node type
((SparkseeGraph) graph).label.set("thing");
graph.createKeyIndex("name", Vertex.class);
// 'name' is defined for the 'thing' node type

v2.setProperty("name", "foo");
v3.setProperty("name", "boo");
// v2 and v3 are 'people' node, so 'people/name' is set

v4.setProperty("name", "foo");
// v4 is a 'thing' node, so 'thing/name' is set

((SparkseeGraph) graph).label.set("people");
int i = 0;
for(Vertex v : graph.getVertices("name", "foo")) {
    assert v.equals(v2);
    i++;
}
assert i == 1;

((SparkseeGraph) graph).label.set("thing");
i = 0;
for(Vertex v : graph.getVertices("name", "foo")) {
    assert v.equals(v4);
    i++;
}
assert i == 1;

((SparkseeGraph) graph).label.set("people");
int i = 0;
for(Vertex v : graph.getVertices()) {
    assert v.equals(v2) || v.equals(v3);
    i++;
}
assert i == 2;

((SparkseeGraph) graph).label.set(null);
int i = 0;
for(Vertex v : graph.getVertices()) {
    assert v.equals(v1) || v.equals(v2) || v.equals(v3) || v.equals(v4);
    i++;
}
assert i == 4;

// Create a specefic type attribute
((SparkseeGraph) graph).typeScope.set(true);
// This creates the attribute name restricted to the type thing.
// It does not overwrite the attribute value foo of the Vertex attribute also called name.
v4.setProperty("name", "boo");
// Restore the normal property graph behaviour behaviour
((SparkseeGraph) graph).typeScope.set(false);

Sessions

SparkseeGraph implements TransactionalGraph in order to manage sessions efficiently.

Any graph operation executed by a thread occurs in the context of a transaction (created automatically if there is not a transaction in progress), and each transaction manages its own session.

The TransactionalGraph enables concurrency of multiple threads for Sparksee’s implementation, since each thread has its own private transaction, and therefore a different Sparksee session.

So, when a transaction begins it starts a Sparksee session for the calling thread. This Sparksee session will be exclusively used by the thread until the transaction stops.

Graph graph = new SparkseeGraph(...);
Vertex v = graph.addVertex(null); // <-- Automatically creates a Sparksee session and starts a transaction 

//
// (...) More operations inside the transaction
//

graph.commit(); // <-- Closes Sparksee session and the transaction

Collections

All Sparksee collections are wrapped in the SparkseeIterable class which implements the CloseableIterable interface from the Blueprints API.

Since Sparksee is a C++ native implementation its resources are not managed by the JVM heap. Thus, Sparksee-based blueprints applications should take into account that in order to obtain the maximum performance the collections must be explicitly closed. In the case the user does not explicitly close the collections, they will be automatically closed when the transaction is stopped (in fact, Sparksee session will be closed as well).

This would be an example of collections closed at the end of a transaction, which will have a penalty fee in performance:

for (final Vertex vertex : graph.getVertices()) {
    for (final Edge edge : vertex.getEdges(Direction.OUT)) {
        final Vertex vertex2 = edge.getVertex(Direction.IN);
        for (final Edge edge2 : vertex2.getEdges(Direction.OUT)) {
            ...
        }
    }
}
graph.commit();

To avoid this performance degradation, all retrieved collections from methods in the SparkseeGraph implementation should be closed as shown below:

CloseableIterable<Vertex> vv = (CloseableIterable<Vertex>)graph.getVertices();
for (final Vertex vertex : vv) {
    CloseableIterable<Edge> ee = (CloseableIterable<Edge>)vertex.getEdges(Direction.OUT);
    for (final Edge edge : ee) {
        final Vertex vertex2 = edge.getVertex(Direction.IN);
        CloseableIterable<Edge> ee2 = (CloseableIterable<Edge>)vertex2.getEdges(Direction.OUT);
        for (final Edge edge2 : ee2) {
            ...
        }
        ee2.close();
    }
    ee.close();
}
vv.close();

Gremlin

Gremlin is a graph traversal language which works over those graph databases/frameworks that implement the Blueprints property graph data model. It is a style of graph traversal that can be natively used in various JVM languages.

Installation of Gremlin is very easy: just download the version from Gremlin github, unzip and start the gremlin console from the script in the bin directory.

Once the console has been started, the user can instantiate a SparkseeGraph instance to use a Sparksee graph database through the Gremlin DSL like this:

$ wget http://tinkerpop.com/downloads/gremlin/gremlin-groovy-2.2.0.zip
$ unzip gremlin-groovy-2.2.0.zip
$ ./gremlin-groovy-2.2.0/bin/gremlin.sh 

         \,,,/
         (o o)
-----oOOo-(_)-oOOo-----
gremlin> g = new SparkseeGraph("./graph.gdb")
==>sparkseegraph[./graph.gdb]
gremlin> g.loadGraphML('gremlin-groovy-2.2.0/data/graph-example-2.xml')
==>null
gremlin> g.V.count()
==>809
gremlin> g.E.count()
==>8049
gremlin> g.V('name', 'HERE COMES SUNSHINE').map   
==>{name=HERE COMES SUNSHINE, song_type=original, performances=65, type=song}
gremlin> g.V('name', 'HERE COMES SUNSHINE').outE('written_by', 'sung_by')
==>e[3079][1053-written_by->1117]
==>e[4103][1053-sung_by->1032]
gremlin> g.V('name', 'HERE COMES SUNSHINE').outE('written_by', 'sung_by').inV.each{println it.map()}
[name:Hunter, type:artist]
[name:Garcia, type:artist]
gremlin> g.shutdown()
==>null
gremlin> quit

Rexster

Rexster is a multi-faceted graph server that exposes any Blueprints graph through several mechanisms with a general focus on REST. This HTTP web service provides standard low-level GET, POST, PUT, and DELETE methods, a flexible extensions model which allows plug-in like development for external services (such as ad hoc graph queries through Gremlin), server-side “stored procedures” written in Gremlin, and a browser-based interface called The Dog House. Rexster Console makes it possible to do remote script evaluation against configured graphs inside a Rexster Server.

Configuration is done in an XML file in the server side. Specifically for Sparksee it may be important to specify the location of the Sparksee configuration file (the sparksee.cfg file, here shown as sparksee.properties) to set some configuration parameters such as the license settings. Use the <config-file> property as shown in the example below:

<graph>
  <graph-name>sparkseesample</graph-name>
  <graph-type>sparkseegraph</graph-type>
  <graph-location>/tmp/mygraph.gdb</graph-location>
  <properties>
    <config-file>sparksee-config/sparksee.properties</config-file>
  </properties>
  <extensions>...</extensions>
</graph>

Pacer

Pacer is a JRuby library that enables expressive graph traversals. It currently supports two major graph databases, Sparksee and Neo4j, using the Tinkerpop graphdb stack. Plus there is also a convenient in-memory graph called TinkerGraph which is part of Blueprints.

Pacer allows the user to create, modify and traverse graphs using very fast and memory-efficient stream processing.

The following example shows how to get Pacer with Sparksee installed using Homebrew on a Mac OS X platform. Please take into account the fact that the user should adapt the example in the case of using a different platform. Also note that Pacer requires having JRuby v.1.7.0 previously installed.

$ brew update
...
$ brew  install jruby
==> Downloading http://jruby.org.s3.amazonaws.com/downloads/1.7.1/jruby-bin-1.7.1.tar.gz
######################################################################## 100.0%
/usr/local/Cellar/jruby/1.7.1: 1619 files, 29M, built in 16 seconds
$ jruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-10M3909 [darwin-x86_64]
$ jruby -S gem install pacer-sparksee
Fetching: pacer-1.1.1-java.gem (100%)
Fetching: pacer-sparksee-2.0.0-java.gem (100%)
Successfully installed pacer-1.1.1-java
Successfully installed pacer-sparksee-2.0.0-java
2 gems installed
$ jruby -S gem list --local

*** LOCAL GEMS ***

pacer (1.1.1 java)
pacer-sparksee (2.0.0 java)
rake (0.9.2.2)

Once it is installed we can work directly with the JRuby interpreter as follows: