TinkerPop is a software developer community providing open source software products in the area of graphs.
Software from this community includes:
Blueprints: Blueprints is a property graph model interface. It provides implementations, test suites, and supporting extensions. Graph databases and frameworks that implement the Blueprints interfaces automatically support Blueprints-enabled applications. Likewise, Blueprints-enabled applications can plug-and-play different Blueprints-enabled graph back-ends.
Pipes: Pipes is a dataflow framework that enables the splitting, merging, filtering, and transformation of data from input to output. Computations are evaluated in a memory-efficient, lazy fashion.
Gremlin: Gremlin is a domain specific language for traversing property graphs. This language has application in the areas of graph query, analysis, and manipulation.
Frames: Frames exposes the elements of a Blueprints graph as Java objects. Instead of writing software in terms of vertices and edges, with Frames, software is written in terms of domain objects and their relationships to each other.
Furnace: Furnace is a property graph algorithms package. It provides implementations for standard graph analysis algorithms that can be applied to property graphs in meaningful ways.
Rexster: Rexster is a multi-faceted graph server that exposes any Blueprints graph through several mechanisms with a general focus on REST.
As this is a software stack and Blueprints is in the bottom, those vendors providing an implementation for the Blueprints API are able to enable other elements from the stack for its users. Together with Sparksee, other graph vendors such as Neo4j, OrientDB, InfiniteGraph, Titan, MongoDB or Oracle NoSQL also provide an implementation for Blueprints.
The following sections describe the particularities of using some of the previously described TinkerPop software with Sparksee.
Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model. Blueprints would be the same as JDBC is for relational databases but for graph databases. It provides a common set of interfaces to allow developers to plug-and-play their graph database backend. Moreover, software written on top of Blueprints works over all Blueprints-enabled graph databases.
SparkseeGraph
is the Sparksee-based implementation of the Blueprints Graph
base interface. Specifically it implements the following interfaces from Blueprints:
KeyIndexableGraph
TransactionalGraph
MetaGraph<com.sparsity.sparksee.gdb.Graph>
To use Sparksee and Blueprints in a Maven-based application the user only has to add the following dependency to the Maven configuration file (pom.xml
):
<dependency>
<groupId>com.tinkerpop.blueprints</groupId>
<artifactId>blueprints-sparksee-graph</artifactId>
<version>X.X.X (*) </version>
</dependency>
(*) You should check Blueprint’s current version. Sparksee’s implementation is always up to date to the latest one.
When using Sparksee’s implementation some aspects regarding the type management, the sessions and the collections may be taken into account to obtain the maximum performance or to be able to use Sparksee-exclusive functionality.
There are some differences between the Blueprints property graph model and the Sparksee’s graph model:
SparkseeGraph
solves these differences and allows any Blueprints application to work without needing to be conscious of this particularity. Moreover, it provides a way to get all Sparksee functionality from Blueprints as well. To work with types SparkseeGraph
has a public ThreadLocal<String> label
field which specifies the node type and updates the default behavior of some Blueprints APIs. It also includes the public ThreadLocal<Boolean> typeScope
field which specifies if the attributes follows the blueprints property graph model or if otherwise you are restricting the attributes to specific Vertex/Edge type. For example it enables the possibility of:
Take into account that the attribute mode in Blueprints is mutually exclusive for example you can not see specific Vertex/Edge types if you are working with the pure blueprints property graph model. Thus, they can share the same name but contain different values.
Here is an example of use of the blueprints implementation for the creation of nodes that are of the type “PEOPLE”:
KeyIndexableGraph graph = new SparkseeGraph("blueprints_test.gdb");
Vertex v1 = graph.addVertex(null);
assert v1.getProperty(StringFactory.LABEL).equals(SparkseeGraph.DEFAULT_SPARKSEE_VERTEX_LABEL));
((SparkseeGraph) graph).label.set("people");
Vertex v2 = graph.addVertex(null);
assert v2.getProperty(StringFactory.LABEL).equals("people");
Vertex v3 = graph.addVertex(null);
assert v3.getProperty(StringFactory.LABEL).equals("people");
// v2 and v3 are two new vertices for the 'people' node type
((SparkseeGraph) graph).label.set("thing");
Vertex v4 = graph.addVertex(null);
assert v4.getProperty(StringFactory.LABEL).equals("thing");
// v4 is a new vertex for the 'thing' node type
((SparkseeGraph) graph).label.set("people");
graph.createKeyIndex("name", Vertex.class);
// 'name' is defined for the 'people' node type
((SparkseeGraph) graph).label.set("thing");
graph.createKeyIndex("name", Vertex.class);
// 'name' is defined for the 'thing' node type
v2.setProperty("name", "foo");
v3.setProperty("name", "boo");
// v2 and v3 are 'people' node, so 'people/name' is set
v4.setProperty("name", "foo");
// v4 is a 'thing' node, so 'thing/name' is set
((SparkseeGraph) graph).label.set("people");
int i = 0;
for(Vertex v : graph.getVertices("name", "foo")) {
assert v.equals(v2);
i++;
}
assert i == 1;
((SparkseeGraph) graph).label.set("thing");
i = 0;
for(Vertex v : graph.getVertices("name", "foo")) {
assert v.equals(v4);
i++;
}
assert i == 1;
((SparkseeGraph) graph).label.set("people");
int i = 0;
for(Vertex v : graph.getVertices()) {
assert v.equals(v2) || v.equals(v3);
i++;
}
assert i == 2;
((SparkseeGraph) graph).label.set(null);
int i = 0;
for(Vertex v : graph.getVertices()) {
assert v.equals(v1) || v.equals(v2) || v.equals(v3) || v.equals(v4);
i++;
}
assert i == 4;
// Create a specefic type attribute
((SparkseeGraph) graph).typeScope.set(true);
// This creates the attribute name restricted to the type thing.
// It does not overwrite the attribute value foo of the Vertex attribute also called name.
v4.setProperty("name", "boo");
// Restore the normal property graph behaviour behaviour
((SparkseeGraph) graph).typeScope.set(false);
SparkseeGraph
implements TransactionalGraph
in order to manage sessions efficiently.
Any graph operation executed by a thread occurs in the context of a transaction (created automatically if there is not a transaction in progress), and each transaction manages its own session.
The TransactionalGraph
enables concurrency of multiple threads for Sparksee’s implementation, since each thread has its own private transaction, and therefore a different Sparksee session.
So, when a transaction begins it starts a Sparksee session for the calling thread. This Sparksee session will be exclusively used by the thread until the transaction stops.
Graph graph = new SparkseeGraph(...);
Vertex v = graph.addVertex(null); // <-- Automatically creates a Sparksee session and starts a transaction
//
// (...) More operations inside the transaction
//
graph.commit(); // <-- Closes Sparksee session and the transaction
All Sparksee collections are wrapped in the SparkseeIterable
class which implements the CloseableIterable
interface from the Blueprints API.
Since Sparksee is a C++ native implementation its resources are not managed by the JVM heap. Thus, Sparksee-based blueprints applications should take into account that in order to obtain the maximum performance the collections must be explicitly closed. In the case the user does not explicitly close the collections, they will be automatically closed when the transaction is stopped (in fact, Sparksee session will be closed as well).
This would be an example of collections closed at the end of a transaction, which will have a penalty fee in performance:
for (final Vertex vertex : graph.getVertices()) {
for (final Edge edge : vertex.getEdges(Direction.OUT)) {
final Vertex vertex2 = edge.getVertex(Direction.IN);
for (final Edge edge2 : vertex2.getEdges(Direction.OUT)) {
...
}
}
}
graph.commit();
To avoid this performance degradation, all retrieved collections from methods in the SparkseeGraph
implementation should be closed as shown below:
CloseableIterable<Vertex> vv = (CloseableIterable<Vertex>)graph.getVertices();
for (final Vertex vertex : vv) {
CloseableIterable<Edge> ee = (CloseableIterable<Edge>)vertex.getEdges(Direction.OUT);
for (final Edge edge : ee) {
final Vertex vertex2 = edge.getVertex(Direction.IN);
CloseableIterable<Edge> ee2 = (CloseableIterable<Edge>)vertex2.getEdges(Direction.OUT);
for (final Edge edge2 : ee2) {
...
}
ee2.close();
}
ee.close();
}
vv.close();
Gremlin is a graph traversal language which works over those graph databases/frameworks that implement the Blueprints property graph data model. It is a style of graph traversal that can be natively used in various JVM languages.
Installation of Gremlin is very easy: just download the version from Gremlin github, unzip and start the gremlin console from the script in the bin
directory.
Once the console has been started, the user can instantiate a SparkseeGraph
instance to use a Sparksee graph database through the Gremlin DSL like this:
$ wget http://tinkerpop.com/downloads/gremlin/gremlin-groovy-2.2.0.zip
$ unzip gremlin-groovy-2.2.0.zip
$ ./gremlin-groovy-2.2.0/bin/gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin> g = new SparkseeGraph("./graph.gdb")
==>sparkseegraph[./graph.gdb]
gremlin> g.loadGraphML('gremlin-groovy-2.2.0/data/graph-example-2.xml')
==>null
gremlin> g.V.count()
==>809
gremlin> g.E.count()
==>8049
gremlin> g.V('name', 'HERE COMES SUNSHINE').map
==>{name=HERE COMES SUNSHINE, song_type=original, performances=65, type=song}
gremlin> g.V('name', 'HERE COMES SUNSHINE').outE('written_by', 'sung_by')
==>e[3079][1053-written_by->1117]
==>e[4103][1053-sung_by->1032]
gremlin> g.V('name', 'HERE COMES SUNSHINE').outE('written_by', 'sung_by').inV.each{println it.map()}
[name:Hunter, type:artist]
[name:Garcia, type:artist]
gremlin> g.shutdown()
==>null
gremlin> quit
An alternative is to clone the source project from the original repository.
Rexster is a multi-faceted graph server that exposes any Blueprints graph through several mechanisms with a general focus on REST. This HTTP web service provides standard low-level GET, POST, PUT, and DELETE methods, a flexible extensions model which allows plug-in like development for external services (such as ad hoc graph queries through Gremlin), server-side “stored procedures” written in Gremlin, and a browser-based interface called The Dog House. Rexster Console makes it possible to do remote script evaluation against configured graphs inside a Rexster Server.
Configuration is done in an XML file in the server side. Specifically for Sparksee it may be important to specify the location of the Sparksee configuration file (the sparksee.cfg file, here shown as sparksee.properties) to set some configuration parameters such as the license settings. Use the <config-file>
property as shown in the example below:
<graph>
<graph-name>sparkseesample</graph-name>
<graph-type>sparkseegraph</graph-type>
<graph-location>/tmp/mygraph.gdb</graph-location>
<properties>
<config-file>sparksee-config/sparksee.properties</config-file>
</properties>
<extensions>...</extensions>
</graph>
Pacer is a JRuby library that enables expressive graph traversals. It currently supports two major graph databases, Sparksee and Neo4j, using the Tinkerpop graphdb stack. Plus there is also a convenient in-memory graph called TinkerGraph which is part of Blueprints.
Pacer allows the user to create, modify and traverse graphs using very fast and memory-efficient stream processing.
The following example shows how to get Pacer with Sparksee installed using Homebrew on a Mac OS X platform. Please take into account the fact that the user should adapt the example in the case of using a different platform. Also note that Pacer requires having JRuby v.1.7.0 previously installed.
$ brew update
...
$ brew install jruby
==> Downloading http://jruby.org.s3.amazonaws.com/downloads/1.7.1/jruby-bin-1.7.1.tar.gz
######################################################################## 100.0%
/usr/local/Cellar/jruby/1.7.1: 1619 files, 29M, built in 16 seconds
$ jruby -v
jruby 1.7.1 (1.9.3p327) 2012-12-03 30a153b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-10M3909 [darwin-x86_64]
$ jruby -S gem install pacer-sparksee
Fetching: pacer-1.1.1-java.gem (100%)
Fetching: pacer-sparksee-2.0.0-java.gem (100%)
Successfully installed pacer-1.1.1-java
Successfully installed pacer-sparksee-2.0.0-java
2 gems installed
$ jruby -S gem list --local
*** LOCAL GEMS ***
pacer (1.1.1 java)
pacer-sparksee (2.0.0 java)
rake (0.9.2.2)
Once it is installed we can work directly with the JRuby interpreter as follows:
$ jirb
irb(main):001:0> require 'pacer-sparksee'
=> true
irb(main):002:0> sparksee = Pacer.sparksee '/tmp/sparksee_demo'
=> #<PacerGraph sparkseegraph[/tmp/sparksee_demo]
irb(main):003:0> pangloss = sparksee.create_vertex :name => 'pangloss', :type => 'user'
=> #<V[1024]>
irb(main):004:0> okram = sparksee.create_vertex :name => 'okram', :type => 'user'
=> #<V[1025]>
irb(main):005:0> group = sparksee.create_vertex :name => 'Tinkerpop', :type => 'group'
=> #<V[1026]>
irb(main):006:0> sparksee.v
#<V[1024]> #<V[1025]> #<V[1026]>
Total: 3
=> #<GraphV>
irb(main):007:0> sparksee.v.properties
{"name"=>"pangloss", "type"=>"user"} {"name"=>"okram", "type"=>"user"} {"name"=>"Tinkerpop", "type"=>"group"}
Total: 3
=> #<GraphV -> Obj-Map>
irb(main):008:0> sparksee.create_edge nil, okram, pangloss, :inspired
=> #<E[2048]:1025-inspired-1024>
irb(main):009:0> sparksee.e
#<E[2048]:1025-inspired-1024>
Total: 1
=> #<GraphE>
irb(main):010:0> group.add_edges_to :member, sparksee.v(:type => 'user')
#<E[3072]:1026-member-1024> #<E[3073]:1026-member-1025>
Total: 2
=> #<Obj 2 ids -> lookup -> is_not(nil)>
irb(main):011:0> sparksee.e
#<E[2048]:1025-inspired-1024> #<E[3072]:1026-member-1024> #<E[3073]:1026-member-1025>
Total: 3
=> #<GraphE>
irb(main):012:0> quit
Check the documentation for a detailed explanation of the use of Pacer.