Graph database

General concepts

In mathematics, a graph is an abstract representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges. Typically, a graph is represented in diagrammatic form as a set of dots for the vertices, joined by lines or curves for the edges. The figure below is an example of this concept.

Figure 2.0: Graph

Figure 2.0: Graph

Vertices are also referred to as nodes and their properties are often called attributes. For the remainder of the document, graphs will be composed of nodes, edges and attributes.

Sparksee graph database

Sparksee is an embedded graph database management system tightly integrated with the application at code level. As a graph database, stored data is modeled as a graph.

Unlike relational databases where the data model is standard, graph database vendors propose different versions of the graph data model according to the description of a graph as explained in the previous section.

Graph data model

The Sparksee graph model is based on a generalization of the graph concept which can be defined as a labeled attributed multigraph. In Sparksee we refer to the label as the type.

These are its main features:

This data model is more suitable for modeling complex scenarios such as the one in the Figure 2.1 which could be hardly represented using the simplest graph model. In Figure 2.1, there are two types of nodes (PEOPLE represented by a star icon and MOVIE shown as a clapperboard icon) both of which have an attribute (called respectively Name and Title) as well as a value. For instance the Scarlett Johansson (Name) node belongs to the PEOPLE type (star icon). Also there are two types of edges (DIRECTS shown in blue and CAST shown in orange). CAST (between PEOPLE and MOVIE) has an attribute called Character. Moreover, whereas DIRECTS is a directed edge, as it has an arrow pointing to its head node, CAST is an undirected edge type. More attributes could be added to both node and edge objects. Displaying the multigraph property, the Woody Allen node and Manhattan node are linked by two different edges.

Figure 2.1: Sparksee multigraph

Figure 2.1: Sparksee multigraph

Types

Nodes and edges in Sparksee must be of a certain type.

All Sparksee types are identified by a public user-provided unique name, the type name, and an immutable Sparksee-generated unique identifier, the type identifier. The type identifier is used to refer the type when using Sparksee APIs as is explained in the 'Nodes and edges section of the 'API' chapter.

In Figure 2.1 the types created are PEOPLE and MOVIES (node types) and CAST and DIRECTS (edge types). Note that we refer to the types with their type name.

Node and edges

Sparksee objects are node or edge instances of a certain type. When they are created they are given an immutable Sparksee-generated unique identifier, the object identifier (OID). The OID is used to refer the object when using Sparksee APIs as is explained in the 'Nodes and edges' section from the 'API' chapter.

Nodes and edges must belong to a certain type and may have attributes.

In Figure 2.1, 18 objects (9 nodes and 9 edges) are displayed.

Attributes

Sparksee attributes are identified by a unique public user-provided name, the attribute name, and an immutable Sparksee-generated unique identifier, the attribute identifier. As in the case of type identifiers, an attribute identifier is used to refer the attribute when using Sparksee APIs as is explained in the 'Attributes and values' section of the 'API' chapter.

Sparksee considers the following kind of attributes:

Sparksee attributes are defined for a domain or data type; all values of an attribute belong to a specified data type with the exception of the null value, which does not belong to any data type. Valid Sparksee data types are:

Moreover, Sparksee attributes are univalued, which means that an object (node or edge) can only have one value for an attribute. Note that null may also be that value.

Figure 2.2 shows the attributes extracted from Figure 2.1. PEOPLE nodes have an Id and Name, MOVIES an Id and a Title and edges of type CAST have an attribute Character showing the name of the character of that actor in the movie. Note that we refer to the attributes by the attribute name.

Figure 2.2: Sparksee attributes

Figure 2.2: Sparksee attributes

Indexing

Attributes

Different index capabilities can be set for each Sparksee attribute. Depending on these capabilities there are three types of attributes:

Sparksee operations accessing the graph through an attribute will automatically use the defined index, significantly improving the performance of the operation. Note that only a single index can be associated to an attribute.

Edges

A specific index can also be defined to improve certain navigational operations. Thus, the neighbor index can be set for an specific edge type to be used automatically by the neighbor API (see the 'Navigation operations' section of the 'API' chapter) significantly improving the performance of this operation.

Processing

A Sparksee-based application is able to manage more than one database, each of them working independently. It is important to keep in mind that a single database can be accessed by a single application or process at a time. Also the connection (open) to the database can only be made once.

Access to the database must be enclosed within a session, and multiple sessions can concurrently access the same database.

Sessions

A session is a stateful period of a user's activity with a database; it can also be described as an instance of database usage.

Whereas a database can be shared among multiple threads, a session cannot because it is not thread-safe. Also all manipulation of a database must be enclosed into a session. A graph can only be operated inside a session.

Session responsibilities include management of transactions and temporary data.

Figure 2.3: Sparksee application architecture

Figure 2.3: Sparksee application architecture

Figure 2.3 shows a representation of a basic Sparksee-based application architecture where the application can manage multiple databases, each of them accessed by multiple threads and each handling a session.

Transactions

A Sparksee transaction encloses a set of operations and defines the granularity level for the concurrent execution of sessions.

There are two types of transactions: Read or Shared, and Write or Exclusive. Sparksee's concurrency model is based on the N-readers 1-writer model, meaning that multiple read transactions can be executed concurrently whereas write transactions are executed exclusively.

When a transaction starts with the 'begin' instruction becomes self-defined for the operations it contains. Initially, a transaction starts as a read transaction and if a method updates the persistent graph database then it automatically becomes a write transaction. To become a write transaction all other read transactions must have finished first. You can also directly start a write transaction by using the 'beginUpdate' instruction instead. That will avoid any possible lost update problem; but keep in mind Sparksee's concurrecny model when creating this type of transactions.

Users can manage transactions in two different ways:

Explicit use of transactions may improve the performance of concurrently executed sessions, so it is highly recommended.

Temporary data

Some operations may require the use of temporary data. This temporary data is automatically managed by the session removing it when the session is closed. For this reason, temporary data may also be referred as Session data.

Large collections of object identifiers, its iterators and session attributes are examples of temporary data.

Session attributes are a further example of temporary data. Whereas attributes are persistent in the graph database, session attributes are temporary and exclusive for a session:

Back to Index