Scripting

Very common operations such as creating the schema or loading data from a CSV file can alternatively be performed by using Sparksee scripts.

A Sparksee script is a plain-text file which contains a set of commands. Each of these commands is interpreted and executed by a parser. Different alternatives for processing a Sparksee script are explained in the 'Scripting' section of the 'API' chapter.

A Sparksee script has the following features:

This chapter explains valid Sparksee script commands and gives examples for each of them.

Schema definition

To define the schema of a Sparksee database the scripts include commands in order to create every element on the schema, such as the database itself and the types and attributes.

Create and open

These are the commands to create and open a Sparksee database:

CREATE GDB alias INTO filename

OPEN GDB alias INTO filename

Note that CREATE also opens the database after its creation. Also, both commands close any other previously opened database.

Examples of use, take into account that everything after the '#' character is a description of the command:

open gdb "My graph database" into "database.gdb" # opens 'database.gdb'
create gdb MyGDB into '/data/mygdb.dbg' # closes 'database.gdb' and opens 'mygdb.dbg'

The rest of commands require having a database open so this should be the first step in the scripts.

Types and attributes

The user can create a new node type with the following command. Attributes may also be created in this step, or added later with the specific commands for them:

CREATE NODE type_name "(" 
    [attribute_name 
        (INTEGER|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) 
        [BASIC|INDEXED|UNIQUE] 
        [DEFAULT value], 
    ...]
")"

This is an example of a valid node type creation:

create node PEOPLE (
    ID string unique,
    NAME string indexed,
    BIRTH timestamp,
    AGE integer basic default 0
)

This is the command to create an edge type and, optionally, its attributes:

CREATE [UNDIRECTED] EDGE type_name 
[FROM tail_node_type_name TO head_node_type_name] "("
    [attribute_name 
        (INTEGER|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) 
        [BASIC|INDEXED|UNIQUE] 
        [DEFAULT value], 
    ...] 
") [MATERIALIZE NEIGHBORS]"

Edge type creation has the same features as in the case of node types but in addition it includes the following parameters:

This is an example of a valid edge type creation:

create edge KNOWS 
from PEOPLE to PEOPLE (
    SINCE timestamp indexed,
    WHERE string default "Barcelona, ES"
) materialize neighbors

An alternative to specifying attributes would be creating them afterwards, using the specific command for that:

CREATE ATTRIBUTE [GLOBAL|NODES|EDGES] [type_name.]attribute_name 
    (INTEGER|LONG|DOUBLE|STRING|BOOLEAN|TIMESTAMP|TEXT) 
    [BASIC|INDEXED|UNIQUE] 
    [DEFAULT value]

In this case, type_name references the parent node or edge type for the new attribute. When it is omitted and the scope is not specified the attribute is created as global.

The same attributes from the previous examples could have also been created as follows:

create node PEOPLE 
create attribute PEOPLE.ID string unique
create attribute PEOPLE.NAME string indexed
create attribute PEOPLE.BIRTH timestamp
create attribute PEOPLE.AGE  integer basic default 0

create edge KNOWS from PEOPLE to PEOPLE materialize neighbors
create edge KNOWS.SINCE timestamp indexed
create edge KNWOS.WHERE string default "Barcelona, ES"

And with the following commands, they could be created as attributes defined for a node (for all the nodes of the graph), edge (for all the edges of the graph) or global scope (for all nodes and edges of the graph):

create attribute IDENT string unique
create attribute global DESCRIPTION string unique
create attribute nodes NICKNAME string unique
create attribute edges WEIGHT double

In addition, the default value or the index specification can be updated after the attribute definition with the following commands:

SET ATTRIBUTE [GLOBAL|NODES|EDGES] [type_name.]attribute_name DEFAULT value

INDEX [type_name.]attribute_name [INDEXED|UNIQUE|BASIC]

For example, the default value for the previous IDENT global attribute and the index specification of the previous WHERE attribute could be updated as follows:

set attribute IDENT default ""
set attribute global DESCRIPTION default ""
set attribute nodes NICKNAME default ""
set attribute edges WEIGHT default 0.0

index KNOWS.WHERE basic

Moreover, existing node or edge types and attributes can be removed from the database with the following commands respectively:

DROP (NODE|EDGE) type_name

DROP ATTRIBUTE [GLOBAL|NODES|EDGES] [type_name.]attribute_name

Thus, we could remove all previously added types and attributes as follows:

drop attribute IDENT
drop attribute global DESCRIPTION
drop attribute nodes NICKNAME
drop attribute edges WEIGHT

drop attribute PEOPLE.ID
drop attribute PEOPLE.NAME
drop attribute PEOPLE.BIRTH
drop attribute PEOPLE.AGE
drop node PEOPLE

drop attribute KNOWS.SINCE
drop attribute KNWOS.WHERE
drop edge KNOWS

Load

Sparksee provides a couple of commands to easily load data from a CSV file into node or edge types that have been previously created in the graph.

The processing of the CSV file assumes that:

Load nodes

This is the command to load data from a CSV file into a node type:

LOAD NODES file_name 
    [LOCALE locale_name] 
    COLUMNS column_name [alias_name], ... 
    INTO node_type_name 
    [IGNORE (column_name|alias_name), ....] 
    [FIELDS 
        [TERMINATED char] 
        [ENCLOSED char] 
        [ALLOW_MULTILINE [num]]]
    [FROM num] 
    [MAX num] 
    [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])] 
    [LOG (OFF|ABORT|logfile)]

With this CSV file:

ID; NAME; SALARY; AGE
1; name1; 1800; 18
2; name2; 1600; 16
3; name3; 2000; 20
4; name4; 2200; 22

The loading of "PEOPLE" nodes using Sparksee scripts would look like the example below:

create gdb FRIENDS into "friends.dbg"

create node PEOPLE 
create attribute PEOPLE.ID string unique
create attribute PEOPLE.NAME string indexed
create attribute PEOPLE.AGE  integer basic default 0

load nodes "people.csv"
    columns ID, NAME, *, AGE
    into PEOPLE
    fields terminated ;
    from 1
    log "people.csv.log"

Load edges

This is the command to load data from a CSV file into an edge type:

LOAD EDGES file_name 
    [LOCALE locale_name]
    COLUMNS column_name [alias_name], ... 
    INTO node_type_name 
    [IGNORE (column_name|alias_name), ....]
    WHERE
        TAIL (column_name|alias_name) = node_type_name.attribute_name
        HEAD (column_name|alias_name) = node_type_name.attribute_name
    [FIELDS
        [TERMINATED char]
        [ENCLOSED char]
        [ALLOW_MULTILINE [num]]]
    [FROM num]
    [MAX num]
    [MODE (ROWS|COLUMNS [SPLIT [PARTITIONS num]])]
    [LOG (OFF|ABORT|logfile)]

Most of the features are the same as in the case of loading nodes, with the exception of the following WHERE elements, which are required to specify the source (tail) and destination (head) of the edges:

This is an example of a Sparksee script to load edges from a CSV file with Sparksee scripts. It assumes that "PEOPLE" nodes have already been created:

1;2;2012-01-01;'Barcelona'
2;3;2011-05-01;'NYC,USA'
3;4;2012-01-01;'Paris,France'
2;1;2012-01-01;'Barcelona'
3;2;2011-05-01;'NYC,USA'

In the example, 'tail tailId = PEOPLE.ID' sets the value referenced by the 'TailId' column as the "PEOPLE" "ID" attribute. The value is used to find a "PEOPLE" node which is the tail of the edge (because the attribute "ID" matches). For instance, the first row created would be an edge that goes from a "PEOPLE" node with ID=2 to the "PEOPLE" node with ID=1.

open gdb FRIENDS into "friends.dbg"

create edge KNOWS from PEOPLE to PEOPLE materialize neighbors
create edge KNOWS.SINCE timestamp indexed
create edge KNWOS.WHERE string default "Barcelona, ES"

load edges "knows.csv"
    columns headId, tailId, SINCE, WHERE
    into KNOWS
    ignore headId, tailId
    where
        tail tailId=PEOPLE.ID
        head headId=PEOPLE.ID
    fields terminated ; enclosed '
    log off

Other

In order to be able to create attributes with a timestamp the following command must be specified in the scripts:

SET TIMESTAMP FORMAT timestamp_format_string

Valid format fields are:

So, valid timestamp formats may be: MM/dd/yy or MM/dd/yyyy-hh.mm. If not specified, the parser automatically tries to match any of the following timestamp formats:

Valid commands to set a timestamp would be:

set timestamp format "MM/dd/yy"
set timestamp format "MM/dd/yyyy-hh.mm"
Back to Index