SCOMA Definition Language

Early Draft - 2009/07/12

Author:
Giovanni Pirrotta (Department of Mathematics, University of Messina, Italy)

TripliSty tool has been created in order to automate mapping between SCOVO ontology and statistical data. To work correctly, TripliSty tool needs a mapping file defined following the SCOMA (SCOma MApping) ontology definition language as described in next sections.

In our examples, imagine we want to to map the following table representing the number of tourists of some cities in the 2008 (sample data)

ScomaConfig

A scm:ScomaConfig defines all the datasets to map and the informations about the URIs schema.

Properties

scm:hasDataset The dataset(s) to map. (Mandatory)
scm:uriHome Defines the URI Home. (Mandatory) Example: http://example.org/myproject
scm:uriSchemaHome Defines the URI Schema Home. (Mandatory) Example: http://example.org/myproject/schema
scm:uriDimensionHome Defines the URI Dimension Home. (Mandatory) Example: http://example.org/myproject/dimension
scm:uriDatasetHome Defines the URI Dataset Home. (Mandatory) Example: http://example.org/myproject/dataset

Example:
# Definition of the class "ScomaConfig"
map:scomacfg   a                    scm:ScomaConfig;               
               scm:hasDataset       map:mydataset;
               scm:uriHome          "http://localhost/triplisty_example/";
               scm:uriSchemaHome    "http://localhost/triplisty_example/schema/";
               scm:uriDimensionHome "http://localhost/triplisty_example/dimension/";
               scm:uriDatasetHome   "http://localhost/triplisty_example/dataset/"; 
               .

Dataset

A scm:Dataset defines the schema of the dataset to map.

Properties

scm:sourceType Defines the input source type. Availables: db, xls, csv. (Mandatory)
scm:hasSource Defines the input source. (Mandatory)
scm:hasKey Defines the schema key(s) column (Mandatory)
scm:hasField Defines the schema field(s) column in the schema. (Mandatory)
scm:uriDataset Defines the Dataset URI (Mandatory)
scm:title Defines the Dataset title (optional)
scm:label Defines the Dataset label (optional)
scm:hasDimension Defines the dataset dimension(s). This dimension will be applied to each dataset item .
scm:sizeDataset Defines a limit for the dataset size. Useful for dataset very large. (To implement)
scm:externalTriples Defines the function name to invoke in the Item generating procedure. The function must be present in triplisty.functions module and you have to use it if you want to generate triples to add on-the-fly to the dataset (ie. link with external dataset, geonames, dbpedia, etc.). See DimensionInstance example below

Example:
# Definition of the class "Dataset"

map:mydataset  a                    scm:Dataset;			      
               scm:sourceType       "db";               
               scm:hasSource        map:mysource;
               scm:hasKey           map:key;               
               scm:hasField         map:field1;               
               scm:hasField         map:field2;
               scm:uriDataset       "http://localhost/triplisty_example/dataset/tourists";
               scm:title            "Dataset: tourists";
               scm:label            "Dataset: tourists";
               scm:hasDimension     map:dim_2008;
               .

Database

A scm:Database defines the database input source

Properties

scm:dbType Defines the type of database. Availables: mysql, postgres. (Mandatory)
scm:dbName Defines the database name. (Mandatory)
scm:hostname Defines the hostname (Mandatory)
scm:username Defines the username. (Mandatory)
scm:password Defines the password (Mandatory)
scm:table Defines the table(s) (Mandatory)
scm:condition Defines the condition in the WHERE clause (Optional)
scm:join Defines the join between tables (Optional)
scm:extra Defines an extra string to add in the query. (Example: LIMIT...ORDER BY....DESC...)

Example:
# Definition of the class "Database"

map:mysource   a                    scm:Database;
               scm:dbType           "mysql";
               scm:dbName           "scoma";
               scm:hostname         "localhost";
               scm:table            "tourists";
               scm:username         "root";
               scm:password         "mypassword":
               .

XLSFile

A scm:XLSFile defines the XLS file input source

Properties

scm:pathfile Defines the path of the file to load. (Mandatory)
scm:indexSheet Defines the index sheet to select into the xls file. (Optional - default: 0)
scm:nameSheet Defines the name sheet to select into the xls file (Optional)
scm:startRow Defines the first row. (Optional - default: 1)
scm:endRow Defines the last row (Optional)

Example:
# Definition of the class "XLSFile"
map:mysource   a                scm:XLSFile;
               scm:pathfile         "C://scoma/table.xls";
               scm:indexSheet       "0"; # default 0
               # or scm:nameSheet   "sheet1"; (alternative to indexSheet)
               scm:startRow         "1"; #default 1
               scm:endRow           "10";
               .	

CSVFile

A scm:CSVFile defines the CSV file input source

Properties

scm:pathfile Defines the path of the file to load. (Mandatory)
scm:fieldsEnclosedBy Defines the fields delimiter (Optional - default: ";")
scm:linesTerminatedBy Defines the line terminator (Optional - default: "\n")

Example:
# Definition of the class "CSVFile"
map:mysource   a                       scm:CSVFile;
               scm:pathfile            "C://scoma/table.csv";
               scm:fieldsEnclosedBy    "\t";   #default ";"
               scm:linesTerminatedBy   "\n";   #default "\n" 
               .	

Key

A scm:Key defines the key column in the schema table

Properties

scm:column Defines the column to map in the table (Mandatory)
scm:hasDimension Defines the key dimension. This dimension will be applied to each item belonging to the same row. (Mandatory - exactly one)
scm:order Defines the table column position. (Mandatory)
scm:transformFunction Defines the name of the function to apply in the Item generating procedure. The function must be present in triplisty.functions module. This property is useful when you want transform the key value (i.e. if the value in the table is "City of Rome", you could specify a function to replace the spaces with underscores in order to export "City_of_Rome" value). See example below.

Example:
# Definition of the class "Key"

map:key        a                      scm:Key;               
               scm:column             "city";    # A,B,C for XLS files, 1,2,3 for CSV files
               scm:hasDimension       map:dim_city;                        
               scm:order              "1";  
               scm:transformFunction  "replace_space"   
               .

For each key, the TripliSty tool will generate the item invoking on-the-fly the replace_space python function defined in the triplisty.functions:

def replace_space(self, value):
	return value.strip().replace(" ", "_")  

The value to transform is passed as argument, so you can convert it as you want.

Field

A scm:Field defines the field column in the schema table

Properties

scm:column Defines the column to map in the table (Mandatory)
scm:hasDimension Defines the column dimension(s). This dimension will be applied to the items belonging to the same column. (Mandatory - one or more)
scm:order Defines the table column position. (Mandatory)
scm:datatype Defines the datatype of the cell value (Optional - Availables: integer, float, string..See XML Schema)

Example:
# Definition of the class "Field"

map:field1     a                    scm:Field;
               scm:column           "male";  # A,B,C for XLS files, 1,2,3 for CSV files
               scm:hasDimension     map:dim_male;
               scm:datatype         "integer";
               scm:order            "2";
               .

DimensionInstance

A scm:DimensionInstance defines the SCOVO dimension instance properties

Properties

scm:hasDimensionType Defines the dimension instance type (of SCOVO Dimension subclasses) (Mandatory)
scm:uriDimensionInstance Defines the dimension instance URI (static or dynamic URIs) (Mandatory)
scm:uriPiece Defines the URI piece for the item generating procedure. (Mandatory) See example below.
scm:titlePiece Defines the title piece for the item generating procedure. (Optional) See example below.
scm:labelPiece Defines the label piece for the item generating procedure. (Optional) See example below.
scm:index Defines the piece index in the URI. Useful when we have multiple keys or multiple dimensions in the same field. (Mandatory)
scm:externalTriples Defines the function name to invoke in the rdf generating procedure. The function must be present in triplisty.functions module and you can use it when you want to generate on-the-fly triples to add to the dataset (ie. link with external dataset, geonames, dbpedia, etc.). See example below

In the dimension instance resources we must define how the TripliSty tool must build each single Item URI. We can have static and dynamic dimension.


Example of static dimension

map:dim_male   a                         scm:DimensionInstance;
               scm:hasDimensionType      :Genre;
               scm:uriDimensionInstance  "http://localhost/triplisty_example/dimension/genre/male";
               scm:uriPiece              "-m";
               scm:index                 "1";
               scm:titlePiece            ", genre: male";
               scm:labelPiece            ", genre: male";
               .

Each dataset and field dimension instances are examples of static dimensions. The uriDimensionInstance, the uriPiece, the titlePiece and the labelPiece values are static. In fact each uri dataset dimension instance will be applied to each Item belonging to the table and each uri field dimension instance will be applied to each Item belonging to the same column. The same for uriPiece, titlePiece and labelPiece.


Example of dynamic dimension

map:dim_city   a                          scm:DimensionInstance;
               scm:hasDimensionType       :City;
               scm:uriDimensionInstance   "http://localhost/triplisty_example/dimension/city/@@city@@";
               scm:uriPiece               "-@@city@@";
               scm:index                  "1";
               scm:titlePiece             ", city: @@city@@"; 
               scm:labelPiece             ", city: @@city@@"; 
               scm:externalTriples        "sameas_geonames";
               .

In this case, the uri key dimension instance, the uriPiece, the titlePiece and the labelPiece will be applied to each item belonging to the same row but, since we do not want to write manually each instance row dimension (it could be thousands of rows), we can specify a pattern in the uri. The part between @@ identifies the key column name that TripliSty tool will replace with the correct key value in item generation procedure. The same for uriPiece, titlePiece and labelPiece.

For each dimension instance you can specify also the externalTriples property in order to generate on-the-fly external rdf triples to add to the dimension instance resource. This value represents a function name that must be present in triplisty.functions module. If for example we want to link each city dimension instance with its geonames rdf link we can define the following python function:

 
def sameas_geonames(self, subject, dimension, column, value):
	import urllib

	value = urllib.quote(value)
	client = ConjunctiveGraph()
	graph = ConjunctiveGraph()		
	client.parse("http://ws.geonames.org/searchRDF?q=" + value + "&maxRows=1")
	city_uri  = list(client.triples((None, Utils.RDF['type'], URIRef('http://www.geonames.org/ontology#Feature'))))[0][0]
	graph.add((URIRef(subject),URIRef("http://www.w3.org/2002/07/owl#sameAs") ,city_uri))
	return graph

The parameters passed to the function are:

In this way, querying the city dimension instances, TripliSty tool will add on-the-fly the new sameas triple making possible to navigate from a dataset to another.

The dimension types

At least we must define the dimension types as in a normal SCOVO ontology

# Domain schema definitions"
:Year   rdfs:subClassOf scv:Dimension ;
 	    dc:title "Year" . 

:City   rdfs:subClassOf scv:Dimension ; 
        dc:title "A city" . 

:Genre  rdfs:subClassOf scv:Dimension ; 
        dc:title "Genre" .
		   

 

Feedback and suggestions are welcome. Write to gpirrotta@unime.it