TripliSty tool has been created in order to automate mapping between SCOVO ontology and statistical data. To work correctly, TripliSty tool needs a mapping file defined following the SCOMA (SCOma MApping) ontology definition language. See below
I will try to explain TripliSty tool with an example.
In the following table, the values represent the number of tourists in some cities in the 2008 (sample data)
In statistics context the data to focus our attention on is not the row but the single cell value and its dimensions.
In each statistic table we can have three types of dimensions:
In our example:
The row dimension represents the dimension key that identifies each single row in the table. Different rows in tables will have different row dimensions applied to the cells.
What I want to emphasized is that, in statistical data, the main concept is not each single row but each single cell object
A very important ontology we can use to model statistical data is of course SCOVO ontology since all properties and concepts turn around the cell object. In the following figure you can note the cell-centric (or item-centric) vision of Scovo ontology.
In statistical data each single cell becomes an instance of Item and our task is to assign to each single cell the correct value, using the "value" property, and all dimensions that identify the single cell in the table.
Our above example in SCOVO ontology will be:
@prefix ex: <http://example.org/tourists/> . @prefix scv: <http://purl.org/NET/scovo#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . # domain schema definitions ex:Year rdfs:subClassOf scv:Dimension ; dc:title "year" ; . ex:City rdfs:subClassOf scv:Dimension ; dc:title "A city"; . ex:Genre rdfs:subClassOf scv:Dimension ; dc:title "genre" ; . ex:2008 rdf:type ex:Year ; dc:title "The year 2008" ; scv:min "2008-01-01"^^xsd:date ; scv:max "2008-12-31"^^xsd:date ; . ex:rome rdf:type ex:City; dc:title "City of Rome". ex:paris rdf:type ex:City; dc:title "City of Paris". ex:london rdf:type ex:City; dc:title "City of London". ex:berlin rdf:type ex:City; dc:title "City of Berlin". ex:male rdf:type ex:Genre; dc:title "male". ex:female rdf:type ex:Genre; dc:title "female". ex:tourists rdf:type scv:Dataset ; dc:title "Tourists in the 2008" ; scv:datasetOf ex:2008-rome-m; scv:datasetOf ex:2008-rome-f; scv:datasetOf ex:2008-paris-m; scv:datasetOf ex:2008-paris-f; scv:datasetOf ex:2008-london-m; scv:datasetOf ex:2008-london-f; scv:datasetOf ex:2008-berlin-m; scv:datasetOf ex:2008-berlin-f. ex:2008-rome-m rdf:type scv:Item ; rdf:value "1324" ; scv:dataset ex:tourists ; scv:dimension ex:rome ; scv:dimension ex:male ; scv:dimension ex:2008 ; . ex:2008-rome-f rdf:type scv:Item ; rdf:value "1432" ; scv:dataset ex:tourists ; scv:dimension ex:rome ; scv:dimension ex:female ; scv:dimension ex:2008 ; . ex:2008-paris-m rdf:type scv:Item ; rdf:value "2432" ; scv:dataset ex:tourists ; scv:dimension ex:paris ; scv:dimension ex:male ; scv:dimension ex:2008 ; . ex:2008-paris-f rdf:type scv:Item ; rdf:value "2654" ; scv:dataset ex:tourists ; scv:dimension ex:paris ; scv:dimension ex:female ; scv:dimension ex:2008 ; . ex:2008-london-m rdf:type scv:Item ; rdf:value "4532" ; scv:dataset ex:tourists ; scv:dimension ex:london ; scv:dimension ex:male ; scv:dimension ex:2008 ; . ex:2008-london-f rdf:type scv:Item ; rdf:value "4943" ; scv:dataset ex:tourists ; scv:dimension ex:london ; scv:dimension ex:female ; scv:dimension ex:2008 ; . ex:2008-berlin-m rdf:type scv:Item ; rdf:value "2354" ; scv:dataset ex:tourists ; scv:dimension ex:berlin ; scv:dimension ex:male ; scv:dimension ex:2008 ; . ex:2008-berlin-f rdf:type scv:Item ; rdf:value "2534" ; scv:dataset ex:tourists ; scv:dimension ex:berlin ; scv:dimension ex:female ; scv:dimension ex:2008 ; .
As we can see each item has a unique URI in order to be identified inside and outside the table. We can make URI as we want but a good way is to assemble dimension pieces.
For a moment we focus our attention on the female tourists of Berlin. We note that the URI of this item is composed by some pieces obtained from uri home, dataset and dimensions.
We can see that in the following figure:
As we can see, each item can be generate simply assembling the respective infomation pieces. The target of this tutorial is exactly to automate the whole procedure with the TripliSty tool.
The TripliSty tool automates the mapping procedure of statistics in semantic way and provides a set of features to publish on-the-fly the results following the LOD rules. To find a strategy to map an input source (database, xls file, csv file) with Scovo Ontology in order to define the dimensions to apply to each item, the SCOMA ontology (SCOvo Mapping) has been defined.
TripliSty tool takes as input file a scoma mapping file and gives as results the statistics rdfized as specified in the mapping file.
To explain how TripliSty works, the first step to follow is to understand how to write the SCOMA mapping file.
Imagine the above data located in a database table called "tourists".
First of all we have to decide the structure of our URIs. So we have:
Obviously "localhost" domain must be modified with your "real" domain. In this context we have chosen "localhost" domain in order to publish the final results on our local pc.
Each SCOMA mapping file must have a ScomaConfig object.
# Definition of the class "ScomaConfig" map:scomacfg a scm:ScomaConfig;
scm:hasDataset map:mydataset;
scm:uriHome "http://localhost/triplisty_example/";
scm:uriSchemaHome "http://localhost/triplisty_example/schema/";
scm:uriDimensionHome "http://localhost/triplisty_example/dimension/";
scm:uriDatasetHome "http://localhost/triplisty_example/dataset/"; .
A mapping file can define one ore more datasets. In our example we consider for semplicity only one dataset referenced by "mydataset" resource.
# Definition of the class "Dataset" map:mydataset a scm:Dataset; scm:sourceType "db"; scm:hasSource map:mysource; scm:hasKey map:key; scm:hasField map:field1; scm:hasField map:field2; scm:uriDataset "http://localhost/triplisty_example/dataset/tourists"; scm:title "Dataset: tourists"; scm:label "Dataset: tourists"; scm:hasDimension map:dim_2008; .
For each dataset we must define a source type, in our case "db" (other values are "xls" and "csv"). In dataset object we define the structure of our dataset. It is composed by a key and two fields. In our example the key is the city column and the fields are the male/female columns. After we defined the URI Dataset, the title and label dataset, we specify the unique dataset dimension.
Now we are going to explore how the datasource resource is composed.
# Definition of the class "Source" map:mysource a scm:Database; scm:dbType "mysql"; scm:dbName "scoma"; scm:hostname "localhost"; scm:table "tourists"; scm:username "root"; scm:password "mypassword": .
We must define the database type (default "mysql" - available also "postgres"), the name DB, the hostname, the user, the password and obviously the table name. We can also define other parameters to obtain more flexibility, for example join from tables, size limit, and so forth. See SCOMA Definition Language
# Definition of the class "Key" map:key a scm:Key; scm:column "city"; scm:hasDimension map:dim_city; scm:order "1"; . map:dim_city a scm:DimensionInstance; scm:hasDimensionType :City; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/city/@@city@@"; scm:uriPiece "-@@city@@"; scm:index "1"; scm:titlePiece ", city: @@city@@"; scm:labelPiece ", city: @@city@@"; .
The Key class is perhaps the most interesting class. Each dataset can have one ore more keys and each key must have a column associated. In our example we define "city" column of our table. As we said, the key represents our dimension row. Unlike dataset and column dimension, that can be defined simply in scoma mapping file, all row dimensions can be a very large number (one row dimension for each row). Since we do not want define all row dimensions of the table (it could be thousand) we could define a URI pattern in order to indicate to TripliSty, how it must compose dimension row for us. (see uri property of dim_city resource). The same principle is valid also to compose the uri item (in fact in each uri item we find a piece of dimension row) so we must define the uri piece item that will change to each row. To do this we define a URI pattern as you can see in the code.
In the URI pattern we must define the changing part between "@@" and indicate an index property to ensure the correct composing in the presence of multiple keys. Obviously also the title and label must change to each row so here we define the text to appear with the pattern strategy. The order property in the Key resource defines the position of the field in the table
# Definition of the class "Field" map:field1 a scm:Field; scm:hasDimension map:dim_male; scm:datatype "integer"; scm:order "2"; scm:column "male"; . map:field2 a scm:Field; scm:hasDimension map:dim_female; scm:datatype "integer"; scm:order "3"; scm:column "female"; . map:dim_male a scm:DimensionInstance; scm:hasDimensionType :Genre; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/genre/male"; scm:uriPiece "-m"; scm:index "1"; scm:titlePiece ", genre: male"; scm:labelPiece ", genre: male"; . map:dim_female a scm:DimensionInstance; scm:hasDimensionType :Genre; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/genre/female"; scm:uriPiece "-f"; scm:index "1"; scm:titlePiece ", genre: female"; scm:labelPiece ", genre: female"; . map:dim_2008 a scm:DimensionInstance; scm:hasDimensionType :Year; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/year/2008"; scm:uriPiece "/2008"; scm:index "1"; scm:titlePiece ", year: 2008"; scm:labelPiece ", year: 2008"; .
In this code you can see how to map the column name with one or more dimension resources. For each field you can define (not mandatory) a datatype
for the value in the column and the order to ensure the correct position of the field in the table (useful when you have more than one dimension in
the same column).
To each dimension associated to a column we must define a DimensionInstance resource with the same rule seen above for Key class. This time
the uri, title and label are static.
# Domain schema definitions" :Year rdfs:subClassOf scv:Dimension ; dc:title "Year" . :City rdfs:subClassOf scv:Dimension ; dc:title "A city" . :Genre rdfs:subClassOf scv:Dimension ; dc:title "Genre" .
At the end we must define the dimension types, simply extending SCOVO ontology, as we have seen above.
# By command-line python -m triplisty_console -m mapping-file [-r request] [-D] [-f file_to_save] # To generate the whole table python -m triplisty_console -m c://triplisty_example/tourists.n3 -r http://localhost/triplisty_example/dataset/tourists # To generate the the single item python -m triplisty_console -m c://triplisty_example/tourists.n3 -r http://localhost/triplisty_example/dataset/tourists/2008-Berlin-m # To dump all datasets in mapping file python -m triplisty_console -m c://triplisty_example/tourists.n3 -D # To save the output in a specified file python -m triplisty_console -m c://triplisty_example/tourists.n3 -r http://localhost/triplisty_example/dataset/tourists -f myfile.rdf
To publish statistics on the Web with TripliSty, a lod.py python script has been created as follows:
#!/usr/bin/python import cgi from triplisty.engine import Engine print "Content-type: application/rdf+xml" print query = cgi.FieldStorage() uri = query.getvalue("uri") engine = Engine() engine.set_mapping_file("c://triplisty_example/tourists.n3") engine.set_request(uri) engine.run()Inside triplisty_example directory an .htaccess file has been prepared as follows:
RewriteEngine On RewriteBase /triplisty_example RewriteRule ^dataset/(.+) lod.py?uri=http://localhost/triplisty_example/dataset/$1 RewriteRule ^dimension/(.+) lod.py?uri=http://localhost/triplisty_example/dimension/$1 RewriteRule ^schema(.*) lod.py?uri=http://localhost/triplisty_example/schema$1
That's all !!!
Now you can navigate through your statistics data for example with Tabulator Firefox plugin.
The whole table:
The single Item:
A static dimension
A dynamic dimension
A dimension type (dimension class)
All triples are generated on-the-fly so the data are stored only on the database (or xls/csv files).
There are many other aspects and features you can specify in scovo mapping file. For example:
A scm:ScomaConfig defines all the datasets to map and the informations about the URIs schema.
# Definition of the class "ScomaConfig" map:scomacfg a scm:ScomaConfig; scm:hasDataset map:mydataset; scm:uriHome "http://localhost/triplisty_example/"; scm:uriSchemaHome "http://localhost/triplisty_example/schema/"; scm:uriDimensionHome "http://localhost/triplisty_example/dimension/"; scm:uriDatasetHome "http://localhost/triplisty_example/dataset/"; .
A scm:Dataset defines the schema of the dataset to map.
scm:sourceType | Defines the input source type. Availables: db, xls, csv. (Mandatory) |
scm:hasSource | Defines the input source. (Mandatory) |
scm:hasKey | Defines the schema key(s) column (Mandatory) |
scm:hasField | Defines the schema field(s) column in the schema. (Mandatory) |
scm:uriDataset | Defines the Dataset URI (Mandatory) |
scm:title | Defines the Dataset title (optional) |
scm:label | Defines the Dataset label (optional) |
scm:hasDimension | Defines the dataset dimension(s). This dimension will be applied to each dataset item . |
scm:sizeDataset | Defines a limit for the dataset size. Useful for dataset very large. (To implement) |
scm:externalTriples | Defines the function name to invoke in the Item generating procedure. The function must be present in triplisty.functions module and you have to use it if you want to generate triples to add on-the-fly to the dataset (ie. link with external dataset, geonames, dbpedia, etc.). See DimensionInstance example below |
# Definition of the class "Dataset" map:mydataset a scm:Dataset; scm:sourceType "db"; scm:hasSource map:mysource; scm:hasKey map:key; scm:hasField map:field1; scm:hasField map:field2; scm:uriDataset "http://localhost/triplisty_example/dataset/tourists"; scm:title "Dataset: tourists"; scm:label "Dataset: tourists"; scm:hasDimension map:dim_2008; .
A scm:Database defines the database input source
scm:dbType | Defines the type of database. Availables: mysql, postgres. (Mandatory) |
scm:dbName | Defines the database name. (Mandatory) |
scm:hostname | Defines the hostname (Mandatory) |
scm:username | Defines the username. (Mandatory) |
scm:password | Defines the password (Mandatory) |
scm:table | Defines the table(s) (Mandatory) |
scm:condition | Defines the condition in the WHERE clause (Optional) |
scm:join | Defines the join between tables (Optional) |
scm:extra | Defines an extra string to add in the query. (Example: LIMIT...ORDER BY....DESC...) |
# Definition of the class "Database" map:mysource a scm:Database; scm:dbType "mysql"; scm:dbName "scoma"; scm:hostname "localhost"; scm:table "tourists"; scm:username "root"; scm:password "mypassword": .
A scm:XLSFile defines the XLS file input source
scm:pathfile | Defines the path of the file to load. (Mandatory) |
scm:indexSheet | Defines the index sheet to select into the xls file. (Optional - default: 0) |
scm:nameSheet | Defines the name sheet to select into the xls file (Optional) |
scm:startRow | Defines the first row. (Optional - default: 1) |
scm:endRow | Defines the last row (Optional) |
# Definition of the class "XLSFile" map:mysource a scm:XLSFile; scm:pathfile "C://scoma/table.xls"; scm:indexSheet "0"; # default 0 # or scm:nameSheet "sheet1"; (alternative to indexSheet) scm:startRow "1"; #default 1 scm:endRow "10"; .
scm:pathfile | Defines the path of the file to load. (Mandatory) |
scm:fieldsEnclosedBy | Defines the fields delimiter (Optional - default: ";") |
scm:linesTerminatedBy | Defines the line terminator (Optional - default: "\n") |
# Definition of the class "CSVFile" map:mysource a scm:CSVFile; scm:pathfile "C://scoma/table.csv"; scm:fieldsEnclosedBy "\t"; #default ";" scm:linesTerminatedBy "\n"; #default "\n" .
scm:column | Defines the column to map in the table (Mandatory) |
scm:hasDimension | Defines the key dimension. This dimension will be applied to each item belonging to the same row. (Mandatory - exactly one) |
scm:order | Defines the table column position. (Mandatory) |
scm:transformFunction | Defines the name of the function to apply in the Item generating procedure. The function must be present in triplisty.functions module. This property is useful when you want transform the key value (i.e. if the value in the table is "City of Rome", you could specify a function to replace the spaces with underscores in order to export "City_of_Rome" value). See example below. |
# Definition of the class "Key" map:key a scm:Key; scm:column "city"; # A,B,C for XLS files, 1,2,3 for CSV files scm:hasDimension map:dim_city; scm:order "1"; scm:transformFunction "replace_space" .
For each key, the TripliSty tool will generate the item invoking on-the-fly the replace_space python function defined in the triplisty.functions:
def replace_space(self, value): return value.strip().replace(" ", "_")
The value to transform is passed as argument, so you can convert it as you want.
scm:column | Defines the column to map in the table (Mandatory) |
scm:hasDimension | Defines the column dimension(s). This dimension will be applied to the items belonging to the same column. (Mandatory - one or more) |
scm:order | Defines the table column position. (Mandatory) |
scm:datatype | Defines the datatype of the cell value (Optional - Availables: integer, float, string..See XML Schema) |
# Definition of the class "Field" map:field1 a scm:Field; scm:column "male"; # A,B,C for XLS files, 1,2,3 for CSV files scm:hasDimension map:dim_male; scm:datatype "integer"; scm:order "2"; .
scm:hasDimensionType | Defines the dimension instance type (of SCOVO Dimension subclasses) (Mandatory) |
scm:uriDimensionInstance | Defines the dimension instance URI (static or dynamic URIs) (Mandatory) |
scm:uriPiece | Defines the URI piece for the item generating procedure. (Mandatory) See example below. |
scm:titlePiece | Defines the title piece for the item generating procedure. (Optional) See example below. |
scm:labelPiece | Defines the label piece for the item generating procedure. (Optional) See example below. |
scm:index | Defines the piece index in the URI. Useful when we have multiple keys or multiple dimensions in the same field. (Mandatory) |
scm:externalTriples | Defines the function name to invoke in the rdf generating procedure. The function must be present in triplisty.functions module and you can use it when you want to generate on-the-fly triples to add to the dataset (ie. link with external dataset, geonames, dbpedia, etc.). See example below |
In the dimension instance resources we must define how the TripliSty tool must build each single Item URI. We can have static and dynamic dimension.
Example of static dimension
map:dim_male a scm:DimensionInstance; scm:hasDimensionType :Genre; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/genre/male"; scm:uriPiece "-m"; scm:index "1"; scm:titlePiece ", genre: male"; scm:labelPiece ", genre: male"; .
Each dataset and field dimension instances are examples of static dimensions. The uriDimensionInstance, the uriPiece, the titlePiece and the labelPiece values are static. In fact each uri dataset dimension instance will be applied to each Item belonging to the table and each uri field dimension instance will be applied to each Item belonging to the same column. The same for uriPiece, titlePiece and labelPiece.
Example of dynamic dimension
map:dim_city a scm:DimensionInstance; scm:hasDimensionType :City; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/city/@@city@@"; scm:uriPiece "-@@city@@"; scm:index "1"; scm:titlePiece ", city: @@city@@"; scm:labelPiece ", city: @@city@@"; scm:externalTriples "sameas_geonames"; .
In this case, the uri key dimension instance, the uriPiece, the titlePiece and the labelPiece will be applied to each item belonging to the same row but, since we do not want to write manually each instance row dimension (it could be thousands of rows), we can specify a pattern in the uri. The part between @@ identifies the key column name that TripliSty tool will replace with the correct key value in item generation procedure. The same for uriPiece, titlePiece and labelPiece.
For each dimension instance you can specify also the externalTriples property in order to generate on-the-fly external rdf triples to add to the dimension instance resource. This value represents a function name that must be present in triplisty.functions module. If for example we want to link each city dimension instance with its geonames rdf link we can define the following python function:
def sameas_geonames(self, subject, dimension, value): import urllib from rdflib import ConjunctiveGraph from rdflib import URIRef value = urllib.quote(value) client = ConjunctiveGraph() graph = ConjunctiveGraph() client.parse("http://ws.geonames.org/searchRDF?q=" + value + "&maxRows=1") city_uri = list(client.triples((None, Utils.RDF['type'], URIRef('http://www.geonames.org/ontology#Feature'))))[0][0] graph.add((URIRef(subject),URIRef("http://www.w3.org/2002/07/owl#sameAs") ,city_uri)) return graph
The parameters passed to the function are:
In this way, querying the city dimension instances, TripliSty tool will add on-the-fly the new sameas triple making possible to navigate from a dataset to another.
At least we must define the dimension types as in a normal SCOVO ontology
# Domain schema definitions" :Year rdfs:subClassOf scv:Dimension ; dc:title "Year" . :City rdfs:subClassOf scv:Dimension ; dc:title "A city" . :Genre rdfs:subClassOf scv:Dimension ; dc:title "Genre" .
A (very early - refactoring in progress) beta version of TripliSty is available here .
Bug alerts, feedbacks and suggestions are welcome. Write to gpirrotta@unime.it