TripliSty tool has been created in order to automate mapping between SCOVO ontology and statistical data. To work correctly, TripliSty tool needs a mapping file defined following the SCOMA (SCOma MApping) ontology definition language as described in next sections.
In our examples, imagine we want to to map the following table representing the number of tourists of some cities in the 2008 (sample data)
A scm:ScomaConfig defines all the datasets to map and the informations about the URIs schema.
# Definition of the class "ScomaConfig" map:scomacfg a scm:ScomaConfig; scm:hasDataset map:mydataset; scm:uriHome "http://localhost/triplisty_example/"; scm:uriSchemaHome "http://localhost/triplisty_example/schema/"; scm:uriDimensionHome "http://localhost/triplisty_example/dimension/"; scm:uriDatasetHome "http://localhost/triplisty_example/dataset/"; .
A scm:Dataset defines the schema of the dataset to map.
scm:sourceType | Defines the input source type. Availables: db, xls, csv. (Mandatory) |
scm:hasSource | Defines the input source. (Mandatory) |
scm:hasKey | Defines the schema key(s) column (Mandatory) |
scm:hasField | Defines the schema field(s) column in the schema. (Mandatory) |
scm:uriDataset | Defines the Dataset URI (Mandatory) |
scm:title | Defines the Dataset title (optional) |
scm:label | Defines the Dataset label (optional) |
scm:hasDimension | Defines the dataset dimension(s). This dimension will be applied to each dataset item . |
scm:sizeDataset | Defines a limit for the dataset size. Useful for dataset very large. (To implement) |
scm:externalTriples | Defines the function name to invoke in the Item generating procedure. The function must be present in triplisty.functions module and you have to use it if you want to generate triples to add on-the-fly to the dataset (ie. link with external dataset, geonames, dbpedia, etc.). See DimensionInstance example below |
# Definition of the class "Dataset" map:mydataset a scm:Dataset; scm:sourceType "db"; scm:hasSource map:mysource; scm:hasKey map:key; scm:hasField map:field1; scm:hasField map:field2; scm:uriDataset "http://localhost/triplisty_example/dataset/tourists"; scm:title "Dataset: tourists"; scm:label "Dataset: tourists"; scm:hasDimension map:dim_2008; .
A scm:Database defines the database input source
scm:dbType | Defines the type of database. Availables: mysql, postgres. (Mandatory) |
scm:dbName | Defines the database name. (Mandatory) |
scm:hostname | Defines the hostname (Mandatory) |
scm:username | Defines the username. (Mandatory) |
scm:password | Defines the password (Mandatory) |
scm:table | Defines the table(s) (Mandatory) |
scm:condition | Defines the condition in the WHERE clause (Optional) |
scm:join | Defines the join between tables (Optional) |
scm:extra | Defines an extra string to add in the query. (Example: LIMIT...ORDER BY....DESC...) |
# Definition of the class "Database" map:mysource a scm:Database; scm:dbType "mysql"; scm:dbName "scoma"; scm:hostname "localhost"; scm:table "tourists"; scm:username "root"; scm:password "mypassword": .
A scm:XLSFile defines the XLS file input source
scm:pathfile | Defines the path of the file to load. (Mandatory) |
scm:indexSheet | Defines the index sheet to select into the xls file. (Optional - default: 0) |
scm:nameSheet | Defines the name sheet to select into the xls file (Optional) |
scm:startRow | Defines the first row. (Optional - default: 1) |
scm:endRow | Defines the last row (Optional) |
# Definition of the class "XLSFile" map:mysource a scm:XLSFile; scm:pathfile "C://scoma/table.xls"; scm:indexSheet "0"; # default 0 # or scm:nameSheet "sheet1"; (alternative to indexSheet) scm:startRow "1"; #default 1 scm:endRow "10"; .
scm:pathfile | Defines the path of the file to load. (Mandatory) |
scm:fieldsEnclosedBy | Defines the fields delimiter (Optional - default: ";") |
scm:linesTerminatedBy | Defines the line terminator (Optional - default: "\n") |
# Definition of the class "CSVFile" map:mysource a scm:CSVFile; scm:pathfile "C://scoma/table.csv"; scm:fieldsEnclosedBy "\t"; #default ";" scm:linesTerminatedBy "\n"; #default "\n" .
scm:column | Defines the column to map in the table (Mandatory) |
scm:hasDimension | Defines the key dimension. This dimension will be applied to each item belonging to the same row. (Mandatory - exactly one) |
scm:order | Defines the table column position. (Mandatory) |
scm:transformFunction | Defines the name of the function to apply in the Item generating procedure. The function must be present in triplisty.functions module. This property is useful when you want transform the key value (i.e. if the value in the table is "City of Rome", you could specify a function to replace the spaces with underscores in order to export "City_of_Rome" value). See example below. |
# Definition of the class "Key" map:key a scm:Key; scm:column "city"; # A,B,C for XLS files, 1,2,3 for CSV files scm:hasDimension map:dim_city; scm:order "1"; scm:transformFunction "replace_space" .
For each key, the TripliSty tool will generate the item invoking on-the-fly the replace_space python function defined in the triplisty.functions:
def replace_space(self, value): return value.strip().replace(" ", "_")
The value to transform is passed as argument, so you can convert it as you want.
scm:column | Defines the column to map in the table (Mandatory) |
scm:hasDimension | Defines the column dimension(s). This dimension will be applied to the items belonging to the same column. (Mandatory - one or more) |
scm:order | Defines the table column position. (Mandatory) |
scm:datatype | Defines the datatype of the cell value (Optional - Availables: integer, float, string..See XML Schema) |
# Definition of the class "Field" map:field1 a scm:Field; scm:column "male"; # A,B,C for XLS files, 1,2,3 for CSV files scm:hasDimension map:dim_male; scm:datatype "integer"; scm:order "2"; .
scm:hasDimensionType | Defines the dimension instance type (of SCOVO Dimension subclasses) (Mandatory) |
scm:uriDimensionInstance | Defines the dimension instance URI (static or dynamic URIs) (Mandatory) |
scm:uriPiece | Defines the URI piece for the item generating procedure. (Mandatory) See example below. |
scm:titlePiece | Defines the title piece for the item generating procedure. (Optional) See example below. |
scm:labelPiece | Defines the label piece for the item generating procedure. (Optional) See example below. |
scm:index | Defines the piece index in the URI. Useful when we have multiple keys or multiple dimensions in the same field. (Mandatory) |
scm:externalTriples | Defines the function name to invoke in the rdf generating procedure. The function must be present in triplisty.functions module and you can use it when you want to generate on-the-fly triples to add to the dataset (ie. link with external dataset, geonames, dbpedia, etc.). See example below |
In the dimension instance resources we must define how the TripliSty tool must build each single Item URI. We can have static and dynamic dimension.
Example of static dimension
map:dim_male a scm:DimensionInstance; scm:hasDimensionType :Genre; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/genre/male"; scm:uriPiece "-m"; scm:index "1"; scm:titlePiece ", genre: male"; scm:labelPiece ", genre: male"; .
Each dataset and field dimension instances are examples of static dimensions. The uriDimensionInstance, the uriPiece, the titlePiece and the labelPiece values are static. In fact each uri dataset dimension instance will be applied to each Item belonging to the table and each uri field dimension instance will be applied to each Item belonging to the same column. The same for uriPiece, titlePiece and labelPiece.
Example of dynamic dimension
map:dim_city a scm:DimensionInstance; scm:hasDimensionType :City; scm:uriDimensionInstance "http://localhost/triplisty_example/dimension/city/@@city@@"; scm:uriPiece "-@@city@@"; scm:index "1"; scm:titlePiece ", city: @@city@@"; scm:labelPiece ", city: @@city@@"; scm:externalTriples "sameas_geonames"; .
In this case, the uri key dimension instance, the uriPiece, the titlePiece and the labelPiece will be applied to each item belonging to the same row but, since we do not want to write manually each instance row dimension (it could be thousands of rows), we can specify a pattern in the uri. The part between @@ identifies the key column name that TripliSty tool will replace with the correct key value in item generation procedure. The same for uriPiece, titlePiece and labelPiece.
For each dimension instance you can specify also the externalTriples property in order to generate on-the-fly external rdf triples to add to the dimension instance resource. This value represents a function name that must be present in triplisty.functions module. If for example we want to link each city dimension instance with its geonames rdf link we can define the following python function:
def sameas_geonames(self, subject, dimension, column, value): import urllib value = urllib.quote(value) client = ConjunctiveGraph() graph = ConjunctiveGraph() client.parse("http://ws.geonames.org/searchRDF?q=" + value + "&maxRows=1") city_uri = list(client.triples((None, Utils.RDF['type'], URIRef('http://www.geonames.org/ontology#Feature'))))[0][0] graph.add((URIRef(subject),URIRef("http://www.w3.org/2002/07/owl#sameAs") ,city_uri)) return graph
The parameters passed to the function are:
In this way, querying the city dimension instances, TripliSty tool will add on-the-fly the new sameas triple making possible to navigate from a dataset to another.
At least we must define the dimension types as in a normal SCOVO ontology
# Domain schema definitions" :Year rdfs:subClassOf scv:Dimension ; dc:title "Year" . :City rdfs:subClassOf scv:Dimension ; dc:title "A city" . :Genre rdfs:subClassOf scv:Dimension ; dc:title "Genre" .
Feedback and suggestions are welcome. Write to gpirrotta@unime.it