Same 1-hop and 2-hop CONSTRUCT queries as Step 1, but run against graph-02.ttl to show the result of literal-to-IRI transformations. |
||
|---|---|---|
| data | ||
| data_examples | ||
| map | ||
| queries | ||
| spec | ||
| src | ||
| updates | ||
| .gitignore | ||
| Cargo.lock | ||
| Cargo.toml | ||
| db_schema.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| Gemfile | ||
| Gemfile.lock | ||
| graph-01.ttl | ||
| LICENSE | ||
| ontology.ttl | ||
| Rakefile | ||
| README.md | ||
| teatre-migrants.sql | ||
Theatre Migrants
To generate a knowledge graph about migrants in the theatre in Europe.
Running the scripts
The mapping scripts have been reimplemented in Rust for faster execution. Both
scripts must be run from this directory (mapping/).
Prerequisites: Start the MariaDB container before running step 1:
docker compose up -d
Step 1 — Direct Mapping from MariaDB to RDF (data/graph-01.ttl):
cargo run --release --bin step-01
Step 2 — Apply SPARQL UPDATE queries (data/graph-02.ttl):
cargo run --release --bin step-02
Alternatively, after installing with cargo install --path .:
step-01
step-02
Generating the ontology
Next there are set of steps describing how to generate the migrants RDF graph.
Step 1 - Loading the input data into a relational database
Task
The file teatre-migrants.sql contains the dump of a MariaDB database. The tables involved in this schema are described in the file db_schema.md. We will load this data in MariaDB to access the data with SQL. To this end:
-
Create a Dockerfile to create a docker container for MariaDB.
-
Upload the dump into a database in the container.
-
Create a Ruby script
map/step-01.rbthat uses the gemsequelto connect to the database. This Ruby script should return a file calledgraph-01.ttlcontaining all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.
Summary
The Dockerfile creates a MariaDB 10.11 container that automatically loads teatre-migrants.sql on first start. The docker-compose.yml exposes the database on port 3306 with a healthcheck.
The script map/step-01.rb connects to the database via sequel and implements the W3C Direct Mapping for all 9 tables (location, migration_table, organisation, person, person_profession, personnames, relationship, religions, work). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file graph-01.ttl contains 162,029 triples.
To run:
docker compose up -d
bundle exec ruby map/step-01.rb
Step 2 - Generate Objects
Continents and countries should be objects instead of literals. To this end, we can transform the following data:
base:location\/ARG-BahBlanca-00 a base:location;
base:location\#City "Bahia Blanca";
base:location\#Continent "South America";
base:location\#Country "Argentina";
base:location\#GeoNamesID "3865086";
base:location\#IDLocation "ARG-BahBlanca-00";
base:location\#latitude -3.87253e1;
base:location\#longitude -6.22742e1;
base:location\#wikidata "Q54108";
base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .
Into the following data:
base:location\/ARG-BahBlanca-00 a base:location;
base:location\#City base:City-BahiaBlanca;
base:location\#Continent base:Continent-SouthAmerica;
base:location\#Country base:Country-Argentina;
base:location\#GeoNamesID "3865086";
base:location\#IDLocation "ARG-BahBlanca-00";
base:location\#latitude -3.87253e1;
base:location\#longitude -6.22742e1;
base:location\#wikidata "Q54108";
base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .
base:City-BahiaBlanca a base:City;
rdfs:label "Bahia Blanca"@en .
base:Continent-SouthAmerica a base:Continent;
rdfs:label "South America"@en .
base:Country-Argentina a base:Country;
rdfs:label "Argentina"@en .
Notice that all ranges of property rdfs:label are stated to be in English.
Generate an SPARQL UPDATE query that do this tranformation for all elements of the table and save it a new folder called updates. Do the same with the other tables, proposing which columns should be defined as objects. For every table define a different SPARQL UPDATE query and to be saved in the updates folder. Enumerate these generated queries adding a prefix number like 001, 002, 003, and so on.
After generating the update queries, generate a Ruby script that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: data/graph-02.ttl.
Summary
19 SPARQL UPDATE queries in updates/ transform literal values into typed objects across all tables:
| Query | Table | Column | Object type |
|---|---|---|---|
| 001 | location | Continent | Continent |
| 002 | location | Country | Country |
| 003 | location | State | State |
| 004 | location | City | City |
| 005 | migration_table | reason | MigrationReason |
| 006 | migration_table | reason2 | MigrationReason |
| 007 | organisation | InstType | InstitutionType |
| 008 | person | gender | Gender |
| 009 | person | Nametype | Nametype |
| 010 | person | Importsource | ImportSource |
| 011 | person_profession | Eprofession | Profession |
| 012 | personnames | Nametype | Nametype |
| 013 | relationship | Relationshiptype | RelationshipType |
| 014 | relationship | relationshiptype_precise | RelationshipTypePrecise |
| 015 | religions | religion | Religion |
| 016 | work | Profession | Profession |
| 017 | work | Profession2 | Profession |
| 018 | work | Profession3 | Profession |
| 019 | work | EmploymentType | EmploymentType |
Each query replaces a literal value with an object reference and creates the object with rdf:type and rdfs:label (in English). The script map/step-02.rb loads data/graph-01.ttl, applies all queries in order, and writes data/graph-02.ttl (164,632 triples).
To run:
bundle exec ruby map/step-02.rb