19 queries in updates/ convert categorical columns (continent, country, city, gender, profession, etc.) from literals to typed RDF objects with rdfs:label. map/step-02.rb applies them to produce data/graph-02.ttl. Also fix step-01.rb to sanitize column names with spaces and avoid prefix serialization issues with fragment IRIs.
5 KiB
Theatre Migrants
To generate a knowledge graph about migrants in the theatre in Europe.
Generating the ontology
Next there are set of steps describing how to generate the migrants RDF graph.
Step 1 - Loading the input data into a relational database
Task
The file teatre-migrants.sql contains the dump of a MariaDB database. The tables involved in this schema are described in the file db_schema.md. We will load this data in MariaDB to access the data with SQL. To this end:
-
Create a Dockerfile to create a docker container for MariaDB.
-
Upload the dump into a database in the container.
-
Create a Ruby script
map/step-01.rbthat uses the gemsequelto connect to the database. This Ruby script should return a file calledgraph-01.ttlcontaining all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.
Summary
The Dockerfile creates a MariaDB 10.11 container that automatically loads teatre-migrants.sql on first start. The docker-compose.yml exposes the database on port 3306 with a healthcheck.
The script map/step-01.rb connects to the database via sequel and implements the W3C Direct Mapping for all 9 tables (location, migration_table, organisation, person, person_profession, personnames, relationship, religions, work). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file graph-01.ttl contains 162,029 triples.
To run:
docker compose up -d
bundle exec ruby map/step-01.rb
Step 2 - Generate Objects
Continents and countries should be objects instead of literals. To this end, we can transform the following data:
base:location\/ARG-BahBlanca-00 a base:location;
base:location\#City "Bahia Blanca";
base:location\#Continent "South America";
base:location\#Country "Argentina";
base:location\#GeoNamesID "3865086";
base:location\#IDLocation "ARG-BahBlanca-00";
base:location\#latitude -3.87253e1;
base:location\#longitude -6.22742e1;
base:location\#wikidata "Q54108";
base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .
Into the following data:
base:location\/ARG-BahBlanca-00 a base:location;
base:location\#City base:City-BahiaBlanca;
base:location\#Continent base:Continent-SouthAmerica;
base:location\#Country base:Country-Argentina;
base:location\#GeoNamesID "3865086";
base:location\#IDLocation "ARG-BahBlanca-00";
base:location\#latitude -3.87253e1;
base:location\#longitude -6.22742e1;
base:location\#wikidata "Q54108";
base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .
base:City-BahiaBlanca a base:City;
rdfs:label "Bahia Blanca"@en .
base:Continent-SouthAmerica a base:Continent;
rdfs:label "South America"@en .
base:Country-Argentina a base:Country;
rdfs:label "Argentina"@en .
Notice that all ranges of property rdfs:label are stated to be in English.
Generate an SPARQL UPDATE query that do this tranformation for all elements of the table and save it a new folder called updates. Do the same with the other tables, proposing which columns should be defined as objects. For every table define a different SPARQL UPDATE query and to be saved in the updates folder. Enumerate these generated queries adding a prefix number like 001, 002, 003, and so on.
After generating the update queries, generate a Ruby script that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: data/graph-02.ttl.
Summary
19 SPARQL UPDATE queries in updates/ transform literal values into typed objects across all tables:
| Query | Table | Column | Object type |
|---|---|---|---|
| 001 | location | Continent | Continent |
| 002 | location | Country | Country |
| 003 | location | State | State |
| 004 | location | City | City |
| 005 | migration_table | reason | MigrationReason |
| 006 | migration_table | reason2 | MigrationReason |
| 007 | organisation | InstType | InstitutionType |
| 008 | person | gender | Gender |
| 009 | person | Nametype | Nametype |
| 010 | person | Importsource | ImportSource |
| 011 | person_profession | Eprofession | Profession |
| 012 | personnames | Nametype | Nametype |
| 013 | relationship | Relationshiptype | RelationshipType |
| 014 | relationship | relationshiptype_precise | RelationshipTypePrecise |
| 015 | religions | religion | Religion |
| 016 | work | Profession | Profession |
| 017 | work | Profession2 | Profession |
| 018 | work | Profession3 | Profession |
| 019 | work | EmploymentType | EmploymentType |
Each query replaces a literal value with an object reference and creates the object with rdf:type and rdfs:label (in English). The script map/step-02.rb loads data/graph-01.ttl, applies all queries in order, and writes data/graph-02.ttl (164,632 triples).
To run:
bundle exec ruby map/step-02.rb