To generate a knowledge graph about migrants in the theater in Europe.
Find a file
Daniel Hernandez d2481d6e80 Add Step 2: SPARQL UPDATE queries to transform literals into objects.
19 queries in updates/ convert categorical columns (continent, country,
city, gender, profession, etc.) from literals to typed RDF objects with
rdfs:label. map/step-02.rb applies them to produce data/graph-02.ttl.
Also fix step-01.rb to sanitize column names with spaces and avoid
prefix serialization issues with fragment IRIs.
2026-02-26 19:45:08 +01:00
data Add Step 2: SPARQL UPDATE queries to transform literals into objects. 2026-02-26 19:45:08 +01:00
data_examples Making test pass. 2026-02-22 21:11:19 +01:00
map Add Step 2: SPARQL UPDATE queries to transform literals into objects. 2026-02-26 19:45:08 +01:00
spec Making test pass. 2026-02-22 21:11:19 +01:00
src Making test pass. 2026-02-22 21:11:19 +01:00
updates Add Step 2: SPARQL UPDATE queries to transform literals into objects. 2026-02-26 19:45:08 +01:00
.gitignore Mapping persons religions. 2026-02-22 10:18:45 +01:00
db_schema.md Generating a single Markdown file describing the tables. 2026-02-22 11:57:04 +01:00
docker-compose.yml Add Step 1: Direct mapping from MariaDB to RDF. 2026-02-26 16:42:30 +01:00
Dockerfile Add Step 1: Direct mapping from MariaDB to RDF. 2026-02-26 16:42:30 +01:00
Gemfile Adding an RSpec test. 2026-02-22 20:26:08 +01:00
Gemfile.lock Adding an RSpec test. 2026-02-22 20:26:08 +01:00
graph-01.ttl Add Step 1: Direct mapping from MariaDB to RDF. 2026-02-26 16:42:30 +01:00
LICENSE Initial commit 2026-02-14 12:08:05 +00:00
ontology.ttl Separate the ontology from the data files. 2026-02-22 18:42:24 +01:00
Rakefile Renaming files 2026-02-22 19:06:30 +01:00
README.md Add Step 2: SPARQL UPDATE queries to transform literals into objects. 2026-02-26 19:45:08 +01:00
teatre-migrants.sql Include the input SQL file. 2026-02-26 15:29:50 +01:00

Theatre Migrants

To generate a knowledge graph about migrants in the theatre in Europe.

Generating the ontology

Next there are set of steps describing how to generate the migrants RDF graph.

Step 1 - Loading the input data into a relational database

Task

The file teatre-migrants.sql contains the dump of a MariaDB database. The tables involved in this schema are described in the file db_schema.md. We will load this data in MariaDB to access the data with SQL. To this end:

  1. Create a Dockerfile to create a docker container for MariaDB.

  2. Upload the dump into a database in the container.

  3. Create a Ruby script map/step-01.rb that uses the gem sequel to connect to the database. This Ruby script should return a file called graph-01.ttl containing all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.

Summary

The Dockerfile creates a MariaDB 10.11 container that automatically loads teatre-migrants.sql on first start. The docker-compose.yml exposes the database on port 3306 with a healthcheck.

The script map/step-01.rb connects to the database via sequel and implements the W3C Direct Mapping for all 9 tables (location, migration_table, organisation, person, person_profession, personnames, relationship, religions, work). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file graph-01.ttl contains 162,029 triples.

To run:

docker compose up -d
bundle exec ruby map/step-01.rb

Step 2 - Generate Objects

Continents and countries should be objects instead of literals. To this end, we can transform the following data:

base:location\/ARG-BahBlanca-00 a base:location;
  base:location\#City "Bahia Blanca";
  base:location\#Continent "South America";
  base:location\#Country "Argentina";
  base:location\#GeoNamesID "3865086";
  base:location\#IDLocation "ARG-BahBlanca-00";
  base:location\#latitude -3.87253e1;
  base:location\#longitude -6.22742e1;
  base:location\#wikidata "Q54108";
  base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .

Into the following data:

base:location\/ARG-BahBlanca-00 a base:location;
  base:location\#City base:City-BahiaBlanca;
  base:location\#Continent base:Continent-SouthAmerica;
  base:location\#Country base:Country-Argentina;
  base:location\#GeoNamesID "3865086";
  base:location\#IDLocation "ARG-BahBlanca-00";
  base:location\#latitude -3.87253e1;
  base:location\#longitude -6.22742e1;
  base:location\#wikidata "Q54108";
  base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .

base:City-BahiaBlanca a base:City;
  rdfs:label "Bahia Blanca"@en .

base:Continent-SouthAmerica a base:Continent;
  rdfs:label "South America"@en .

base:Country-Argentina a base:Country;
  rdfs:label "Argentina"@en .

Notice that all ranges of property rdfs:label are stated to be in English.

Generate an SPARQL UPDATE query that do this tranformation for all elements of the table and save it a new folder called updates. Do the same with the other tables, proposing which columns should be defined as objects. For every table define a different SPARQL UPDATE query and to be saved in the updates folder. Enumerate these generated queries adding a prefix number like 001, 002, 003, and so on.

After generating the update queries, generate a Ruby script that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: data/graph-02.ttl.

Summary

19 SPARQL UPDATE queries in updates/ transform literal values into typed objects across all tables:

Query Table Column Object type
001 location Continent Continent
002 location Country Country
003 location State State
004 location City City
005 migration_table reason MigrationReason
006 migration_table reason2 MigrationReason
007 organisation InstType InstitutionType
008 person gender Gender
009 person Nametype Nametype
010 person Importsource ImportSource
011 person_profession Eprofession Profession
012 personnames Nametype Nametype
013 relationship Relationshiptype RelationshipType
014 relationship relationshiptype_precise RelationshipTypePrecise
015 religions religion Religion
016 work Profession Profession
017 work Profession2 Profession
018 work Profession3 Profession
019 work EmploymentType EmploymentType

Each query replaces a literal value with an object reference and creates the object with rdf:type and rdfs:label (in English). The script map/step-02.rb loads data/graph-01.ttl, applies all queries in order, and writes data/graph-02.ttl (164,632 triples).

To run:

bundle exec ruby map/step-02.rb