Daniel Hernandez fadf063d3a Update README to reference Rust programs instead of Ruby scripts.

Also add task descriptions for steps 3, 4, and 5.

2026-02-28 18:07:38 +01:00

6.4 KiB

Raw Blame History

Theatre Migrants

To generate a knowledge graph about migrants in the theatre in Europe.

Running the scripts

The mapping scripts have been reimplemented in Rust for faster execution. Both scripts must be run from this directory (mapping/).

Prerequisites: Start the MariaDB container before running step 1:

docker compose up -d

Step 1 — Direct Mapping from MariaDB to RDF (data/graph-01.ttl):

cargo run --release --bin step-01

Step 2 — Apply SPARQL UPDATE queries (data/graph-02.ttl):

cargo run --release --bin step-02

Alternatively, after installing with cargo install --path .:

step-01
step-02

Generating the ontology

Next there are set of steps describing how to generate the migrants RDF graph.

Step 1 - Loading the input data into a relational database

Task

The file teatre-migrants.sql contains the dump of a MariaDB database. The tables involved in this schema are described in the file db_schema.md. We will load this data in MariaDB to access the data with SQL. To this end:

Create a Dockerfile to create a docker container for MariaDB.
Upload the dump into a database in the container.
Create a Rust program src/map/step_01.rs that connects to the database. This program should return a file called graph-01.ttl containing all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.

Summary

The Dockerfile creates a MariaDB 10.11 container that automatically loads teatre-migrants.sql on first start. The docker-compose.yml exposes the database on port 3306 with a healthcheck.

The program src/map/step_01.rs connects to the database and implements the W3C Direct Mapping for all 9 tables (location, migration_table, organisation, person, person_profession, personnames, relationship, religions, work). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file graph-01.ttl contains 162,029 triples.

To run:

docker compose up -d
cargo run --release --bin step-01

Step 2 - Generate Objects

Continents and countries should be objects instead of literals. To this end, we can transform the following data:

base:location\/ARG-BahBlanca-00 a base:location;
  base:location\#City "Bahia Blanca";
  base:location\#Continent "South America";
  base:location\#Country "Argentina";
  base:location\#GeoNamesID "3865086";
  base:location\#IDLocation "ARG-BahBlanca-00";
  base:location\#latitude -3.87253e1;
  base:location\#longitude -6.22742e1;
  base:location\#wikidata "Q54108";
  base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .

Into the following data:

base:location\/ARG-BahBlanca-00 a base:location;
  base:location\#City base:City-BahiaBlanca;
  base:location\#Continent base:Continent-SouthAmerica;
  base:location\#Country base:Country-Argentina;
  base:location\#GeoNamesID "3865086";
  base:location\#IDLocation "ARG-BahBlanca-00";
  base:location\#latitude -3.87253e1;
  base:location\#longitude -6.22742e1;
  base:location\#wikidata "Q54108";
  base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .

base:City-BahiaBlanca a base:City;
  rdfs:label "Bahia Blanca"@en .

base:Continent-SouthAmerica a base:Continent;
  rdfs:label "South America"@en .

base:Country-Argentina a base:Country;
  rdfs:label "Argentina"@en .

Notice that all ranges of property rdfs:label are stated to be in English.

Generate an SPARQL UPDATE query that do this tranformation for all elements of the table and save it a new folder called updates. Do the same with the other tables, proposing which columns should be defined as objects. For every table define a different SPARQL UPDATE query and to be saved in the updates folder. Enumerate these generated queries adding a prefix number like 001, 002, 003, and so on.

After generating the update queries, generate a Rust program that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: data/graph-02.ttl.

Summary

19 SPARQL UPDATE queries in updates/ transform literal values into typed objects across all tables:

Query	Table	Column	Object type
001	location	Continent	Continent
002	location	Country	Country
003	location	State	State
004	location	City	City
005	migration_table	reason	MigrationReason
006	migration_table	reason2	MigrationReason
007	organisation	InstType	InstitutionType
008	person	gender	Gender
009	person	Nametype	Nametype
010	person	Importsource	ImportSource
011	person_profession	Eprofession	Profession
012	personnames	Nametype	Nametype
013	relationship	Relationshiptype	RelationshipType
014	relationship	relationshiptype_precise	RelationshipTypePrecise
015	religions	religion	Religion
016	work	Profession	Profession
017	work	Profession2	Profession
018	work	Profession3	Profession
019	work	EmploymentType	EmploymentType

Each query replaces a literal value with an object reference and creates the object with rdf:type and rdfs:label (in English). The program src/map/step_02.rs loads data/graph-01.ttl, applies all queries in order, and writes data/graph-02.ttl (164,632 triples).

To run:

cargo run --release --bin step-02

Step 3 - Annotate dataypes

In the previous example we have dates like "1894-12-31", which is represented as an xsd:string datatype. Please infer the datatypes of these literals and create a new SPARQL query to generate a new RDF graph where literals use these dataypes.

Step 4 - Replace empty string with unbound values

Intuitively, the triple

work:4 workp:EmploymentType workp:comment "" .

does not intended to mean a comment "", but the lack of a comment. So, write a query that exclude these comments from the next generated graph.

Step 5 - Use Schema.org

For some classes, properties and individuals we can be represented with Schema.org. For example, the class migrants:person can be represented with the class schema:Person. Please propose what of these elements could use the Schema.org vocabulary and generate an SPARQL to generate the next graph.

6.4 KiB Raw Blame History

Theatre Migrants

Running the scripts

Generating the ontology

Step 1 - Loading the input data into a relational database

Task

Summary

Step 2 - Generate Objects

Summary

Step 3 - Annotate dataypes

Step 4 - Replace empty string with unbound values

Step 5 - Use Schema.org

6.4 KiB

Raw Blame History