Daniel Hernandez d2481d6e80 Add Step 2: SPARQL UPDATE queries to transform literals into objects.

19 queries in updates/ convert categorical columns (continent, country,
city, gender, profession, etc.) from literals to typed RDF objects with
rdfs:label. map/step-02.rb applies them to produce data/graph-02.ttl.
Also fix step-01.rb to sanitize column names with spaces and avoid
prefix serialization issues with fragment IRIs.

2026-02-26 19:45:08 +01:00

5 KiB

Raw Blame History

Theatre Migrants

To generate a knowledge graph about migrants in the theatre in Europe.

Generating the ontology

Next there are set of steps describing how to generate the migrants RDF graph.

Step 1 - Loading the input data into a relational database

Task

The file teatre-migrants.sql contains the dump of a MariaDB database. The tables involved in this schema are described in the file db_schema.md. We will load this data in MariaDB to access the data with SQL. To this end:

Create a Dockerfile to create a docker container for MariaDB.
Upload the dump into a database in the container.
Create a Ruby script map/step-01.rb that uses the gem sequel to connect to the database. This Ruby script should return a file called graph-01.ttl containing all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.

Summary

The Dockerfile creates a MariaDB 10.11 container that automatically loads teatre-migrants.sql on first start. The docker-compose.yml exposes the database on port 3306 with a healthcheck.

The script map/step-01.rb connects to the database via sequel and implements the W3C Direct Mapping for all 9 tables (location, migration_table, organisation, person, person_profession, personnames, relationship, religions, work). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file graph-01.ttl contains 162,029 triples.

To run:

docker compose up -d
bundle exec ruby map/step-01.rb

Step 2 - Generate Objects

Continents and countries should be objects instead of literals. To this end, we can transform the following data:

base:location\/ARG-BahBlanca-00 a base:location;
  base:location\#City "Bahia Blanca";
  base:location\#Continent "South America";
  base:location\#Country "Argentina";
  base:location\#GeoNamesID "3865086";
  base:location\#IDLocation "ARG-BahBlanca-00";
  base:location\#latitude -3.87253e1;
  base:location\#longitude -6.22742e1;
  base:location\#wikidata "Q54108";
  base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .

Into the following data:

base:location\/ARG-BahBlanca-00 a base:location;
  base:location\#City base:City-BahiaBlanca;
  base:location\#Continent base:Continent-SouthAmerica;
  base:location\#Country base:Country-Argentina;
  base:location\#GeoNamesID "3865086";
  base:location\#IDLocation "ARG-BahBlanca-00";
  base:location\#latitude -3.87253e1;
  base:location\#longitude -6.22742e1;
  base:location\#wikidata "Q54108";
  base:location\#wikipedia "https://en.wikipedia.org/wiki/Bah%C3%ADa_Blanca" .

base:City-BahiaBlanca a base:City;
  rdfs:label "Bahia Blanca"@en .

base:Continent-SouthAmerica a base:Continent;
  rdfs:label "South America"@en .

base:Country-Argentina a base:Country;
  rdfs:label "Argentina"@en .

Notice that all ranges of property rdfs:label are stated to be in English.

Generate an SPARQL UPDATE query that do this tranformation for all elements of the table and save it a new folder called updates. Do the same with the other tables, proposing which columns should be defined as objects. For every table define a different SPARQL UPDATE query and to be saved in the updates folder. Enumerate these generated queries adding a prefix number like 001, 002, 003, and so on.

After generating the update queries, generate a Ruby script that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: data/graph-02.ttl.

Summary

19 SPARQL UPDATE queries in updates/ transform literal values into typed objects across all tables:

Query	Table	Column	Object type
001	location	Continent	Continent
002	location	Country	Country
003	location	State	State
004	location	City	City
005	migration_table	reason	MigrationReason
006	migration_table	reason2	MigrationReason
007	organisation	InstType	InstitutionType
008	person	gender	Gender
009	person	Nametype	Nametype
010	person	Importsource	ImportSource
011	person_profession	Eprofession	Profession
012	personnames	Nametype	Nametype
013	relationship	Relationshiptype	RelationshipType
014	relationship	relationshiptype_precise	RelationshipTypePrecise
015	religions	religion	Religion
016	work	Profession	Profession
017	work	Profession2	Profession
018	work	Profession3	Profession
019	work	EmploymentType	EmploymentType

Each query replaces a literal value with an object reference and creates the object with rdf:type and rdfs:label (in English). The script map/step-02.rb loads data/graph-01.ttl, applies all queries in order, and writes data/graph-02.ttl (164,632 triples).

To run:

bundle exec ruby map/step-02.rb

5 KiB Raw Blame History

Theatre Migrants

Generating the ontology

Step 1 - Loading the input data into a relational database

Task

Summary

Step 2 - Generate Objects

Summary

5 KiB

Raw Blame History