Update README to reference Rust programs instead of Ruby scripts.
Also add task descriptions for steps 3, 4, and 5.
This commit is contained in:
parent
0813192990
commit
fadf063d3a
1 changed files with 24 additions and 6 deletions
30
README.md
30
README.md
|
|
@ -46,19 +46,19 @@ The file `teatre-migrants.sql` contains the dump of a MariaDB database. The tabl
|
|||
|
||||
2. Upload the dump into a database in the container.
|
||||
|
||||
3. Create a Ruby script `map/step-01.rb` that uses the gem `sequel` to connect to the database. This Ruby script should return a file called `graph-01.ttl` containing all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.
|
||||
3. Create a Rust program `src/map/step_01.rs` that connects to the database. This program should return a file called `graph-01.ttl` containing all the data from the tables loaded in the database using the direct mapping from relational databases to RDF.
|
||||
|
||||
#### Summary
|
||||
|
||||
The `Dockerfile` creates a MariaDB 10.11 container that automatically loads `teatre-migrants.sql` on first start. The `docker-compose.yml` exposes the database on port 3306 with a healthcheck.
|
||||
|
||||
The script `map/step-01.rb` connects to the database via `sequel` and implements the [W3C Direct Mapping](https://www.w3.org/TR/rdb-direct-mapping/) for all 9 tables (`location`, `migration_table`, `organisation`, `person`, `person_profession`, `personnames`, `relationship`, `religions`, `work`). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file `graph-01.ttl` contains 162,029 triples.
|
||||
The program `src/map/step_01.rs` connects to the database and implements the [W3C Direct Mapping](https://www.w3.org/TR/rdb-direct-mapping/) for all 9 tables (`location`, `migration_table`, `organisation`, `person`, `person_profession`, `personnames`, `relationship`, `religions`, `work`). Each table row becomes an RDF resource identified by its primary key, each column becomes a datatype property, and each foreign key becomes an object property linking to the referenced row. The output file `graph-01.ttl` contains 162,029 triples.
|
||||
|
||||
To run:
|
||||
|
||||
```sh
|
||||
docker compose up -d
|
||||
bundle exec ruby map/step-01.rb
|
||||
cargo run --release --bin step-01
|
||||
```
|
||||
|
||||
### Step 2 - Generate Objects
|
||||
|
|
@ -106,7 +106,7 @@ Notice that all ranges of property `rdfs:label` are stated to be in English.
|
|||
|
||||
Generate an SPARQL UPDATE query that do this tranformation for all elements of the table and save it a new folder called `updates`. Do the same with the other tables, proposing which columns should be defined as objects. For every table define a different SPARQL UPDATE query and to be saved in the `updates` folder. Enumerate these generated queries adding a prefix number like 001, 002, 003, and so on.
|
||||
|
||||
After generating the update queries, generate a Ruby script that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: `data/graph-02.ttl`.
|
||||
After generating the update queries, generate a Rust program that executes the updates on the RDF graph generated in the previous step and generates a new RDF graph to be saved: `data/graph-02.ttl`.
|
||||
|
||||
#### Summary
|
||||
|
||||
|
|
@ -134,10 +134,28 @@ After generating the update queries, generate a Ruby script that executes the up
|
|||
| 018 | work | Profession3 | Profession |
|
||||
| 019 | work | EmploymentType | EmploymentType |
|
||||
|
||||
Each query replaces a literal value with an object reference and creates the object with `rdf:type` and `rdfs:label` (in English). The script `map/step-02.rb` loads `data/graph-01.ttl`, applies all queries in order, and writes `data/graph-02.ttl` (164,632 triples).
|
||||
Each query replaces a literal value with an object reference and creates the object with `rdf:type` and `rdfs:label` (in English). The program `src/map/step_02.rs` loads `data/graph-01.ttl`, applies all queries in order, and writes `data/graph-02.ttl` (164,632 triples).
|
||||
|
||||
To run:
|
||||
|
||||
```sh
|
||||
bundle exec ruby map/step-02.rb
|
||||
cargo run --release --bin step-02
|
||||
```
|
||||
|
||||
### Step 3 - Annotate dataypes
|
||||
|
||||
In the previous example we have dates like "1894-12-31", which is represented as an `xsd:string` datatype. Please infer the datatypes of these literals and create a new SPARQL query to generate a new RDF graph where literals use these dataypes.
|
||||
|
||||
### Step 4 - Replace empty string with unbound values
|
||||
|
||||
Intuitively, the triple
|
||||
|
||||
```
|
||||
work:4 workp:EmploymentType workp:comment "" .
|
||||
```
|
||||
|
||||
does not intended to mean a comment "", but the lack of a comment. So, write a query that exclude these comments from the next generated graph.
|
||||
|
||||
### Step 5 - Use Schema.org
|
||||
|
||||
For some classes, properties and individuals we can be represented with Schema.org. For example, the class `migrants:person` can be represented with the class `schema:Person`. Please propose what of these elements could use the Schema.org vocabulary and generate an SPARQL to generate the next graph.
|
||||
|
|
|
|||
Loading…
Reference in a new issue