#008: Dataset history via file dumps #8

New issue

Open

opened 2026-04-05 12:58:42 +00:00 by daniel · 0 comments

daniel commented

2026-04-05 12:58:42 +00:00

Owner

Blocked by

#005 SHACL migration detection (history is triggered by migrations)

Summary

Save dataset history as serialized RDF dump files, stored in the file storage system (Hetzner storage box) alongside PDF uploads. Each dump is a Turtle or N-Quads file capturing the state of the dataset at a point in time.

Design sketch

When a migration is performed:

Export the affected graph as a Turtle/N-Quads file.
Store it in the team's file storage: data/files/{team}/dumps/{dataset}-{timestamp}.ttl
Record metadata in the meta-dataset (file path, timestamp, SHACL version, reason).
Apply the migration to the dataset.

The dumps can also be registered in the file catalog for discoverability via SPARQL.

Pros

Minimal impact on dataset and RDF store size
Dumps are portable — can be loaded into any RDF tool
Leverages existing file storage infrastructure (storage box)
Can handle very large datasets without bloating the store

Cons

Not directly queryable — must be loaded into a store first to query historical data
File management overhead (cleanup, retention policies)
Serialization/deserialization time for large datasets

Configuration

Enabled per-dataset via meta-dataset:

<urn:config:history> concon:historyStrategy concon:FileDumps ;
    concon:dumpFormat "text/turtle" .

Tests

Unit tests

#[test]
fn test_dump_file_created_on_migration() {
    // Configure FileDumps strategy
    // Run migration
    // Verify: Turtle file exists at expected path
}

#[test]
fn test_dump_file_contains_pre_migration_data() {
    // Run migration
    // Parse the dump file
    // Verify: contains the old data, not the migrated data
}

#[test]
fn test_dump_metadata_in_meta_dataset() {
    // Run migration
    // Query meta-dataset for dump metadata
    // Verify: file path, timestamp, format recorded
}

#[test]
fn test_dump_registered_in_file_catalog() {
    // Run migration with catalog registration enabled
    // Query file catalog
    // Verify: dump appears as a file entry
}

Manual tests

Enable FileDumps strategy for a dataset
Run migration, verify dump file appears in storage
Download dump file — verify it's valid Turtle/N-Quads
Load dump into a local store — verify historical data queryable
Check file catalog — verify dump is discoverable

# Blocked by - [\#005 SHACL migration detection](005-shacl-migration-detection.org) (history is triggered by migrations) # Summary Save dataset history as serialized RDF dump files, stored in the file storage system (Hetzner storage box) alongside PDF uploads. Each dump is a Turtle or N-Quads file capturing the state of the dataset at a point in time. # Design sketch When a migration is performed: 1. Export the affected graph as a Turtle/N-Quads file. 2. Store it in the team's file storage: `data/files/{team}/dumps/{dataset}-{timestamp}.ttl` 3. Record metadata in the meta-dataset (file path, timestamp, SHACL version, reason). 4. Apply the migration to the dataset. The dumps can also be registered in the file catalog for discoverability via SPARQL. # Pros - Minimal impact on dataset and RDF store size - Dumps are portable — can be loaded into any RDF tool - Leverages existing file storage infrastructure (storage box) - Can handle very large datasets without bloating the store # Cons - Not directly queryable — must be loaded into a store first to query historical data - File management overhead (cleanup, retention policies) - Serialization/deserialization time for large datasets # Configuration Enabled per-dataset via meta-dataset: ``` turtle <urn:config:history> concon:historyStrategy concon:FileDumps ; concon:dumpFormat "text/turtle" . ``` # Tests ## Unit tests ``` rust #[test] fn test_dump_file_created_on_migration() { // Configure FileDumps strategy // Run migration // Verify: Turtle file exists at expected path } #[test] fn test_dump_file_contains_pre_migration_data() { // Run migration // Parse the dump file // Verify: contains the old data, not the migrated data } #[test] fn test_dump_metadata_in_meta_dataset() { // Run migration // Query meta-dataset for dump metadata // Verify: file path, timestamp, format recorded } #[test] fn test_dump_registered_in_file_catalog() { // Run migration with catalog registration enabled // Query file catalog // Verify: dump appears as a file entry } ``` ## Manual tests 1. Enable FileDumps strategy for a dataset 2. Run migration, verify dump file appears in storage 3. Download dump file — verify it's valid Turtle/N-Quads 4. Load dump into a local store — verify historical data queryable 5. Check file catalog — verify dump is discoverable

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

daniel/concon#8

No description provided.

Rows
Columns