Skip to content

dataobservatory-eu/fscontext

Repository files navigation

The fscontext R Package

rhub lifecycle Project Status: WIP devel-version dataobservatory codecov

fscontext provides a provenance-aware contextual reconstruction framework for file systems and related digital resource collections.

The package creates reproducible observational snapshots of files, repository structures, and related operational resources, and supports their contextual abstraction, semantic stabilization, and reconstruction-oriented analysis.

Installation

# CRAN release
install.packages("fscontext")

# Latest development version
pak::pak("dataobservatory-eu/fscontext")

Getting started

The package includes four introductory vignettes that follow the typical fscontext workflow.

Together these vignettes introduce the observational, contextual, documentary, and semantic layers of the package.

Context before semantics

Many digital collections contain valuable contextual information but little documentation explaining how files, datasets, reports, source code, inventories, or digital surrogates relate to one another.

Examples include research projects spread across multiple repositories, digitised archival collections with evolving inventories, audiovisual production environments, long-running analytical projects, and shared drives that have accumulated over many years.

Before semantic integration, archival description, provenance modelling, or knowledge graph construction can begin, it is often necessary to reconstruct the context in which digital resources were created and used.

fscontext approaches filesystems as observational environments. Files, folders, timestamps, repository structures, and other digital traces are treated as evidence from which contextual structures can be reconstructed.

Filesystem observations  
         ↓ 
     Snapshots
         ↓   
Contextual reconstruction
         ↓ 
   Record Sets  
         ↓ 
 Semantic stabilisation           
         ↓
   Knowledge systems  

Rather than replacing archival description or provenance models, fscontext focuses on the earlier task of contextual reconstruction.

The package is inspired by the archival conceptual model Records in Contexts (RiC), developed by the International Council on Archives. Rather than implementing RiC-CM or RiC-O directly, fscontext focuses on the earlier task of contextual reconstruction: deriving contextual relationships and candidate Record Sets from filesystem observations, repository structures, inventories, and other digital traces.

For more information, see:

A reproducible example

The package includes two example filesystem snapshots derived from the companion repository fscontextdemo.

The demonstration repository is available at: https://github.com/dataobservatory-eu/fscontextdemo

It contains a small but realistic digital work environment including source code, datasets, generated artefacts, documentation, tests, package metadata, semantic enrichment examples.

The snapshots, fscontextdemo_snapshot_01 and fscontextdemo_snapshot_02, capture the repository at different points in time, allowing reconstruction and longitudinal analysis workflows to be demonstrated reproducibly.

library(fscontext)
data("fscontextdemo_snapshot_02")
fscontextdemo_snapshot_02 |>
  subset(
    select = c(storage_id, rel_path, filename, quick_sig)
  ) |>
  head()
#>      storage_id                           rel_path
#> 1 fscontextdemo                 .github/.gitignore
#> 2 fscontextdemo     .github/workflows/pkgdown.yaml
#> 3 fscontextdemo                         .gitignore
#> 4 fscontextdemo                      .Rbuildignore
#> 5 fscontextdemo data/fscontextdemo_snapshot_01.rda
#> 6 fscontextdemo       data/fsdemo_country_data.rda
#>                        filename                  quick_sig
#> 1                    .gitignore                   db6ad734
#> 2                  pkgdown.yaml          5eb4aaba_6cbfbdf4
#> 3                    .gitignore                   e73cf12f
#> 4                 .Rbuildignore                   09ab8617
#> 5 fscontextdemo_snapshot_01.rda 03dd3533_36abd309_cb1736f4
#> 6       fsdemo_country_data.rda                   f7e65210

The snapshot records observed filesystem resources together with contextual information such as relative paths, timestamps, extensions, and storage identifiers.

Contextual identifiers can then be added:

data("fscontextdemo_snapshot_02")
snapshot <- add_snapshot_context(fscontextdemo_snapshot_02)

snapshot |>
  subset(
    select = c(storage_path_id, observation_id, rel_path)
  ) |>
  head()
#>                                     storage_path_id
#> 1                 fscontextdemo::.github/.gitignore
#> 2     fscontextdemo::.github/workflows/pkgdown.yaml
#> 3                         fscontextdemo::.gitignore
#> 4                      fscontextdemo::.Rbuildignore
#> 5 fscontextdemo::data/fscontextdemo_snapshot_01.rda
#> 6       fscontextdemo::data/fsdemo_country_data.rda
#>                                                       observation_id
#> 1                 fscontextdemo::.github/.gitignore::20260525-174640
#> 2     fscontextdemo::.github/workflows/pkgdown.yaml::20260525-174640
#> 3                         fscontextdemo::.gitignore::20260525-174640
#> 4                      fscontextdemo::.Rbuildignore::20260525-174640
#> 5 fscontextdemo::data/fscontextdemo_snapshot_01.rda::20260525-174640
#> 6       fscontextdemo::data/fsdemo_country_data.rda::20260525-174640
#>                             rel_path
#> 1                 .github/.gitignore
#> 2     .github/workflows/pkgdown.yaml
#> 3                         .gitignore
#> 4                      .Rbuildignore
#> 5 data/fscontextdemo_snapshot_01.rda
#> 6       data/fsdemo_country_data.rda

The examples above demonstrate only the observational layer. Subsequent workflows can derive contextual Record Sets, compare repeated observations over time, identify duplicate resources, analyse activity patterns, and support semantic stabilisation.

See the package vignettes for complete end-to-end examples.

Core concepts

The package separates three complementary analytical layers:

Layer Purpose
Observation Observe filesystems and related digital environments as reproducible snapshots.
Context Derive contextual identifiers, structural aggregations, and candidate Record Sets from observations.
Record Sets Create lightweight documentary objects using recordset_df, inspired by RiC.
Semantic stabilisation Support progressive semantic enrichment through prelabelled values, rulebooks, and human review.
Analysis Compare snapshots, detect duplicates, reconstruct activity, and analyse evolving digital work environments.

In RiC-inspired terms, filesystem observations represent observed digital resources and their associated instantiations at a particular point in time. These observations may later be aggregated into contextual Record Sets while preserving the distinction between the observed resources themselves and the contextual structures derived from them. The framework intentionally separates observation, contextual organisation, semantic stabilisation, and domain-specific interpretation. This allows the same observational evidence to support different analytical perspectives—including archival description, business process reconstruction, software development, digital forensics, historical research, and other forms of contextual analysis—without conflating the evidence with its interpretation.

What this package does not do

fscontext does not attempt to replace archival description, provenance ontologies, or knowledge graph platforms. Instead, it provides a reproducible observational and contextual layer that can support those systems by making digital working environments easier to understand, review, and reconstruct.

Notes

  • Large scans may require substantial time on slower or networked storage systems.
  • Some files may be inaccessible due to permissions or synchronization state.
  • Observational snapshots are intended for reproducible local or institutional analysis workflows.
  • Contextual and analytical layers may evolve independently from the original observational corpus.

About

Filesystem Contextualisation and Record Set Reconstruction

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages