Abstract

This section is non-normative.

The Allotrope Data Format Audit Trail and Electronic Signatures Specification [ADF-A] is a specification on how to use audit trails in the Allotrope Data Format [ADF] in a standardized way. The specification is both about the Allotrope Audit Trail Ontology and the Audit Trail API that is part of the [ADF] APIs.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current Allotrope publications and the latest revision of this technical report can be found in the Allotrope technical reports index at http://purl.allotrope.org/TR/.

This document is part of a set of specifications on the Allotrope Framework [AF].

This document was published by the Allotrope Foundation as a Working Draft. This document is intended to become an Allotrope Recommendation. If you wish to make comments regarding this document, please send them to more.info@allotrope.org. All comments are welcome.

Publication as a Working Draft does not imply endorsement by the Allotrope Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Table of Contents

1. Disclaimer

THESE MATERIALS ARE PROVIDED "AS IS" AND ALLOTROPE EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE WARRANTIES OF NON-INFRINGEMENT, TITLE, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

2. Document Conventions

2.1 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].

2.1.1 Namespaces

Within this specification, the following standard namespace prefix bindings are used:

Prefix Namespace
owl:http://www.w3.org/2002/07/owl#
rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs:http://www.w3.org/2000/01/rdf-schema#
xsd:http://www.w3.org/2001/XMLSchema#
dct:http://purl.org/dc/terms/
skos:http://www.w3.org/2004/02/skos/core#
foaf:http://xmlns.com/foaf/0.1/
org:http://www.w3.org/ns/org#>
prov:http://www.w3.org/ns/prov#
pav:http://purl.org/pav/
time:http://www.w3.org/2006/time#
ore:http://www.openarchives.org/ore/terms/
sh:http://www.w3.org/ns/shacl#
hdf:http://purl.allotrope.org/ontologies/hdf/1.8#
qb:http://purl.org/linked-data/cube#
af-c:http://purl.allotrope.org/ontologies/common#
af-cq:http://purl.allotrope.org/ontologies/common/qualifier#
af-m:http://purl.allotrope.org/ontologies/material#
af-e:http://purl.allotrope.org/ontologies/equipment#
af-p:http://purl.allotrope.org/ontologies/process#
af-r:http://purl.allotrope.org/ontologies/result#
adf-a:http://purl.allotrope.org/ontologies/audit#
adf-g:http://purl.allotrope.org/ontologies/graph#
adf-dc:http://purl.allotrope.org/ontologies/datacube#
adf-dp:http://purl.allotrope.org/ontologies/datapackage#
adf-dc-hdf:http://purl.allotrope.org/ontologies/datacube-hdf-map

Within the examples, the following namespace prefix bindings are used:

Prefix Namespace Description
ex:http://example.org/an example namespace
ex1:http://example.org/an example namespace
ex2:http://example.org/an example namespace

2.1.2 Identifiers

The Allotrope Foundation Taxonomies use identifiers (IRI) that are not human-readable. In order to make the examples more understandable, we use the following notation: we prefix the SKOS [skos-reference] preferred label of the concept with namespace prefix and surround it with French quotes (guillemets):

«{ns-prefix}:{pref label}», {ns-prefix} is the namespace prefix defined above and {pref label} is the preferred label of the concept defined in the taxonomy.

For example, the notation «af-x:device manufacturer» represents the real IRI af-x:AFX_0000333 of the device manufacturer property.

2.1.3 Diagrams

This document uses the Unified Modeling Language to illustrate some concepts and visualize RDF graphs. These diagrams are non-normative and should not be interpreted in the strict interpretation specified by the UML specification [UML].

Colors are used to make the domain of the entities more transparent. The color schema is illustrated in the following figure:

color scheme
Fig. 1 Color scheme

2.1.4 Number Formatting

Within this document, decimal numbers will use a dot "." as the decimal mark.

2.2 Requirements

The following sections describe the use cases for audit trail in the scope of Allotrope:

2.2.1 Audit Trail embedded in ADF

The Audit Trail ensures traceability of changes to the contents of the file, independent of IT application or system, making the file a standalone representation of the content and audit trail (user, timestamp, entity changed, old value, new value, reason) and standardizing this aspect of compliance across Allotrope-enabled software. Here, the responsibility of the audit trail is owned by the Allotrope class libraries. For each ADF API, we MUST track who did what, when, why and how? What? describes the changed values either directly or via references, who? describes the primary agent (person or organization) that made the change, when? describes the time interval the changes have been done, why? describes the motivation for the change and how? the software that applied the ADF API.

2.2.2 Audt Trail added to ADF

An Audit trail created by an external software application can be added to the ADF, wherein the responsibility of the audit trail is owned by the external software application. It must be possible for external applications to add audit trail information to the ADF in order to document the complete audit history in one location.

2.3 Specification

2.3.1 Key concepts

The following defines the important concepts of the audit trail and related ontologies:
audit trail
A chronological record of system activities that is sufficient to enable the reconstruction, reviews, and examination of the sequence of environments and activities surrounding or leading to each event in the path of a transaction from its inception to output of final results. [FDA-Glossary]
digital signature
Digital signature means an electronic signature based upon cryptographic methods of originator authentication, computed by using a set of rules and a set of parameters such that the identity of the signer and the integrity of the data can be verified. [FDA-21CFR11]
electronic record
Electronic record means any combination of text, graphics, data, audio, pictorial, or other information representation in digital form that is created, modified, maintained, archived, retrieved, or distributed by a computer system. [FDA-21CFR11]
electronic signature
Electronic signature means a computer data compilation of any symbol or series of symbols executed, adopted, or authorized by an individual to be the legally binding equivalent of the individual's handwritten signature. [FDA-21CFR11]
This specification does not cover digital signatures.

2.3.2 Ontologies and namespaces

The Allotrope audit trail and electronic signature concepts are all part of the Allotrope Audit ontology with the public IRI <http://purl.allotrope.org/ontologies/audit>. The ontology depends on the following external ontologies:

Allotrope Data Cube Ontology (ADF-DCO)
The Allotrope Data Cube ontology [ADF-DCO] is an OWL ontology based on the RDF Data Cube Vocabulary for describing the data cube part of the Allotrope data format [ADF].
Allotrope Data Package Ontology (ADF-DPO)
The Allotrope Data Package ontology [ADF-DPO] is an OWL ontology for describing the data package (file system like) part of the Allotrope data format [ADF].
Allotrope HDF Ontology (ADF-HDF)
The Allotrope HDF Ontology [ADF-HDF] is an OWL ontology for describing the HDF 1.8 data model. It follows closely the DDL for HDF5 [HDF5-DDL].
Friend of a Friend Ontology (FOAF)
Friend of a Friend Ontology [FOAF] provides the vocabulary for describing persons.
Object Reuse and Exchange (ORE)
Open Archives Initiative Object Reuse and Exchange [OAI-ORE] vocabulary defines standards for the description and exchange of aggregations of Web resources.
Organization Ontology (ORG)
The Organization Ontology [vocab-org] provides the vocabulary for describing organizational structures.
Provenance, Authoring and Versioning Ontology (PAV)
The Provenance, Authoring and Versioning Ontology [PAV] specializes the W3C Provenance Ontology [PROV-O] in order to describe authorship, curation and digital creation of resources.
Provenance Ontology (PROV-O)
The Provenance Ontolgy [PROV-O] defines the concepts for describing provenance. Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.
ADF Data Cube Vocabulary (QB)
The RDF Data Cube Vocabulary [vocab-data-cube] is an ontology for publishing multi-dimensional data. It is the foundation for the Allotrope Data Cube ontology.
Time Ontology (OWL-Time)
OWL-Time [owl-time] has been developed for describing the temporal content of Web pages, or the temporal properties of any resource denoted using a web identifier (URI), including real-world things if desired.
VoID vocabulary (VoID)
VoID [void] is an RDF Schema vocabulary for expressing metadata about RDF datasets. An audit record is such a dataset.

2.3.3 Audit Trail

An audit trail is a chronological record of system activities. A single record of a system activity is called an audit record dataset. It is repesented in ADF by a dataset, which is a single named graph (this graph may further define or reference other named graphs). The content of the audit record dataset varies depending on the kind of activity and the electronic record that is affected by the activity.

audit record dataset
An audit record dataset is a dataset that contains the metadata and data that is needed to record the changes to an electronic record in an activity.
In our case, the electronic record is the ADF file and the audit record dataset will contain a record of changes in the three parts of the ADF file,
  • the data description, represented by its RDF graph,
  • the data cubes, represented by one or more HDF datasets, and
  • the data package, represented by HDF groups and datasets.

An audit trail is an ordered aggregation (list) of audit record datasets. 1-n relations between instances in an RDF graph are not ordered, so an intermediate proxy object is needed that gives the audit record dataset an explicit order. We use the ORE model of aggregations:

aggregation
A set of related aggregated resources, grouped together such that the set can be treated as a single resource. [OAI-ORE]
aggregated resource
A resource which is included in an aggregation. [OAI-ORE]
proxy
A proxy represents an aggregated resource as it exists in a specific aggregation. All assertions made about an entity are globally true, not only within the context of the aggregation. As such, in order to make assertions which are only true of a resource as it exists in an aggregation, a proxy object is required. [OAI-ORE]
resource map
A description of an aggregation. [OAI-ORE]
audit trail overview
Fig. 2 Audit trail overview
The audit trail is an ORE resource map that describes the aggregation of audit record datasets. Because aggregations are unordered in RDF, the chronological order of the audit record datasets is represented by a proxy – the audit record. The audit record maintains the order of the previous resp. next item in the list.
audit trail overview
Fig. 3 Audit trail overview

2.3.4 Audit Record

The amount and kind of metadata and data for describing the changes vary with the type of activity that has been done on the ADF file. All of this is stored together in an RDF dataset consisting of a primary named graph (audit record graph) for the audit record and, optionally, other named graphs that describe additions and removals from the default RDF model under audit – the data description. The audit record graph contains copies of, or references to, the data at the time before the change and after, as well as other metadata about the change, such as who?, why?, when?, etc.

Note

Strictly speaking, the 'audit record' is only the proxy for the 'audit record dataset', s.a., but both terms are otherwise synonyms.

The audit trail ontology leverages the PROV-O ontology as a foundation for the metadata. The PROV-O core model is about three classes:

entity
An entity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. [PROV-O]
agent
An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. [PROV-O]
activity
An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities. [PROV-O]
prov-o core classes
Fig. 4 PROV-O core classes

An entity – in our case, the ADF file – was generated by some activity (e.g. a measurement); later, new versions are derived from previous ones. The activity is associated with some agent that can be a person, an organization, or a software that attributed to the entity. Other activities may use the entity. The audit trail of the ADF file is currently only concerned with the derivation of the ADF file; the initial generation and possible invalidation of the ADF file is typically not traced by an audit trail, but it is possible to do so.

The Provenance Ontology distinguishes between three kind of derivations:

revision
A revision is a derivation for which the resulting entity is a revised version of some original. [PROV-O]
quotation
A quotation is the repeat of (some or all of) an entity, such as text or image, by someone who may or may not be its original author. [PROV-O]
primary source
A primary source for a topic refers to something produced by some agent with direct experience and knowledge about the topic, at the time of the topic's study, without benefit from hindsight. [PROV-O]

Only the revision is used in the audit trail, so that we have 3 types of audit record datasets:

  • a revision record dataset, for the ADF changes
  • a generation record dataset, for the ADF generation (not supported by the API, see above)
  • an invalidation record dataset, for the ADF invalidation (not supported by the API, see above)
audit record dataset
Fig. 5 Audit record dataset
The attribution by an agent is further qualified by a qualified association class attribution that also relates to the role the agent has in the revision. This is important if more than one agent attributes to the change, e.g., a software agent and an operator in a measurement, or two persons that both need to sign some decision.

2.3.5 The revision dataset

This section describes the content of the revision dataset for an ADF file in detail.

revision record dataset
Fig. 6 Revision record dataset
A revision creates a new version of an entity. In our case, the entity is an electronic record, the ADF file. Each transaction generates a new version IRI for the electronic record – see versioning. Within one audit record, there MUST be only one revision, but that revision may contain multiple change sets. The new version of the ADF file is generated by a single activity and uses the existing version. The new version is the revision of the old version. The activity metadata includes the time and optionally the physical location. The start and end time can be stated by explicit with a date/time literal, or some other entity can be stated that acts as a trigger to start resp. end the activity.

Example 1

		# version 1 is revision of version 0
		<adf://self/version/1> prov:wasRevisionOf <adf://self/version/0> .
		# the activity generated v1 and used v0, Dublin Core title and description can be used to describe the activity
		<adf://audit/auditrecord/1/activity> a prov:Activity;
			dct:title "Revision 1";
			dct:description "A new version of the ADF file is created that ...";
			prov:generated <adf://self/version/1> ;
			prov:used <adf://self/version/0>
			prov:startedAtTime "2016-11-30T16:45:00Z"^^xsd:dateTime ;
			prov:endedAtTime "2016-11-30T17:15:00Z"^^xsd:dateTime ;
			prov:startedBy <http://example.org/process/...> ; # example of an external trigger reference
			prov:atLocation <http://example.org/site1/room101> .
	
Note
When using the embedded audit trail API, the IRIs for the metadata objects such as the activity are generated in a structured way from a unique transaction id. In the example here, a simpler scheme is used for better readability.
Note
The full expressivity of the audit trail metadata model is not supported when the embedded audit trail API is used. This uses a simplified audit trail model in order to keep the API easy to use for the typical use cases. The limitations are described in the section on the audit trail API. Future versions of the embedded audit trail API may allow to use of the full audit record metadata model. Audit trails that are defined externally are not limited in this way.

The revision has been done by an agent; usually a person, but it could be also a software system. The agent is assocated with the revised entity via the attribution that also captures an optional role. There are several standard role classes defined in the audit trail vocabulary, and they SHOULD be used if applicable, but any role defined in the standard Allotrope ontologies [AFO] can be used as well. What metadata must be stated about the agent to be compliant with regulatory and privacy protection rule is out of scope of this specification, but any detailed information MUST use the FOAF and ORG vocabularies where applicable. The PROV-O vocabulary also allows to state a delegation. The ORG ontology allows a fine granular description of the organizational structure and any posts and roles therein.

Example 2

		<adf://audit/auditrecord/1/attribution> a prov:Attribution;
		    prov:hadRole adf-a:Approver;
			prov:agent  <mailto:jerry@example.org> ;
			prov:actedOnBehalfOf <mailto:tom@example.org> .

		<mailto:jerry@example.org> a prov:Person, foaf:Person;
			foaf:mbox <mailto:jerry@example.org> ;
			foaf:familyName "Mouse" ;
			foaf:givenName "Jerry" ;
			org:memberOf <http://example.org/unit1>  .

		<mailto:tom@example.org> a prov:Person, foaf:Person;
			foaf:mbox <mailto:tom@example.org> ;
			foaf:familyName "Cat" ;
			foaf:givenName "Tom" ;
			org:memberOf <http://example.org/unit1> ;
			org:holds <http://example.org/unit1/qa> .

		<http://example.org/unit/qa> a org:Post ;
			dct:title "Quality Assurance Unit 1" .

		<http://example.org/unit1> a org:OrganizationalUnit ;
			dct:title "Unit 1" ;
			org:post <http://example.org/unit1/qa> ;
			org.unitOf <http://example.org> .

		<http://example.org> a org:FormalOrganization ;
			dct:title "Example Organization" .
	
Note
If the agent is already fully defined in the data description of the ADF file, it is not necessary to repeat this information in the audit record; a reference by IRI is sufficient. Any change in the data description is subject to being audit trailed itself, so the agent metadata will be kept in the ADF file.

The details of changes to the electronic record done in the revision and their motivations are tracked in changesets. An electronic record is often composed of parts that can be added to, removed from, or updated in the electronic record. These parts are often composed of other parts, and so on. An ADF file is composed of the data description, the data cube and the data package parts. A changeset can be applied to the whole electronic record under revision, or only to one of these parts – the subject of change. In many cases, the change can be described by the set of removed parts, combined with the set of added parts. The new version of the electronic record is created from the original one and the changeset by first removing all the parts from the original electronic record listed in the removal set of the changeset, followed by adding the parts in the addition set. If the change cannot be decomposed into separate parts of the electronic record, but affects data segments of the electronic record that have no identity of their own, then a data update is needed in the change set. The data update then either lists the old and new data, or it describes the position of the old and new data. For example, a file in the data package cannot be further decomposed into smaller parts. So if the file content - a binary stream of data - is changed, either the whole file must be exchanged, or the data update must be described in detail. For files, the changed data can be described by its size and position within the binary stream. For data updates in data cubes, the parts that have been changed within the cube are described by data selections (slabs or point selections).

changeset
A changeset captures what was added, deleted and updated in an electronic record, and the motivation why it was done.
data update
A data update is a description of individual data changes within an electronic record that cannot be otherwise described by a remove / add combination.
changeset
Fig. 7 Changeset

An ADF file is composed of three major parts

  • the data description – locally referenced by adf://dd
  • the data cubes – locally referenced by adf://dc
  • the data package – locally referenced by adf://dp

2.3.5.1 Data Description Changes

A change in the data description is always described by a set of removed statements and a set of added statements. Assuming a resource ex:example has its title changed from "old title" to "new title", then within the data description the old statement


	  ex:example dct:title "old title" .
	  
will be replaced with the new statement

	  ex:example dct:title "new title" .
	  

In order to describe this change with a changeset, two named graphs are created: one that contains all removed statements (<removal-graph>), and another containing the added statements (<addition-graph>). The changed data description is created from the original one by the following set operation:


	  <dd-new> = (<dd-old> − <removal-graph>) ∪ <addition-graph>
	  
=
This set operation can be applied in reverse to recreate the old data description from the current one.

<dd-old> = (<dd-new> − <addition-graph>) ∪ <removal-graph>

2.3.5.1.1 Handling of blank nodes in audit record

In general the above method will work only if the resources are not anonymous. Anonymous resources (blank nodes) only have a local identity within a graph, only a URI gives a resource a public identity. Typically serializations of the RDF graph like n3 generate new blank node ids on each output, like _.b1. The consequences for the audit trail are best explained in an example.

What happens if the following statements are added to the data description in Turtle notation?


ex:example1 ex:hasTopic [ dct:title "about As"] .
ex:example2 ex:hasTopic [ dct:title "about As"] .
Internally, this will be represented in triple form:

ex:example1 ex:hasTopic _:b1;
_:b1 dct:title "about As";
ex:example2 ex:hasTopic _:b2;
_:b2 dct:title "about As";
with b1 and b2 being representations of the internal identifiers for the two blank nodes. Both graphs are fully isomorphical to each other and so would be

ex:example1 ex:hasTopic _:b10;
_:b10 dct:title "about As";
ex:example2 ex:hasTopic _:b20;
_:b20 dct:title "about As";
Now, if we change dct:title of the ex:example2 to [dct:title "about Bs"], we get a removal graph

<removal-graph> {
    _:b2 dct:title "about As" .
}
and the addition graph is

<addition-graph> {
    _:b2 dct:title "about Bs" .
}
However, _:b2 is only known in the original data description graph. The _:b2 in the removal resp. addition graph can have any other internal blank node identifier, for example _:b1

<removal-graph> {
    _:b1 dct:title "about As" .
}
In any case, the information in the audit record named graphs about the original blank node identifiers is lost. The equivalent turtle representation

<removal-graph> {
    [ dct:title "about As" ] .
}
better reflects this. In ADF, we ensure that the internal identifiers of blank nodes are kept the same as in the original data description when the audit trail addition/removal graphs are created, so the change location can be identified. However, any external representation of the audit record will not have this information available. A fully repeatable audit trail must make the internal identifiers of all blank nodes explicit to an external representation of the graph. External changes must keep track of the naming (skolemization) when making changes:

ex:example1 ex:hasTopic _:b10;
_:b10 dct:title "about As";
ex:example2 ex:hasTopic _:b20;
_:b20 dct:title "about As";
# following is explicit blank node labeling
_:b10 adf-a:blankNodeId 'a8735c8d-cb51-4a76-813e-da85abb23827' . # giving the blank node _:b10 an external unique id
_:b20 adf-a:blankNodeId '4cd2d444-8333-42d2-8bd1-02b8f006a246' . # giving the blank node _:b20 an external unique id
 
In this case, the portable removal graph would look as follows (note the different blank node URI _:b1 instead of _:b10):

<removal-graph> {
    _:b1 dct:title "about As" .
    _:b1 adf-a:blankNodeId 'a8735c8d-cb51-4a76-813e-da85abb23827'
}
The labeling statements are metadata and not a real part of the graph's content and could be put into a separate named graph. However, in this case, they can get easily lost.
Note
In the ADF 1.3 version, the blank node labeling (skolemization) using adf-a:blankNodeId is not made explicit.
Note
In the explicit labeling of blank nodes is one way of solving the graph isomorphism problem, which also occurs if a reproducible canonical representations of the graph, e.g. for building digital signatures is required. The explicit labeling, however, must be kept and maintained by each intermediate processing, which is difficult to ensure in general. In many use cases, no explicit labeling is really necessary. Blank nodes get their identity by the immedidate neighborhood in the graph and the graph is fully known. In the example, the ex:example1 ex:hasTopic _:b10 statement provides the identity for the blank node _:b10. This works well except in some obscure, artifical border cases.
2.3.5.1.2 Named graphs in the data description

Since the data description in the newer ADF files can also make use of named graphs, the addition, removal and update of named graphs needs to be tracked in the audit trail as well. Because of the multiple named graphs, we are no longer adding or removing statements (triples), but effectively quadruples, that beside the subject, predicate and object also contain the owning graph. The audit trail model, however does not describe changes in the form of adding and removing quadruples to the data description directly, but instead it tracks addition and removal of graphs to the data description and the updates of them. In this model, the addition resp. removal of a statement (triple) is a DataUpdate of a named graph. This is similar to how updates on data cubes or data package nodes are handled. The target of the data update is the named graph, and the set of added statements and removed statements is used to describe the old and new dataset.

data description changeset
Fig. 8 data description changeset
Example 3

<changeset-dd> a adf-a:ChangeSet;
    adf-a:addition <added-named-graph>;
    adf-a:removal  <removed-named-graph>;
    adf-a:update  <named-graph-update>;
    adf-a:subjectOfChange <adf://dd> .

<added-named-graph> a void:Dataset, adf-g:Graph .
<removed-named-graph> a void:Dataset, adf-g:Graph .

<named-graph-update> a adf-a:DataUpdate;
    adf-a:target <updated-named-graph>;  # update on named graph
    adf-a:newData <added-to-named-graph>; # new data = added triples stored in a named graph
    adf-a:oldData <removed-from-named-graph> . # old data = removed triples stored in a named graph

<added-to-named-graph> a void:Dataset, adf-g:Graph .
<removed-from-named-graph> a void:Dataset, adf-g:Graph .

# named graph with the added statements
<added-to-named-graph> {
		 ex:example dct:title "new title";
}

#named graph with the removed statements
<removed-from-named-graph> {
		 ex:example dct:title "old title";
}

2.3.5.2 Data Cube Changes

A changeset in the data cube part of the ADF file contains the added and removed data cubes, and any data updates in the cube.

Note
In this version of the API, audit trails on data cubes have the following limitation: The DataSelection that describes what has been updated in a data cube is stored in the audit record, but the old cube data is not archived on the logical data cube level. This is done in the lower level HDF structures (HDF datasets and dictionaries). This means that all data is saved in the audit trail and can potentially be recovered, but there is no direct API support for reading the old data.
See section HDF changes
datacube changeset
Fig. 9 datacube changeset
archived datacube changed datacube
Fig. 10 archived and changed datacube
Example 4

	<changeset-dc> a adf-a:ChangeSet;
		adf-a:addition <adf://dc/added-cube>;
		adf-a:removal  <adf://dc/removed-cube>;
		adf-a:update   <adf://dc/updated-cube-data>;
		adf-a:subjectOfChange <adf://dc> .

	<adf://dc/added-cube> a qb:DataSet ;
		qb:structure ...

	# the removed data cube gets a link to the backup archive (not implemented)
	<adf://dc/removed-cube> a qb:DataSet ;
        qb:structure ...
		adf-a:archivedTo <adf://dc/archive/removed-cube> .

	# the cube will be moved in an archive section of the ADF file (not implemented)
	<adf://audit/changes/dc/removed-cube> a qb:DataSet ;
        qb:structure ...
		adf-a:isArchiveOf <adf://dc/removed-cube> .


	<adf://dc/updated-cube> a qb:DataSet ;
        qb:structure <update-cube-structure> ;

	# the archived data cube containing the overridden data will be put in an archive section (not implemented)
	<adf://audit/changes/dc/update-cube> a qb:DataSet ;
        qb:structure <archive-cube-structure> ;

	<updated-cube-data> a adf-a:DataUpdate ;
		adf-a:target           <updated-cube> ;
        adf-a:oldDataReference <selection-old> ;
        adf-a:newDataReference <selection-new> .

	<selection-old> a adf-dc:DataSelection ;
		adf-dc:selectionOn  <archived-cube> ;
		...

	<selection-new> a adf-dc:DataSelection ;
		adf-dc:selectionOn  <updated-cube> ;
		...

	
Appended data does not need to be stored as an archive, but can be referenced directly on the current version of the cube; however, any data that is deleted or overwritten must be stored in an archive data cube. A data selection creates a stream of observation data in the order of the cube dimensions. Thus, it is not necessary for the archive data cube to have the same structure as the updated/deleted data cube. A single-dimension data cube is sufficient to store the changed data. It is even possible and efficient to use a single archive data cube to store multiple data updates:
archiving data
Fig. 11 Archiving data of two update steps

2.3.6 Data Package Changes

A changeset in the data package part of the ADF file contains the added and removed files and folders, and any data updates in the files.
Note
This version of the API only supports a limited audit trail on data package. Addition and removal of the files and folders is tracked, but removed files are not archived. Data updates in the files are limited to appends. This means that no existing data can be changed. Also, currently the the actual data change is not tracked on the logical level of the data package, but only on the underlying layer of the HDF dataset.
See section HDF changes
The change tracking is similar to the data cube change tracking. Instead of data selections, the updated data is referenced using segments. Segments are parts of a binary stream of data with a known starting position and a length.
data package changeset
Fig. 12 Data package changeset

2.3.7 HDF changes

Data cubes and data package nodes are implemented in ADF as HDF datasets and groups. Folders are implemented as HDF groups, files as HDF datasets, and data cubes are implemented as a set of HDF datasets within an HDF group. If audit trail is activated on an ADF file, changes to these implementation artifacts will also fall under the audit trail. The changesets of the HDF artifacts are again similar to those of the data cubes. A changeset is the addition, removal, and data update of HDF datasets resp. HDF groups. The data update references into the HDF datasets is described using HDF hyperslabs.
HDF changeset
Fig. 13 HDF changeset
Note
Only removed datasets are archived by moving them into an archive group. Groups and named types can be fully restored using metadata only.

2.4 Electronic Signatures

The ADF audit trail ontology provides vocabulary to store electronic signatures in the data description.
electronic signature
Fig. 14 Electronic signature
An electronic signature signs some resource. The properties for the signature are
signed on
time of the signature
motivated by
motivation as generic text
had motivation
a relation to a detailed motivation. There are two kind of motivations: purpose and reason. Both are not further specified and act only as a plug-in point for an Allotrope domain ontology.
signed by
the person who signed; this agent must be further identified using the FOAF and ORG vocabularies, but the minimum shape for a person is not part of this specification
signed with
the software that has been used to create the electronic signature. The minimum shape of the software agent is not part of this specification

2.5 Versioning

This audit trail specification currently defines no policy on what the version IRI for the ADF file should look like; it is simply a reference point for tracking the state. However, the audit trail API uses the org.allotrope.audit.service.VersioningService service for generating version IRIs. Its implementation applies the following canonical versioning scheme, following a RESTful, linked data approach:

Any resource IRI becomes a versioned resource IRI by appending /version/ to it, followed by either a version string (if defined), or a timestamp based version string if not. The timestamp based version string is the ISO 8601 date-time format string in UTC of the time the version is created.

A version IRI for an resource created on November 30th, 2016 at 11:45 a.m. UTC using the timestamp based versioning would be http://example.org/resource/version/2016-11-30T11:45:00. http://example.org/resource is base URL of the resource.

The current versioning service implementation uses a version string that is an incremented integer. The provenance and the version strings are described using the PAV vocabulary.

The initial version of the ADF file has the (locally resolvable) IRI adf://self/version/0. adf://self/ is the local ADF URL of the ADF file, which is an alias its the public UUID URN. The version metadata is

Example 5

		<adf://self> pav:hasVersion <adf://self/version/0> ;
		             pav:currentVersion <adf://self/version/0> .

		<adf://self/version/0> pav:hasVersion "0" .
	
The next version IRI becomes adf://self/version/1. Its version metadata is
Example 6

		<adf://self> pav:hasVersion <adf://self/version/0>, <adf://self/version/1> ;
		             pav:currentVersion <adf://self/version/0> .

		<adf://self/version/1> pav:hasVersion "1" ;
		                       pav:previousVersion <adf://self/version/0> .
	

3. Additional specifications

This section specifies additional conventions on ADF files that are not audit trail specific and can also be applied outside of an audit trail context.

3.1 Locally resolved ADF URLs

In most cases, resources within an ADF file get a UUID URN. UUID URNs have the advantage that an algorithm for generating UUIDs is specified that can be executed without access to any central authority that ensures the uniqueness of the URN. However, UUIDs are not an ideal solution for IRIs if we want to follow the linked data approach that IRIs should be resolvable. UUIDs are not resolvable, so if one wants to access a resource such as a file in the data package by its UUID URN, a mapping of UUIDs to the files in the data package must be kept somewhere. A much more natural URI for the file would be the path in the data package, which is unique (within the ADF file) and allows to find it directly without any additional mapping, simply by traversing the path to the location of the file. This is how file:// and http:// URLs work. The linked data approach recommends to use http-URLs as identifiers, but we cannot follow that approach in an ADF file because there is no authoritative http-Server for it that could resolve a http-URL addressing something within a ADF file.

To make resources in an ADF file resolvable, we define a locally scoped URL custom scheme adf:// that can be resolved only within an ADF file. This URL is a local alias for a resource that is only valid in the context of the ADF file. To make it globally unique, that local URL must be transformed into a public URI, either by giving it an explicit public URI like a UUID URN or a public http:-URL, or the URL must be a combination of the public URL of the ADF file and the locally resolved adf-URL.

Note

Accessing parts of a HTML or XML document with a file or http-URL can be done through anchor fragments, or using the XPointer specification [xptr-xpointer],[RFC3986]. A similar approach can be used to make ADF URLs public:
For example, take http://example.org/example.adf#adf://dd. The local URL would be the fragment accessor into the data description part of the ADF file, accessible under http://example.org/example.adf.

The following ADF URLs are defined and reserved:

adf://selfA self reference that is an alias for the ADF file (the container) itself.
adf://ddThe URL of the data description. This URL is a named graph IRI for the triples stored in the data description.
adf://dcThis URL is the base URL for all data cubes defined in the ADF file.
adf://dpThis URL is the base URL for the files and folders stored in the data package of the ADF file. Files and folders in the data package have local ADF URLs by appending their path to this base URL.
adf://auditThis URL is the base URL for audit trail URLs, such as the audit trail of the ADF file, or any audit records that are part of it.

Example 7
adf://dp/folder1/folder2/file1 is an URL for a file with the path folder1/folder2 and the name file1 in the data package of the ADF file.

Using these locally resolved ADF URLs, the resulting RDF triples are much easier to read and, if RESTful principles of URL design are followed, they can in many places be self-documenting. UUID URNs hide any details of the type of resource, and are very difficult to validate in larger graphs, so use of local ADF URLs is recommended.

Note

Care has to be taken when resource IRIs are exported or imported into an ADF file. An ADF URL MUST never be used outside of an ADF file. If a resource also has a public IRI, an explicit equivalence mapping using owl:sameAs between public and local IRI SHOULD be added.

To express that an ADF file consists of a data description, a data cube, and data package part, we can use local ADF URLs.
Example 8

			<adf://self> dct:hasPart <adf://dp>, <adf://dd>, <adf://dp> .
		

3.2 Locally resolved HDF URLs

Similar to the logical resources in a ADF file, also the underlying implementations (representations) in HDF can be locally resolved. For accessing HDF resources, we use the custom URL scheme hdf://. These HDF URLs can be used to address all HDF named objects, which include HDF groups, datasets and named datatypes. A HDF named object is identified by its path.

Example 9
hdf:// is the root group of the HDF file.
hdf://a/b/c is a named object in the nested group /a/b. This URL does not tell whether c is a group, dataset or datatype.

Note

An extension of this URL scheme for typed HDF URLs that also indicates the type of the resource could be to add a query [RFC3986] with a type parameter, for example ?@type=${typeIRI}, to the URL. The parameter name @type follows the JSON-LD convention.
<hdf://a/b/c?@type=hdf:Dataset> would be the typed URL for a HDF dataset with the HDF path /a/b/c.

4. Change History

Version Release Date Remarks
0.1.0 2016-12-07 Initial Working Draft Version
0.2.0 2017-03-31 ADF 1.3 extensions to audit trail
1.3.0 RF 2017-06-30
  • Updated version and date
  • ADF 1.3 extensions to audit trail
  • adaptations to new business model
  • minor edits
  • A. References

    A.1 Normative references

    [ADF-DCO]
    Allotrope Foundation. ADF Data Cube Ontology. URL: http://purl.allotrope.org/TR/adf-dco/ADF Data Cube Ontology.html
    [ADF-DPO]
    Allotrope Foundation. ADF Data Package Ontology. URL: http://purl.allotrope.org/TR/adf-dpo/ADF Data Package Ontology.html
    [ADF-HDF]
    Allotrope Foundation. The Allotrope HDF5 ontology.
    [AF]
    Allotrope. Allotrope Framework Overview. URL: http://purl.allotrope.org/TR/af/
    [FOAF]
    Dan Brickley; Libby Miller. FOAF project. FOAF Vocabulary Specification 0.99 (Paddington Edition). 14 January 2014. URL: http://xmlns.com/foaf/spec
    [PROV-O]
    Timothy Lebo; Satya Sahoo; Deborah McGuinness. W3C. PROV-O: The PROV Ontology. 30 April 2013. W3C Recommendation. URL: https://www.w3.org/TR/prov-o/
    [RFC2119]
    S. Bradner. IETF. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
    [vocab-data-cube]
    Richard Cyganiak; Dave Reynolds. W3C. The RDF Data Cube Vocabulary. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/vocab-data-cube/
    [vocab-org]
    Dave Reynolds. W3C. The Organization Ontology. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/vocab-org/

    A.2 Informative references

    [ADF]
    Allotrope Foundation. Allotrope Data Format Overview. URL: http://purl.allotrope.org/TR/adf/
    [AFO]
    Allotrope Foundation. Allotrope Foundation Taxonomies. URL: http://purl.allotrope.org/TR/afo/
    [FDA-21CFR11]
    FDA. Code of Federal Regulations Title 21: Food and Drugs PART 11 — ELECTRONIC RECORDS; ELECTRONIC SIGNATURES.
    [FDA-Glossary]
    FDA. Glossary of Computer System Software Development Terminology. URL: http://www.fda.gov/iceci/inspections/inspectionguides/ucm074875.htm
    [HDF5-DDL]
    The HDF Group. DDL in BNF for HDF5. URL: https://support.hdfgroup.org/HDF5/doc/ddl.html
    [OAI-ORE]
    Open Archives Initiative. Open Archives Initiative Object Reuse and Exchange. URL: https://www.openarchives.org/ore/1.0/vocabulary
    [PAV]
    Paolo Ciccarese; Stian Soiland-Reyes; Khalid Belhajjame; Alasdair JG Gray; Carole Goble; Tim Clark. Journal of biomedical semantics. PAV ontology: provenance, authoring and versioning. 2013. URL: http://pav-ontology.github.io/pav/
    [RFC3986]
    T. Berners-Lee; R. Fielding; L. Masinter. IETF. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
    [UML]
    Object Management Group. The Unified Modeling Language (UML). URL: http://www.omg.org/spec/UML/2.5/PDF
    [owl-time]
    Simon Cox, Ed.; Chris Little, Ed.. W3C. Time Ontology in OWL. URL: https://www.w3.org/TR/owl-time/
    [skos-reference]
    Alistair Miles; Sean Bechhofer. W3C. SKOS Simple Knowledge Organization System Reference. 18 August 2009. W3C Recommendation. URL: https://www.w3.org/TR/skos-reference
    [void]
    Keith Alexander; Richard Cyganiak; Michael Hausenblas; Jun Zhao. W3C. Describing Linked Datasets with the VoID Vocabulary. 3 March 2011. W3C Note. URL: https://www.w3.org/TR/void/
    [xptr-xpointer]
    Steven DeRose; Eve Maler; Ron Daniel. W3C. XPointer xpointer() Scheme. 19 December 2002. W3C Working Draft. URL: https://www.w3.org/TR/xptr-xpointer/