Skip to content

Guardian analytics: bottom-up data traceability #3336

@anvabr

Description

@anvabr

Problem description

Participants in the Guardian ecosystem, especially data originators such as project developers or emissions reporters, need to be able to easily identify all occurrences of usage/referencing - in the broad sense of the word, and with unlimited depth - of specific 'original' data across the entire ecosystem. Such data could be an arbitrary value of interest from an MRV report, or a new data point originated by the specific entity other than Standard Registry - such as Project Developer. The actual data points are packaged in VC/VP documents, while new data points are defined as (mini-policies with restricted functionality? - TBD) containing mathematical expressions and published in IPFS (with the corresponding 'indexing records' in Hedera).

The occurrences that need to be traceable are not always simple immediate 'first order' usages/references of the data. Guardian data can undergo transformations in their substance, for example using mathematical expressions or sequences thereof, or in their format - packaged into different documents, which are then merged/split, cross-referenced, etc. These types of links need to be recognised and displayed by the system. Users need to be able see the entire reference graph for a specified data point, and navigate unlimited levels of dependencies with the possibility of examining the 'nodes' in the graph - i.e. the documents and relevant policy elements/schemas which guided the transformations of (or references to) the 'original' data.

Additionally, Project Developers and other designated Policy Users should be able to define new data points which are derivative of some other existing data points (which may be the original data at the lowest level of this hierarchy). This definition uniquely identifies the 'source' datapoints and deterministically defines how to calculated the new datapoint from these.

Requirements

  1. Design and implement specialised analytics engine which would enable Guardian to identify, trace and display mathematical relations between data in different artifacts (VCs/VPs/tokens) including events (transactions/messages) on Hedera hashgraph, with unlimited traceability depth.
  2. Intelligent 'understanding' of the nature of the transformations (e.g. in formulas in calculation blocks) is out of scope of this ticket, the analytics engine can view transformations as black boxes. If the 'original' data are used as 'input' into such a black box, for the purposes of this analytics reporting it can be assumed that the 'output' data depends on that 'original' data.
  3. The system should correctly identify and display references to the 'original' data such as when VC document fields reference document fields in other VCs.
  4. Users should be able to perform complex data searches with the scope limited to the dependencies graph.
  5. Guardian UI should enable Project Developers to define (and name) new datapoints and publish these definitions in a standardised manner linking them both to the originator's author DID, project ID and policy ID.

Definition of done

  • Analytics functionality is implemented as per requirements above.
  • Analytics module is optional for deployment, and can be operated independently.
  • Guardian UI for defining and publishing new datapoints is available.
  • Documentation is available.

Acceptance criteria

The functionality fulfilling the requirements above is useable by non-technical users, and can be operated at the scale of the entire Guardian ecosystem.

Metadata

Metadata

Labels

EpicNext PhaseWill be worked in Next Phase

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions