Add OSS NA '25 demo to examples (#44)

marcelamelara · sandlbn · web-flow · commit bf47044f9d77 · 2025-08-04T08:45:42.000-07:00
* Add OSS NA '25 demo

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;

* Add support for MockAttestations

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;

* Disable trivy scan

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;

* Debug mock attestation tests

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;

* Add demo README

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;

* Remove duplicate example files

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;

* Apply suggestions from code review

Co-authored-by: Marcin Spoczynski &lt;marcin@spoczynski.com&gt;

---------

Signed-off-by: Marcela Melara &lt;marcela.melara@intel.com&gt;
Co-authored-by: Marcin Spoczynski &lt;marcin@spoczynski.com&gt;
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,4 @@
 /target
-*.*~
+*.*~
+*.parquet
+*.pem
diff --git a/docs/EXAMPLES.md b/docs/EXAMPLES.md
@@ -2,6 +2,11 @@
 
 This document provides examples and workflow patterns for using the Atlas CLI tool.
 
+## End-to-End Examples
+
+We provide a number of end-to-end Atlas workflow examples in the
+[/examples](../examples) directory,
+
 ## Basic Usage Examples
 
 ### Creating a Single Model Manifest
@@ -336,4 +341,4 @@ make example-full-workflow
 
 # Run example for filesystem storage
 make example-filesystem-storage
-```
+```
diff --git a/examples/oss-na-25-demo/README.md b/examples/oss-na-25-demo/README.md
@@ -0,0 +1,80 @@
+# Open Source Summit North America (OSS NA) '25 Provenance Demo
+
+## Introduction
+
+This example demonstrates how to generate and verify provenance data for a
+two-stage machine learning lifecycle using the Atlas CLI tool. Although the demo
+contains scripts for training and evaluation pipelines, the demo does not run
+them but still tracks these software components as part of the lifecycle from
+dataset download to evaluation, showing how to establish an end-to-end audit
+trail of all artifacts and their relationships.
+
+To show the added integrity properties gained from running the Atlas CLI inside
+of a confidential computing environment such as Intel TDX, the demo will
+collect a hardware-based platform attestation and include it in each artifact
+manifest generated by the CLI. When run on platforms that do not support Intel
+TDX, the demo will still generate a mock hardware attestation.
+
+This example creates C2PA-compliant manifests for:
+- Datasets (raw MNIST training and test datasets downloaded from HuggingFace)
+- Software components (training and evaluation scripts)
+- Models (dummy trained classifier model)
+- Evaluation results (dummy results)
+
+All components are linked to their direct parents during creation to form a
+complete provenance graph that can be exported and audited.
+
+For a more comprehensive example that does run the training and evaluation
+pipelines, see the
+[MNIST training provenance collection example](../mnist/README.md).
+
+## Prerequisites
+
+### System Requirements
+- Rust toolchain (1.85 or above)
+- Docker and Docker-compose
+
+### Setting up Atlas CLI
+
+Ensure Atlas CLI is built and available in your PATH:
+
+```bash
+# Build Atlas CLI (from the root directory)
+cargo build --release
+# Add to PATH or use full path
+export PATH=$PATH:./target/release
+```
+
+### Setting up the Database Backend
+
+Start the database backend (if not already running):
+
+```bash
+# Start the database service
+cd storage_service && docker-compose build && docker-compose up -d && cd ..
+```
+
+## Running the Example
+
+You can launch the demo using the provided bash script, and progress through the
+demo by pressing any key to proceed at each step:
+
+``` bash
+./collect_mnist_provenance.sh
+```
+
+The demo script also displays the generated and linked manifests after select
+operations.
+
+### Demo Steps
+
+1. Generate provenance signing key pair (deleted at the end of the demo).
+2. Download the MNIST Dataset (training and test data) from HuggingFace.
+3. Generate and link the C2PA manifests for the training script, training data,
+   and dummy model. Each manifest includes a (mock) hardware-based attestation.
+4. Generate and link the C2PA manifests for the evaluation script, dummy model,
+   test data, and evaluation results. Each manifest includes a (mock)
+   hardware-based attestation.
+5. Export the collected provenance graph.
+6. Verify the provenance graph: For each manifest, the manifest format, hashes
+   for every tracked artifact, and expected manifest links are validated.
diff --git a/examples/oss-na-25-demo/collect_mnist_provenance.sh b/examples/oss-na-25-demo/collect_mnist_provenance.sh
@@ -0,0 +1,230 @@
+#!/bin/bash
+# MNIST Demo Provenance Collection Script
+# This script runs a demo MNIST workflow (no data prep or training) and collects
+# provenance data
+
+# Configuration
+STORAGE_URL="http://localhost:8080"
+
+# Helper function to extract ID from output
+extract_id() {
+    grep -o "ID: [^ ]*" "$1" | cut -d' ' -f2
+}
+
+TRAIN_DATASET="train-00000-of-00001.parquet"
+TEST_DATASET="test-00000-of-00001.parquet"
+
+if [ ! -e "$TRAIN_DATASET" ]; then
+    echo "Warning: Training datset not found. Downloading..."
+    wget -q https://huggingface.co/datasets/ylecun/mnist/resolve/main/mnist/$TRAIN_DATASET
+fi
+
+if [ ! -e "$TEST_DATASET" ]; then
+    echo "Warning: Test datset not found. Downloading..."
+    wget -q https://huggingface.co/datasets/ylecun/mnist/resolve/main/mnist/$TEST_DATASET
+fi
+
+echo -e "=== STEP 0: Setup Provenance Signing/Verification Key Pair ==="
+openssl genpkey -algorithm RSA -out private.pem -pkeyopt rsa_keygen_bits:4096 2>/dev/null
+openssl rsa -pubout -in private.pem -out public.pem 2>/dev/null
+
+read -s -r -p "Press any key to continue"
+
+echo -e "\n=== STEP 1: Generate Provenance for MNIST Training Data ==="
+
+read -s -r -p "Create training dataset manifest..."
+atlas-cli dataset create \
+    --paths="$TRAIN_DATASET" \
+    --ingredient-names="MNIST Training Dataset" \
+    --name="MNIST Training Data" \
+    --author-org="https://huggingface.co/datasets/ylecun/mnist/tree/main/mnist/blob/main/mnist/$TRAIN_DATASET" \
+    --author-name="ylecun" \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    --key=private.pem \
+    > train_dataset_output.txt
+TRAIN_DATASET_ID=$(extract_id train_dataset_output.txt)
+echo "Dataset ID: $TRAIN_DATASET_ID"
+
+read -s -r -p "Display training data manifest"
+atlas-cli manifest export \
+	  --id=$TRAIN_DATASET_ID \
+	  --format=json \
+	  | jq '.'
+
+read -s -r -p "Press any key to continue"
+
+echo -e "\n=== STEP 2: Generate Provenance for Model Training Artifacts ==="
+
+read -s -r -p "Create training script manifest..."
+atlas-cli software create \
+    --paths=../mnist/train.py \
+    --ingredient-names="MNIST Training Script" \
+    --name="MNIST CNN Training Implementation" \
+    --software-type="script" \
+    --version="1.0.0" \
+    --author-org="Your Organization" \
+    --author-name="Your Name" \
+    --description="PyTorch training script for MNIST CNN model" \
+    --with-tdx \
+    --key=private.pem \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    > training_script_output.txt
+TRAINING_SCRIPT_ID=$(extract_id training_script_output.txt)
+echo "Training Script ID: $TRAINING_SCRIPT_ID"
+
+touch classifier.onnx
+read -s -r -p "Create model manifest..."
+atlas-cli model create \
+    --paths=classifier.onnx \
+    --ingredient-names="MNIST CNN Model" \
+    --name="Trained MNIST Classifier" \
+    --author-org="Your Organization" \
+    --author-name="Your Name" \
+    --key=private.pem \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    > model_output.txt
+MODEL_ID=$(extract_id model_output.txt)
+echo "Model ID: $MODEL_ID"
+
+read -s -r -p "Display model's manifest"
+atlas-cli manifest export \
+	  --id=$MODEL_ID \
+	  --format=json \
+	  | jq '.'
+
+read -s -r -p "Press any key to continue"
+
+echo -e "\n=== STEP 3: Link Model Training Manifests ==="
+
+read -s -r -p "Link MNIST training dataset to model..."
+atlas-cli manifest link \
+	  --source=$MODEL_ID \
+	  --target=$TRAIN_DATASET_ID \
+	  --storage-type=database \
+	  --storage-url=$STORAGE_URL \
+	  > model_train_dataset_link_output.txt
+MODEL_ID=$(extract_id model_train_dataset_link_output.txt)
+echo "Updated Model ID: $MODEL_ID"
+
+read -s -r -p "Link training script to model..."
+atlas-cli manifest link \
+	  --source=$MODEL_ID \
+	  --target=$TRAINING_SCRIPT_ID \
+	  --storage-type=database \
+	  --storage-url=$STORAGE_URL \
+	  > model_train_script_link_output.txt
+MODEL_ID=$(extract_id model_train_script_link_output.txt)
+echo "Updated Model ID: $MODEL_ID"
+
+read -s -r -p "Display model's manifest"
+atlas-cli manifest export \
+	  --id=$MODEL_ID \
+	  --format=json \
+	  | jq '.'
+
+read -s -r -p "Press any key to continue"
+
+echo -e "\n=== STEP 4: Generate & Link Provenance for Model Evaluation Artifacts ==="
+
+atlas-cli dataset create \
+    --paths="$TEST_DATASET" \
+    --ingredient-names="MNIST Training Dataset" \
+    --name="MNIST Training Data" \
+    --author-org="https://huggingface.co/datasets/ylecun/mnist/tree/main/mnist/blob/main/mnist/$TEST_DATASET" \
+    --author-name="ylecun" \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    --key=private.pem \
+    > test_dataset_output.txt
+TEST_DATASET_ID=$(extract_id test_dataset_output.txt)
+echo "Test Dataset ID: $TEST_DATASET_ID"
+
+atlas-cli software create \
+    --paths=../mnist/eval.py \
+    --ingredient-names="MNIST Evaluation Script" \
+    --name="MNIST Model Evaluation Implementation" \
+    --software-type="script" \
+    --version="1.0.0" \
+    --author-org="Your Organization" \
+    --author-name="Your Name" \
+    --description="PyTorch evaluation script for MNIST CNN model" \
+    --with-tdx \
+    --key=private.pem \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    > eval_script_output.txt
+EVAL_SCRIPT_ID=$(extract_id eval_script_output.txt)
+echo "Evaluation Script ID: $EVAL_SCRIPT_ID"
+
+touch eval_results.json
+echo "Creating evaluation results manifest linked to model..."
+atlas-cli evaluation create \
+    --path=eval_results.json \
+    --name="MNIST Model Evaluation Results" \
+    --author-org="Your Organization" \
+    --author-name="Your Name" \
+    --model-id=$MODEL_ID \
+    --dataset-id=$TEST_DATASET_ID \
+    --hash-alg=sha384 \
+    --key=private.pem \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    > eval_results_output.txt
+EVAL_RESULTS_ID=$(extract_id eval_results_output.txt)
+echo "Evaluation Results ID: $EVAL_RESULTS_ID"
+
+atlas-cli manifest link \
+	  --source=$EVAL_RESULTS_ID \
+	  --target=$EVAL_SCRIPT_ID \
+	  --storage-type=database \
+	  --storage-url=$STORAGE_URL \
+	  > eval_script_link_output.txt
+EVAL_RESULTS_ID=$(extract_id eval_script_link_output.txt)
+echo "Updated Eval Results ID: $EVAL_RESULTS_ID"
+
+read -s -r -p "Press any key to continue"
+
+echo -e "\n=== STEP 4: Export Provenance Graph ==="
+atlas-cli manifest export \
+    --id=$EVAL_RESULTS_ID \
+    --storage-type=database \
+    --storage-url=$STORAGE_URL \
+    --format=json \
+    --max-depth=10 \
+    --output=mnist_provenance.json
+
+read -s -r -p "Press any key to continue"
+
+echo -e "\n=== STEP 5: Validate Provenance ==="
+
+read -s -r -p "Validate model manifest..."
+atlas-cli manifest validate \
+	  --id=$MODEL_ID \
+	  --storage-type=database \
+	  --storage-url=$STORAGE_URL
+
+read -s -r -p "Validate evaluation results manifest..."
+atlas-cli manifest validate \
+	  --id=$EVAL_RESULTS_ID \
+	  --storage-type=database \
+	  --storage-url=$STORAGE_URL
+
+INVALID_LINKED_MANIFEST_ID="urn:c2pa:123e4567-e89b-12d3-a456-426614174000"
+
+read -s -r -p "Validate bad manifest link (should fail)..."
+atlas-cli manifest verify-link \
+	  --source=$MODEL_ID \
+	  --target=$INVALID_LINKED_MANIFEST_ID \
+	  --storage-type=database \
+	  --storage-url=$STORAGE_URL
+
+read -s -r -p "Display exported evaluation results provenance"
+echo -e "\n"
+cat mnist_provenance.json | jq '.'
+
+read -s -r -p "Finish demo"
+echo -e "\n"
+rm -f *_output.txt *.pem classifier.onnx eval_results.json mnist_provenance.json
diff --git a/src/cc_attestation/mock.rs b/src/cc_attestation/mock.rs
@@ -1,3 +1,4 @@
+use serde::{Deserialize, Serialize};
 use serde_json::json;
 
 use tdx_workload_attestation::error::Result;
@@ -7,6 +8,16 @@ pub struct MockAttestationProvider {
     platform: String,
 }
 
+#[derive(Serialize, Deserialize, Debug)]
+pub struct MockReport {
+    report_type: String,
+    platform: String,
+    timestamp: String,
+    status: String,
+    version: String,
+    message: String,
+}
+
 impl MockAttestationProvider {
     pub fn new(platform: &str) -> Self {
         Self {
@@ -19,14 +30,12 @@ impl AttestationProvider for MockAttestationProvider {
     fn get_attestation_report(&self) -> Result<String> {
         // Create a mock attestation report with platform info
         let mock_report = json!({
-            "type": "mock_attestation",
+            "report_type": "mock_attestation",
             "platform": self.platform,
             "timestamp": chrono::Utc::now().to_rfc3339(),
-            "mock_data": {
-                "version": "1.0",
-                "status": "simulated",
-                "message": "This is a mock attestation report for non-Linux or unsupported platforms"
-            }
+            "version": "1.0",
+            "status": "simulated",
+            "message": "This is a mock attestation report for non-Linux or unsupported platforms"
         });
 
         // Serialize to JSON string
diff --git a/src/manifest/mod.rs b/src/manifest/mod.rs
@@ -1,3 +1,4 @@
+use crate::cc_attestation::mock::MockReport;
 use crate::error::{Error, Result};
 use crate::hash;
 use crate::storage::traits::StorageBackend;
@@ -991,6 +992,14 @@ fn extract_assertion_details(
                 "enforced": do_not_train.enforced,
             })
         }
+        atlas_c2pa_lib::assertion::Assertion::CustomAssertion(custom) => {
+            let r_str = custom.data.as_str().unwrap();
+            let r: MockReport = serde_json::from_str(r_str).unwrap();
+            serde_json::json!({
+                "label": custom.label,
+                "data": r,
+            })
+        }
         _ => serde_json::json!({"type": "Unknown"}),
     }
 }
diff --git a/src/tests/cc_attestation.rs b/src/tests/cc_attestation.rs
diff --git a/storage_service/Dockerfile b/storage_service/Dockerfile

-Original file line number
+Diff line change
@@ @@ -1,2 +1,4 @@ @@
 /target
 -*.*~
 +*.*~
 +*.parquet
 +*.pem
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,4 @@`
	`1`	`+use crate::cc_attestation::mock::MockReport;`
`1`	`2`	`use crate::error::{Error, Result};`
`2`	`3`	`use crate::hash;`
`3`	`4`	`use crate::storage::traits::StorageBackend;`
`@@ -991,6 +992,14 @@ fn extract_assertion_details(`
`991`	`992`	`"enforced": do_not_train.enforced,`
`992`	`993`	`})`
`993`	`994`	`}`
	`995`	`+ atlas_c2pa_lib::assertion::Assertion::CustomAssertion(custom) => {`
	`996`	`+ let r_str = custom.data.as_str().unwrap();`
	`997`	`+ let r: MockReport = serde_json::from_str(r_str).unwrap();`
	`998`	`+ serde_json::json!({`
	`999`	`+ "label": custom.label,`
	`1000`	`+ "data": r,`
	`1001`	`+ })`
	`1002`	`+ }`
`994`	`1003`	`_ => serde_json::json!({"type": "Unknown"}),`
`995`	`1004`	`}`
`996`	`1005`	`}`