A high-performance data ingestion and processing engine designed for heterogeneous storage systems and scientific workflows.
Ubuntu/Debian:
spack install iowarpgit clone https://github.com/iowarp/content-assimilation-engine.git
cd content-assimilation-engine
mkdir build && cd build
cmake ..
make
make installThis builds and installs two executables:
wrp- Main YAML job orchestratorwrp_binary_format_mpi- MPI binary format processor
The repository includes a simple example configuration at omni/config/example_simple.yaml:
# Simple OMNI example using repository data files
name: example_data_ingestion
max_scale: 4 # Maximum number of processes
data:
- path: data/A46_xx.parquet
offset: 0
size: 31744
description:
- parquet
- structured_data
- path: data/datahub.csv
range: [0, 671]
description:
- csv
- tabularRun it from the repository root:
mpirun -np 4 wrp omni/config/example_simple.yamlThe OMNI format uses YAML to describe data ingestion jobs:
name: job_name # Job identifier (required)
max_scale: 100 # Max number of MPI processes (required)
data: # List of data sources
- path: /file/path # File path (required)
offset: 0 # Starting byte offset (optional)
size: 1024 # Bytes to read from offset (optional)
range: [0, 1024] # Alternative: [start, end] byte range (optional)
description: # Tags describing the data (optional)
- binary
- structured
hash: sha256_value # Integrity verification (optional)Key Fields:
path: Absolute or relative file pathoffset+size: Readsizebytes starting atoffsetrange: Alternative to offset/size, specifies [start, end] bytesdescription: List of tags for metadata/categorizationhash: SHA256 hash for data verification
The repository includes several example configurations in omni/config/:
quick_test.yaml- Simple test casedemo_job.yaml- Demonstration jobexample_job.yaml- Annotated example with all optionswildcard_test.yaml- Pattern matching examples
Run examples:
cd build/omni/config
mpirun -np 2 ../../bin/wrp quick_test.yamlomni/- OMNI module (job orchestration and format processing)format/- Binary format handlersrepo/- Repository and storage backendsconfig/- Example job configurations
data/- Sample datasets for testing
This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.
Copyright (c) 2024, Gnosis Research Center, Illinois Institute of Technology
- IOWarp Organization: https://github.com/iowarp
- Issues: GitHub Issues