Data Artifex

/ˈar.ti.feks/, [ˈärt̪ɪfɛks̠] 1. artist, actor 2. author, maker 3. craftsman 4. master of an art 5. mastermind

Data Artifex is a Python-based, open-source ecosystem that elevates data into API-powered, machine-actionable digital knowledge.

This project is in an early incubation phase.

Our vision is to:

Foster the creation of comprehensive data documentation that is equally accessible to humans and machines
Facilitate the rapid publication of data and associated metadata through APIs
Unleash data-driven machine intelligence
Reduce time spent on data wrangling
Support the adoption of standards and best practices
Enable natural language-driven data management

This will have a broad impact and modernize how we publish, discover, access, and utilize data.

To achieve this, we are building, in collaboration with leading organizations, data custodians, research communities, developers, and other stakeholders, a collection of open-source packages powered by metadata standards, knowledge graphs, intelligent agents, and APIs.

Overview

The way too common practice of publishing data as downloadable files or in traditional databases, with little documentation and no APIs, is a flawed approach. It leads to users spending the majority of their time data wrangling and prevents machines from understanding or taking intelligent actions on the data.

We aim to address this by building open-source tools promoting metadata/API-first data management practices.

In such an environment:

Metadata (digital documentation) always exists and surrounds the data, unlocking machine intelligence
Users interact with the data through intuitive interfaces or by using natural language
Applications, developers, and data scientists interact with APIs
The data and metadata are managed in the back-end by agents.

We're looking towards a future where managing data is as easy as talking to a computer in everyday language. This means data custodians and non-technical users won't have to worry about the complexities of implementing APIs and metadata challenges.

High-Value Data

Note that our focus is on High-Value Datasets (HVDs), which have substantial potential to benefit society, contribute to humanitarian efforts, and address global challenges (socio-economic, health, environment, AI, etc.). The complexities surrounding such data increase the importance and need for user and machine-friendly data and APIs. This approach does not preclude using the tools with other kinds of datasets.

Implementation strategy

Our technical approach is not to reinvent the wheel but to fill gaps and provide new ways to work, essentially enabling best practices advocated by data custodians and research communities and empowering computer systems with data intelligence.

We envision our open-source ecosystem as a collection of small, specialized tools that can be used in isolation but, most importantly, can come together in a well-orchestrated manner to automate the data to API workflow and facilitate the creation and maintenance of metadata.

These will work hand in hand with existing data technologies, such as databases and API frameworks, as well as harness recent developments in artificial intelligence.

Standards

Standards and best practices are central to our strategy. We are actively involved in the CODATA Cross-Domain Interoperability Framework, focusing on creating guidelines for domain-agnostic standards that support the implementation of interoperability and the reusability of FAIR data.

Guided by the FAIR principles , the CODATA Cross-Domain Integration Framework, and the W3C Data on the Web Best Practices, our tools utilize specifications such as the Data Documentation Initiative (DDI), MLCommons Croissant, schema.org, Data Catalog Vocabulary (DCAT), Research Object Crates (RO-Crates), and Open Digital Rights Language (ODRL).

On the information technology side, we build upon JSON Schema, and semantic web standards.

Acknowledgments

Strategic partners

Technology partners

Contact

For more information, contact Pascal Heus ([email protected]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Artifex

Overview

High-Value Data

Implementation strategy

Standards

Acknowledgments

Strategic partners

Technology partners

Contact

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!