/ˈar.ti.feks/, [ˈärt̪ɪfɛks̠] 1. artist, actor 2. author, maker 3. craftsman 4. master of an art 5. mastermind
Data Artifex is a Python-based, open-source ecosystem that elevates data into API-powered, machine-actionable digital knowledge.
This project is in an early incubation phase.
Our vision is to:
- Foster the creation of comprehensive data documentation that is equally accessible to humans and machines
- Facilitate the rapid publication of data and associated metadata through APIs
- Unleash data-driven machine intelligence
- Reduce time spent on data wrangling
- Support the adoption of standards and best practices
- Enable natural language-driven data management
This will have a broad impact and modernize how we publish, discover, access, and utilize data.
To achieve this, we are building, in collaboration with leading organizations, data custodians, research communities, developers, and other stakeholders, a collection of open-source packages powered by metadata standards, knowledge graphs, intelligent agents, and APIs.
The way too common practice of publishing data as downloadable files or in traditional databases, with little documentation and no APIs, is a flawed approach. It leads to users spending the majority of their time data wrangling and prevents machines from understanding or taking intelligent actions on the data.
We aim to address this by building open-source tools promoting metadata/API-first data management practices.
In such an environment:
- Metadata (digital documentation) always exists and surrounds the data, unlocking machine intelligence
- Users interact with the data through intuitive interfaces or by using natural language
- Applications, developers, and data scientists interact with APIs
- The data and metadata are managed in the back-end by agents.
We're looking towards a future where managing data is as easy as talking to a computer in everyday language. This means data custodians and non-technical users won't have to worry about the complexities of implementing APIs and metadata challenges.
Note that our focus is on High-Value Datasets (HVDs), which have substantial potential to benefit society, contribute to humanitarian efforts, and address global challenges (socio-economic, health, environment, AI, etc.). The complexities surrounding such data increase the importance and need for user and machine-friendly data and APIs. This approach does not preclude using the tools with other kinds of datasets.
Our technical approach is not to reinvent the wheel but to fill gaps and provide new ways to work, essentially enabling best practices advocated by data custodians and research communities and empowering computer systems with data intelligence.
We envision our open-source ecosystem as a collection of small, specialized tools that can be used in isolation but, most importantly, can come together in a well-orchestrated manner to automate the data to API workflow and facilitate the creation and maintenance of metadata.
These will work hand in hand with existing data technologies, such as databases and API frameworks, as well as harness recent developments in artificial intelligence.
Standards and best practices are central to our strategy. We are actively involved in the CODATA Cross-Domain Interoperability Framework, focusing on creating guidelines for domain-agnostic standards that support the implementation of interoperability and the reusability of FAIR data.
Guided by the FAIR principles , the CODATA Cross-Domain Integration Framework, and the W3C Data on the Web Best Practices, our tools utilize specifications such as the Data Documentation Initiative (DDI), MLCommons Croissant, schema.org, Data Catalog Vocabulary (DCAT), Research Object Crates (RO-Crates), and Open Digital Rights Language (ODRL).
On the information technology side, we build upon JSON Schema, and semantic web standards.
For more information, contact Pascal Heus ([email protected]).

