Skip to content

KUDOS and the question(s) #2

@yarikoptic

Description

@yarikoptic

Hi Lion,

I was thrilled to find ASDF and SEIS-PROV used by it while looking around at how provenance is captured while working on/with standardized (for the field) data storage formats (like ASDF) and types.
In our case it is NWB file format for neural data which is, like ASDF, is hdf5 based. ATM it has no provisioning for provenance capture, so I was intrigued by ASDF storing full PROV records inside the .hdf5.

Based on your experience with establishing SEIS-PROV (well done! I really loved going through http://seismicdata.github.io/SEIS-PROV/), the fact that it was out there for years now, and somewhat reflecting on the fact that you seems to be the sole developer/contributor to it, and that there were no recent activity to further its development etc, I wondered:

  • PROV (like the rest of semantic web markup) is not really easily digestable by humans with all the random IDs etc. json-ld and other serializations made things better but not really easy. That is why often there is a seductive power of "let's just come up with some schema which could be compatible with PROV, i.e. that we could convert to PROV representation if needed; or which would just be more useful to humans instead of computers". Do you still feel that "native" PROV in ASDF was the way to go?

  • do higher level user tools in the field use PROV information, e.g. for visualization or querying by mere mortals for pragmatic benefit (e.g. just listing types of filtering done on the data with parameters used etc)?

  • did you see or could refer to specific pragmatic (goal driven, not just demos on "what could be done") use cases / studies / benefits from having PROV in ASDF?

In our case with NWB, our immediate "prototypical" use-case is the brand new https://github.com/spikodrome/ project, where one of the goals would be to compare results between different spike sorting algorithms and human curators. So we are thinking now about how to capture provenance information on preprocessing + spike sorting in NWB files. (ref: https://github.com/SpikeInterface/spikeextractors/issues/290), hence I decided to ask this question(s) ;)

If you would prefer to reply in private -- debian at onerussian.com should work ;)

Thanks in advance and Kudos on all great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions