Skip to content
This repository was archived by the owner on Mar 30, 2020. It is now read-only.
This repository was archived by the owner on Mar 30, 2020. It is now read-only.

Continuing our discussion for simplifying pipelines (and their examples) #13

@pgrosu

Description

@pgrosu

Hi Matt (@mbookman),

So to continue our discussion from #10 (comment), I understand the REST interface here:

https://www.googleapis.com/discovery/v1/apis/genomics/v1alpha2/rest

But this is too cumbersome for bioinformaticians who just want a turn-key solution and to run stuff. The examples are great, but we should have secondary ones to simplify them, which will increase the audience spectrum. This includes the ability for multiple files. This can be done now, even if the backend does not support it directly. Also include examples of connected pipelines as workflows and nested pipelines examples - and yes, there are several ways :)

So with each example there should be pipelines like this, which are defined in a file that the program (Python/R/Java, etc) will pick up and adapt to the REST interface. Here one provides only the necessary information, and the parser will transform the generalized names and also fill out the required on it's own:

Pipeline:

    name: 'fastqc'
    CPU: 1
    RAM: 3.75 GB

    disks:
      name: 'datadisk'
      mountPoint: '/mnt/data'
      size: 500 GB
      persistent: true

    docker: 
      image: 'gcr.io/PROJECT_ID_ARGUMENT/fastqc'

      cmd: ( 'mkdir /mnt/data/output && '
             'fastqc /mnt/data/input/* --outdir=/mnt/data/output/' )

     inputParameters:

       name: inputFile + [idx : 1...len(INPUT)]

       location: 
         path: 'input/'
         disk: 'datadisk'

     outputParameters:

       name: 'outputPath'

       location:
         path: 'output/*'
         disk: 'datadisk'


   pipelineArgs:

    RAM: 1 GB
    disks:
           name: 'datadisk'
           size: DISK_SIZE_ARGUMENT
           persistent: true

     inputs:
        inputFile + [idx : 1...len(INPUT)]
     outputs:
        path: OUTPUT_ARGUMENT

    logging:
      path: LOGGING_ARGUMENT

Let me know what you think.

Thanks,
Paul

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions