Below is a description of the general file and folder structure in the project, along with how conda environments can be used, and how each element is typically utilized. All executable scripts for stages should be named app.py to maintain consistency.
-
Project root
manifest.json— The main project manifest (if used). It may contain common settings or link individual pipeline stages.- Other files not related to a specific stage.
-
Stage folders (for example,
load_anndata,clustering,dimensionality_reduction)-
manifest.json(inside the stage folder)- Describes the key parameters required by this stage.
- Contains the stage name, its description, execution order (
stage), types and requirements of input parameters (params), and information about returned data (return). - May specify dependencies (e.g.,
depends_and_scriptordepends_or_script) and the environments used (conda,libs,conda_pip).- Conda usage: If
condais specified, the system creates or uses a conda environment with the requested Python version and installs the listed libraries (libsvia conda,conda_pipvia pip in that conda environment). This isolation helps avoid library conflicts.
- Conda usage: If
-
app.py(the executable script for this stage)- This file name should always be
app.pyto maintain a consistent structure. - It contains the core business logic: reading data, transforming it, analyzing it, and producing output.
- Typically, it defines a function (often
run(**kwargs)) that:- Imports the necessary dependencies (e.g.,
scanpy,scvi,numpy). - Reads parameters from
kwargs, which are provided frommanifest.json(e.g., file path, analysis method, metrics). - Calls a helper function or a series of functions that perform the main logic (e.g., data loading, clustering, dimensionality reduction, etc.).
- Returns the result in the format described in the manifest (usually a dictionary where the keys match the fields in
return).
- Imports the necessary dependencies (e.g.,
- This file name should always be
-
- Import libraries
import scanpy as sc import numpy as np import pandas as pd # ...
- Define helper functions (e.g.,
reduce_dimensionality,cluster,load_data)def reduce_dimensionality(adata, method='pca', ...): # Dimensionality reduction logic return adata
run(**kwargs)functiondef run(**kwargs): # Read arguments adata = kwargs.get('adata') method = kwargs.get('method', 'pca') # ... # Call a helper function out = reduce_dimensionality(adata, method=method) # Return the result return { 'adata': out }
-
manifest.jsonin each folder:- Defines which parameters the stage requires and what data it returns.
- Specifies the execution order in the pipeline.
- Allows you to determine which libraries (conda or pip) are needed for the stage.
- May include version constraints for packages.
- Conda Environments: If
condais specified, the system will create/use the indicated environment (for example,python=3.11) and install the specified libraries.
-
app.py:- Performs the main work — processes data using parameters obtained from
manifest.json. - Produces output that subsequent stages can access.
- Has a structure consisting of several steps:
- Imports
- Helper functions
run(**kwargs)function — the entry point.
- Performs the main work — processes data using parameters obtained from
project_root/
├── manifest.json # Main (root) manifest, if present
├── load_anndata/
│ ├── manifest.json # Manifest for the loading stage
│ └── app.py # Script performing data loading
├── clustering/
│ ├── manifest.json # Manifest for the clustering stage
│ └── app.py # Script for clustering data
├── dimensionality_reduction/
│ ├── manifest.json # Manifest for the dimensionality reduction stage
│ └── app.py # Script performing the analysis
└── other_folders_or_files # Other files/folders in the project
- Store a maximum of one stage in each folder (with its own
manifest.jsonandapp.py). - The main manifest can set the overall pipeline logic or serve as the entry point for the entire system.
- Each
app.pyshould be as focused as possible, making the stage easier to test, modify, and reuse. - Parameters in
manifest.jsonshould be described in as much detail as possible so that users understand what is required as input and what will be returned as output. - Conda Environments: When
condais specified, each stage can be isolated in its own environment to avoid library version conflicts across different scripts.
Below is a description of the general file and folder structure in the project, along with how conda environments can be used, and how each element is typically utilized. All executable scripts for stages should be named app.py to maintain consistency.
-
Project root
manifest.json— The main project manifest (if used). It may contain common settings or link individual pipeline stages.- Other files not related to a specific stage.
-
Stage folders (for example,
load_anndata,clustering,dimensionality_reduction)-
manifest.json(inside the stage folder)- Describes the key parameters required by this stage.
- Contains the stage name, its description, execution order (
stage), types and requirements of input parameters (params), and information about returned data (return). - May specify dependencies (e.g.,
depends_and_scriptordepends_or_script) and the environments used (conda,libs,conda_pip).- Conda usage: If
condais specified, the system creates or uses a conda environment with the requested Python version and installs the listed libraries (libsvia conda,conda_pipvia pip in that conda environment). This isolation helps avoid library conflicts.
- Conda usage: If
-
app.py(the executable script for this stage)- This file name should always be
app.pyto maintain a consistent structure. - It contains the core business logic: reading data, transforming it, analyzing it, and producing output.
- Typically, it defines a function (often
run(**kwargs)) that:- Imports the necessary dependencies (e.g.,
scanpy,scvi,numpy). - Reads parameters from
kwargs, which are provided frommanifest.json(e.g., file path, analysis method, metrics). - Calls a helper function or a series of functions that perform the main logic (e.g., data loading, clustering, dimensionality reduction, etc.).
- Returns the result in the format described in the manifest (usually a dictionary where the keys match the fields in
return).
- Imports the necessary dependencies (e.g.,
- This file name should always be
-
- Import libraries
import scanpy as sc import numpy as np import pandas as pd # ...
- Define helper functions (e.g.,
reduce_dimensionality,cluster,load_data)def reduce_dimensionality(adata, method='pca', ...): # Dimensionality reduction logic return adata
run(**kwargs)functiondef run(**kwargs): # Read arguments adata = kwargs.get('adata') method = kwargs.get('method', 'pca') # ... # Call a helper function out = reduce_dimensionality(adata, method=method) # Return the result return { 'adata': out }
-
manifest.jsonin each folder:- Defines which parameters the stage requires and what data it returns.
- Specifies the execution order in the pipeline.
- Allows you to determine which libraries (conda or pip) are needed for the stage.
- May include version constraints for packages.
- Conda Environments: If
condais specified, the system will create/use the indicated environment (for example,python=3.11) and install the specified libraries.
-
app.py:- Performs the main work — processes data using parameters obtained from
manifest.json. - Produces output that subsequent stages can access.
- Has a structure consisting of several steps:
- Imports
- Helper functions
run(**kwargs)function — the entry point.
- Performs the main work — processes data using parameters obtained from
project_root/
├── manifest.json # Main (root) manifest, if present
├── load_anndata/
│ ├── manifest.json # Manifest for the loading stage
│ └── app.py # Script performing data loading
├── clustering/
│ ├── manifest.json # Manifest for the clustering stage
│ └── app.py # Script for clustering data
├── dimensionality_reduction/
│ ├── manifest.json # Manifest for the dimensionality reduction stage
│ └── app.py # Script performing the analysis
└── other_folders_or_files # Other files/folders in the project
- Store a maximum of one stage in each folder (with its own
manifest.jsonandapp.py). - The main manifest can set the overall pipeline logic or serve as the entry point for the entire system.
- Each
app.pyshould be as focused as possible, making the stage easier to test, modify, and reuse. - Parameters in
manifest.jsonshould be described in as much detail as possible so that users understand what is required as input and what will be returned as output. - Conda Environments: When
condais specified, each stage can be isolated in its own environment to avoid library version conflicts across different scripts.