Holistic QC Pipeline

A modular rewrite of the BOOST behavioral quality-control (QC) pipeline. The repo pulls raw JATOS exports, normalizes them into tidy data frames, applies construct-specific QC, and persists both participant-level artifacts and aggregate dashboards for downstream analysts.

Highlights

Single entrypoint (code/main_handler.py) that coordinates pulling raw studies, CSV conversion, QC, persistence, and plotting.
Domain-specific QC modules for the core cognitive constructs: cognitive control (CC), psychomotor speed (PS), memory (MEM), and word learning (WL).
Automatic artifact management: raw outputs land under data/, aggregated summaries in meta/, and generated plots in per-subject folders (with exemplar group views retained in group/plots/).
Ready to automate: python code/main_handler.py all mirrors the GitHub Action and is safe to schedule.

Repository Layout

code/
  main_handler.py        # Orchestrates end-to-end QC for a task or the full battery
  data_processing/
    pull_handler.py      # Pulls fresh JATOS exports by study IDs
    utils.py             # Shared helpers (CSV normalization, accuracy/RT math, WL fuzzy matching)
    save_utils.py        # Writes subject artifacts (CSV + plots) into the data lake structure
    cc_qc.py             # CC task QC rules (AF/NF/NTS/ATS/NNB/VNB)
    ps_qc.py             # PS task QC rules (PC/LC/DSST)
    mem_qc.py            # Working memory QC rules (FN/SM)
    wl_qc.py             # Word learning QC rules (WL/DWL + delay reconciliation)
    plot_utils.py        # Matplotlib/seaborn helpers for construct-specific visualizations
  transfer/
    path_logic.py        # Optional helper to mirror generated outputs onto the BOOST file server

data/                   # Subject-level caches (obs/int sites, then subject/task/data|plot)
meta/                   # Auto-saved aggregate CSVs (master_acc, cc_master, ps_master, mem_master, wl_master)
group/plots/            # Example construct plots for quick reference
requirements.txt        # Python dependencies for QC + plotting
run.py                  # Flask placeholder (not yet active)

Data & QC Flow

Pull – Pull in pull_handler.py requests study metadata + data blobs from JATOS for the study IDs defined in Handler.IDs. days_ago defaults to 127 but can be overridden when calling load().
Normalize – CONVERT_TO_CSV flattens newline-delimited JSON into tidy Pandas frames ready for QC.
QC & Metrics – Handler.choose_construct() routes each task to its construct-specific QC class:
- CCqC enforces max RT checks, per-condition accuracy thresholds, and task-switching rules.
- PS_QC scores psychomotor speed blocks and tallies correct counts.
- MEM_QC inspects FN/SM performance with RT + accuracy rollups.
- WL_QC orchestrates fuzzy matching against version-specific keys, handling WL and DWL simultaneously.
Visualize – plot_utils generates construct-appropriate figures (per-condition counts, RT distributions, WL learning curves, etc.).
Persist – SAVE_EVERYTHING stores per-participant CSVs and plots under data/<study>/<site>/<subject>/<task>/. Handler._persist_all_masters() writes aggregate CSVs into meta/ on every successful task run to keep analytics in sync.

Supported Tasks

Construct	Tasks	Notes
CC (Cognitive Control)	`AF`, `NF`, `ATS`, `NTS`, `NNB`, `VNB`	Shared QC thresholds at 50% accuracy, optional task-switching logic for ATS/NTS
PS (Psychomotor Speed)	`PC`, `LC`, `DSST`	Separate RT limits for LC/PC vs DSST; exports accuracy and correct-count masters
MEM (Face/Scene Memory)	`FN`, `SM`	Captures per-condition accuracy, mean RT, and counts into `mem_master.csv`
WL (Word Learning + Delayed)	`WL`, `DWL`	Combines learning/distraction/immediate blocks with delayed recall; masters upsert rows per subject/session

To target a single task, run python code/main_handler.py WL. To mirror the nightly sweep, use python code/main_handler.py all.

Setup

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

(Optional) If you are on Nix, nix develop provisions the toolchain.
Configure secrets:
- Handler.pull() currently references a token inline. Replace with an environment variable (e.g., JATOS_TOKEN) and export it before running.
- Proxy credentials (tease) should also come from the environment or an .env file that is not committed.

Running QC Locally

# QC the full battery (mirrors CI)
python code/main_handler.py all

# QC a single construct
python code/main_handler.py AF

Outputs land under data/ using the subject -> task folder pattern enforced by SAVE_EVERYTHING. Every run also refreshes the aggregated CSVs in meta/:

master_acc.csv: high-level accuracy summaries for PS/MEM tasks.
cc_master.csv: condition-level accuracy + mean RT for CC tasks.
ps_master.csv: per-block correct counts for PS tasks.
mem_master.csv: joined counts/RT/accuracy for FN/SM.
wl_master_wide.csv & wl_master.csv: wide vs flattened WL summaries combining WL + DWL submissions.

Visual Artifacts

Participant plots are co-located with their data under data/.../plot/.
Shared reference visuals live in group/plots/ (e.g., flanker.png, task_switching.png) for quick distribution in slide decks.

Transferring Data to the Server

code/transfer/path_logic.py discovers local subject folders and mirrors them to /mnt/lss/Projects/BOOST (observational vs intervention sites routed automatically). Use PathLogic.copy_subjects_to_server(max_workers=?, dry_run=True) inside a Python shell to preview the copy plan before executing.

Development Workflow

Lint with python -m flake8 code before committing.
Run pytest (tests live under tests/) to cover threshold logic, expected artifact names, and any new utilities.
Keep notebooks or ad-hoc experiments outside tracked directories, or convert them into reproducible scripts.

Extending the Pipeline

Add the new task code and study IDs to Handler.IDs.
Implement construct logic under code/data_processing/ (reuse helpers in utils.py when possible).
Register the new branch in Handler.choose_construct() and add persistence hooks for master CSVs.
Document the task behavior and update tests/fixtures to reflect the new data expectations.

Troubleshooting

No data returned from JATOS: confirm the study IDs in Handler.IDs and that your token has access; adjust the days_ago window if you are backfilling.
Missing session folders: ensure input CSVs include session or session_number. SAVE_EVERYTHING uses those columns to label artifacts.
WL metrics look stale: WL and DWL write to the same wl_master rows via _upsert_wl_master; make sure both tasks are run for each session to populate delay scores.

License & Data Privacy

This repository processes sensitive participant responses. Keep tokens, raw exports, and downstream artifacts off public machines. Add new temp/output folders to .gitignore as needed to avoid leaking data.

Name		Name	Last commit message	Last commit date
Latest commit History 386 Commits
.github/workflows		.github/workflows
code		code
data		data
group/plots		group/plots
meta		meta
tests/data_processing		tests/data_processing
.envrc		.envrc
.gitignore		.gitignore
.qcrc		.qcrc
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
data.json		data.json
flake.lock		flake.lock
flake.nix		flake.nix
meta_file.csv		meta_file.csv
requirements.txt		requirements.txt
sitecustomize.py		sitecustomize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Holistic QC Pipeline

Highlights

Repository Layout

Data & QC Flow

Supported Tasks

Setup

Running QC Locally

Visual Artifacts

Transferring Data to the Server

Development Workflow

Extending the Pipeline

Troubleshooting

License & Data Privacy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

HBClab/boost-beh

Folders and files

Latest commit

History

Repository files navigation

Holistic QC Pipeline

Highlights

Repository Layout

Data & QC Flow

Supported Tasks

Setup

Running QC Locally

Visual Artifacts

Transferring Data to the Server

Development Workflow

Extending the Pipeline

Troubleshooting

License & Data Privacy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages