A modular rewrite of the BOOST behavioral quality-control (QC) pipeline. The repo pulls raw JATOS exports, normalizes them into tidy data frames, applies construct-specific QC, and persists both participant-level artifacts and aggregate dashboards for downstream analysts.
- Single entrypoint (
code/main_handler.py) that coordinates pulling raw studies, CSV conversion, QC, persistence, and plotting. - Domain-specific QC modules for the core cognitive constructs: cognitive control (CC), psychomotor speed (PS), memory (MEM), and word learning (WL).
- Automatic artifact management: raw outputs land under
data/, aggregated summaries inmeta/, and generated plots in per-subject folders (with exemplar group views retained ingroup/plots/). - Ready to automate:
python code/main_handler.py allmirrors the GitHub Action and is safe to schedule.
code/
main_handler.py # Orchestrates end-to-end QC for a task or the full battery
data_processing/
pull_handler.py # Pulls fresh JATOS exports by study IDs
utils.py # Shared helpers (CSV normalization, accuracy/RT math, WL fuzzy matching)
save_utils.py # Writes subject artifacts (CSV + plots) into the data lake structure
cc_qc.py # CC task QC rules (AF/NF/NTS/ATS/NNB/VNB)
ps_qc.py # PS task QC rules (PC/LC/DSST)
mem_qc.py # Working memory QC rules (FN/SM)
wl_qc.py # Word learning QC rules (WL/DWL + delay reconciliation)
plot_utils.py # Matplotlib/seaborn helpers for construct-specific visualizations
transfer/
path_logic.py # Optional helper to mirror generated outputs onto the BOOST file server
data/ # Subject-level caches (obs/int sites, then subject/task/data|plot)
meta/ # Auto-saved aggregate CSVs (master_acc, cc_master, ps_master, mem_master, wl_master)
group/plots/ # Example construct plots for quick reference
requirements.txt # Python dependencies for QC + plotting
run.py # Flask placeholder (not yet active)
- Pull –
Pullinpull_handler.pyrequests study metadata + data blobs from JATOS for the study IDs defined inHandler.IDs.days_agodefaults to 127 but can be overridden when callingload(). - Normalize –
CONVERT_TO_CSVflattens newline-delimited JSON into tidy Pandas frames ready for QC. - QC & Metrics –
Handler.choose_construct()routes each task to its construct-specific QC class:CCqCenforces max RT checks, per-condition accuracy thresholds, and task-switching rules.PS_QCscores psychomotor speed blocks and tallies correct counts.MEM_QCinspects FN/SM performance with RT + accuracy rollups.WL_QCorchestrates fuzzy matching against version-specific keys, handling WL and DWL simultaneously.
- Visualize –
plot_utilsgenerates construct-appropriate figures (per-condition counts, RT distributions, WL learning curves, etc.). - Persist –
SAVE_EVERYTHINGstores per-participant CSVs and plots underdata/<study>/<site>/<subject>/<task>/.Handler._persist_all_masters()writes aggregate CSVs intometa/on every successful task run to keep analytics in sync.
| Construct | Tasks | Notes |
|---|---|---|
| CC (Cognitive Control) | AF, NF, ATS, NTS, NNB, VNB |
Shared QC thresholds at 50% accuracy, optional task-switching logic for ATS/NTS |
| PS (Psychomotor Speed) | PC, LC, DSST |
Separate RT limits for LC/PC vs DSST; exports accuracy and correct-count masters |
| MEM (Face/Scene Memory) | FN, SM |
Captures per-condition accuracy, mean RT, and counts into mem_master.csv |
| WL (Word Learning + Delayed) | WL, DWL |
Combines learning/distraction/immediate blocks with delayed recall; masters upsert rows per subject/session |
To target a single task, run python code/main_handler.py WL. To mirror the nightly sweep, use python code/main_handler.py all.
- Create a virtual environment and install dependencies:
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt - (Optional) If you are on Nix,
nix developprovisions the toolchain. - Configure secrets:
Handler.pull()currently references a token inline. Replace with an environment variable (e.g.,JATOS_TOKEN) and export it before running.- Proxy credentials (
tease) should also come from the environment or an.envfile that is not committed.
# QC the full battery (mirrors CI)
python code/main_handler.py all
# QC a single construct
python code/main_handler.py AFOutputs land under data/ using the subject -> task folder pattern enforced by SAVE_EVERYTHING. Every run also refreshes the aggregated CSVs in meta/:
master_acc.csv: high-level accuracy summaries for PS/MEM tasks.cc_master.csv: condition-level accuracy + mean RT for CC tasks.ps_master.csv: per-block correct counts for PS tasks.mem_master.csv: joined counts/RT/accuracy for FN/SM.wl_master_wide.csv&wl_master.csv: wide vs flattened WL summaries combining WL + DWL submissions.
- Participant plots are co-located with their data under
data/.../plot/. - Shared reference visuals live in
group/plots/(e.g.,flanker.png,task_switching.png) for quick distribution in slide decks.
code/transfer/path_logic.py discovers local subject folders and mirrors them to /mnt/lss/Projects/BOOST (observational vs intervention sites routed automatically). Use PathLogic.copy_subjects_to_server(max_workers=?, dry_run=True) inside a Python shell to preview the copy plan before executing.
- Lint with
python -m flake8 codebefore committing. - Run
pytest(tests live undertests/) to cover threshold logic, expected artifact names, and any new utilities. - Keep notebooks or ad-hoc experiments outside tracked directories, or convert them into reproducible scripts.
- Add the new task code and study IDs to
Handler.IDs. - Implement construct logic under
code/data_processing/(reuse helpers inutils.pywhen possible). - Register the new branch in
Handler.choose_construct()and add persistence hooks for master CSVs. - Document the task behavior and update tests/fixtures to reflect the new data expectations.
- No data returned from JATOS: confirm the study IDs in
Handler.IDsand that your token has access; adjust thedays_agowindow if you are backfilling. - Missing session folders: ensure input CSVs include
sessionorsession_number.SAVE_EVERYTHINGuses those columns to label artifacts. - WL metrics look stale: WL and DWL write to the same
wl_masterrows via_upsert_wl_master; make sure both tasks are run for each session to populate delay scores.
This repository processes sensitive participant responses. Keep tokens, raw exports, and downstream artifacts off public machines. Add new temp/output folders to .gitignore as needed to avoid leaking data.