Skip to content

Liu-Hy/GenoMAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis


Official implementation of the paper:

"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis"
Haoyang Liu, Yijiang Li, Haohan Wang
UIUC, UC San Diego

🌐 Website | πŸ“„ Paper | πŸ’» Code


πŸ”¬ Overview

GenoMAS System Diagram

This repository contains two main components:

πŸ€– 1. Multi-Agent Framework

A minimal yet powerful framework for robust automation of scientific workflows:

  • Generic Communication Protocol: Typed messaging mechanism for code-driven analysis
  • Notebook-Style Workflow: Agents can plan, write code, execute, debug, and backtrack through multi-step tasks
  • Customizable Agents: Define specific roles, guidelines, tools, and action units for your domain

Design Philosophy:

  • Balance controllability of traditional workflows with flexibility of autonomous agents
  • Provide just enough encapsulation to make agent experiments easierβ€”simplicity matters
  • Build a reliable foundation for production-level agent systems

🧬 2. GenoMAS Implementation

A specialized system for automated gene expression data analysis:

  • Data Sources: Analyzes transcriptomic datasets from GEO and TCGA
  • Goal: Identify significant genes related to traits while accounting for confounders
  • State-of-the-Art Performance: 60.38% F1 score on the GenoTEX benchmark, substantially outperforming both open-domain agents and generic biomedical agents
  • Scientific Discovery: Identifies biologically meaningful gene-trait associations, with high-confidence associations corroborated by literature and novel findings worthy of further investigation

πŸ“š Table of Contents


πŸ“¦ Usage

1. Data Preparation

Download Input Data

Our experiments use the publicly available GenoTEX benchmark. Download the input data (~42 GB) from the Google Drive folder, and save them under the same parent folder.

Verify Data Integrity

cd download
python validator.py --data-dir /path/to/data --validate

2. Environment Setup

Create Python Environment

Create a conda environment with Python 3.10 and install the required packages:

conda create -n genomas python=3.10
conda activate genomas
pip install -r requirements.txt

Configure API Keys

Create a .env file in the project root directory with at least one API key from a provider you choose. The code supports multiple providers and can use different API keys for load balancing.

Copy the template file and fill in your API keys:

cp env.example .env
# Then edit .env with your actual API keys

πŸ’‘ Tip: For OpenAI models, the organization ID is also required. See env.example for the full template with all available configuration options.

3. Run Experiments

Understanding Key Arguments

  • --version: Experiment version identifier (used for output file naming)
  • --model: LLM model name (e.g., gpt-5-mini-2025-08-07, claude-sonnet-4-5-20250929, gemini-2.5-pro)
  • --api: API key index to use (1, 2, 3, etc.), corresponding to _1, _2, _3 suffixes in .env
  • --thinking: Enable extended thinking mode for Claude models
  • --parallel-mode: Parallelization strategy (none or cohorts for parallel cohort preprocessing)
  • --max-workers: Number of concurrent workers when using --parallel-mode cohorts
  • --data-root: Root directory containing input data. Defaults to ../data (configurable in utils/config.py)

Role-Specific Model Configuration: You can assign different models and/or API index to different agent roles, which will override the global --model and --api:

  • --code-reviewer-model, --code-reviewer-api: Model for Code Review agent
  • --domain-expert-model, --domain-expert-api: Model for Domain Expert agent
  • --data-engineer-model, --data-engineer-api: Model for Data Engineer agents
  • --statistician-model, --statistician-api: Model for Statistician agent
  • --planning-model, --planning-api: Model for the planning mechanism

Example 1: Basic Run with Single Model

python main.py --version exp1 --model gpt-5-mini-2025-08-07 --api 1

Example 2: Heterogeneous LLM Configuration

The below replicates the heterogeneous configuration used in our paper, but any combination is allowed:

python main.py \
  --version exp2 \
  --model claude-sonnet-4-20250514 \
  --thinking \
  --api 1 \
  --planning-model o3-2025-04-16 \
  --planning-api 1 \
  --code-reviewer-model o3-2025-04-16 \
  --code-reviewer-api 1 \
  --domain-expert-model gemini-2.5-pro \
  --domain-expert-api 1

Example 3: Using Open-Source Models

Local Deployment (using the Ollama library):

# DeepSeek-R1 671B (requires substantial GPU resources)
# For higher performance, you may adapt the code to try latest SOTA open-source models
python main.py --version exp3 --model deepseek-r1:671b

# Llama 3.1 8B (suitable for testing on consumer GPUs)
python main.py --version exp4 --model llama3.1

Via API (using Novita by default):

# To use DeepSeek, we recommend third-party providers like Novita for reduced latency
python main.py --version exp5 --model deepseek-r1:671b --use-api --api 1

Example 4: Parallel Mode for Faster Execution

python main.py \
  --version exp6 \
  --model gpt-5-2025-08-07 \
  --api 1 \
  --parallel-mode cohorts \
  --max-workers 2

This processes up to 2 cohorts concurrently, significantly reducing wall-clock time. However, note that setting max-workers too large may stress the API rate limit.

Example 5: Generate Action Units with Human Refinement

Generate Action Unit (AU) prompts from guidelines, allowing manual editing before use:

python main.py \
  --version exp7 \
  --model claude-sonnet-4-5-20250929 \
  --thinking \
  --api 1 \
  --generate-action-units

This will:

  1. Generate AU prompts from agent guidelines
  2. Pause for manual editing of the generated files
  3. Ask for confirmation before proceeding
  4. Use the edited AUs for the experiment

Example 6: Generate Action Units in Non-Interactive Mode

Generate and use Action Units without manual intervention:

python main.py \
  --version exp8 \
  --model claude-sonnet-4-5-20250929 \
  --thinking \
  --api 1 \
  --generate-action-units \
  --non-interactive

This automatically generates and uses AU prompts without pausing for editing, suitable for automated pipelines.

πŸ’° Cost and Time Estimates

Full Benchmark Run

  • Time: 3-5 days of continuous execution
  • Cost: $300+ (varies by model choice and API pricing)
  • Scope: All 1,384 (trait, condition) pairs in GenoTEX

Small-Scale Testing

For functionality verification without full replication:

  1. Download only a few cohort datasets from GenoTEX
  2. Use the --quick-test flag to skip statistical analysis and focus on preprocessing only:
python main.py \
  --version test_preprocess \
  --model claude-sonnet-4-5-20250929 \
  --api 1 \
  --quick-test
  1. This allows you to evaluate preprocessing quality (the more challenging task for agents) without waiting for regression analysis
  2. Note: Full regression analysis requires all related datasets to be preprocessed, which can be time-consuming

⚠️ Important Notes

  • Logs are saved to ./output/log_{version}.txt, which is a human-readable source for observing agent behaviors and diagnosing the system.
  • If a model name is incorrect, the error message will list all supported models.
    • If you want to add new models, feel free to submit a pull request.

πŸ“‚ Output Structure

The system generates outputs following the GenoTEX structure convention:

output/
β”œβ”€β”€ preprocess/
β”‚   └── {trait_name}/          # Preprocessed cohort datasets
β”œβ”€β”€ regress/
β”‚   └── {trait_name}/          # Regression analysis results
└── log_{version}.txt          # Detailed execution logs

For detailed output format specifications, please refer to the GenoTEX documentation.


πŸ”§ Troubleshooting

Memory Issues

For local model deployment (Ollama), ensure sufficient GPU memory to prevent CUDA OOM issues or extended latency. Models like DeepSeek-R1 671B require multiple high-end GPUs. For testing, use smaller models like Llama 3.1 8B or Llama 3.3 70B.

⚠️ Note: Due to the complexity of genomic data, our experiments require a max input length of 20K tokens. Consider this when estimating GPU memory requirements for local deployment.

Timeout Issues

Some models or API endpoints may experience extended latency. The system automatically adjusts timeout values based on model names, but it cannot handle all models and circumstances. If you encounter more than occasional timeout errors:

  • Adjust the timeout scaler for specific models in utils/llm.py (lines 240-256)
  • Modify the task-wise timeout argument --max-time accordingly

Interruption Recovery

We have implemented checkpoint resume logic. In case an experiment run is interrupted, you can rerun the same command to continue from where it left off. Outputs of the half-finished task will be automatically cleared.


πŸ’¬ Discussion

We welcome your feedback and contributions! Feel free to:


πŸ“ Citation

If you find our work useful for your research, please consider citing our paper and starring ⭐ our repository:

@misc{liu2025genomas,
      title={GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis}, 
      author={Haoyang Liu and Yijiang Li and Haohan Wang},
      year={2025},
      eprint={2507.21035},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.21035}, 
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 Haoyang Liu, Yijiang Li, Haohan Wang


πŸ™ Acknowledgements

We thank Ye Zhang, Senior Expert Data Scientist at Novartis, for his valuable suggestions based on his biomedical expertise.


Built with ❀️ by the GenoMAS Team

🌐 Website β€’ πŸ“„ Paper β€’ πŸ’» GitHub β€’ πŸ› Issues

About

A minimalist multi-agent framework for rubost automation of scientific analysis workflows, such as gene expression analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •