A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

Official implementation of the paper:

"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis"
Haoyang Liu, Yijiang Li, Haohan Wang
UIUC, UC San Diego

🌐 Website | 📄 Paper | 💻 Code

🔬 Overview

This repository contains two main components:

🤖 1. Multi-Agent Framework

A minimal yet powerful framework for robust automation of scientific workflows:

Generic Communication Protocol: Typed messaging mechanism for code-driven analysis
Notebook-Style Workflow: Agents can plan, write code, execute, debug, and backtrack through multi-step tasks
Customizable Agents: Define specific roles, guidelines, tools, and action units for your domain

Design Philosophy:

Balance controllability of traditional workflows with flexibility of autonomous agents
Provide just enough encapsulation to make agent experiments easier—simplicity matters
Build a reliable foundation for production-level agent systems

🧬 2. GenoMAS Implementation

A specialized system for automated gene expression data analysis:

Data Sources: Analyzes transcriptomic datasets from GEO and TCGA
Goal: Identify significant genes related to traits while accounting for confounders
State-of-the-Art Performance: 60.38% F1 score on the GenoTEX benchmark, substantially outperforming both open-domain agents and generic biomedical agents
Scientific Discovery: Identifies biologically meaningful gene-trait associations, with high-confidence associations corroborated by literature and novel findings worthy of further investigation

📦 Usage

1. Data Preparation

Download Input Data

Our experiments use the publicly available GenoTEX benchmark. Download the input data (~42 GB) from the Google Drive folder, and save them under the same parent folder.

Verify Data Integrity

cd download
python validator.py --data-dir /path/to/data --validate

2. Environment Setup

Create Python Environment

Create a conda environment with Python 3.10 and install the required packages:

conda create -n genomas python=3.10
conda activate genomas
pip install -r requirements.txt

Configure API Keys

Create a .env file in the project root directory with at least one API key from a provider you choose. The code supports multiple providers and can use different API keys for load balancing.

Copy the template file and fill in your API keys:

cp env.example .env
# Then edit .env with your actual API keys

💡 Tip: For OpenAI models, the organization ID is also required. See env.example for the full template with all available configuration options.

3. Run Experiments

Understanding Key Arguments

--version: Experiment version identifier (used for output file naming)
--model: LLM model name (e.g., gpt-5-mini-2025-08-07, claude-sonnet-4-5-20250929, gemini-2.5-pro)
--api: API key index to use (1, 2, 3, etc.), corresponding to _1, _2, _3 suffixes in .env
--thinking: Enable extended thinking mode for Claude models
--parallel-mode: Parallelization strategy (none or cohorts for parallel cohort preprocessing)
--max-workers: Number of concurrent workers when using --parallel-mode cohorts
--data-root: Root directory containing input data. Defaults to ../data (configurable in utils/config.py)

Role-Specific Model Configuration: You can assign different models and/or API index to different agent roles, which will override the global --model and --api:

--code-reviewer-model, --code-reviewer-api: Model for Code Review agent
--domain-expert-model, --domain-expert-api: Model for Domain Expert agent
--data-engineer-model, --data-engineer-api: Model for Data Engineer agents
--statistician-model, --statistician-api: Model for Statistician agent
--planning-model, --planning-api: Model for the planning mechanism

Example 1: Basic Run with Single Model

python main.py --version exp1 --model gpt-5-mini-2025-08-07 --api 1

Example 2: Heterogeneous LLM Configuration

The below replicates the heterogeneous configuration used in our paper, but any combination is allowed:

python main.py \
  --version exp2 \
  --model claude-sonnet-4-20250514 \
  --thinking \
  --api 1 \
  --planning-model o3-2025-04-16 \
  --planning-api 1 \
  --code-reviewer-model o3-2025-04-16 \
  --code-reviewer-api 1 \
  --domain-expert-model gemini-2.5-pro \
  --domain-expert-api 1

Example 3: Using Open-Source Models

Local Deployment (using the Ollama library):

# DeepSeek-R1 671B (requires substantial GPU resources)
# For higher performance, you may adapt the code to try latest SOTA open-source models
python main.py --version exp3 --model deepseek-r1:671b

# Llama 3.1 8B (suitable for testing on consumer GPUs)
python main.py --version exp4 --model llama3.1

Via API (using Novita by default):

# To use DeepSeek, we recommend third-party providers like Novita for reduced latency
python main.py --version exp5 --model deepseek-r1:671b --use-api --api 1

Example 4: Parallel Mode for Faster Execution

python main.py \
  --version exp6 \
  --model gpt-5-2025-08-07 \
  --api 1 \
  --parallel-mode cohorts \
  --max-workers 2

This processes up to 2 cohorts concurrently, significantly reducing wall-clock time. However, note that setting max-workers too large may stress the API rate limit.

Example 5: Generate Action Units with Human Refinement

Generate Action Unit (AU) prompts from guidelines, allowing manual editing before use:

python main.py \
  --version exp7 \
  --model claude-sonnet-4-5-20250929 \
  --thinking \
  --api 1 \
  --generate-action-units

This will:

Generate AU prompts from agent guidelines
Pause for manual editing of the generated files
Ask for confirmation before proceeding
Use the edited AUs for the experiment

Example 6: Generate Action Units in Non-Interactive Mode

Generate and use Action Units without manual intervention:

python main.py \
  --version exp8 \
  --model claude-sonnet-4-5-20250929 \
  --thinking \
  --api 1 \
  --generate-action-units \
  --non-interactive

This automatically generates and uses AU prompts without pausing for editing, suitable for automated pipelines.

💰 Cost and Time Estimates

Full Benchmark Run

Time: 3-5 days of continuous execution
Cost: $300+ (varies by model choice and API pricing)
Scope: All 1,384 (trait, condition) pairs in GenoTEX

Small-Scale Testing

For functionality verification without full replication:

Download only a few cohort datasets from GenoTEX
Use the --quick-test flag to skip statistical analysis and focus on preprocessing only:

python main.py \
  --version test_preprocess \
  --model claude-sonnet-4-5-20250929 \
  --api 1 \
  --quick-test

This allows you to evaluate preprocessing quality (the more challenging task for agents) without waiting for regression analysis
Note: Full regression analysis requires all related datasets to be preprocessed, which can be time-consuming

⚠️ Important Notes

Logs are saved to ./output/log_{version}.txt, which is a human-readable source for observing agent behaviors and diagnosing the system.
If a model name is incorrect, the error message will list all supported models.
- If you want to add new models, feel free to submit a pull request.

📂 Output Structure

The system generates outputs following the GenoTEX structure convention:

output/
├── preprocess/
│   └── {trait_name}/          # Preprocessed cohort datasets
├── regress/
│   └── {trait_name}/          # Regression analysis results
└── log_{version}.txt          # Detailed execution logs

For detailed output format specifications, please refer to the GenoTEX documentation.

🔧 Troubleshooting

Memory Issues

For local model deployment (Ollama), ensure sufficient GPU memory to prevent CUDA OOM issues or extended latency. Models like DeepSeek-R1 671B require multiple high-end GPUs. For testing, use smaller models like Llama 3.1 8B or Llama 3.3 70B.

⚠️ Note: Due to the complexity of genomic data, our experiments require a max input length of 20K tokens. Consider this when estimating GPU memory requirements for local deployment.

Timeout Issues

Some models or API endpoints may experience extended latency. The system automatically adjusts timeout values based on model names, but it cannot handle all models and circumstances. If you encounter more than occasional timeout errors:

Adjust the timeout scaler for specific models in utils/llm.py (lines 240-256)
Modify the task-wise timeout argument --max-time accordingly

Interruption Recovery

We have implemented checkpoint resume logic. In case an experiment run is interrupted, you can rerun the same command to continue from where it left off. Outputs of the half-finished task will be automatically cleared.

💬 Discussion

We welcome your feedback and contributions! Feel free to:

Open an issue for bug reports or feature requests
Submit a pull request for improvements
Contact the authors via email for research collaborations

📝 Citation

If you find our work useful for your research, please consider citing our paper and starring ⭐ our repository:

@misc{liu2025genomas,
      title={GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis}, 
      author={Haoyang Liu and Yijiang Li and Haohan Wang},
      year={2025},
      eprint={2507.21035},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.21035}, 
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

We thank Ye Zhang, Senior Expert Data Scientist at Novartis, for his valuable suggestions based on his biomedical expertise.

Built with ❤️ by the GenoMAS Team

🌐 Website • 📄 Paper • 💻 GitHub • 🐛 Issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

🔬 Overview

🤖 1. Multi-Agent Framework

🧬 2. GenoMAS Implementation

📚 Table of Contents

📦 Usage

1. Data Preparation

2. Environment Setup

3. Run Experiments

Understanding Key Arguments

Example 1: Basic Run with Single Model

Example 2: Heterogeneous LLM Configuration

Example 3: Using Open-Source Models

Example 4: Parallel Mode for Faster Execution

Example 5: Generate Action Units with Human Refinement

Example 6: Generate Action Units in Non-Interactive Mode

💰 Cost and Time Estimates

📂 Output Structure

🔧 Troubleshooting

Memory Issues

Timeout Issues

Interruption Recovery

💬 Discussion

📝 Citation

📄 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
core		core
download		download
imgs		imgs
metadata		metadata
prompts		prompts
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.example		env.example
environment.py		environment.py
eval.py		eval.py
main.py		main.py
ollama.sh		ollama.sh
requirements.txt		requirements.txt

License

Liu-Hy/GenoMAS

Folders and files

Latest commit

History

Repository files navigation

A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

🔬 Overview

🤖 1. Multi-Agent Framework

🧬 2. GenoMAS Implementation

📚 Table of Contents

📦 Usage

1. Data Preparation

2. Environment Setup

3. Run Experiments

Understanding Key Arguments

Example 1: Basic Run with Single Model

Example 2: Heterogeneous LLM Configuration

Example 3: Using Open-Source Models

Example 4: Parallel Mode for Faster Execution

Example 5: Generate Action Units with Human Refinement

Example 6: Generate Action Units in Non-Interactive Mode

💰 Cost and Time Estimates

📂 Output Structure

🔧 Troubleshooting

Memory Issues

Timeout Issues

Interruption Recovery

💬 Discussion

📝 Citation

📄 License

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages