Comparing PushGP and GPT-4o on Program Synthesis with only Input-Output Examples

This repository contains code and materials for "Comparn. Below is a detailed README file explaining the structure, usage, and other important aspects of this project.

Introduction

This study analyzed the program synthesis capabilities of genetic programming and large language models when both are provided with user intent consisting of input-output examples. We evaluated the program synthesis performance of PushGP and OpenAI’s GPT-4o on tasks from the PSB2 suite.

Installation and Running

To install the required dependencies for running the experiments:

Clone this repository.
Install Python 3.9 or higher.
Start a virtual environement:

python3 -m venv myenv
source myenv/bin/activate

Install required packages:

pip install -r requirements.txt

Add your Azure OpenAI apikey and endpoint as an evirononment variable. In a Unix operating system:

export AZURE_OPENAI_API_KEY=<your_api_key_here>
export AZURE_OPENAI_ENDPOINT=<your_endpoint_here>

Confirm your API is working by running:

python3 replication.py

Generate data splits.

python3 datautils.py

Experiments

A preliminary experiment identified suitable prompts from a set of five gathered from existing literature on LLM- based program synthesis.

After selecting the best prompt, we use the 25 PSB2 tasks to evaluate three program synthesizers: PushGP, GPT-4o with data-only prompts, and GPT-4o with text-only prompts.

How to Run Experiments

Running GP

local_script.sh

ID'ing best LLM query

python3 bestquery.py

Comparing best i/o query, normal LLM query, and both.

python3 finalquery.py

Results and Interpret

PushGP solved 10 tasks, GPT-4o with data-only prompts solved 8 tasks, and GPT-4o with text-only prompts solved 7 tasks. Both PushGP and GPT-4o solve overlapping and distinct tasks, suggesting that neither consistently outperforms the other. Given 7 of the 25 tasks were solved by GPT-4o irrespective of the type of prompt used, this indicates that GPT-4o is able to retrieve the relevant information in its training corpus related to the task using both text-based and data-based prompts. The preference for one synthesizer over the other may depend on the information available for a given task.

Contributions

Contributions are welcome! Please send any suggestions or pull requests to GitHub Issue Tracker or via email.

Citation

@conference{theaksainigp_v_llm,
    title = {Comparing PushGP and GPT-4o on Program Synthesis with only 
    Input-Output Examples},
    author = {Jose G Hernandez and Anil K Saini and Gabriel L Ketron 
    and Jason H Moore },
    year = 2025,
    month = {July},
    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference 
    (GECCO 2025)},
    address = {Malaga, Spain}
    publisher = {ACM},
    doi = {}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
GP		GP
LLM		LLM
benchmark_problems		benchmark_problems
generated_code		generated_code
venv		venv
.DS_Store		.DS_Store
.gitignore		.gitignore
LLMSubmission.sh		LLMSubmission.sh
PSB2_metadata.csv		PSB2_metadata.csv
README.md		README.md
__init__.py		__init__.py
bestquery.py		bestquery.py
finalquery.py		finalquery.py
local_script.sh		local_script.sh
replication.py		replication.py
requirements.txt		requirements.txt
synthesize_gp.py		synthesize_gp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comparing PushGP and GPT-4o on Program Synthesis with only Input-Output Examples

Introduction

Installation and Running

Experiments

How to Run Experiments

Results and Interpret

Contributions

Citation

About

Uh oh!

Releases

Packages

Languages

gketronDS/gp_v_llm

Folders and files

Latest commit

History

Repository files navigation

Comparing PushGP and GPT-4o on Program Synthesis with only Input-Output Examples

Introduction

Installation and Running

Experiments

How to Run Experiments

Results and Interpret

Contributions

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages