Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,6 @@ nbs/wandb/
wandb/

OUT/

#examples related
examples/experiments/grounded_program_synthesis/dataset
41 changes: 41 additions & 0 deletions examples/experiments/grounded_program_synthesis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Interpreter Grounded Program Synthesis
*Program synthesis* is the task of automatically generating programs that solve a given task by satisfying an IO condition. In Neural Program Synthesis the synthesizer is a neural network which is a Language Model that takes in an input/output pair and tries to generate the program in the defined toy DSL's Grammar.

## Toy List Manipulation DSL Grammar
The DSL has the following grammar:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the DSL grammar, add some snippets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added example snippets to showcase the atomic functions.

```
list_expr := list[int]
integer := -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5
statement :=
| take(list_expr,integer)
| drop(list_expr,integer)
| reverse(list_expr)
| sort_asc(list_expr)
| sort_des(list_expr)
| add_n(list_expr,integer)
| sub_n(list_expr,integer)
| mul_n(list_expr,integer)
| expand_copy(list_expr)


```
This particular program `add_n(reverse([-2, -5, -4]),1)` would reverse the list and add one to it, thereby giving `[-3,-4,-1]`.
More examples are showcased below:
```
take([1,2,3],2) -> [1,2]
drop([1,2,3],2) -> [1]
reverse([1,2,3]) -> [3,2,1]
sort_asc([10,5,6]) -> [5,6,10]
sort_des([10,5,6]) -> [10,6,5]

```
To generate training/testing data run, `python3 -m lang`. The dataset would be saved in `./dataset/train.json` and `./dataset/test.json`. To use the processed dataset refer to this [google drive link](https://drive.google.com/drive/folders/1093FlJA0MF7gh25yi4-__yU6Fj-onK1v?usp=share_link).
Each datapoint in the dataset would look like,
```json
{"input": "Input: [4, -2, 0, 0, 5, 5] Output: [25, 25, 20, 0, 0, -10] Function:",
"output": "sort_des(reverse(mul_n(sort_asc(sort_asc([4, -2, 0, 0, 5, 5])),5)))"}
```
## Caveat on DSL design
The DSL designed here is a very simple toy example with every function returning type `list`, ideally in a real world scenario even list manipulation DSLs would be more complex with different types like strings, etc.
## Training with TRLX
Run `python3 -m train_trlx.py` to run the training with grounded interpreter. The `reward_fn`, would return `-1` if a sample generated is of invalid syntax. it would return `0.5` if the generated syntax is valid but doesn't satisfy IO condition.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
model:
model_path: "moyix/codegen-350M-mono-gptj" # Name of hf model to load
tokenizer_path: "Salesforce/codegen-350M-mono" # Name of hf tokenizer to load
model_type: "AcceleratePPOModel" # Name of accelerate model type to load
num_layers_unfrozen: 2 # Number of bottom layers to freeze during training

train:
seq_length: 256 # Size of LM context
epochs: 10 # Train for max(epochs, total_steps)
total_steps: 80000 # Train for max(epochs, total_steps)
batch_size: 8 # batch size

lr_ramp_steps: 100 # learning rate warm up
lr_decay_steps: 79000 # learning rate decay
weight_decay: 1.0e-6 # weight decay param
learning_rate_init: 1.412e-4 # init learning rate
learning_rate_target: 1.412e-4 # target final learning rate
opt_betas: [0.9, 0.95] # adam betas

checkpoint_interval: 1000000 # checkpoint interval
eval_interval: 16 # eval interval

pipeline: "PPOPipeline" # prompt pipeline to load
orchestrator: "PPOOrchestrator" # orchestrator to load

method:
name: 'ppoconfig' # Name of RL method config
num_rollouts: 8 # Number of rollouts to collect per epoch
chunk_size: 8 # Number of rollouts to collect in one loop of orchestrator
ppo_epochs: 4 # Number of ppo epochs
init_kl_coef: 0.2 # init kl coefficient
target: 6 # target kl coefficient
horizon: 10000 # PPO horizon
gamma: 1 # PPO discount
lam: 0.95 # PPO lambda
cliprange: 0.2 # clip range
cliprange_value: 0.2 # clip range
vf_coef: 0.2 # value term weight
gen_kwargs:
max_length: 256 # LM max sample gen length
min_length: 48 # LM min sample gen length
top_k: 0.0 # top k
top_p: 0.7 # top p
do_sample: True # sample
temperature: 0.5
Loading