CarperAI · LouisCastricato · Nov 7, 2022 · Oct 22, 2022 · Nov 5, 2022 · Nov 5, 2022
diff --git a/.gitignore b/.gitignore
@@ -145,3 +145,6 @@ nbs/wandb/
 wandb/
 
 OUT/
+
+#examples related 
+examples/experiments/grounded_program_synthesis/dataset
diff --git a/examples/experiments/grounded_program_synthesis/README.md b/examples/experiments/grounded_program_synthesis/README.md
@@ -0,0 +1,41 @@
+# Interpreter Grounded Program Synthesis
+*Program synthesis* is the task of automatically generating programs that solve a given task by satisfying an IO condition. In Neural Program Synthesis the synthesizer is a neural network which is a Language Model that takes in an input/output pair and tries to generate the program in the defined toy DSL's Grammar.
+
+## Toy List Manipulation DSL Grammar
+The DSL has the following grammar:
+```
+list_expr := list[int]
+integer := -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5
+statement :=
+          | take(list_expr,integer)
+          | drop(list_expr,integer)
+          | reverse(list_expr)
+          | sort_asc(list_expr)
+          | sort_des(list_expr)
+          | add_n(list_expr,integer)
+          | sub_n(list_expr,integer)
+          | mul_n(list_expr,integer)
+          | expand_copy(list_expr)
+
+
+```
+This particular program `add_n(reverse([-2, -5, -4]),1)` would reverse the list and add one to it, thereby giving `[-3,-4,-1]`.          
+More examples are showcased below:
+```
+take([1,2,3],2) -> [1,2]
+drop([1,2,3],2) -> [1]
+reverse([1,2,3]) -> [3,2,1]
+sort_asc([10,5,6]) -> [5,6,10]
+sort_des([10,5,6]) -> [10,6,5]
+
+```
+To generate training/testing data run, `python3 -m lang`. The dataset would be saved in `./dataset/train.json` and `./dataset/test.json`. To use the processed dataset refer to this [google drive link](https://drive.google.com/drive/folders/1093FlJA0MF7gh25yi4-__yU6Fj-onK1v?usp=share_link).    
+Each datapoint in the dataset would look like,
+```json
+    {"input": "Input: [4, -2, 0, 0, 5, 5] Output: [25, 25, 20, 0, 0, -10] Function:",
+    "output": "sort_des(reverse(mul_n(sort_asc(sort_asc([4, -2, 0, 0, 5, 5])),5)))"}
+```
+## Caveat on DSL design
+The DSL designed here is a very simple toy example with every function returning type `list`, ideally in a real world scenario even list manipulation DSLs would be more complex with different types like strings, etc.
+## Training with TRLX
+Run `python3 -m train_trlx.py` to run the training with grounded interpreter. The `reward_fn`, would return `-1` if a sample generated is of invalid syntax. it would return `0.5` if the generated syntax is valid but doesn't satisfy IO condition. 
diff --git a/examples/experiments/grounded_program_synthesis/__init__.py b/examples/experiments/grounded_program_synthesis/__init__.py
diff --git a/examples/experiments/grounded_program_synthesis/config/trlx_ppo_config.yml b/examples/experiments/grounded_program_synthesis/config/trlx_ppo_config.yml
@@ -0,0 +1,45 @@
+model:
+  model_path: "moyix/codegen-350M-mono-gptj"  # Name of hf model to load
+  tokenizer_path: "Salesforce/codegen-350M-mono"  # Name of hf tokenizer to load
+  model_type: "AcceleratePPOModel"  # Name of accelerate model type to load
+  num_layers_unfrozen: 2  # Number of bottom layers to freeze during training
+
+train:
+  seq_length: 256  # Size of LM context
+  epochs: 10  # Train for max(epochs, total_steps)
+  total_steps: 80000  # Train for max(epochs, total_steps)
+  batch_size: 8  # batch size
+
+  lr_ramp_steps: 100  # learning rate warm up
+  lr_decay_steps: 79000  # learning rate decay
+  weight_decay: 1.0e-6  # weight decay param
+  learning_rate_init: 1.412e-4  # init learning rate
+  learning_rate_target: 1.412e-4  # target final learning rate
+  opt_betas: [0.9, 0.95] # adam betas
+
+  checkpoint_interval: 1000000  # checkpoint interval
+  eval_interval: 16  # eval interval
+
+  pipeline: "PPOPipeline"  # prompt pipeline to load
+  orchestrator: "PPOOrchestrator"  # orchestrator to load
+
+method:
+  name: 'ppoconfig'  # Name of RL method config
+  num_rollouts: 8  # Number of rollouts to collect per epoch
+  chunk_size: 8  # Number of rollouts to collect in one loop of orchestrator
+  ppo_epochs: 4  # Number of ppo epochs
+  init_kl_coef: 0.2  # init kl coefficient
+  target: 6  # target kl coefficient
+  horizon: 10000  # PPO horizon
+  gamma: 1  # PPO discount
+  lam: 0.95  # PPO lambda
+  cliprange: 0.2  # clip range
+  cliprange_value: 0.2  # clip range
+  vf_coef: 0.2  # value term weight
+  gen_kwargs:
+    max_length: 256  # LM max sample gen length
+    min_length: 48  # LM min sample gen length
+    top_k: 0.0  # top k
+    top_p: 0.7  # top p
+    do_sample: True  # sample
+    temperature: 0.5
-Original file line number
+Diff line change
@@ Expand Up / @@ -145,3 +145,6 @@ nbs/wandb/ @@
     wandb/
     OUT/
+    #examples related
+    examples/experiments/grounded_program_synthesis/dataset