Update generation utilities #172

maxreciprocate · 2023-01-09T15:29:57Z

This PR will

enable sweeping over a single gen_kwargs value (e.g. temperature, top_k, beta) during periodic evaluations
change reward_fn's signature from reward_fn(samples) to reward_fn(samples, prompts, responses)
add an optional stop keyword to gen_kwargs (e.g. Models indicate they have completed a response by generating a stop sequence, which is literally the string "Human:" – from Anthropic's 2021 "A General Language Assistant as a Laboratory for Alignment")
clean up evaluation result's outputs

Among HF's generate arguments StoppingCriteria can only stop all generations per batch, eos_token_ids accepts only List[int]. The other option is to customize generate method for every architecture or to trim samples after generate as was done here

https://wandb.ai/sorry/trlx/reports/-Update-generation-utilities-172---VmlldzozMzIxMjE1
https://wandb.ai/sorry/trlx/reports/Update-generation-utilities-172--VmlldzozMzIxMjE5

LouisCastricato · 2023-01-09T15:53:14Z

@PhungVanDuy Are we adding best-of-n here?

PhungVanDuy · 2023-01-09T16:13:32Z

@PhungVanDuy Are we adding best-of-n here?

Yes, we can add it here.

LouisCastricato · 2023-01-11T23:07:14Z

Are we planning to merge tonight

maxreciprocate · 2023-01-12T00:45:16Z

No, I still have to update every other example (currently grappling with T5's) and make regression plots. I will also add clarifying comments shortly so until don't rush with reviewing

jon-tow

Very nice update. I've left some comments for feedback. Could you also resolve the merge conflicts when you get a chance? Thanks 🙏

jon-tow · 2023-01-13T04:55:51Z

trlx/trainer/accelerate_base_trainer.py

+        # Log and display evaluation metrics
+        if self.accelerator.is_main_process:
+            rows = sum(list(map(list, zip(*table))), [])
+            rich_table = Table(*columns, title=f"Evaluation #{self.nth_evaluation}")


It's so beautiful 😭 Do you think we should add show_lines=True in the Table constructor? I noticed some of the outputs kind of bleed together but this might just be my terminal settings. Fine either way - much better than before!

jon-tow · 2023-01-13T07:22:48Z

trlx/trainer/accelerate_base_trainer.py

+        self, prompts: List[torch.IntTensor], samples, prompt_sizes=None
+    ) -> List[str]:
+        """
+        Decode samples into (samples: List[str], outputs: List[str], samples: List[str])


Should the return be documented instead as:
(samples: List[str], prompts: List[str], outputs: List[str])

jon-tow · 2023-01-13T07:27:56Z

trlx/trainer/__init__.py

+        config: TRLConfig,
+        reward_fn=None,
+        metric_fn=None,
+        stop_word=None,


In general, I don't believe that stop sequences will only be words. For clarity, either document that this handles a stop sequence (string) or update the arg name (e.g. OpenAI's Completion API calls it stop).

jon-tow · 2023-01-13T07:39:24Z

trlx/trainer/accelerate_base_trainer.py

+                stop_word_ix = str_output.find(self.stop_word)
+                if stop_word_ix == -1:
+                    stop_word_ix = None
+                str_output = str_output[:stop_word_ix]


Maybe we want to right-strip this to avoid extra white space? (Otherwise, assume users expect to include any leading white space in their stop_word)

trlx/trainer/accelerate_base_trainer.py

jon-tow · 2023-01-13T07:57:13Z

trlx/trainer/accelerate_base_trainer.py

+        for k, v in self.config.method.gen_kwargs.items():
+            if isinstance(v, list):
+                if self.generate_sweep_kwarg is not None:
+                    print(


Let's use utils.print_rank_0 or check for main process here to avoid the annoying prints from all processes.

maxreciprocate added 4 commits January 9, 2023 16:20

feat(base_trainer): enable sweeping over a single gen_kwargs value

30484a1

refactor(base_trainer): rename relevant variables

884310a

fix(base_trainer): initialize gen_sweep_arg regardless

65bfde5

feat(base_trainer): change reward_fn's signature to accept kwargs

1353850

jon-tow added this to the v0.4.0 milestone Jan 9, 2023

maxreciprocate added 5 commits January 11, 2023 15:58

Merge branch 'main' into add-generate-utils

c8f462b

merge(base_trainer): refactor to reflect main

fb36a78

feat(*_trainer): add stop_word

b967588

refactor(base_trainer): remove seq2seq if-case

44fd063

refactor(base_trainer): clean up logging of samples

b4964c0

maxreciprocate added 7 commits January 12, 2023 15:35

fix(base_trainer): remove inconsistencies

5507d54

fix(ppo_orchestrator): consistent padding and gpu device

22d210a

feat(base_trainer): add rich as dependency

528ddee

chore(examples): update signatures

9f90af5

fix(ppo_orchestrator): logprob gather indexing

72fc87f

docs(trlx): update train's signature

cd4c70d

fix(base_trainer): disable save_best when training with deepspeed

80e65b1

maxreciprocate marked this pull request as ready for review January 12, 2023 17:24

jon-tow requested changes Jan 13, 2023

View reviewed changes

maxreciprocate added 7 commits January 13, 2023 11:20

Merge branch 'main' into add-generate-utils

f5c5c2e

merge(base): complete merge

4b5d5e2

feat(base_trainer): rework stop_word -> stop_sequences

65171fb

docs(base_trainer): update decode's signature

3c250d7

chore(base_trainer): print -> print_rank_0

7262d0c

feat(base_trainer): clean up table's output

877515b

feat(base_trainer): add number of gpus to the run's name

a38f729

maxreciprocate added 2 commits January 13, 2023 17:23

style(trlx): satisfy black

89209ca

style(wandb): satisfy isort

648549c

jon-tow approved these changes Jan 13, 2023

View reviewed changes

LouisCastricato merged commit 84dd156 into main Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update generation utilities #172

Update generation utilities #172

Uh oh!

maxreciprocate commented Jan 9, 2023 •

edited

Loading

Uh oh!

LouisCastricato commented Jan 9, 2023

Uh oh!

PhungVanDuy commented Jan 9, 2023

Uh oh!

LouisCastricato commented Jan 11, 2023

Uh oh!

maxreciprocate commented Jan 12, 2023

Uh oh!

jon-tow left a comment

Uh oh!

jon-tow Jan 13, 2023

Uh oh!

jon-tow Jan 13, 2023

Uh oh!

jon-tow Jan 13, 2023

Uh oh!

jon-tow Jan 13, 2023 •

edited

Loading

Uh oh!

Uh oh!

jon-tow Jan 13, 2023 •

edited

Loading

Uh oh!

Uh oh!

Update generation utilities #172

Update generation utilities #172

Uh oh!

Conversation

maxreciprocate commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LouisCastricato commented Jan 9, 2023

Uh oh!

PhungVanDuy commented Jan 9, 2023

Uh oh!

LouisCastricato commented Jan 11, 2023

Uh oh!

maxreciprocate commented Jan 12, 2023

Uh oh!

jon-tow left a comment

Choose a reason for hiding this comment

Uh oh!

jon-tow Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

jon-tow Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

jon-tow Jan 13, 2023

Choose a reason for hiding this comment

Uh oh!

jon-tow Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jon-tow Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maxreciprocate commented Jan 9, 2023 •

edited

Loading

jon-tow Jan 13, 2023 •

edited

Loading

jon-tow Jan 13, 2023 •

edited

Loading