add reference model logps to chunkedloss interface and fix dpo loss fn #405

shivam15s · 2024-11-21T23:10:34Z

accomodate reference model logps in chunked loss interface and make dpo loss use reference model logps in its loss function

Summary

as title

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

…s use reference model

pramodith

Great work! Wrapping DPO up.

pramodith · 2024-11-21T23:20:06Z

src/liger_kernel/chunked_loss/fused_linear_preference.py

+        input_chunk, ref_weight, target_chunk, ref_bias=None, ignore_index=-100
+    ):
+        with torch.no_grad():
+            ref_logits_chunk = input_chunk @ ref_weight.t()


The logic to get the log_probs is common for both the reference and active policy model, wondering if we can create a util function that can be called by both instead of repeating the code.

should be possible; just need the util fn to compute nll loss for the active policy forward

pramodith

Thanks for the refactor!

shivam15s added 2 commits November 21, 2024 23:12

change interface to accomodate reference model logps and make dpo los…

b07af8f

…s use reference model

checkstyle

15cb383

shivam15s force-pushed the shisahni/dpo_ref branch from 116ee1a to 15cb383 Compare November 21, 2024 23:12

shivam15s changed the title ~~change interface to accomodate reference model logps~~ [chunked loss] accomodate reference model logps and dpo loss w ref Nov 21, 2024

shivam15s changed the title ~~[chunked loss] accomodate reference model logps and dpo loss w ref~~ [chunkedloss] add reference model logps and add ref logps to dpo Nov 21, 2024

pramodith reviewed Nov 21, 2024

View reviewed changes

shivam15s changed the title ~~[chunkedloss] add reference model logps and add ref logps to dpo~~ add reference model logps to chunkedloss interface and fix dpo loss fn Nov 21, 2024

shivam15s added 2 commits November 21, 2024 23:56

refactor code

f84f16a

checkstyle

0fda3b8

shivam15s requested a review from pramodith November 21, 2024 23:57

pramodith approved these changes Nov 22, 2024

View reviewed changes

austin362667 mentioned this pull request Nov 22, 2024

Fix DPO with Reference Model #387

Closed

3 tasks

shivam15s merged commit d907ec0 into main Nov 22, 2024
3 checks passed

shivam15s deleted the shisahni/dpo_ref branch November 22, 2024 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add reference model logps to chunkedloss interface and fix dpo loss fn #405

add reference model logps to chunkedloss interface and fix dpo loss fn #405

Uh oh!

shivam15s commented Nov 21, 2024 •

edited

Loading

Uh oh!

pramodith left a comment

Uh oh!

pramodith Nov 21, 2024

Uh oh!

shivam15s Nov 21, 2024

Uh oh!

pramodith left a comment

Uh oh!

Uh oh!

Uh oh!

add reference model logps to chunkedloss interface and fix dpo loss fn #405

add reference model logps to chunkedloss interface and fix dpo loss fn #405

Uh oh!

Conversation

shivam15s commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

pramodith left a comment

Choose a reason for hiding this comment

Uh oh!

pramodith Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

shivam15s Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

pramodith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shivam15s commented Nov 21, 2024 •

edited

Loading