Skip to content

Conversation

Tcc0403
Copy link
Contributor

@Tcc0403 Tcc0403 commented Oct 12, 2024

Summary

Resolve #277.

Testing Done

  • Hardware Type: gpu-ci
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@Tcc0403 Tcc0403 marked this pull request as ready for review October 12, 2024 17:31
@lancerts lancerts requested a review from yundai424 October 13, 2024 16:05
tl.store(dX_ptr + offsets, dX, mask=mask)


MAX_FUSED_SIZE = 65536


def jsd_forward(_input, target, beta):
def jsd_forward(_input, target, label, beta, ignore_index, has_label):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong -- if i'm understanding it correctly, we currently have an intrinsic assumption that the label is shifted already. It would be helpful to specify this requirement and provide an example of what kind of input we'll expect in this case 🤔

Copy link
Contributor Author

@Tcc0403 Tcc0403 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I added some examples in transformers files, and renamed it to shift_labels.

@Tcc0403 Tcc0403 requested a review from yundai424 October 15, 2024 22:18
Copy link
Collaborator

@yundai424 yundai424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM just a very minor suggestion!

beta,
n_rows,
n_non_ignore,
ignore_index,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be a constexpr

@Tcc0403 Tcc0403 requested a review from yundai424 October 16, 2024 01:15
@yundai424 yundai424 merged commit 24a7efc into linkedin:main Oct 16, 2024
2 checks passed
@Tcc0403 Tcc0403 deleted the jsd-ignore-index branch December 1, 2024 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding ignore index support for divergence losses
3 participants