Skip to content

Conversation

@lkhphuc
Copy link
Contributor

@lkhphuc lkhphuc commented Mar 6, 2025

Simply log all the learning rates for all parameter groups of all schedulers.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 6, 2025
Co-authored-by: Chien-Chin Huang <[email protected]>
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me.

Could you please include a screenshot of the results? Either TB or WandB is fine.

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems similar things (with bigger change) are being done in #938

How about we collaborate over there?

@lkhphuc
Copy link
Contributor Author

lkhphuc commented Mar 7, 2025

Screenshot 2025-03-07 at 10 16 03 Here's a screenshot on WandB (before the log's name change).

It seems similar things (with bigger change) are being done in #938

Currently that PR does not includes changes to the Exp Tracker, only to the logger. So it's orthogonal to this PR I think.
I'm not sure how best to proceed but but feel free to include the code in that PR directly before merging. .

@tianyu-l
Copy link
Contributor

tianyu-l commented Mar 7, 2025

sorry what is "Exp Tracker"?
It looks to me this line effectively achieves the same thing, as all optimizer groups across all lr schedulers would have the same LR in torchtitan setting.
https://github.com/pytorch/torchtitan/pull/938/files#diff-ea620cebba782ef8545fcfc700627348c15bb4cbb8ef5c5b4f417ddff955668bR396

@lkhphuc
Copy link
Contributor Author

lkhphuc commented Mar 7, 2025

Ah yes sorry I was confused and missed that part. Those are the same thing. I will close this PR in favor of that PR.

@lkhphuc lkhphuc closed this Mar 7, 2025
tianyu-l pushed a commit that referenced this pull request Jul 31, 2025
This PR adds learning rate logging. There was a previous attempt to
implement this in an [earlier
PR](#937), but that one was
ultimately **closed**. This version ensures that LR logging works
properly, I verified it using the WSD scheduler that was recently added
in [another PR](#938).
<img width="1842" height="730" alt="image"
src="https://github.com/user-attachments/assets/8f23674a-d689-4cc2-9d9b-30bff4e63f3b"
/>

One design consideration here is that torchtitan supports multiple
optimizers and learning rate schedules, each potentially having its own
LR. However, in practice, I believe that 99.9999% of use cases will use
a single LR.

Given that, the logging works as follows:
- If there is only one learning rate, it gets logged directly under the
main charts as `lr`.
- If there are multiple learning rates, they are logged under a separate
section, each with its corresponding label.

Alternatively, we could have ignored the multi-LR case and always logged
a single LR, but I prefer this approach since it handles both scenarios
robustly with minimal extra code.

Happy to adjust if others have a strong preference for simplicity over
robustness.
bentherien pushed a commit to bentherien/torchtitan_ that referenced this pull request Aug 5, 2025
This PR adds learning rate logging. There was a previous attempt to
implement this in an [earlier
PR](pytorch#937), but that one was
ultimately **closed**. This version ensures that LR logging works
properly, I verified it using the WSD scheduler that was recently added
in [another PR](pytorch#938).
<img width="1842" height="730" alt="image"
src="https://github.com/user-attachments/assets/8f23674a-d689-4cc2-9d9b-30bff4e63f3b"
/>

One design consideration here is that torchtitan supports multiple
optimizers and learning rate schedules, each potentially having its own
LR. However, in practice, I believe that 99.9999% of use cases will use
a single LR.

Given that, the logging works as follows:
- If there is only one learning rate, it gets logged directly under the
main charts as `lr`.
- If there are multiple learning rates, they are logged under a separate
section, each with its corresponding label.

Alternatively, we could have ignored the multi-LR case and always logged
a single LR, but I prefer this approach since it handles both scenarios
robustly with minimal extra code.

Happy to adjust if others have a strong preference for simplicity over
robustness.
joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025
This PR adds learning rate logging. There was a previous attempt to
implement this in an [earlier
PR](pytorch#937), but that one was
ultimately **closed**. This version ensures that LR logging works
properly, I verified it using the WSD scheduler that was recently added
in [another PR](pytorch#938).
<img width="1842" height="730" alt="image"
src="https://github.com/user-attachments/assets/8f23674a-d689-4cc2-9d9b-30bff4e63f3b"
/>

One design consideration here is that torchtitan supports multiple
optimizers and learning rate schedules, each potentially having its own
LR. However, in practice, I believe that 99.9999% of use cases will use
a single LR.

Given that, the logging works as follows:
- If there is only one learning rate, it gets logged directly under the
main charts as `lr`.
- If there are multiple learning rates, they are logged under a separate
section, each with its corresponding label.

Alternatively, we could have ignored the multi-LR case and always logged
a single LR, but I prefer this approach since it handles both scenarios
robustly with minimal extra code.

Happy to adjust if others have a strong preference for simplicity over
robustness.
joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025
This PR adds learning rate logging. There was a previous attempt to
implement this in an [earlier
PR](pytorch#937), but that one was
ultimately **closed**. This version ensures that LR logging works
properly, I verified it using the WSD scheduler that was recently added
in [another PR](pytorch#938).
<img width="1842" height="730" alt="image"
src="https://github.com/user-attachments/assets/8f23674a-d689-4cc2-9d9b-30bff4e63f3b"
/>

One design consideration here is that torchtitan supports multiple
optimizers and learning rate schedules, each potentially having its own
LR. However, in practice, I believe that 99.9999% of use cases will use
a single LR.

Given that, the logging works as follows:
- If there is only one learning rate, it gets logged directly under the
main charts as `lr`.
- If there are multiple learning rates, they are logged under a separate
section, each with its corresponding label.

Alternatively, we could have ignored the multi-LR case and always logged
a single LR, but I prefer this approach since it handles both scenarios
robustly with minimal extra code.

Happy to adjust if others have a strong preference for simplicity over
robustness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants