Skip to content

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Jun 9, 2022

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved GPU memory management for batch size determination in YOLOv5.

📊 Key Changes

  • Added emojis import to autobatch.py for enriched logging.
  • Included device check before profiling to handle situations without CUDA.
  • Enhanced display of CUDA memory statistics including name, total, reserved, allocated, and free memory.
  • Revised profiling of batch sizes to store results more effectively.
  • Implemented a polynomial fit to determine optimal batch size based on available memory.
  • Added a catch for profiling failures to ensure chosen batch size has successfully passed memory requirement tests.

🎯 Purpose & Impact

  • 🔍 Clearer insights into memory usage: Users gain better visibility about their GPU memory, aiding in troubleshooting and performance optimization.
  • 💡 Better batch size predictions: The system more accurately predicts batch sizes for training, helping avoid memory-related crashes and improving utilization.
  • 🛡 Increased robustness: Fallbacks when certain batch sizes fail during profiling leads to more stable and reliable model training sessions.

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.
@glenn-jocher glenn-jocher self-assigned this Jun 9, 2022
@glenn-jocher glenn-jocher merged commit 6e46617 into master Jun 9, 2022
@glenn-jocher glenn-jocher deleted the update/autobatch branch June 9, 2022 15:15
tdhooghe pushed a commit to tdhooghe/yolov5 that referenced this pull request Jun 10, 2022
* AutoBatch checks against failed solutions

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

* Update autobatch.py

* Update autobatch.py
ctjanuhowski pushed a commit to ctjanuhowski/yolov5 that referenced this pull request Sep 8, 2022
* AutoBatch checks against failed solutions

@kalenmike this is a simple improvement to AutoBatch to verify that returned solutions have not already failed, i.e. return batch-size 8 when 8 already produced CUDA out of memory.

This is a halfway fix until I can implement a 'final solution' that will actively verify the solved-for batch size rather than passively assume it works.

* Update autobatch.py

* Update autobatch.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant