Skip to content

GPU not utilizing 100% memory  #9949

@KxFxN

Description

@KxFxN

Search before asking

Question

GPU not utilizing 100% memory !!

I tried Train multiple GPUs but Ram not use 100% memory

device for me.
GTX 1080 Ti 11GB
GTX 1070 8GB

python 3.9.13
torch 1.12.1+cu113
torchaudio 0.12.1+cu113
torchvision 0.13.1+cu113

python train.py --img 480 --batch 2 --epochs 100 --data ./data/coco_CBZ.yaml --weights ./models/yolov5s.pt --device 0,1 --name CBZ

What is the problem? Help me

utoAnchor: 6.40 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset
Plotting labels to runs\train\CBZ4\labels.jpg... 
Image sizes 480 train, 480 val
Using 2 dataloader workers
Logging results to runs\train\CBZ4
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
  0%|          | 0/239 [00:00<?, ?it/s]                                                                                                                                              C:\Users\kuy50\AppData\Roaming\Python\Python39\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn('PyTorch is not compiled with NCCL support')
       0/99     0.312G     0.1153    0.09853    0.06285         14        480: 100%|██████████| 239/239 [01:54<00:00,  2.09it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 120/120 [00:29<00:00,  4.02it/s]
                   all        478       4773     0.0232      0.115     0.0149    0.00374

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       1/99     0.776G    0.09581    0.09177    0.05947         31        480: 100%|██████████| 239/239 [01:41<00:00,  2.36it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 120/120 [00:21<00:00,  5.48it/s]
                   all        478       4773     0.0467      0.209     0.0442     0.0132

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       2/99     0.776G    0.08696    0.08272    0.05724         32        480: 100%|██████████| 239/239 [01:40<00:00,  2.38it/s]
                 Class     Images  Instances          P          R      mAP50   mAP50-95: 100%|██████████| 120/120 [00:17<00:00,  6.92it/s]
                   all        478       4773     0.0387      0.153     0.0539      0.024

      Epoch    GPU_mem   box_loss   obj_loss   cls_loss  Instances       Size
       3/99     0.776G    0.09386    0.06784    0.05689         23        480:   4%|         | 9/239 [00:04<01:50,  2.08it/s]

Additional

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    StaleStale and schedule for closing soonquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions