-
Notifications
You must be signed in to change notification settings - Fork 63
Open
Description
in _run_single_test
we have the code
model, tokenizer = load_model(
self.model_path,
num_gpus=self.num_gpus,
device=self.device,
debug=self.debug,
)
This happens in a loop in generation_results:
for attempt in range(max_retries):
try:
state = self._run_single_test()
if state:
print(f"Test function successful on attempt {attempt + 1}")
return state
except Exception as e:
print(f"Test function failed on attempt {attempt + 1}")
import traceback; traceback.print_exc();
print(f"Retrying in {retry_interval} seconds...")
time.sleep(retry_interval)
So if the model was already loaded and then there is an error, the model will be loaded again without clearing the memory, often causing OOM errors. The model should be stored in a class property so it can be accessed if it was already loaded into the GPU.
Metadata
Metadata
Assignees
Labels
No labels