-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Hi Josko,
I tried to install jobstat but getting error:
./jobstats -d 874
DEBUG: jobidraw=874, start=1728671046, end=1728671263, cluster=ganesha, tres=cpu=2,gres/gpu=1,mem=4000M,node=1, data=, user=ss6478, account=sysops, state=COMPLETED, timelimit=90, nodes=1, ncpus=2, reqmem=4000M, qos=normal, partition=gpu, jobname=test
DEBUG: jobid=874, jobidraw=874, start=1728671046, end=1728671263, gpus=1, diff=217, cluster=ganesha, data=, timelimitraw=90
DEBUG: query=max_over_time(cgroup_memory_total_bytes{cluster='ganesha',jobid='874',step='',task=''}[217s]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time(cgroup_memory_rss_bytes{cluster='ganesha',jobid='874',step='',task=''}[217s]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time(cgroup_cpu_total_seconds{cluster='ganesha',jobid='874',step='',task=''}[217s]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time(cgroup_cpus{cluster='ganesha',jobid='874',step='',task=''}[217s]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time((nvidia_gpu_memory_total_bytes{cluster='ganesha'} and nvidia_gpu_jobId == 874)[217s:]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=max_over_time((nvidia_gpu_memory_used_bytes{cluster='ganesha'} and nvidia_gpu_jobId == 874)[217s:]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
DEBUG: query=avg_over_time((nvidia_gpu_duty_cycle{cluster='ganesha'} and nvidia_gpu_jobId == 874)[217s:]), time=1728671263
DEBUG: query result={'status': 'success', 'data': {'resultType': 'vector', 'result': []}}
Traceback (most recent call last):
File "./jobstats", line 58, in
stats.report_job()
File "/tmp/jobstats/jobstats.py", line 582, in report_job
+f'If the run time was very short then try running "seff {self.jobid}".')
File "/tmp/jobstats/jobstats.py", line 115, in error
raise Exception(msg)
Exception: No stats found for job 874, either because it is too old or because
it expired from jobstats database. If you are not running this command on the
cluster where the job was run then use the -c option to specify the cluster.
If the run time was very short then try running "seff 874".
Could you please help?