-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Check duplicate issues.
- Checked for duplicates
Description
Hello,
Since we moved to TCMalloc (instead of JeMalloc) for cmsRun
we have seen some failures in TROOT::EndOfProcessCleanups
upon destruction of TString
(and some other destructors) affecting CMSSW ROOT6 (version 6.29.01
) and ROOT628 (version 6.28.05
) IBs (el8_amd64_gcc11
).
We reported the issue in the cmssw repo at #42468.
Reproducer
The errors can be reproduced on lxplus as follows:
lxplus816:~> cmssw-el8
Singularity> cd /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/
Singularity> cmsenv
Singularity> cd /tmp/avalenzu/
Singularity> cmsRun /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/Alignment/OfflineValidation/test/inspectData_cfg.py unitTest=True trackCollection=ALCARECOTkAlCosmicsCTF0T
And for gdb:
Singularity> gdb cmsRun
(gdb) run /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/Alignment/OfflineValidation/test/inspectData_cfg.py unitTest=True trackCollection=ALCARECOTkAlCosmicsCTF0T
A sample stacktrace upon destruction of the TString
:
Thread 1 (Thread 0x7ffff413fc80 (LWP 3407688) "cmsRun"):
#0 tcmalloc::SLL_PopRange (end=<synthetic pointer>, start=<synthetic pointer>, N=32, head=0x45a0a0) at src/linked_list.h:88
#1 tcmalloc::SLL_PopRange (end=<synthetic pointer>, start=<synthetic pointer>, N=32, head=0x45a0a0) at src/linked_list.h:79
#2 tcmalloc::ThreadCache::FreeList::PopRange (end=<synthetic pointer>, start=<synthetic pointer>, N=32, this=0x45a0a0) at src/thread_cache.h:238
#3 tcmalloc::ThreadCache::ReleaseToCentralCache (this=this@entry=0x45a040, src=src@entry=0x45a0a0, cl=<optimized out>, N=N@entry=32) at src/thread_cache.cc:206
#4 0x00007ffff57dff2c in tcmalloc::ThreadCache::ListTooLong (this=0x45a040, list=0x45a0a0, cl=<optimized out>) at src/thread_cache.cc:164
#5 0x00007ffff6dc2465 in TString::UnLink (this=0xcbb820) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/inc/TString.h:265
--Type <RET> for more, q to quit, c to continue without paging--c
#6 0x00007ffff68a2266 in TString::~TString (this=0xcbb820, __in_chrg=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TString.cxx:249
#7 0x00007ffff68ac37f in TStyle::~TStyle (this=0xcbb500, __in_chrg=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TStyle.cxx:478
#8 0x00007ffff68ac46c in TStyle::~TStyle (this=0xcbb500, __in_chrg=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TStyle.cxx:483
#9 0x00007ffff68fa74e in TCollection::GarbageCollect (obj=0xcbb500) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/cont/src/TCollection.cxx:736
#10 0x00007ffff6902c6f in TList::Delete (this=0xcb8070, option=0x7ffff69efd24 "") at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/cont/src/TList.cxx:537
#11 0x00007ffff689635c in TROOT::EndOfProcessCleanups (this=0x7ffff6b34040 <ROOT::Internal::GetROOT1()::alloc>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TROOT.cxx:1235
#12 0x00007ffff682ef46 in CallEndOfProcessCleanups () at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TApplication.cxx:90
#13 0x00007ffff4c7026c in __run_exit_handlers () from /lib64/libc.so.6
#14 0x00007ffff4c703a0 in exit () from /lib64/libc.so.6
#15 0x00007ffff4c59d8c in __libc_start_main () from /lib64/libc.so.6
#16 0x000000000040803e in _start ()
See cms-sw/cmssw#42468 (comment) for other destructors failing on cleanup.
It runs fine when using cmsRunJE
instead (for JeMalloc allocation type): cmsRunJE /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/AlignmentunitTest=True trackCollection=ALCARECOTkAlCosmicsCTF0T
.
ROOT version
CMSSW ROOT6 (version 6.29.01
) and ROOT628 (version 6.28.05
) IBs.
Installation method
from source
Operating system
el8
Additional context
FYI, @smuzaffar @makortel
Reported to ROOT as discussed in Core Software meeting