Skip to content

TROOT::EndOfProcessCleanups fails when using TCMalloc on different destructors #13429

@aandvalenzuela

Description

@aandvalenzuela

Check duplicate issues.

  • Checked for duplicates

Description

Hello,

Since we moved to TCMalloc (instead of JeMalloc) for cmsRun we have seen some failures in TROOT::EndOfProcessCleanups upon destruction of TString (and some other destructors) affecting CMSSW ROOT6 (version 6.29.01) and ROOT628 (version 6.28.05) IBs (el8_amd64_gcc11).

We reported the issue in the cmssw repo at #42468.

Reproducer

The errors can be reproduced on lxplus as follows:

lxplus816:~> cmssw-el8
Singularity> cd /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/
Singularity> cmsenv
Singularity> cd /tmp/avalenzu/
Singularity> cmsRun /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/Alignment/OfflineValidation/test/inspectData_cfg.py unitTest=True trackCollection=ALCARECOTkAlCosmicsCTF0T

And for gdb:

Singularity> gdb cmsRun
(gdb) run /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/Alignment/OfflineValidation/test/inspectData_cfg.py unitTest=True trackCollection=ALCARECOTkAlCosmicsCTF0T

A sample stacktrace upon destruction of the TString:

Thread 1 (Thread 0x7ffff413fc80 (LWP 3407688) "cmsRun"):
#0  tcmalloc::SLL_PopRange (end=<synthetic pointer>, start=<synthetic pointer>, N=32, head=0x45a0a0) at src/linked_list.h:88
#1  tcmalloc::SLL_PopRange (end=<synthetic pointer>, start=<synthetic pointer>, N=32, head=0x45a0a0) at src/linked_list.h:79
#2  tcmalloc::ThreadCache::FreeList::PopRange (end=<synthetic pointer>, start=<synthetic pointer>, N=32, this=0x45a0a0) at src/thread_cache.h:238
#3  tcmalloc::ThreadCache::ReleaseToCentralCache (this=this@entry=0x45a040, src=src@entry=0x45a0a0, cl=<optimized out>, N=N@entry=32) at src/thread_cache.cc:206
#4  0x00007ffff57dff2c in tcmalloc::ThreadCache::ListTooLong (this=0x45a040, list=0x45a0a0, cl=<optimized out>) at src/thread_cache.cc:164
#5  0x00007ffff6dc2465 in TString::UnLink (this=0xcbb820) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/inc/TString.h:265
--Type <RET> for more, q to quit, c to continue without paging--c
#6  0x00007ffff68a2266 in TString::~TString (this=0xcbb820, __in_chrg=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TString.cxx:249
#7  0x00007ffff68ac37f in TStyle::~TStyle (this=0xcbb500, __in_chrg=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TStyle.cxx:478
#8  0x00007ffff68ac46c in TStyle::~TStyle (this=0xcbb500, __in_chrg=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TStyle.cxx:483
#9  0x00007ffff68fa74e in TCollection::GarbageCollect (obj=0xcbb500) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/cont/src/TCollection.cxx:736
#10 0x00007ffff6902c6f in TList::Delete (this=0xcb8070, option=0x7ffff69efd24 "") at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/cont/src/TList.cxx:537
#11 0x00007ffff689635c in TROOT::EndOfProcessCleanups (this=0x7ffff6b34040 <ROOT::Internal::GetROOT1()::alloc>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TROOT.cxx:1235
#12 0x00007ffff682ef46 in CallEndOfProcessCleanups () at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el8_amd64_gcc11/lcg/root/6.29.01-192f857c68d2f0a1a8cb821f03f5a854/root-6.29.01/core/base/src/TApplication.cxx:90
#13 0x00007ffff4c7026c in __run_exit_handlers () from /lib64/libc.so.6
#14 0x00007ffff4c703a0 in exit () from /lib64/libc.so.6
#15 0x00007ffff4c59d8c in __libc_start_main () from /lib64/libc.so.6
#16 0x000000000040803e in _start ()

See cms-sw/cmssw#42468 (comment) for other destructors failing on cleanup.

It runs fine when using cmsRunJE instead (for JeMalloc allocation type): cmsRunJE /cvmfs/cms-ib.cern.ch/week1/el8_amd64_gcc11/cms/cmssw/CMSSW_13_3_ROOT6_X_2023-08-07-2300/src/AlignmentunitTest=True trackCollection=ALCARECOTkAlCosmicsCTF0T.

ROOT version

CMSSW ROOT6 (version 6.29.01) and ROOT628 (version 6.28.05) IBs.

Installation method

from source

Operating system

el8

Additional context

FYI, @smuzaffar @makortel
Reported to ROOT as discussed in Core Software meeting

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions