Skip to content

Performance impact of opening a new XRootD File in every cat operation (TBasket in Uproot) #55

@jpivarski

Description

@jpivarski

Reported by @chrisburr in scikit-hep/uproot5#1157 (comment)_:

The rest of the difference is coming from the fsspec source opening the file twice:

[2024-03-05 23:00:45.735325 +0100][Debug  ][ExDbgMsg          ] [ccsrm.ihep.ac.cn:1094] MsgHandler created: 0x15e5ffa0 (message: kXR_open (file: /dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst, mode: 00, flags: kXR_open_read kXR_async kXR_retstat ) ).
[2024-03-05 23:00:50.783995 +0100][Debug  ][ExDbgMsg          ] [ccsrm.ihep.ac.cn:1094] MsgHandler created: 0x15f25490 (message: kXR_stat (path: /dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst, flags: none) ).
FSSpecSource.chunk(start=0, stop=403)
[2024-03-05 23:00:51.071420 +0100][Debug  ][ExDbgMsg          ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x1781f720 (message: kXR_read (handle: 0x00000000, offset: 0, size: 403) ).
FSSpecSource.chunk(start=3872048242, stop=3872048555)
[2024-03-05 23:00:51.376154 +0100][Debug  ][ExDbgMsg          ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x1781f510 (message: kXR_read (handle: 0x00000000, offset: 3872048242, size: 313) ).
FSSpecSource.chunk(start=3867679172, stop=3867679853)
[2024-03-05 23:00:51.682223 +0100][Debug  ][ExDbgMsg          ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x15f25740 (message: kXR_read (handle: 0x00000000, offset: 3867679172, size: 681) ).
FSSpecSource.chunks(ranges=[(3867678939, 3867679107)])
[2024-03-05 23:00:51.995422 +0100][Debug  ][ExDbgMsg          ] [ccsrm.ihep.ac.cn:1094] MsgHandler created: 0x23f04350 (message: kXR_open (file: /dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst, mode: 074, flags: kXR_open_read kXR_async kXR_retstat ) ).
[2024-03-05 23:00:52.588776 +0100][Debug  ][ExDbgMsg          ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x15f25490 (message: kXR_read (handle: 0x01000000, offset: 3867678939, size: 168) ).
[2024-03-05 23:00:52.890174 +0100][Debug  ][ExDbgMsg          ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x17826850 (message: kXR_close (handle: 0x01000000) ).
Took 7.651638916009688 seconds
[2024-03-05 23:00:53.264742 +0100][Debug  ][ExDbgMsg          ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x27b2c1c0 (message: kXR_close (handle: 0x00000000) ).

The second one is caused by FSSpecSource.chunks calling _cat_file:

https://github.com/scikit-hep/uproot5/blob/e47934f32bd16439a2ca9e92428d2a9a4610a144/src/uproot/source/fsspec.py#L164

which opens a new file behind the scenes for every call:

https://github.com/CoffeaTeam/fsspec-xrootd/blob/b12503eb852f82f0b6bf85e1df02b2b683c1f819/src/fsspec_xrootd/xrootd.py#L392-L399

In my experience, these File objects are heavy; slow to open. fsspec's cat interface is stateless, so it seems that you have to create a new one of these for every call, but that means every TBasket in Uproot.

Is there an alternative that we can use, some multi_cat or a context that holds the File object so that we don't need so many? Is there a way to use XRootD in a lightweight, stateless way (like HTTP connections)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions