Skip to content

Conversation

shangm2
Copy link
Contributor

@shangm2 shangm2 commented Aug 18, 2025

Description

  1. add support to use byte buffer pool for thrift serde to avoid intermediate byte array allocation and reduce gc overhead
  2. each event loop will have its own pool so that no contention from other threads
  3. Will need Add support to thrift serde with pooled bytebuffer airlift#123 to compile
  4. We saw about 12% gc time reduction compared to non-pooled thrift serde

Motivation and Context

Impact

Test Plan

  1. verifier passed
  2. ensure the pool is being used
Screenshot 2025-08-17 at 20 28 29

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add support to use byte buffer pool for thrift serde to reduce intermediate allocation thus reduce gc overhead

@shangm2 shangm2 requested review from elharo and a team as code owners August 18, 2025 03:26
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Aug 18, 2025
return connectorCodecManager.getConnectorSplitCodec(connectorId).map(codec -> codec.deserialize(bytes)).orElse(null);
Optional<ConnectorCodec<ConnectorSplit>> codec = connectorCodecManager.getConnectorSplitCodec(connectorId);
if (!codec.isPresent()) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it throw instead?

ByteBufferPool byteBufferPool = byteBufferPoolManager.getPool();
List<ByteBuffer> byteBuffers = reader.readBinaryToBuffers(byteBufferPool);
if (byteBuffers.isEmpty()) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally try to avoid returning nulls. The semantic of null is not always clear. For example here it is unclear whether it is a failure, or the object is legitimately null.

Is it appropriate to throw in here?

}
finally {
for (ByteBuffer byteBuffer : byteBuffers) {
byteBufferPoolManager.getPool().release(byteBuffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the release be done in the same code that does acquire? (codec.serialize?)

public static <T> void serializeConcreteValue(T value, ThriftCodec<T> codec, ByteBufferPool pool, Consumer<List<ByteBuffer>> consumer)
throws Exception
{
List<ByteBuffer> byteBuffers = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Prefer ImmutableList. Consider creating it in ByteBufferOutputTransport (e.g.: when calling getBytes())


public class ByteBufferPoolManager
{
private final ConcurrentHashMap<Thread, ByteBufferPool> threadPools = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered a single ByteBufferPool for all threads?

Having it done this way is dangerous. You never know what thread this code will be run on. Different frameworks manage threads differently.

In a worst case consider getPool called by a framework that runs a certain task every time on a new thread?

}

@Min(1024)
public int getByteBufferPoolBufferSize()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call these methods as getThriftByteBufferPool... Otherwise it is not very clear what byte buffer pool those are used for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:Meta PR from Meta
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants