Proposal for global_table Cleanup Optimization in Seata Transacion Coordinator #7362

YongGoose · 2025-05-25T08:10:03Z

YongGoose
May 25, 2025
Collaborator

Dear Seata Community 👋

We are observing performance bottlenecks related to the delayed deletion of global_table entries, particularly in high-concurrency environments, as highlighted in issue #6615, #7334. The accumulation of transaction records impacts TPS and disk space utilization. I propose two key areas for optimization, focusing on enhancing the efficiency of global_table cleanup within the Transaction Coordinator.

1. Enhance TC Internal Multi-threaded Cleanup

Currently, the global_table cleanup process within a single instance, managed by scheduled thread pools, may not fully leverage available CPU resources. While queryLimit allows adjusting the batch size for queries, the overall cleanup throughput can be limited if the deletion logic itself is not sufficiently parallelized across multiple threads or if the batching is not optimally tuned. This leads to a backlog of completed transactions in global_table (e.g.Committed, Rollbacked states ).

Proposed Solution:
I suggest enhancing the internal multi-threaded cleanup mechanism within the DefaultCoordinator to achieve more granular task distribution and dynamic tuning of queryLimit in conjunction with the number of cleanup threads.

Enhancing the internal multi-threaded cleanup mechanism - Round Robin

Instead of a single cleanup task processing all eligible records, we can introduce logic to divide the global_table into logical segments. Each segment would then be assigned to a dedicated thread within an expanded cleanup thread pool. This segmentation could be based on
- Remainder of transactionId: Distribute GlobalSessions according to the remainder of transactionId when divided by a specific number.

queryLimit and Thread Count Inter-tuning

The queryLimit (defaulting to 1000) determines the number of records fetched in a single database query. When increasing the number of cleanup threads, it's crucial to re-evaluate this parameter.
A larger queryLimit reduces database round trips but can lead to longer-held database locks and increased transaction log growth. With multiple threads, a moderately smaller queryLimit per thread might be more effective, as it allows for more frequent, smaller transactions, reducing contention and improving overall database concurrency.

diagram(Enhance TC Internal Multi-threaded Cleanup)

graph TD
    subgraph Seata TC Instance
        A --> B{Scheduled Thread Pool};
        B -- Thread 1 --> C1(Cleanup Task: Query & Delete Batch A);
        B -- Thread 2 --> C2(Cleanup Task: Query & Delete Batch B);
        B -- Thread N --> CN(Cleanup Task: Query & Delete Batch N);
    end
    C1 --> D[global_table];
    C2 --> D;
    CN --> D;

Real world Example: Elasticsearch Index Merging
Elasticsearch, a distributed search and analytics engine, stores data in indices, which are further divided into shards. Each shard consists of multiple segments (immutable Lucene indices). As new data is indexed, new segments are created. To optimize storage and search performance, Elasticsearch performs background "merge" operations. These merges combine smaller segments into larger ones, reclaiming disk space and improving query efficiency. This process is highly parallelized; multiple threads or processes can concurrently merge different sets of segments within a shard or across multiple shards.

2. Database-level Sharding for `global_table`

Even with optimized internal multi-threading, a single underlying global_table in a monolithic database can become a bottleneck under extreme transaction volumes. The sheer volume of data can lead to I/O contention, increased query latency, and prolonged lock times during cleanup operations, regardless of how efficiently the TC requests deletions.

Proposed Solution:
Implement database-level sharding for the global_table. This involves distributing the global_table data across multiple physical database instances or logical partitions. This approach fundamentally scales out the storage and processing capacity for transaction logs.

Instead of all Seata TC instances writing to and reading from a single global_table, a sharding middleware would intercept database operations and route them to the appropriate shard.

Sharding Key Selection

A crucial step is choosing an effective sharding key. For global_table, suitable candidates include:
- xid (transactionId): Using transactionId of the xid can distribute transactions evenly across shards, preventing hotspots.
- or other things

Middleware Integration

A database sharding middleware (like Apache ShardingSphere) would sit between the Seata TC and the physical database instances. This middleware handles the routing of SQL queries to the correct shard based on the configured sharding rules.

Distributed Cleanup

With data distributed, each database shard can perform its own cleanup operations in parallel, drastically reducing the load on any single database instance. This enables true horizontal scalability for global_table management.

diagram (Database-level Sharding for global_table)

graph TD
  subgraph Seata_Client_Application
    TM
  end

  subgraph Seata_TC_Cluster
    TC1
    TC2
    TCN
  end

  subgraph Sharding_Middleware
    SM
  end

  subgraph Sharded_Database
    DB1
    DB2
    DBN
  end

  TM --> TC1
  TM --> TC2
  TM --> TCN

  TC1 --> SM
  TC2 --> SM
  TCN --> SM

  SM --> DB1
  SM --> DB2
  SM --> DBN

  DB1 --> DB1
  DB2 --> DB2
  DBN --> DBN

How we use: Apache ShardingSphere
Apache ShardingSphere is a widely used open-source ecosystem that provides data sharding, distributed transactions, and database governance capabilities. It can be seamlessly integrated with Seata. For global_table sharding, ShardingSphere-JDBC (a lightweight Java JDBC driver) can be configured to:

Define Sharding Rules: Specify sharding algorithms based on xid.
Route SQL: When a Seata TC executes an SQL operation on global_table (e.g., INSERT, SELECT, DELETE), ShardingSphere intercepts the SQL, parses it, and routes it to the correct physical database instance(s) based on the sharding key in the SQL.
Distributed Transaction Integration: ShardingSphere supports distributed transactions, including integration with Seata AT mode. This ensures that even with sharding, the ACID properties of global transactions are maintained.
By leveraging ShardingSphere, the global_table can be scaled horizontally, allowing multiple TC instances to operate on different data subsets in parallel, thereby alleviating the cleanup bottleneck and improving overall system performance and scalability.

YongGoose · 2025-05-25T14:25:07Z

YongGoose
May 25, 2025
Collaborator Author

@funky-eyes @slievrly

When you have time, I’d appreciate your feedback 🙂

0 replies

funky-eyes · 2025-05-26T01:44:10Z

funky-eyes
May 26, 2025
Collaborator

I don't believe that leveraging other components to solve this issue is a reasonable solution. Here's why:

ShardingSphere itself relies on Seata to handle distributed transaction problems. When data is sharded, distributed transaction issues inherently arise, which would create a circular dependency.
Sharding doesn't resolve the problem where the rate of new transaction creation significantly outpaces the rate of transaction deletion. This is because transactions in 'rollbacking' and 'committing' states are processed serially by a single thread, which makes the deletion rate very slow.
We shouldn't introduce an even more complex component to solve an already complex problem, as this could lead to further complications. Additionally, users unfamiliar with ShardingSphere would need to learn about this new component, increasing their learning curve and the overall cost of adoption.
The global table name is configurable. If each TC (Transaction Coordinator) node uses a different global table name, that inherently provides sharding capability. A complex solution like the one proposed is entirely unnecessary to address this problem.

5 replies

YongGoose May 26, 2025
Collaborator Author

Thank you for your response! 👍🏻

I wasn’t aware that ShardingSphere internally uses Seata—appreciate the insight.
I also agree that introducing an additional component to solve this issue might not be the best approach.

As you suggested, using different global_table names per TC node would effectively result in sharding.
However, I’m a bit concerned that this setup could introduce issues in case of a TC node failure.
If a node goes down, recovery from another TC node might not be possible, which could lead to data loss.
Also seems like handling data imbalance across nodes would be difficult in this setup.

Since this is a simpler approach, the trade-offs are quite clear.
What are your thoughts on these potential downsides?

funky-eyes May 26, 2025
Collaborator

Thank you for your response! 👍🏻

I wasn’t aware that ShardingSphere internally uses Seata—appreciate the insight. I also agree that introducing an additional component to solve this issue might not be the best approach.

As you suggested, using different global_table names per TC node would effectively result in sharding. However, I’m a bit concerned that this setup could introduce issues in case of a TC node failure. If a node goes down, recovery from another TC node might not be possible, which could lead to data loss. Also seems like handling data imbalance across nodes would be difficult in this setup.

Since this is a simpler approach, the trade-offs are quite clear. What are your thoughts on these potential downsides?

If the goal is to leverage the computational power of all nodes, I believe we could consider a solution similar to a dispatcher or ticket-issuer model.

A leader node could be elected using either a distributedLockTable or a Raft cluster (where store.mode would still be db or redis).

If leader election happens via a distributedLockTable, we could establish a task table. The leader node would scan the list of transactions currently in 'rollbacking' and 'committing' states and publish tasks to this table. Other nodes would then continuously poll the task table, competing to claim these tasks. Each task would contain a batch of XIDs (transaction IDs), and the node acquiring the task would be responsible for executing the 'end' actions (e.g., commit or rollback) for these transactions. Naturally, we would also need to account for the possibility of a node crashing after claiming a task. To address this, each task should have a timeout period. If a task's timeout is reached and it still exists (i.e., hasn't been completed), it would become available for other nodes to claim.

Alternatively, if using a Raft cluster, the leader could directly dispatch tasks to a specific follower node within the cluster for execution. Once the follower completes the task, it would send a response back to the leader.

It's worth noting, however, that these proposals were previously rejected by the community. The reasoning was that they involve a distributed environment and were deemed too complex. Consequently, the prevailing recommendation was to opt for a multi-threaded processing solution instead.

YongGoose May 26, 2025
Collaborator Author

Given the conclusion in #6615, it seems best to proceed with a multi-threaded approach for now and evolve the solution gradually.

I'll explore ways to distribute the data and tune the number of threads based on queryLimit, and conduct some analysis accordingly.

I’ve scheduled a session with a colleague who contributes actively to Apache Beam for next Saturday—we plan to run some experiments and analyze the results together. I’ll be sure to share our findings after that.

In the meantime, if you have any thoughts or feedback on Option 1 (Enhance TC Internal Multi-threaded Cleanup), please feel free to share!

funky-eyes May 26, 2025
Collaborator

Okay, no problem. I'll be waiting for your good news.

YongGoose Jun 7, 2025
Collaborator Author

Okay, no problem. I'll be waiting for your good news.

After running tests today, I actually saw a performance drop in the multi-threaded scenario. (The judgment was based on TPS)
I plan to keep digging into it.

If you have any good ideas, feel free to leave a comment!

YongGoose · 2025-06-10T03:55:07Z

YongGoose
Jun 10, 2025
Collaborator Author

Does anyone know of a tool that can monitor the number of Redis keys in real time and visualize it in a graph?
I’m currently using RedisInsight, but it doesn’t seem to support this feature.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal for global_table Cleanup Optimization in Seata Transacion Coordinator #7362

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Proposal for global_table Cleanup Optimization in Seata Transacion Coordinator #7362

Uh oh!

YongGoose May 25, 2025 Collaborator

1. Enhance TC Internal Multi-threaded Cleanup

2. Database-level Sharding for global_table

Replies: 3 comments · 5 replies

Uh oh!

YongGoose May 25, 2025 Collaborator Author

Uh oh!

funky-eyes May 26, 2025 Collaborator

Uh oh!

YongGoose May 26, 2025 Collaborator Author

Uh oh!

funky-eyes May 26, 2025 Collaborator

Uh oh!

YongGoose May 26, 2025 Collaborator Author

Uh oh!

funky-eyes May 26, 2025 Collaborator

Uh oh!

YongGoose Jun 7, 2025 Collaborator Author

Uh oh!

YongGoose Jun 10, 2025 Collaborator Author

YongGoose
May 25, 2025
Collaborator

2. Database-level Sharding for `global_table`

Replies: 3 comments 5 replies

YongGoose
May 25, 2025
Collaborator Author

funky-eyes
May 26, 2025
Collaborator

YongGoose May 26, 2025
Collaborator Author

funky-eyes May 26, 2025
Collaborator

YongGoose May 26, 2025
Collaborator Author

funky-eyes May 26, 2025
Collaborator

YongGoose Jun 7, 2025
Collaborator Author

YongGoose
Jun 10, 2025
Collaborator Author