Skip to content

Conversation

the-other-tim-brown
Copy link
Contributor

Change Logs

Restore planning:

  • the planning will now compute the instants to rollback based on the completion time of the commits instead of the requested time
  • there is one exception for tables with version less than 8 where a compaction started before an instant that we are restoring to must not be cleaned up since the log file is directly associate with that compaction instant
  • computing the files to delete now relies on the compaction metadata when the table version is 8 or greater or if the commit is a clustering commit

Test Updates:

  • parameterized tests were added to existing flows
  • assertions were made on the record content instead of just counts which can mask issues with correctness
  • a check was added to ensure the metadata and file system are consistent
  • Cases were added for interleaved compaction and delta commits
  • Case was added for clustering followed by delta commit to ensure all files are properly cleaned up
  • RLI testing uses restore instead of rollback since the tests were attempting to rollback a clean commit which is a no-op

Impact

  • Ensure restore behavior matches expectations and add testing to prevent possible regressions

Risk level (write none, low medium or high below)

Low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Jul 30, 2025
@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review July 30, 2025 23:54
return instantComparator.completionTimeOrderedComparator().compare(o1, o2);
} else {
// Do to special handling of compaction instants, we need to use requested time based comparator for compaction instants but completion time based comparator for others
if (o1.getAction().equals(HoodieTimeline.COMMIT_ACTION) || o2.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {
Copy link
Contributor

@danny0405 danny0405 Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is true only for MOR table. I didn't quite get this part, the timeline filterCompletedAndCompactionInstants can include pending compactions actually so the action could be COMPACTION for mor table, for completed compations, the action is COMMIT_ACTION.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is around the special handling for compactions on v6 tables in the planner. The planner will retain a compaction that completed after the instant we are restoring to if the compaction started before the instant we are restoring to. The "last" instant on the timeline would be compaction in this case if we use completion time ordering and then the assertion will fail since the restore was targeting a delta commit that started after the compaction but finished before the compaction completed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this is not intuitive to understand. can we add some simple illustration.
t1.dc,.... t2.dc, t11.compaction.req, t12.dc, t11.commit, t13.dc ... dc15.
If we are looking to restore to t12, and as we ordering the commits to rollback based on completion time, we would rollback t11 compaction as well (since t11 completed after t12 completed). but we can't do that. and hence the special handling.

but trying to understand why this special handling is not required for table 8 and above.
how are we handling this case for v8 table w/o special handling.
From https://github.com/apache/hudi/pull/13653/files#r2246930654, I only see we account for completed instant time or requested instant time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v8, the delta commit is not directly tied to the base file commit time so that is why we don't require this. In v8 if we remove the compaction in the timeline described above, we can still safely query the table.

Copy link
Contributor

@danny0405 danny0405 Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v8, the delta commit is not directly tied to the base file commit time so that is why we don't require this. In v8 if we remove the compaction in the timeline described above, we can still safely query the table.

This is not true, our assumption for file slice is the newer file slice will cover all the dataset in history, if we restore the compaction base files, the log files in this file silce will just be kept in the file slice and there is no base file to merge for read, then we would got a data loss(unless you keep the requested compaction metadata file on the timeline but it seems not the case).

For example we have
t1.dc.req, t1.dc, t2.dc.req, t2.dc, t3.compaction.req, t4.dc.req, t4.dc, t5.dc.req, t5.dc, t3.commit.

Now we want to restore to t5, if we also restore t3.commit for V8 table, the file slice that includes t4 logs will only have logs from t4, the history dataset in the compaction would be lost.

So we should always use requested time comparison for compactions regardless of the table versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the file slice computed at runtime for v8? The slice would just use t1 as the base and the log files would all be present preventing data loss.

Even if the above is accurate, if we want to just keep it consistent between versions for ease of operation, I am fine with that as well.

Copy link
Contributor

@danny0405 danny0405 Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slice would just use t1 as the base and the log files would all be present preventing data loss.

This is true, but the data loss comes from the deleted base file which contains all the history dataset, not very related with file slicing though, the table before V8 also has correct file slicing because the file name contains the base instant time.

HoodieTableType tableType,
HoodieTableVersion tableVersion) throws IOException {
// for MOR tables with version < 8, listing is required to fetch the log files associated with base files added by this commit.
if (isCommitMetadataCompleted && (tableType == HoodieTableType.COPY_ON_WRITE || tableVersion.greaterThanOrEquals(HoodieTableVersion.EIGHT))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch~

}).collect(Collectors.toList());

return storage.listDirectEntries(filePaths, pathFilter);
return getFilesFromCommitMetadata(basePath, commitMetadata, partitionPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file existence check seems unnecessary because we can delete a file that does not exist, cc @nsivabalan for the background of this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for this as well and confirmed that it handles the missing file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. our rollback execution should be fine even if file does not exist. we are good to remove this.

@@ -75,10 +80,19 @@ public Option<HoodieRestorePlan> execute() {
.filter(instant -> GREATER_THAN.test(instant.requestedTime(), savepointToRestoreTimestamp))
.collect(Collectors.toList());

// Get all the commits on the timeline after the provided commit time
// Get all the commits on the timeline after the provided commit's completion time unless it is the SOLO_COMMIT_TIMESTAMP which indicates there are no commits for the table
String completionTime = savepointToRestoreTimestamp.equals(SOLO_COMMIT_TIMESTAMP) ? savepointToRestoreTimestamp : completionTimeQueryView.getCompletionTime(savepointToRestoreTimestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case we restore to SOLO_COMMIT_TIMESTAMP? is it valid in production?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only occurring in the upgrade/downgrade testing on tables with no commits

.getReverseOrderedInstantsByCompletionTime()
.filter(instant -> {
// For compaction on tables with version less than 8, if the compaction started before the target of the restore, it must not be removed since the log files will reference this commit
if (instant.getAction().equals(HoodieTimeline.COMMIT_ACTION) && !metaClient.getTableConfig().getTableVersion().greaterThanOrEquals(HoodieTableVersion.EIGHT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define a filter func out of the stream loop because the table version and table type is kind of deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!greaterThanOrEquals -> lessThan

@@ -60,7 +60,7 @@ public interface CompletionTimeQueryView extends AutoCloseable {
*
* @return The completion time if the instant finished or empty if it is still pending.
*/
Option<String> getCompletionTime(String beginTime);
Option<String> getCompletionTime(String instantTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or requestedTime if you like

}).collect(Collectors.toList());

return storage.listDirectEntries(filePaths, pathFilter);
return getFilesFromCommitMetadata(basePath, commitMetadata, partitionPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. our rollback execution should be fine even if file does not exist. we are good to remove this.

HoodieTableType tableType,
HoodieTableVersion tableVersion) throws IOException {
// for MOR tables with version < 8, listing is required to fetch the log files associated with base files added by this commit.
if (isCommitMetadataCompleted && (tableType == HoodieTableType.COPY_ON_WRITE || tableVersion.greaterThanOrEquals(HoodieTableVersion.EIGHT))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. this can only work in table version 8 and above, if we ordering the commits based on completion time.
guess that fix simplified this.

@@ -49,13 +53,15 @@ public static void deleteSavepoint(HoodieTable table, String savepointTime) {
public static void validateSavepointRestore(HoodieTable table, String savepointTime) {
// Make sure the restore was successful
table.getMetaClient().reloadActiveTimeline();
Option<HoodieInstant> lastInstant = table.getActiveTimeline()
Option<HoodieInstant> lastInstant = Option.fromJavaOptional(table.getActiveTimeline()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. can we add java docs to L52 to call out what are we looking to validate

return instantComparator.completionTimeOrderedComparator().compare(o1, o2);
} else {
// Do to special handling of compaction instants, we need to use requested time based comparator for compaction instants but completion time based comparator for others
if (o1.getAction().equals(HoodieTimeline.COMMIT_ACTION) || o2.getAction().equals(HoodieTimeline.COMMIT_ACTION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this is not intuitive to understand. can we add some simple illustration.
t1.dc,.... t2.dc, t11.compaction.req, t12.dc, t11.commit, t13.dc ... dc15.
If we are looking to restore to t12, and as we ordering the commits to rollback based on completion time, we would rollback t11 compaction as well (since t11 completed after t12 completed). but we can't do that. and hence the special handling.

but trying to understand why this special handling is not required for table 8 and above.
how are we handling this case for v8 table w/o special handling.
From https://github.com/apache/hudi/pull/13653/files#r2246930654, I only see we account for completed instant time or requested instant time.

@@ -79,6 +79,15 @@ public CompletionTimeBasedComparator(Map<String, String> comparableActions) {

@Override
public int compare(HoodieInstant instant1, HoodieInstant instant2) {
if (instant1.getCompletionTime() == null && instant2.getCompletionTime() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we add UTs for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added one

}
}

private void upsertBatch(SparkRDDWriteClient client, List<HoodieRecord> baseRecordsToUpdate) throws IOException {
@Test
void rollbackWithAsyncServices_compactionCompletesDuringCommit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we parametrize this for version 6 and 8 ?
w/ all special handling we are doing in restore, its worth adding tests for both table versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v6 you cannot schedule compaction while there is an in-flight delta commit so it is not a valid case

}

@Test
void rollbackWithAsyncServices_commitCompletesDuringCompaction() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. parametrize w/ both 6 and 8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, the compaction cannot be scheduled for v6

}
}

private void validateFilesMetadata(HoodieWriteConfig writeConfig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this validation def gives us good confidence now.

.getReverseOrderedInstants()
.filter(instant -> GREATER_THAN.test(instant.requestedTime(), savepointToRestoreTimestamp))
.getReverseOrderedInstantsByCompletionTime()
.filter(constructInstantFilter(metaClient.getTableConfig(), completionTime))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the filter creation can be moved to line 88.

}
}

private Predicate<HoodieInstant> constructInstantFilter(HoodieTableConfig tableConfig, String completionTime) {
Copy link
Contributor

@danny0405 danny0405 Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reuse this comparactor part with SavepointHelpers.validateSavepointRestore so that we can maintain this intelligenable logic in one place and doc it well with some illustrations.

@hudi-bot
Copy link

hudi-bot commented Aug 4, 2025

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 31dbcfe into apache:master Aug 4, 2025
59 of 60 checks passed
@the-other-tim-brown the-other-tim-brown deleted the restore-rollback-testing branch August 4, 2025 14:00
rahil-c pushed a commit to rahil-c/hudi that referenced this pull request Aug 6, 2025
)

* fix restore sequence to be in completion reverse order, still requested time comparison for compaction
* add a custom comparator for the restore instant sort

---------

Co-authored-by: danny0405 <[email protected]>
rahil-c pushed a commit to rahil-c/hudi that referenced this pull request Aug 6, 2025
)

* fix restore sequence to be in completion reverse order, still requested time comparison for compaction
* add a custom comparator for the restore instant sort

---------

Co-authored-by: danny0405 <[email protected]>
rahil-c pushed a commit to rahil-c/hudi that referenced this pull request Aug 6, 2025
)

* fix restore sequence to be in completion reverse order, still requested time comparison for compaction
* add a custom comparator for the restore instant sort

---------

Co-authored-by: danny0405 <[email protected]>
alexr17 pushed a commit to alexr17/hudi that referenced this pull request Aug 25, 2025
)

* fix restore sequence to be in completion reverse order, still requested time comparison for compaction
* add a custom comparator for the restore instant sort

---------

Co-authored-by: danny0405 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants