Skip to content

Fix O(n^2) algorithm in the GC cleanup logic #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 20, 2022
Merged

Fix O(n^2) algorithm in the GC cleanup logic #429

merged 1 commit into from
May 20, 2022

Conversation

JohnSully
Copy link
Collaborator

Because deletion from a vector is a linear operation the loop inside endEpoch() is O(n^2). It was written this way to be safe while deleting the vector but can cause serious consequences during large GC epochs where large numbers of epochs need to be cleared.

The fix is to move to a list datastructure with O(1) deletion. This is the more appropriate datastructure for this operation.

@@ -100,13 +103,13 @@ class GarbageCollector
serverAssert(m_setepochOutstanding.find(epoch) != m_setepochOutstanding.end());
serverAssert(sp->FWillFreeChildDebug() == false);

auto itr = std::find(m_vecepochs.begin(), m_vecepochs.end(), m_epochNext+1);
if (itr == m_vecepochs.end())
auto itr = std::find(m_listepochs.begin(), m_listepochs.end(), m_epochNext+1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use an unordered set? You could still iterate in O(N) and insert/delete in O(1), but this find operation would be O(1) instead of O(N)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rely on sorted order though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an ordered set? insertion/deletion would be logarithmic instead of constant, but so would this find.

Copy link
Collaborator Author

@JohnSully JohnSully May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion is O(log n) though, this would regress us to O(n log n) performance instead of the O(n) that we are currently so its not a net benefit.

@paulmchen
Copy link
Contributor

I verified this fix on my cloud environment. It is now much better, The '0 qps' issue cannot be reproduced during the fullsync replication cycle. Thanks John.

btw. the following changes will also help on minimizing the CPU usage during the GC cycle, i suggest we add the change to KEYDB core as well:

(gc.h)

class GarbageCollector
{
struct EpochHolder
{
uint64_t tstamp;
std::unique_ptr<std::vector<std::unique_ptr>> m_spvecObjs;

    EpochHolder() {
        m_spvecObjs = std::make_unique<std::vector<std::unique_ptr<T>>>();
    }

    // Support move operators
    EpochHolder(EpochHolder &&other) = default;
    EpochHolder &operator=(EpochHolder &&) = default;

@JohnSully JohnSully merged commit 5dbf1f6 into main May 20, 2022
@JohnSully JohnSully deleted the gc_perf_fix branch May 20, 2022 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants