Skip to content

Add custom comparators #1182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

matthew-mcallister
Copy link

@matthew-mcallister matthew-mcallister commented Feb 24, 2025

Closes #1180.

The diff is large due to having to add a type parameter C and update constraints in numerous places, but the logic changes are not large.

This trait adds the Comparator and Equivalator traits, which generalize the Comparable and Equivalent traits by taking a self parameter in addition to the two keys to compare, making it possible to change the comparison function used without changing the key type.

Summary

  • Adds trait Comparator<L, R> and trait Equivalator<L, R>.
  • Adds struct BasicComparator, which is a zero-sized type that implements Comparator using the Comparable trait, which itself falls back on Borrow and Ord.
  • Adds a type parameter C to SkipList, SkipMap, and SkipSet.
    • C is defaulted to OrdComparator. This preserves the existing behavior of functions like get(), insert(), etc.
  • Adds SkipList::with_comparator(), SkipSet::with_comparator(), and SkipMap::with_comparator() constructors.
    • SkipList::new(), SkipMap::new(), and SkipSet::new() always use C = OrdComparator. It would be nice if it worked for any C that implements Default, but that would break type inference. This is exactly the reason std::collections::HashMap::new() always uses S = RandomState for its defaulted parameter even if S implements Default.
  • Adds a test using a wrapper around Box<dyn Fn(&[u8], &[u8]) -> Ordering> as the comparator.

Hashing

Because the goal of these traits is to support totally different comparison operations from the default for the underlying key type, Equivalator explicitly does not require two equivalent keys to have the same hash. If someone, someday wants to support Equivalator on their hash map, they will have to generalize the Hash trait as well.

Safety

My one minor question is whether unsafe impl<Q, R, K, V, C> Send for RefRange<'_, Q, R, K, V, C> should require C: Sync. I don't think so because it doesn't require K or V to be sync, but it was enough to make me second-guess myself.

@taiki-e
Copy link
Member

taiki-e commented Feb 25, 2025

cc @al8n

@matthew-mcallister
Copy link
Author

Restored the Equivalent and Comparable traits. Now there is no change in behavior compared to master unless you explicitly define your own comparator type.

@matthew-mcallister matthew-mcallister force-pushed the comparators branch 2 times, most recently from 4e63330 to b19ede6 Compare February 27, 2025 18:59
Renamed OrdComparator to BasicComparator and made compatible with the
Equivalent/Comparable traits
@matthew-mcallister
Copy link
Author

It's been a while since I circled back to this, but is there a sentiment as to whether to merge?

Copy link

@agavra agavra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, though I admit I'm a little sketched out by the concept of breaking the hash/eq dependency definition with Equivalator trait...

Would be good to get some committers reviewing this as well! Excited for this feature.

Comment on lines +16 to +19
/// Because you can use `Equivalator` to implement very different comparison
/// logic from the `Eq` trait for a given type, there is no expectation that
/// two keys should have the same hash for them to compare equal. For example,
/// this is a valid implementation that treats two inputs of any type as equal:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to spell out the motivations for this. It is always possible for the user to implement some funky transform of their keys before they insert into a set/map and when retrieving data as a workaround for not having this. (e.g. in your test, I think a much better implementation if the user wanted that is to just set.insert(key.length()) instead of doing something convoluted with the eq function). That being said, you could still have a comparator that sorts based on key.length and considers two keys equivalent for a sort order if they have the same length.

On the other hand, if we introduce this, it would be impossible to leverage traditional hash structures as optimizations in the future (e.g. add a bloom filter on top of a skiplist) because two elements could be equivalent under this equivilator but have different hashes.

It generally seems dangerous to me to break the standard hash/eq dependency that is implicitly used by so many data structures.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to spell out the motivations for this.

Agreed. My example was pretty contrived; I can provide more realistic motivation.

Though I will remark that set.insert(key.length()) would produce a totally different program as you'd lose access to the first key inserted. Even skip_map.insert(key.length(), key) is not equivalent since that would give you the last key inserted instead of the first.

It generally seems dangerous to me to break the standard hash/eq dependency that is implicitly used by so many data structures.

Well, this doesn't "break" the Hash/Eq dependency since nothing changes if you stick to the standard library traits. But I think the documentation here may be confusing because hashing was out of scope for this PR, so there is no indication of how you would handle hashing when you need it. In reality, hashing is very easy to accommodate; it just requires another "Hash"-ator trait to go alongside Comparator and Equivalator. This is another concept that has been in the C++ standard library for decades.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! If there's a good reference implementation in the standard C++ library that gives me a lot more confidence in the approach. Thanks for the explanation.

}

/// Returns an iterator over all entries in the map,
/// sorted by key.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// sorted by key.
/// sorted by the comparator's ordering of the key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also update SkipSet and change all method docs to use the term "equivalent" (instead of equal).

@@ -716,3 +716,59 @@ fn concurrent_insert_get_same_key() {
}
handle.join().unwrap()
}

#[test]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably makes sense to add tests for SkipMap as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Custom comparators for SkipList
4 participants