-
Notifications
You must be signed in to change notification settings - Fork 502
Add custom comparators #1182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add custom comparators #1182
Conversation
cc @al8n |
Restored the |
4e63330
to
b19ede6
Compare
Renamed OrdComparator to BasicComparator and made compatible with the Equivalent/Comparable traits
b19ede6
to
fbfe7fd
Compare
It's been a while since I circled back to this, but is there a sentiment as to whether to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, though I admit I'm a little sketched out by the concept of breaking the hash/eq dependency definition with Equivalator
trait...
Would be good to get some committers reviewing this as well! Excited for this feature.
/// Because you can use `Equivalator` to implement very different comparison | ||
/// logic from the `Eq` trait for a given type, there is no expectation that | ||
/// two keys should have the same hash for them to compare equal. For example, | ||
/// this is a valid implementation that treats two inputs of any type as equal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to spell out the motivations for this. It is always possible for the user to implement some funky transform of their keys before they insert into a set/map and when retrieving data as a workaround for not having this. (e.g. in your test, I think a much better implementation if the user wanted that is to just set.insert(key.length())
instead of doing something convoluted with the eq function). That being said, you could still have a comparator that sorts based on key.length
and considers two keys equivalent for a sort order if they have the same length.
On the other hand, if we introduce this, it would be impossible to leverage traditional hash structures as optimizations in the future (e.g. add a bloom filter on top of a skiplist) because two elements could be equivalent under this equivilator but have different hashes.
It generally seems dangerous to me to break the standard hash/eq dependency that is implicitly used by so many data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to spell out the motivations for this.
Agreed. My example was pretty contrived; I can provide more realistic motivation.
Though I will remark that set.insert(key.length())
would produce a totally different program as you'd lose access to the first key inserted. Even skip_map.insert(key.length(), key)
is not equivalent since that would give you the last key inserted instead of the first.
It generally seems dangerous to me to break the standard hash/eq dependency that is implicitly used by so many data structures.
Well, this doesn't "break" the Hash/Eq dependency since nothing changes if you stick to the standard library traits. But I think the documentation here may be confusing because hashing was out of scope for this PR, so there is no indication of how you would handle hashing when you need it. In reality, hashing is very easy to accommodate; it just requires another "Hash"-ator trait to go alongside Comparator and Equivalator. This is another concept that has been in the C++ standard library for decades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! If there's a good reference implementation in the standard C++ library that gives me a lot more confidence in the approach. Thanks for the explanation.
} | ||
|
||
/// Returns an iterator over all entries in the map, | ||
/// sorted by key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// sorted by key. | |
/// sorted by the comparator's ordering of the key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also update SkipSet
and change all method docs to use the term "equivalent" (instead of equal).
@@ -716,3 +716,59 @@ fn concurrent_insert_get_same_key() { | |||
} | |||
handle.join().unwrap() | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably makes sense to add tests for SkipMap
as well
Closes #1180.
The diff is large due to having to add a type parameter
C
and update constraints in numerous places, but the logic changes are not large.This trait adds the
Comparator
andEquivalator
traits, which generalize theComparable
andEquivalent
traits by taking aself
parameter in addition to the two keys to compare, making it possible to change the comparison function used without changing the key type.Summary
trait Comparator<L, R>
andtrait Equivalator<L, R>
.struct BasicComparator
, which is a zero-sized type that implementsComparator
using theComparable
trait, which itself falls back onBorrow
andOrd
.C
toSkipList
,SkipMap
, andSkipSet
.C
is defaulted toOrdComparator
. This preserves the existing behavior of functions likeget()
,insert()
, etc.SkipList::with_comparator()
,SkipSet::with_comparator()
, andSkipMap::with_comparator()
constructors.SkipList::new()
,SkipMap::new()
, andSkipSet::new()
always useC = OrdComparator
. It would be nice if it worked for anyC
that implementsDefault
, but that would break type inference. This is exactly the reasonstd::collections::HashMap::new()
always usesS = RandomState
for its defaulted parameter even ifS
implementsDefault
.Box<dyn Fn(&[u8], &[u8]) -> Ordering>
as the comparator.Hashing
Because the goal of these traits is to support totally different comparison operations from the default for the underlying key type,
Equivalator
explicitly does not require two equivalent keys to have the same hash. If someone, someday wants to supportEquivalator
on their hash map, they will have to generalize theHash
trait as well.Safety
My one minor question is whether
unsafe impl<Q, R, K, V, C> Send for RefRange<'_, Q, R, K, V, C>
should requireC: Sync
. I don't think so because it doesn't requireK
orV
to be sync, but it was enough to make me second-guess myself.