-
-
Couldn't load subscription status.
- Fork 3.3k
crypto: fast fe_batch_invert using Montgomery's trick
#10111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a change to avoid allocations:
diff --git a/src/crypto/crypto-ops.c b/src/crypto/crypto-ops.c
index d5db2481e..6205d19e8 100644
--- a/src/crypto/crypto-ops.c
+++ b/src/crypto/crypto-ops.c
@@ -30,7 +30,6 @@
#include <assert.h>
#include <stdint.h>
-#include <stdlib.h>
#include "warnings.h"
#include "crypto-ops.h"
@@ -322,28 +321,22 @@ int fe_batch_invert(fe *out, const fe *in, const int n) {
}
// Step 1: collect initial muls
- fe *init_muls = (fe *) malloc(n * sizeof(fe));
- if (!init_muls) {
- return 1;
- }
- fe_copy(init_muls[0], in[0]);
+ fe_copy(out[0], in[0]);
for (int i = 1; i < n; ++i) {
- fe_mul(init_muls[i], init_muls[i-1], in[i]);
+ fe_mul(out[i], out[i-1], in[i]);
}
// Step 2: get the inverse of all elems multiplied together
fe a;
- fe_invert(a, init_muls[n-1]);
+ fe_invert(a, out[n-1]);
// Step 3: get each inverse
for (int i = n; i > 1; --i) {
- fe_mul(out[i-1], a, init_muls[i-2]);
+ fe_mul(out[i-1], a, out[i-2]);
fe_mul(a, a, in[i-1]);
}
fe_copy(out[0], a);
- free(init_muls);
-
return 0;
}
b75df59 to
be4c802
Compare
be4c802 to
4e9adb7
Compare
| // Montgomery's trick | ||
| // https://iacr.org/archive/pkc2004/29470042/29470042.pdf 2.2 | ||
| int fe_batch_invert(fe *out, const fe *in, const int n) { | ||
| if (n == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the function be labeled vartime due to this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time variable to the width, which generally isn't a secret, shouldn't be an issue. You'd have to argue that the amount of inputs/amounts is a secret and should be constant-time to them accordingly (or whatever defines the width here).
|
|
||
| bool init() | ||
| { | ||
| m_fes = (fe *) malloc(n_elems * sizeof(fe)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for nullptr? Could easily use std::vector ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used a vector instead of a pointer in the latest. I think I did that originally and then changed it, I don't remember why. Good spot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure reminded me why I changed it from using a vector...
/Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/allocator.h:168:84: error: object expression of non-scalar type 'int[10]' cannot be used in a pseudo-destructor expression
168 | _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_HIDE_FROM_ABI void destroy(pointer __p) { __p->~_Tp(); }
I figured using a raw pointer was the simplest way to deal with this. Do you see a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this pattern in FCMP++ integration code, not just in tests here, so it would definitely be good to improve if you see a better way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted to using a raw pointer in the latest. Open to a better approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops.
I think the only way is to wrap in another object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unique_ptr mostly works here, but you have to match it with the correct allocation call. (new [] array).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use this pattern in FCMP++ integration code, not just in tests here, so it would definitely be good to improve if you see a better way.
Hopefully the code doesn't leak on exceptions (like these tests kind of do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the latest look? I'd update the integration code with that pattern too (it does currently leak on exceptions)
|
|
||
| bool test() | ||
| { | ||
| fe *inv_fes = (fe *) malloc(n_elems * sizeof(fe)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
tests/unit_tests/crypto.cpp
Outdated
| { | ||
| fe *ptr = (fe *) malloc(n * sizeof(fe)); | ||
| if (!ptr) | ||
| throw std::runtime_error("failed to malloc fe *"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw std::bad_alloc is probably more appropriate, as this has to allocate memory for the std::string after just failing to malloc.
Again, not sure why std::vector isn't used instead.
tests/unit_tests/crypto.cpp
Outdated
| { | ||
| fe *batch_inverted = alloc(n_elems); | ||
| ASSERT_EQ(fe_batch_invert(batch_inverted, init_elems, n_elems), 0); | ||
| ASSERT_EQ(memcmp(batch_inverted, norm_inverted, n_elems * sizeof(fe)), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth noting that in-memory representations of fe might not be equal even if the fe_tobyte() canonical representations would be the same. They happen to be in this case, and it might be worth keeping this test as-is? But it's conceivable that some correct change to e.g. fe_mul() down the line might cause this line to fail. I would recommend keeping this line, but documenting it in case it fails later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref10 in general doesn't guarantee they're reduced. That caused annoyances with the FCMP++ code...
(Adding on to what you said, as you identified and raised)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a warning in the latest, also linking to an fe_reduce implementation that could be used to update the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented an fe_equals over here: j-berman@2a56788
Will make a follow-up PR to this one
| { | ||
| std::vector<fe> inv_fes(n_elems); | ||
|
|
||
| if (batched) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (batched) | |
| if constexpr (batched) |
For good measure, to help compiler optimize the branch out, and to signal to reader that that is what it's doing
|
Test failure is unrelated |
Idea by @kayabaNerve to speed up field element inversions when we need to do multiple inversions at one time. This is a significant perf optimization used in FCMP++ tree building.
The perf test shows that batch inverting 1000 elements is ~98% faster than inverting each one individually.