crypto: fast `fe_batch_invert` using Montgomery's trick #10111

j-berman · 2025-09-26T02:34:59Z

Idea by @kayabaNerve to speed up field element inversions when we need to do multiple inversions at one time. This is a significant perf optimization used in FCMP++ tree building.

The perf test shows that batch inverting 1000 elements is ~98% faster than inverting each one individually.

jeffro256

Here's a change to avoid allocations:

diff --git a/src/crypto/crypto-ops.c b/src/crypto/crypto-ops.c
index d5db2481e..6205d19e8 100644
--- a/src/crypto/crypto-ops.c
+++ b/src/crypto/crypto-ops.c
@@ -30,7 +30,6 @@
 
 #include <assert.h>
 #include <stdint.h>
-#include <stdlib.h>
 
 #include "warnings.h"
 #include "crypto-ops.h"
@@ -322,28 +321,22 @@ int fe_batch_invert(fe *out, const fe *in, const int n) {
   }
 
   // Step 1: collect initial muls
-  fe *init_muls = (fe *) malloc(n * sizeof(fe));
-  if (!init_muls) {
-    return 1;
-  }
-  fe_copy(init_muls[0], in[0]);
+  fe_copy(out[0], in[0]);
   for (int i = 1; i < n; ++i) {
-    fe_mul(init_muls[i], init_muls[i-1], in[i]);
+    fe_mul(out[i], out[i-1], in[i]);
   }
 
   // Step 2: get the inverse of all elems multiplied together
   fe a;
-  fe_invert(a, init_muls[n-1]);
+  fe_invert(a, out[n-1]);
 
   // Step 3: get each inverse
   for (int i = n; i > 1; --i) {
-    fe_mul(out[i-1], a, init_muls[i-2]);
+    fe_mul(out[i-1], a, out[i-2]);
     fe_mul(a, a, in[i-1]);
   }
   fe_copy(out[0], a);
 
-  free(init_muls);
-
   return 0;
 }

https://iacr.org/archive/pkc2004/29470042/29470042.pdf 2.2 Co-authored-by: Jeffro <[email protected]>

vtnerd · 2025-09-27T15:58:25Z

src/crypto/crypto-ops.c

+// Montgomery's trick
+// https://iacr.org/archive/pkc2004/29470042/29470042.pdf 2.2
+int fe_batch_invert(fe *out, const fe *in, const int n) {
+  if (n == 0) {


Should the function be labeled vartime due to this branch?

Time variable to the width, which generally isn't a secret, shouldn't be an issue. You'd have to argue that the amount of inputs/amounts is a secret and should be constant-time to them accordingly (or whatever defines the width here).

vtnerd · 2025-09-27T16:01:08Z

tests/performance_tests/fe_batch_invert.h

+
+  bool init()
+  {
+    m_fes = (fe *) malloc(n_elems * sizeof(fe));


Check for nullptr? Could easily use std::vector ...?

Used a vector instead of a pointer in the latest. I think I did that originally and then changed it, I don't remember why. Good spot

CI failure reminded me why I changed it from using a vector...

/Applications/Xcode_16.4.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__memory/allocator.h:168:84: error: object expression of non-scalar type 'int[10]' cannot be used in a pseudo-destructor expression 168 | _LIBCPP_DEPRECATED_IN_CXX17 _LIBCPP_HIDE_FROM_ABI void destroy(pointer __p) { __p->~_Tp(); }

I figured using a raw pointer was the simplest way to deal with this. Do you see a better way?

I use this pattern in FCMP++ integration code, not just in tests here, so it would definitely be good to improve if you see a better way

Reverted to using a raw pointer in the latest. Open to a better approach

Oops.

I think the only way is to wrap in another object.

std::unique_ptr mostly works here, but you have to match it with the correct allocation call. (new [] array).

I use this pattern in FCMP++ integration code, not just in tests here, so it would definitely be good to improve if you see a better way.

Hopefully the code doesn't leak on exceptions (like these tests kind of do).

How does the latest look? I'd update the integration code with that pattern too (it does currently leak on exceptions)

vtnerd · 2025-09-27T16:01:35Z

tests/performance_tests/fe_batch_invert.h

+
+  bool test()
+  {
+    fe *inv_fes = (fe *) malloc(n_elems * sizeof(fe));


vtnerd · 2025-09-27T16:02:11Z

tests/unit_tests/crypto.cpp

+  {
+    fe *ptr = (fe *) malloc(n * sizeof(fe));
+    if (!ptr)
+      throw std::runtime_error("failed to malloc fe *");


throw std::bad_alloc is probably more appropriate, as this has to allocate memory for the std::string after just failing to malloc.

Again, not sure why std::vector isn't used instead.

jeffro256 · 2025-09-27T16:48:46Z

tests/unit_tests/crypto.cpp

+  {
+    fe *batch_inverted = alloc(n_elems);
+    ASSERT_EQ(fe_batch_invert(batch_inverted, init_elems, n_elems), 0);
+    ASSERT_EQ(memcmp(batch_inverted, norm_inverted, n_elems * sizeof(fe)), 0);


It's worth noting that in-memory representations of fe might not be equal even if the fe_tobyte() canonical representations would be the same. They happen to be in this case, and it might be worth keeping this test as-is? But it's conceivable that some correct change to e.g. fe_mul() down the line might cause this line to fail. I would recommend keeping this line, but documenting it in case it fails later.

ref10 in general doesn't guarantee they're reduced. That caused annoyances with the FCMP++ code...

(Adding on to what you said, as you identified and raised)

Added a warning in the latest, also linking to an fe_reduce implementation that could be used to update the test

I implemented an fe_equals over here: j-berman@2a56788

Will make a follow-up PR to this one

jeffro256 · 2025-09-27T18:29:13Z

tests/performance_tests/fe_batch_invert.h

+  {
+    std::vector<fe> inv_fes(n_elems);
+
+    if (batched)


Suggested change

if (batched)

if constexpr (batched)

For good measure, to help compiler optimize the branch out, and to signal to reader that that is what it's doing

This reverts commit d844269.

j-berman · 2025-09-29T19:13:56Z

Test failure is unrelated

crypto: break out fe_frombytes_vartime function

864430c

j-berman mentioned this pull request Sep 25, 2025

Upstreaming plan seraphis-migration/monero#103

Open

selsta added pending review cryptography labels Sep 26, 2025

jeffro256 suggested changes Sep 27, 2025

View reviewed changes

j-berman force-pushed the fcmp++-fe-batch-invert branch from b75df59 to be4c802 Compare September 27, 2025 02:04

crypto: fast fe_batch_invert using Montgomery's trick

4e9adb7

https://iacr.org/archive/pkc2004/29470042/29470042.pdf 2.2 Co-authored-by: Jeffro <[email protected]>

j-berman force-pushed the fcmp++-fe-batch-invert branch from be4c802 to 4e9adb7 Compare September 27, 2025 02:06

vtnerd suggested changes Sep 27, 2025

View reviewed changes

jeffro256 reviewed Sep 27, 2025

View reviewed changes

j-berman added 2 commits September 27, 2025 10:05

@vtnerd comments, use vector instead of pointer

d844269

@jeffro256 warning in test about fe repr's

7bdbcc4

jeffro256 reviewed Sep 27, 2025

View reviewed changes

j-berman added 3 commits September 29, 2025 09:12

Revert "@vtnerd comments, use vector instead of pointer"

f9d7b1e

This reverts commit d844269.

@vtnerd suggestion to throw bad_alloc

b31e8cb

@jeffro256 suggestion constexpr

be8e06c

jeffro256 approved these changes Sep 29, 2025

View reviewed changes

j-berman added 2 commits September 29, 2025 10:32

@vtnerd suggestion replace raw pointer with unique ptr fe[]

6740b16

@vtnerd suggestion in perf test

98d72f1

j-berman mentioned this pull request Sep 30, 2025

RingCT crypto: 6x faster zero commit #10108

Open

Uh oh!

Uh oh!

crypto: fast fe_batch_invert using Montgomery's trick #10111

Are you sure you want to change the base?

crypto: fast fe_batch_invert using Montgomery's trick #10111

Conversation

j-berman commented Sep 26, 2025

Uh oh!

jeffro256 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeffro256 Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-berman commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crypto: fast `fe_batch_invert` using Montgomery's trick #10111

crypto: fast `fe_batch_invert` using Montgomery's trick #10111

jeffro256 Sep 27, 2025 •

edited

Loading