Skip to content

Conversation

hopeyen
Copy link
Contributor

@hopeyen hopeyen commented Jun 4, 2025

Why are these changes needed?

[context on circular wrapping]

the calculation for relative period index versus absolute period index was off by a factor of reservationWindow.

At the time of accountant metering implementation, reservationPeriodIndex was timestamp/reservationWindow. We received an audit issue concerning that the records can become corrupted when reservationWindow changes (configurable by Payment Vault owner on-chain). Our solution was to update reservationPeriodIndex to the nearest lower multiple of reservationWindow (timestamp/reservationWindow * reservationWindow) so that we can keep reservationWindow configurable without storing reservationWindow along each reservation period records or storing timestamp.

This introduced a bug into the accountant, which kept using the original calculation reservationPeriodIndex = timestamp/reservationWindow, relativeBinIndex = reservationPeriodIndex % numBins, and overflowBinIndex = (reservationPeriodIndex + 2) % numBins. This means that if reservationWindow % numBins = 0, the bin index is consistently 0 for current period, and 2 for overflow period. The unit tests were failing because all the cases had reservationWindow % numBins != 0

This PR updates relative bin index calculation: relativeBinIndex = (reservationPeriodIndex / reservationWindow) % numBins, and overflowBinIndex = (reservationPeriodIndex/reservationWindow + 2) % numBins; added test to cover this edge case

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@hopeyen hopeyen marked this pull request as ready for review June 4, 2025 19:58
@hopeyen hopeyen requested a review from a team as a code owner June 4, 2025 19:58
@hopeyen hopeyen requested review from anupsv and dmanc June 4, 2025 19:58
Copy link
Collaborator

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to bed so if you manage to make all the required changes and want a re-review, please ping @litt3. Otherwise I can take a look over the weekend potentially.

@@ -116,8 +116,8 @@ func (a *Accountant) BlobPaymentInfo(
return a.cumulativePayment, nil
}
return big.NewInt(0), fmt.Errorf(
"no bandwidth reservation found for account %s, and current cumulativePayment balance insufficient "+
"to make an on-demand dispersal. Consider depositing more eth to the PaymentVault contract.", a.accountID.Hex())
"invalid payments: no available bandwidth reservation found for account %s, and current cumulativePayment balance insufficient "+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really feel like this function should be broken into two: a function that only checks (and updates) reservation, and IF that fails (returns nil), then its caller's job to get a cumulativePayment). This function is very hard to review because its doing too many things, and BlobPaymentInfo doesn't reflect the fact that it updates state, and possibly returns a cumulativePayment. Can it update reservation state when returning a nonzero cumulativePayment?

These are all questions (and many more) that are running through my brain as I'm reviewing this. We could write more documentation on the function to describe all these possible scenarios, but I feel like it would be much simpler (and self-documenting) to split into 2 functions. Can prob be done in a separate PR at a future date, but still wanted to leave this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good refactor, only called in one spot func (a *Accountant) AccountBlob.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the reasoning. This is something I have planned in this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is just a refactor, could you make it a separate in-between PR, to not bloat the PR with actual new features?
In integration team, we try (but tbf its hard) to follow the RII pattern: new feature introductions should be broken into 3 separate PRs:

  1. refactor (any code refactors that are needed to prepare for the feature addition)
  2. interface
  3. implementation

You already broke your PR into interface and implementation, so that's awesome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can break it up. Given this is meant to be a fix for the current accounting, I would do refactor/interface/implementation for the upcoming quorum specific accounting

@@ -116,8 +116,8 @@ func (a *Accountant) BlobPaymentInfo(
return a.cumulativePayment, nil
}
return big.NewInt(0), fmt.Errorf(
"no bandwidth reservation found for account %s, and current cumulativePayment balance insufficient "+
"to make an on-demand dispersal. Consider depositing more eth to the PaymentVault contract.", a.accountID.Hex())
"invalid payments: no available bandwidth reservation found for account %s, and current cumulativePayment balance insufficient "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good refactor, only called in one spot func (a *Accountant) AccountBlob.

@@ -157,9 +157,24 @@ func (a *Accountant) SymbolsCharged(numSymbols uint64) uint64 {
return core.RoundUpDivide(numSymbols, a.minNumSymbols) * a.minNumSymbols
}

// GetRelativePeriodRecord returns the period record for the given index
// return empty record if there is no record for the relative index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment about the edge case where relativeIndex >= uint32(a.numBins)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. this should never happen, but just in case, it will return an empty one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true with the panic; did you remove the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in commit

// relativeIndex := uint32((index / a.reservationWindow) % uint64(a.numBins))
relativeIndex := uint32((index / a.reservationWindow) % uint64(a.numBins))
// Return empty record if the index is greater than the number of bins (should never happen by accountant initialization)
if relativeIndex >= uint32(a.numBins) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever possible that a.numBins can change at runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't have mut on it, it will be fixed to the configured value

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by "have mut"? A new developer could certainly add some code that changes it (by mistake). Would that be a problem? If yes, then it should be documented in the accountant struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which.. how should one choose numBins? There is no comment as to this effect, neither on the struct nor on the constructor.

  1. can it be anything?
  2. does it have to align with the disperser's meterer? (I see on our docs that "The disperser will track at least 4 periods per reservation, starting from the previous period to the period after next period. "). are these period per reservation the same thing?
  3. what is a good choice for this number? why? does it even matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some documentation.

  1. Anything. the accountant would use that configured number or 3, whichever is bigger
  2. the disperser tracks the same thing
  3. 3 is the minimum and the default to account for overflow usage. In reality client could even just have one bin and doesn't provide overflow functionality within the accountant. Using a bigger number doesn't provide functional difference to the accountant, but the dispersal client could track more reservation history

Copy link
Collaborator

@samlaf samlaf Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. how do you make sure the client is using the same numBins as the disperser? EDIT: I see you overwrite PeriodRecords in SetPaymentState... could we do this in constructor instead? SetPaymentState is mandatory to be called to function properly right? This should prob be documented, and also AccountBlob should just panic if SetPaymentState has not been called I think?
  2. whats the point of allowing >3 then if there's no functional difference? whats the point of tracking reservation history?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised a similar question here #1599 (comment). Let's move the discussion to that PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. numBins doesn't have to use the same numBins as the disperser. SetPaymentState are only called by the first dispersal request if the accountant isn't filled in. The idea was to allow clients to implement accountants that manage state differently such as persisting them instead of relying on the disperser to provide the payment state, or call eth client directly for the onchain states. These are thing we won't implement, but allowing the possibility for modifications. Perhaps we should make it more of a requirement. Can keep this thought for later

  2. I figured some clients might want to track more history for analyzing and adjusting their usage. The circular wrapping invariants stay the same with >3, and not a requirement to configure, so I figured it won't hurt. We can iterate on whether clients should have this flexibility

Copy link
Collaborator

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second round of comments

@hopeyen hopeyen requested a review from samlaf June 9, 2025 20:04
Copy link
Collaborator

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big improvements already, thanks for the changes. Mostly LGTM at this point, still have 2 questions to be addressed #1641 (comment)

@hopeyen
Copy link
Contributor Author

hopeyen commented Jun 9, 2025

Big improvements already, thanks for the changes. Mostly LGTM at this point, still have 2 questions to be addressed #1641 (comment)

thanks! I answered the questions, but can keep the discussions open for later changes. will go ahead and merge this one 🤞

@hopeyen hopeyen merged commit c229748 into master Jun 9, 2025
13 of 14 checks passed
@hopeyen hopeyen deleted the hope/accountant-circular-period-records branch June 9, 2025 20:52
@hopeyen hopeyen mentioned this pull request Jun 11, 2025
5 tasks
samlaf added a commit that referenced this pull request Jul 11, 2025
samlaf added a commit that referenced this pull request Jul 15, 2025
samlaf added a commit that referenced this pull request Jul 16, 2025
* Revert "refactor: account ledger interface incorporates debit slip (#1732)"

This reverts commit b4cf3d9.

* Revert "fix: period record circular wrapping refresh (#1731)"

This reverts commit 1091f46.

* Revert "fix: disperser_client v2 nil ptr bug (#1710)"

This reverts commit a5acd93.

* Revert "fix: disperser_client v2 should use default onDemandQuorums (#1700)"

This reverts commit 005e0a6.

* Revert "refactor: unify accountant debit slip (#1695)"

This reverts commit a9c0868.

* Revert "feat: AccountantLedger interface for next step unifications (#1694)"

This reverts commit 354e42f.

* Revert "feat: extract generic payment logic (#1693)"

This reverts commit 1003b98.

* Revert "fix: make disperser client backwards compatible (#1686)"

This reverts commit 26c612f.

* Revert "fix: payment state api doesn't fail at zero valued reservations/ondemand (#1682)"

This reverts commit 2031c1d.

* Revert "refactor: quorum specific metrics for metered bytes (#1668)"

This reverts commit 7ac688d.

* Revert "feat: GetPaymentStateForAllQuorums api impl (#1664)"

This reverts commit ed89588.

* Revert "refactor: core meterer consolidation (#1663)"

This reverts commit 319a865.

* Revert "refactor: onchain state interface (#1662)"

This reverts commit d795dfa.

* Revert "refactor: core meterer period record module (#1661)"

This reverts commit afbef90.

* Revert "refactor: payment common functions (#1653)"

This reverts commit 36a3d33.

* Revert "fix: period index calculation (#1641)"

This reverts commit c229748.

* Revert "refactor: use generic column name (#1626)"

This reverts commit 4d1994e.

* Revert "feat: payment onchain state interface (#1625)"

This reverts commit 61e46ac.

* Revert "feat: offchain quorum period record getter (#1620)"

This reverts commit a309139.

* Revert "feat: protobuf payment state api quorum specific variant (#1613)"

This reverts commit 58acdfd.

* Revert "fix: nil assignment to new account (#1612)"

This reverts commit bcbf9f5.

* Revert "feat: offchain batch writes (#1580)"

This reverts commit ce4d2f5.

* Revert "feat: validator authenticate on-demand request by disperser key (#1539)"

This reverts commit f545b16.

* fix: issues after series of reverts (scary)

* chore: add back ErrZeroSymbols error

This error was likely removed by mistake during a revert conflict fix from one of the previous set of reverted PRs.

* style: fix lint

* test: fix accountant_test hardcoded error string

* Revert "Revert "fix: period index calculation (#1641)""

This reverts commit b8767ae.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants