Skip to content

Revert l3vni optimization #19344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

raja-rajasekar
Copy link
Contributor

There is an issue in this L3vni optimization design. With the current
optimization, we have two issues.
 1) Overwriting the states (ADD/Withdraw) leading to inconsistencies
    between Zebra and BGP.
 2) Post processing handling in the delete

- When BGP receives a L3VNI add
       * Pre-processes (update RDs, tunnel ip etc..)
       * Schedule the remote route processing for global walk
       * No post processing
- However, in case of L3VNI del
       * No pre-processing
       * Schedules the remote route processing for global walk
       * Post-processing: handle the delete/memset portion of code

Imagine the below situation
T0: L3VNI Up in zebra
T1: L3VNI added in BGP via Zebra
T2: BGP imports type-5 route in the tenant VRF for this L3VNI based on
    import RTs
T3: VRF routes runs best path and programmed in zebra
T4: Before Zebra gets a chance to process this routes L3VNI is
    oper down (due to SVI down or some other reason)
T5: When the route does get processed in zebra, nexthop_active marks
    this as in-active based on the DVNI/SVD check
T6: L3VNI oper down goes down to BGP
T7: L3VNI comes up, this trigger masks the l3vni down in BGP
T8: BGP does not unimport/import the routes again and zebra is in stale state

The right fix for this is
 1) DO NOT overwrite the updates (ADD/DELETE)
 2) Queue the entire handling i.e. pre-processing + remote-route +
    post-processing for later and not chunks of it.

Note: L2VNI optimization still remains intact (zebra cleansup the state
+ no post processing). This issues is specific to L3vni optimization
only because of post-processing in the DELETE handling.

…sing"

This reverts commit 0f2cb27.

There is an issue in this L3vni optimization design. With the current
optimization, we have two issues.
 1) Overwriting the states (ADD/Withdraw) leading to inconsistencies
    between Zebra and BGP.
 2) Post processing handling in the delete

- When BGP receives a L3VNI add
       * Pre-processes (update RDs, tunnel ip etc..)
       * Schedule the remote route processing for global walk
       * No post processing
- However, in case of L3VNI del
       * No pre-processing
       * Schedules the remote route processing for global walk
       * Post-processing: handle the delete/memset portion of code

Imagine the below situation
T0: L3VNI Up in zebra
T1: L3VNI added in BGP via Zebra
T2: BGP imports type-5 route in the tenant VRF for this L3VNI based on
    import RTs
T3: VRF routes runs best path and programmed in zebra
T4: Before Zebra gets a chance to process this routes L3VNI is
    oper down (due to SVI down or some other reason)
T5: When the route does get processed in zebra, nexthop_active marks
    this as in-active based on the DVNI/SVD check
T6: L3VNI oper down goes down to BGP
T7: L3VNI comes up, this trigger masks the l3vni down in BGP
T8: BGP does not unimport/import the routes again and zebra is in stale state

The right fix for this is
 1) DO NOT overwrite the updates (ADD/DELETE)
 2) Queue the entire handling i.e. pre-processing + remote-route +
    post-processing for later and not chunks of it.

Note: L2VNI optimization still remains intact (zebra cleansup the state
+ no post processing). This issues is specific to L3vni optimization
only because of post-processing in the DELETE handling.

Signed-off-by: Rajasekar Raja <[email protected]>
@raja-rajasekar raja-rajasekar force-pushed the rajasekarr/revert_l3vni_optimization branch from d72e275 to 04ecfcb Compare August 5, 2025 18:26
@ton31337
Copy link
Member

ton31337 commented Aug 5, 2025

@raja-rajasekar how is this critical, because I see it's already in 10.4, and 10.3. So we need urgent bugfix releases?

@ton31337
Copy link
Member

ton31337 commented Aug 5, 2025

@Mergifyio backport stable/10.4 stable/10.3

Copy link

mergify bot commented Aug 5, 2025

backport stable/10.4 stable/10.3

🟠 Waiting for conditions to match

  • merged [📌 backport requirement]

@raja-rajasekar
Copy link
Contributor Author

raja-rajasekar commented Aug 5, 2025

@raja-rajasekar how is this critical, because I see it's already in 10.4, and 10.3. So we need urgent bugfix releases?

In our internal testing, we did a quick flap of an L3vni where this was exposed(Meaning that the ADD-DEL-ADD is in queue).(kind of timing/race condition).

Assuming it had been in upstream for almost a year kind of hints that no one has tried such quick flaps.

The right way is to queue the entire pre/remote/post processing (altogether) for later processing and not chunks of it in add and delete..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants