Skip to content

Conversation

NickCraver
Copy link
Collaborator

Note: this is based on #1912, so let's wait until that's in.

Changeset overall:

  • Add MakePrimaryAsync (deprecate MakeMaster)
    • This yanks the code and does an evil .Wait() - better ideas?
  • Add ReplicaOfAsync (deprecate ReplicaOf)
  • Remove the last usages of CommandFlags.HighPriority
  • Remove ServerEndPoint.WriteDirectFireAndForgetSync<T> path

We got a little heavy with stats over the years and it's cumbersome to add anything here (e.g. another `out` parameter). Instead of adding yet more stats in upcoming bits as-is, I decided to take a stab at simplifying this with `readonly struct` passes. We're not _super_ concerned with efficiency in the exception path but hey, why not.

This should simplify maintenance/additions and clarify what each property is. Note that we have some static defaults here because `default` does _not_ run property initializers where a `new()` does.
This isn't working as we want yet in several regards but putting a progress commit in here.
Overall:
- Needs offload to a sibling connection (and option governing this)
- Needs to fail much faster in the initial connection scenario
- Lots of test love around the new functionality
default subtly differs here in that the reader/writer states will default to int/byte 0 which is not the same as NA. Perhaps we should just make NA be the 0 state though, which would simplify all use cases...

@mgravell @philon-msft  thoughts there? Talking PhysicalConnection Reader/Writer status enums.
The failure tests need not retry loop n times.
Also:
- Removes the handshake queue, relying on the pipe directly instead.
- Formats TestBase for easier maintenance
This gets us going to queue commands while disconnected. We still need handoff to sibling connections but this gets us going.
We observe a race here between the connection state saying we're connected and the connection actually being selectable, this tightens that window tremendously.
This fixes the immediate issue but we need to be really sure we never queue anything awaited in the middle of nowhere into a backlog we never start. Now that I understand the issue, we can probably harden this a bit more.
Currently the way we handshake is to get everything configured, wait for a tracer to complete, and then issue the tiebreakers to all servers if they are in play. This complicates a few things with respect to timings, duplication, and write paths being a one-off for tie breakers, which I tripped on hard in #1912.

In this, we instead move the tie breaker fetch as part of AutoConfigure as a fire-and-forget-process-the-result-later setup with a dedicated processor. This all happens before the tracer fires moving us to the next connection phase (added comments) so we should be safe. It should reduce both complexity and overall connection times proportional to endpoint latency (since we wait for completion right now).

What needs adding here is tests with us disabling commands like INFO, GET, etc. and ensuring things still behave as we want. In the overall, the tie breaker is slightly less isolated but _should_ be happening in the same order and with the same exception if any - no net result change is intended there with respect to how we do or don't error along the way. But we never want a connection to fail _because of a tiebreaker_ and I think that warrants a few tests:

- [ ] Disable `INFO` and see if we can connect
- [ ] Disable `GET` and see if we can connect
- [ ] Store some invalid TieBreaker and see if we can connect (e.g. make it a hash instead of a string)
...and maybe others?
Good catch by @TimLovellSmith, this indeed should have a more constrained set of code being checked - we can still safely exit on the null and FireAndForget cases, only needing to wrap around the throws.
This is in seconds (I now loathe all non-timespan time APIs!), the whole point was for the keepalive to fire and detect failure.
These could race between test runs - let's not do that.
Changeset overall:
- Add `MakePrimaryAsync` (deprecate `MakeMaster`)
  - This yanks the code and does an evil .Wait() - better ideas?
- Add `ReplicaOfAsync` (deprecate `ReplicaOf`)
- Remove the last usages of `CommandFlags.HighPriority`
- Remove `ServerEndPoint.WriteDirectFireAndForgetSync<T>` path
Based on #1912 (otherwise tests fail those PRs are working on, but this is good against `main` too)
- Shortens the names in the check list
- Removes .NET 5.0 SDK we don't need the runtime for anymore
- Removes `netcoreapp3.1` from StackExchange.Redis.Tests (running `net472` and `net6.0` now)
@NickCraver NickCraver mentioned this pull request Feb 6, 2022
@NickCraver NickCraver marked this pull request as ready for review February 6, 2022 20:28
@philon-msft
Copy link
Collaborator

A more complicated scenario might involve a master/replica setup; for this usage, simply specify all the desired nodes that make up that logical redis tier (it will automatically identify the master):

This may be a good time to convert to 'primary' terminology in docs too


Refers to: docs/Basics.md:15 in 9661ec4. [](commit_id = 9661ec4, deletion_comment = False)

@NickCraver
Copy link
Collaborator Author

NickCraver commented Feb 7, 2022

@philon-msft was going to do another pass there since it's so massive, this was mostly "get them in shape...but change if it I'm changing the line anyway". I figure another pass or 2 at the terminology change is best since it's quite wide overall - thoughts?

For context: some of that's happening in #1976 and didn't want to create a pile of conflicts, especially with method/code renames in play as well.

@philon-msft
Copy link
Collaborator

Also: Testing.md contains a link to TestConfig.cs on the 'master' branch, should be on 'main'

@philon-msft
Copy link
Collaborator

Suggested changes in Timeouts.md:

Replace this current guidance:

...with:

@NickCraver
Copy link
Collaborator Author

@philon-msft checking here - were comments intended for #1976?

@philon-msft
Copy link
Collaborator

@philon-msft checking here - were comments intended for #1976?

There appears to be some overlap - feel free to apply comments to either PR

Base automatically changed from craver/backlog-v2.5 to main February 7, 2022 16:27
@NickCraver
Copy link
Collaborator Author

NickCraver commented Feb 7, 2022

@philon-msft oh weird, I have no idea what happened with the diff here - merged main in after the other PRs and now should be much cleaner and more scoped to actual changes!

NickCraver added a commit that referenced this pull request Feb 7, 2022
Good suggestions from @philon-msft - see #1969 (comment)
@NickCraver NickCraver merged commit 953824f into main Feb 8, 2022
@NickCraver NickCraver deleted the craver/primary-async branch February 8, 2022 15:39
NickCraver added a commit that referenced this pull request Feb 8, 2022
This one's a big docs cleanup across the board. There's a few very minor code tweaks in here but mostly: docs. We had some typos, some wrong words, and a lot of inconsistency so this is a pass at tightening all that up.

It's based on top of #1969 (for minimal merge conflicts), so currently last in the merge list. But: ready for review. Highly recommend using the "Viewed" checkbox on files in case there are merge conflicts later...this one's got a few in it.

Co-authored-by: mgravell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants