You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
socket: implement Linux flavor of SO_REUSE_PORT option
This patch is a manual back-port of the original FreeBSD
patch https://reviews.freebsd.org/rS334719. The FreeBSD patch
adds support of the SO_REUSEPORT_LB socket option, whereas the one
below implements the Linux flavor of SO_REUSEPORT which in effect
borrows good chunk of the FreeBSD implementation.
Please note the FreeBSD committers decided to retain support of the
original SO_REUSEPORT option and add new one - SO_REUSEPORT_LB. The new
option exhibits same behavior as the older one but adds important new feature
- load balancing across listener sockets sharing the same port. The
FreeBSD manual states this:
"SO_REUSEPORT_LB allows completely duplicate bindings by multiple pro-
cesses if they all set SO_REUSEPORT_LB before binding the port. Incoming
TCP and UDP connections are distributed among the sharing processes based
on a hash function of local port number, foreign IP address and port num-
ber. A maximum of 256 processes can share one socket."
So most of the original patch was back-ported as-is except for the parts
with the conditional logic to account for both SO_REUSEPORT and SO_REUSEPORT_LB
which was unnecessary for OSv as it implements Linux which only supports
the SO_REUSEPORT option. In addition in some places I had to change
some of C code to use C++ constructs just like in another places of in_pcb.cc.
Bulk of the patch below, is about adding definitions of the struct inpcblbgroup and
functions to allocate, deallocate and manipulate it to manage load
balancing groups including adding and removing member sockets or more
specifically their PCBs - Protocol Control Blocks:
(Internal API)
- struct inpcblbgroup *in_pcblbgroup_alloc() - allocates new LB group
- void in_pcblbgroup_free(struct inpcblbgroup *grp) - frees existing LB group
- struct inpcblbgroup *in_pcblbgroup_resize(struct inpcblbgrouphead *hdr, struct inpcblbgroup *old_grp, int size) - creates new LB group that is a resized version of the old one
- in_pcblbgroup_reorder(struct inpcblbgrouphead *hdr, struct inpcblbgroup **grpp, int i) - PCB at index 'i' is removed from the group, pull up the ones below il_inp[i] and shrink group if possible
(Public API)
- int in_pcbinslbgrouphash(struct inpcb *inp) - adds PCB member to the LB group for SO_REUSEPORT option (allocate new LB group if necessary)
- void in_pcbremlbgrouphash(struct inpcb *inp) - removes PCB from load balance group (free existing LB group if last member)
- struct inpcb *in_pcblookup_lbgroup(const struct inpcbinfo *pcbinfo,
const struct in_addr *laddr, uint16_t lport, const struct in_addr *faddr, uint16_t fport, int lookupflags) - looks up
inpcb in a load balancing group
The remaining part of the patch, modifies relevant parts in in_pcb.cc to:
1) add logic add and remove inpcb members to/from LB groups by
delegating to in_pcbinslbgrouphash() and in_pcbremlbgrouphash() during
setup and teardown of sockets and their PCBs
2) add logic to lookup PCBs (and relevant sockets) by delegating to
in_pcblookup_lbgroup()
This patch does not add any new locking appart for some places
that verify certain locks are held in place when functions are called.
Please note that at some point during the review process the original
version of the FreeBSD patch contained the logic originating from
DragonFlyBSD (DragonFlyBSD/DragonFlyBSD@02ad2f0)
to handle a drawback when processes/threads using SO_REUSE_PORT would crash
causing some pending sockets in the completion and incompletion queues
to be dropped. But due to the concerns in the locking logic, this part
was removed from the patch (https://reviews.freebsd.org/D11003#326149)
and therefore also is absent in this patch below. I believe also
Linux does not handle this drawback correctly as of now.
From practical standpoint, this patch greatly improves the throughput
of applications using SO_REUSEPORT. More specifically this http
server example implemented in Rust -
https://gist.github.com/alexcrichton/7b97beda66d5e9b10321207cd69afbbc -
yields way better performance in SMP mode (the 4 CPU difference is most
profound):
Req/sec BEFORE this patch:
2 CPU - 82199.52
4 CPU - 97982.16
AFTER this patch:
2 CPU - 82361.77
4 CPU - 147389.79
Finally note this patch does not change any non-load balancing
aspects of the SO_REUSEPORT option that were already in place
before this patch, but inactive. More specifically these would
be related to how SO_REUSEADDR and/or SO_REUSEPORT flags drive
same address and/or port collision logic.
Some articles about SO_REUSE_PORT:
- https://lwn.net/Articles/542629/
- https://linuxjedi.co.uk/2020/04/25/socket-so_reuseport-and-kernel-implementations/
V2: Comparing to the V1, this patch changes slightly the expression
to calculate size of the allocated memory in in_pcblbgroup_alloc() in
order to make it compile with GCC 11 (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95942);
So I changed this:
bytes = __offsetof(struct inpcblbgroup, il_inp[size]);
to:
bytes = __offsetof(struct inpcblbgroup, il_inp) + sizeof(inpcblbgroup::il_inp[0]) * size;
Fixes#1170
Signed-off-by: Waldemar Kozaczuk <[email protected]>
Message-Id: <[email protected]>
0 commit comments