Implement TCP offloading via TSO and GRO for the Linux tun.Device, which
is made possible by virtio extensions in the kernel's TUN driver.
Delete conn.LinuxSocketEndpoint in favor of a collapsed conn.StdNetBind.
conn.StdNetBind makes use of recvmmsg() and sendmmsg() on Linux. All
platforms now fall under conn.StdNetBind, except for Windows, which
remains in conn.WinRingBind, which still needs to be adjusted to handle
multiple packets.
Also refactor sticky sockets support to eventually be applicable on
platforms other than just Linux. However Linux remains the sole platform
that fully implements it for now.
Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Accept packet vectors for reading and writing in the tun.Device and
conn.Bind interfaces, so that the internal plumbing between these
interfaces now passes a vector of packets. Vectors move untouched
between these interfaces, i.e. if 128 packets are received from
conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is
no internal buffering.
Currently, existing implementations are only adjusted to have vectors
of length one. Subsequent patches will improve that.
Also, as a related fixup, use the unix and windows packages rather than
the syscall package when possible.
Co-authored-by: James Tucker <james@tailscale.com>
Signed-off-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This does bind_std only; other platforms remain.
The remaining alloc per iteration in the Throughput benchmark
comes from the tuntest package, and should not appear in regular use.
name old time/op new time/op delta
Latency-10 25.2µs ± 1% 25.0µs ± 0% -0.58% (p=0.006 n=10+10)
Throughput-10 2.44µs ± 3% 2.41µs ± 2% ~ (p=0.140 n=10+8)
name old alloc/op new alloc/op delta
Latency-10 854B ± 5% 741B ± 3% -13.22% (p=0.000 n=10+10)
Throughput-10 265B ±34% 267B ±39% ~ (p=0.670 n=10+10)
name old allocs/op new allocs/op delta
Latency-10 16.0 ± 0% 14.0 ± 0% -12.50% (p=0.000 n=10+10)
Throughput-10 2.00 ± 0% 1.00 ± 0% -50.00% (p=0.000 n=10+10)
name old packet-loss new packet-loss delta
Throughput-10 0.01 ±82% 0.01 ±282% ~ (p=0.321 n=9+8)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
There are more places where we'll need to add it later, when Go 1.18
comes out with support for it in the "net" package. Also, allowedips
still uses slices internally, which might be suboptimal.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Instead of hard-coding exactly two sources from which
to receive packets (an IPv4 source and an IPv6 source),
allow the conn.Bind to specify a set of sources.
Beneficial consequences:
* If there's no IPv6 support on a system,
conn.Bind.Open can choose not to return a receive function for it,
which is simpler than tracking that state in the bind.
This simplification removes existing data races from both
conn.StdNetBind and bindtest.ChannelBind.
* If there are more than two sources on a system,
the conn.Bind no longer needs to add a separate muxing layer.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
This makes it clearer that they are fresh on each attempt,
and avoids the bookkeeping required to clearing them on failure.
Also, remove an unnecessary err != nil.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>