wireguard-go

mirror of https://git.zx2c4.com/wireguard-go synced 2024-11-15 01:05:15 +01:00

Author	SHA1	Message	Date
Jordan Whited	1ec454f253	device: move Queue{In,Out}boundElement Mutex to container type Queue{In,Out}boundElement locking can contribute to significant overhead via sync.Mutex.lockSlow() in some environments. These types are passed throughout the device package as elements in a slice, so move the per-element Mutex to a container around the slice. Reviewed-by: Maisem Ali <maisem@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2023-10-10 15:07:36 +02:00
Jordan Whited	4201e08f1d	device: distribute crypto work as slice of elements After reducing UDP stack traversal overhead via GSO and GRO, runtime.chanrecv() began to account for a high percentage (20% in one environment) of perf samples during a throughput benchmark. The individual packet channel ops with the crypto goroutines was the primary contributor to this overhead. Updating these channels to pass vectors, which the device package already handles at its ends, reduced this overhead substantially, and improved throughput. The iperf3 results below demonstrate the effect of this commit between two Linux computers with i5-12400 CPUs. There is roughly ~13us of round trip latency between them. The first result is with UDP GSO and GRO, and with single element channels. Starting Test: protocol: TCP, 1 streams, 131072 byte blocks [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 12.3 GBytes 10.6 Gbits/sec 232 3.15 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 12.3 GBytes 10.6 Gbits/sec 232 sender [ 5] 0.00-10.04 sec 12.3 GBytes 10.6 Gbits/sec receiver The second result is with channels updated to pass a slice of elements. Starting Test: protocol: TCP, 1 streams, 131072 byte blocks [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 13.2 GBytes 11.3 Gbits/sec 182 3.15 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - Test Complete. Summary Results: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 13.2 GBytes 11.3 Gbits/sec 182 sender [ 5] 0.00-10.04 sec 13.2 GBytes 11.3 Gbits/sec receiver Reviewed-by: Adrian Dewhurst <adrian@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2023-10-10 15:07:36 +02:00
Jordan Whited	3bb8fec7e4	conn, device, tun: implement vectorized I/O plumbing Accept packet vectors for reading and writing in the tun.Device and conn.Bind interfaces, so that the internal plumbing between these interfaces now passes a vector of packets. Vectors move untouched between these interfaces, i.e. if 128 packets are received from conn.Bind.Read(), 128 packets are passed to tun.Device.Write(). There is no internal buffering. Currently, existing implementations are only adjusted to have vectors of length one. Subsequent patches will improve that. Also, as a related fixup, use the unix and windows packages rather than the syscall package when possible. Co-authored-by: James Tucker <james@tailscale.com> Signed-off-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2023-03-10 14:52:13 +01:00
Jason A. Donenfeld	ebbd4a4330	global: bump copyright year Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2023-02-07 20:39:29 -03:00
Jason A. Donenfeld	bb719d3a6e	global: bump copyright year Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2022-09-20 17:21:32 +02:00
Jason A. Donenfeld	484a9fd324	device: flush peer queues before starting device In case some old packets snuck in there before, this flushes before starting afresh. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-10 00:39:28 +01:00
Josh Bleecher Snyder	ecceaadd16	device: remove nil elem check in finalizers This is not necessary, and removing it speeds up detection of UAF bugs. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 18:28:55 +01:00
Jason A. Donenfeld	6ac1240821	device: do not attach finalizer to non-returned object Before, the code attached a finalizer to an object that wasn't returned, resulting in immediate garbage collection. Instead return the actual pointer. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	4b5d15ec2b	device: lock elem in autodraining queue before freeing Without this, we wind up freeing packets that the encryption/decryption queues still have, resulting in a UaF. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Josh Bleecher Snyder	d8dd1f254f	device: remove mutex from Peer send/receive The immediate motivation for this change is an observed deadlock. 1. A goroutine calls peer.Stop. That calls peer.queue.Lock(). 2. Another goroutine is in RoutineSequentialReceiver. It receives an elem from peer.queue.inbound. 3. The peer.Stop goroutine calls close(peer.queue.inbound), close(peer.queue.outbound), and peer.stopping.Wait(). It blocks waiting for RoutineSequentialReceiver and RoutineSequentialSender to exit. 4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets(). SendStagedPackets attempts peer.queue.RLock(). That blocks forever because the peer.Stop goroutine holds a write lock on that mutex. A background motivation for this change is that it can be expensive to have a mutex in the hot code path of RoutineSequential*. The mutex was necessary to avoid attempting to send elems on a closed channel. This commit removes that danger by never closing the channel. Instead, we send a sentinel nil value on the channel to indicate to the receiver that it should exit. The only problem with this is that if the receiver exits, we could write an elem into the channel which would never get received. If it never gets received, it cannot get returned to the device pools. To work around this, we use a finalizer. When the channel can be GC'd, the finalizer drains any remaining elements from the channel and restores them to the device pool. After that change, peer.queue.RWMutex no longer makes sense where it is. It is only used to prevent concurrent calls to Start and Stop. Move it to a more sensible location and make it a plain sync.Mutex. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 13:02:52 -08:00
Josh Bleecher Snyder	57aadfcb14	device: create channels.go We have a bunch of stupid channel tricks, and I'm about to add more. Give them their own file. This commit is 100% code movement. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 12:38:19 -08:00

11 Commits