wireguard-go

mirror of https://git.zx2c4.com/wireguard-go synced 2025-09-18 20:57:50 +02:00

Author	SHA1	Message	Date
Jason A. Donenfeld	54dbe2471f	conn: reconstruct v4 vs v6 receive function based on symtab This is kind of gross but it's better than the alternatives. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-04-12 15:35:32 -06:00
Kristupas Antanavičius	d2fd0c0cc0	device: allocate new buffer in receive death spiral Note: this bug is "hidden" by avoiding "death spiral" code path by `6228659` ("device: handle broader range of errors in RoutineReceiveIncoming"). If the code reached "death spiral" mechanism, there would be multiple double frees happening. This results in a deadlock on iOS, because the pools are fixed size and goroutine might stop until somebody makes space in the pool. This was almost 100% repro on the new ARM Macbooks: - Build with 'ios' tag for Mac. This will enable bounded pools. - Somehow call device.IpcSet at least couple of times (update config) - device.BindUpdate() would be triggered - RoutineReceiveIncoming would enter "death spiral". - RoutineReceiveIncoming would stall on double free (pool is already full) - The stuck routine would deadlock 'device.closeBindLocked()' function on line 'netc.stopping.Wait()' Signed-off-by: Kristupas Antanavičius <kristupas.antanavicius@nordsec.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-04-12 11:14:53 -06:00
Josh Bleecher Snyder	10533c3e73	all: make conn.Bind.Open return a slice of receive functions Instead of hard-coding exactly two sources from which to receive packets (an IPv4 source and an IPv6 source), allow the conn.Bind to specify a set of sources. Beneficial consequences: * If there's no IPv6 support on a system, conn.Bind.Open can choose not to return a receive function for it, which is simpler than tracking that state in the bind. This simplification removes existing data races from both conn.StdNetBind and bindtest.ChannelBind. * If there are more than two sources on a system, the conn.Bind no longer needs to add a separate muxing layer. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-04-02 11:07:08 -06:00
Josh Bleecher Snyder	6228659a91	device: handle broader range of errors in RoutineReceiveIncoming RoutineReceiveIncoming exits immediately on net.ErrClosed, but not on other errors. However, for errors that are known to be permanent, such as syscall.EAFNOSUPPORT, we may as well exit immediately instead of retrying. This considerably speeds up the package device tests right now, because the Bind sometimes (incorrectly) returns syscall.EAFNOSUPPORT instead of net.ErrClosed. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-03-30 12:41:43 -07:00
Josh Bleecher Snyder	02e419ed8a	device: rename unsafeCloseBind to closeBindLocked And document a bit. This name is more idiomatic. Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-03-30 12:07:12 -07:00
Jason A. Donenfeld	5f0c8b942d	device: signal to close device in separate routine Otherwise we wind up deadlocking. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-03-11 09:29:10 -07:00
Jason A. Donenfeld	593658d975	device: get rid of peers.empty boolean in timersActive There's no way for len(peers)==0 when a current peer has isRunning==false. This requires some struct reshuffling so that the uint64 pointer is aligned. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-03-06 08:44:38 -07:00
Jason A. Donenfeld	3c11c0308e	conn: implement RIO for fast Windows UDP sockets Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-25 15:08:08 +01:00
Jason A. Donenfeld	f9dac7099e	global: remove TODO name graffiti Googlers have a habit of graffiting their name in TODO items that then are never addressed, and other people won't go near those because they're marked territory of another animal. I've been gradually cleaning these up as I see them, but this commit just goes all the way and removes the remaining stragglers. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-23 20:00:57 +01:00
Jason A. Donenfeld	9a29ae267c	device: test up/down using virtual conn This prevents port clashing bugs. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-23 20:00:57 +01:00
Jason A. Donenfeld	6603c05a4a	device: cleanup unused test components Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-23 20:00:57 +01:00
Jason A. Donenfeld	a4f8e83d5d	conn: make binds replacable Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-23 20:00:57 +01:00
Jason A. Donenfeld	c69481f1b3	device: disable waitpool tests This code is stable, and the test is finicky, especially on high core count systems, so just disable it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-22 15:26:47 +01:00
Jason A. Donenfeld	8bf4204d2e	global: stop using ioutil Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-17 22:19:27 +01:00
Jason A. Donenfeld	4e439ea10e	conn: bump to 1.16 and get rid of NetErrClosed hack Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-16 21:05:25 +01:00
Jason A. Donenfeld	c7b7998619	device: remove old version file Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-12 17:59:50 +01:00
Jason A. Donenfeld	75e6d810ed	device: use container/list instead of open coding it This linked list implementation is awful, but maybe Go 2 will help eventually, and at least we're not open coding the hlist any more. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-10 18:19:11 +01:00
Jason A. Donenfeld	747f5440bc	device: retry Up() in up/down test We're loosing our ownership of the port when bringing the device down, which means another test process could reclaim it. Avoid this by retrying for 4 seconds. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-10 01:01:37 +01:00
Jason A. Donenfeld	484a9fd324	device: flush peer queues before starting device In case some old packets snuck in there before, this flushes before starting afresh. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-10 00:39:28 +01:00
Jason A. Donenfeld	5bf8d73127	device: create peer queues at peer creation time Rather than racing with Start(), since we're never destroying these queues, we just set the variables at creation time. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-10 00:21:12 +01:00
Jason A. Donenfeld	587a2b2a20	device: return error from Up() and Down() Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-10 00:12:23 +01:00
Jason A. Donenfeld	6f08a10041	rwcancel: add an explicit close call This lets us collect FDs even if the GC doesn't do it for us. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 20:19:14 +01:00
Jason A. Donenfeld	da32fe328b	device: handshake routine writes into encryption queue Since RoutineHandshake calls peer.SendKeepalive(), it potentially is a writer into the encryption queue, so we need to bump the wg count. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 19:26:45 +01:00
Josh Bleecher Snyder	4eab21a7b7	device: make RoutineReadFromTUN keep encryption queue alive RoutineReadFromTUN can trigger a call to SendStagedPackets. SendStagedPackets attempts to protect against sending on the encryption queue by checking peer.isRunning and device.isClosed. However, those are subject to TOCTOU bugs. If that happens, we get this: goroutine 1254 [running]: golang.zx2c4.com/wireguard/device.(Peer).SendStagedPackets(0xc000798300) .../wireguard-go/device/send.go:321 +0x125 golang.zx2c4.com/wireguard/device.(Device).RoutineReadFromTUN(0xc000014780) .../wireguard-go/device/send.go:271 +0x21c created by golang.zx2c4.com/wireguard/device.NewDevice .../wireguard-go/device/device.go:315 +0x298 Fix this with a simple, big hammer: Keep the encryption queue alive as long as it might be written to. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 09:53:00 -08:00
Josh Bleecher Snyder	78ebce6932	device: only allocate peer queues once This serves two purposes. First, it makes repeatedly stopping then starting a peer cheaper. Second, it prevents a data race observed accessing the queues. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 18:33:48 +01:00
Josh Bleecher Snyder	cae090d116	device: clarify device.state.state docs (again) Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 18:29:01 +01:00
Josh Bleecher Snyder	465261310b	device: run fewer iterations in TestUpDown The high iteration count was useful when TestUpDown was the nexus of new bugs to investigate. Now that it has stabilized, that's less valuable. And it slows down running the tests and crowds out other tests. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 18:28:59 +01:00
Josh Bleecher Snyder	d117d42ae7	device: run fewer trials in TestWaitPool when race detector enabled On a many-core machine with the race detector enabled, this test can take several minutes to complete. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 18:28:58 +01:00
Josh Bleecher Snyder	ecceaadd16	device: remove nil elem check in finalizers This is not necessary, and removing it speeds up detection of UAF bugs. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-09 18:28:55 +01:00
Jason A. Donenfeld	9e728c2eb0	device: rename unsafeRemovePeer to removePeerLocked This matches the new naming scheme of upLocked and downLocked. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 16:11:33 +01:00
Jason A. Donenfeld	eaf664e4e9	device: remove deviceStateNew It's never used and we won't have a use for it. Also, move to go-running stringer, for those without GOPATHs. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:39:19 +01:00
Jason A. Donenfeld	a816e8511e	device: fix comment typo and shorten state.mu.Lock to state.Lock Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	02138f1f81	device: fix typo in comment Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	d7bc7508e5	device: fix alignment on 32-bit machines and test for it The test previously checked the offset within a substruct, not the offset within the allocated struct, so this adds the two together. It then fixes an alignment crash on 32-bit machines. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	d6e76fdbd6	device: do not log on idempotent device state change Part of being actually idempotent is that we shouldn't penalize code that takes advantage of this property with a log splat. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	6ac1240821	device: do not attach finalizer to non-returned object Before, the code attached a finalizer to an object that wasn't returned, resulting in immediate garbage collection. Instead return the actual pointer. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	4b5d15ec2b	device: lock elem in autodraining queue before freeing Without this, we wind up freeing packets that the encryption/decryption queues still have, resulting in a UaF. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	6548a682a9	device: remove listen port race in tests Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 15:37:04 +01:00
Jason A. Donenfeld	a60e6dab76	device: generate test keys on the fly Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2021-02-09 00:42:39 +01:00
Josh Bleecher Snyder	d8dd1f254f	device: remove mutex from Peer send/receive The immediate motivation for this change is an observed deadlock. 1. A goroutine calls peer.Stop. That calls peer.queue.Lock(). 2. Another goroutine is in RoutineSequentialReceiver. It receives an elem from peer.queue.inbound. 3. The peer.Stop goroutine calls close(peer.queue.inbound), close(peer.queue.outbound), and peer.stopping.Wait(). It blocks waiting for RoutineSequentialReceiver and RoutineSequentialSender to exit. 4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets(). SendStagedPackets attempts peer.queue.RLock(). That blocks forever because the peer.Stop goroutine holds a write lock on that mutex. A background motivation for this change is that it can be expensive to have a mutex in the hot code path of RoutineSequential*. The mutex was necessary to avoid attempting to send elems on a closed channel. This commit removes that danger by never closing the channel. Instead, we send a sentinel nil value on the channel to indicate to the receiver that it should exit. The only problem with this is that if the receiver exits, we could write an elem into the channel which would never get received. If it never gets received, it cannot get returned to the device pools. To work around this, we use a finalizer. When the channel can be GC'd, the finalizer drains any remaining elements from the channel and restores them to the device pool. After that change, peer.queue.RWMutex no longer makes sense where it is. It is only used to prevent concurrent calls to Start and Stop. Move it to a more sensible location and make it a plain sync.Mutex. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 13:02:52 -08:00
Josh Bleecher Snyder	57aadfcb14	device: create channels.go We have a bunch of stupid channel tricks, and I'm about to add more. Give them their own file. This commit is 100% code movement. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 12:38:19 -08:00
Josh Bleecher Snyder	af408eb940	device: print direction when ping transit fails Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 12:01:08 -08:00
Josh Bleecher Snyder	15810daa22	device: separate timersInit from timersStart timersInit sets up the timers. It need only be done once per peer. timersStart does the work to prepare the timers for a newly running peer. It needs to be done every time a peer starts. Separate the two and call them in the appropriate places. This prevents data races on the peer's timers fields when starting and stopping peers. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 10:32:07 -08:00
Josh Bleecher Snyder	d840445e9b	device: don't track device interface state in RoutineTUNEventReader We already track this state elsewhere. No need to duplicate. The cost of calling changeState is negligible. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 10:32:07 -08:00
Josh Bleecher Snyder	675ff32e6c	device: improve MTU change handling The old code silently accepted negative MTUs. It also set MTUs above the maximum. It also had hard to follow deeply nested conditionals. Add more paranoid handling, and make the code more straight-line. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 10:32:07 -08:00
Josh Bleecher Snyder	3516ccc1e2	device: remove device.state.stopping from RoutineTUNEventReader The TUN event reader does three things: Change MTU, device up, and device down. Changing the MTU after the device is closed does no harm. Device up and device down don't make sense after the device is closed, but we can check that condition before proceeding with changeState. There's thus no reason to block device.Close on RoutineTUNEventReader exiting. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 10:32:07 -08:00
Josh Bleecher Snyder	0bcb822e5b	device: overhaul device state management This commit simplifies device state management. It creates a single unified state variable and documents its semantics. It also makes state changes more atomic. As an example of the sort of bug that occurred due to non-atomic state changes, the following sequence of events used to occur approximately every 2.5 million test runs: * RoutineTUNEventReader received an EventDown event. * It called device.Down, which called device.setUpDown. * That set device.state.changing, but did not yet attempt to lock device.state.Mutex. * Test completion called device.Close. * device.Close locked device.state.Mutex. * device.Close blocked on a call to device.state.stopping.Wait. * device.setUpDown then attempted to lock device.state.Mutex and blocked. Deadlock results. setUpDown cannot progress because device.state.Mutex is locked. Until setUpDown returns, RoutineTUNEventReader cannot call device.state.stopping.Done. Until device.state.stopping.Done gets called, device.state.stopping.Wait is blocked. As long as device.state.stopping.Wait is blocked, device.state.Mutex cannot be unlocked. This commit fixes that deadlock by holding device.state.mu when checking that the device is not closed. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 10:32:07 -08:00
Josh Bleecher Snyder	da95677203	device: remove unnecessary zeroing in peer.SendKeepalive elem.packet is always already nil. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 10:14:17 -08:00
Josh Bleecher Snyder	9c75f58f3d	device: remove device.state.stopping from RoutineHandshake It is no longer necessary. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 08:18:32 -08:00
Josh Bleecher Snyder	84a42aed63	device: remove device.state.stopping from RoutineDecryption It is no longer necessary, as of 454de6f3e64abd2a7bf9201579cd92eea5280996 (device: use channel close to shut down and drain decryption channel). Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>	2021-02-08 08:18:32 -08:00

1 2 3 4 5

204 Commits