Mesh vpns(like tailscale/netmaker/netbird/nebula/...) are awesome, being able to just reach my devices no matter where they are is like magic! Automatically figuring out routing, doing nat hole punching and even providing a turn server as a last resort fallback.
All that said, every single one of those services has a common flaw: they are designed for corporations.
Polycule scale software
Software designed for corporations tends to be able to make some convenient assumptions around ownership. In a corporate setting the corporation owns every device, and gets to dictate their configurations. A device is only ever going to be a member of a single organization and have one main instance of software they're using.
This is not how individuals work. As an individual I want to own my device, and still use my friends services. Ideally I should be able to do this without having to juggle several different accounts and having multiple browser profiles/domains.
Tailscale is not polycule scale
Everyone has a single network, you can technically add other devices to your network but it kinda sucks, only single network per account, firewall rules are managed by the network admin.
Yggdrasili is not a mesh VPN
Yggdrasil is quite cool. It creates an overlay network over the existing internet infrstaructure, automatically assigning every peer an unique IPv6 address derived from their private key, and creating secure encrypted connections between them.
Despite this yggdrasil is just not designed to be used as a tailscale-like mesh VPN. It's a PoC routing protocol designed to ultimately be able to replace the existing network architecture, which means it runs over TCP, doesn't setup NAT hole punching and in general isn't very focused on performance.
Introducing polynet
The pitch of polynet is simple: tailscale for polycules.
Imagine if you could connect to my laptop by just using "toothbrush.badat.dev" as the name. Writing rules based on domain names(eg. *.badat.dev can talk to port 80 on my machine) A fast network that allows you to connect to your own and your friends using friendly names, built on existing internet infrstaructure(DNS)
That's the networking dream I'd like to fulfill with polynet.
Architecture
Polynet is divided into two protocols:
-
The signaling protocol
Before peers can establish a full wireguard connection they need some sort of channel to communicate with each other. This part of the protocol is responsible for discovering peers and validating their identities. -
The connection management protocol
This is the simpler of the two. Once two peers have established the signalling channel they need to establish and manage a full wireguard connection. This part of the procol is responsible for allocating IP addresses, finding the best network path(incl potentially NAT hole punching) and reconnections
Signaling protocol
In order to establish a connection the connecting peer("top") needs some way to reach the peer being connected to("bottom").
To do this communication server on the public internet("bartender") is used.
Every peer on the network needs to have a DNS TXT record, containing their public keys and the address where their bartender is reachable. A bartender can also function as a DNS server, automatically setting up a domain for a bottom on connection. A similar dynamic approach can be used to do other stupid stuff, like making every kubernetes pod reachable via polynet.
Due to the untrustworthiness of DNS records ideally the polynet daemon should require the records to be signed with dnscrypt. This is still a subideal solution, so it might be worth considering replacing dnscrypt, and reuse the standard TLS CA ecosystem instead(although that comes with a lot of usability issues).
In order to connect the top connects to the bottom's bartender using a TLS socket and sends an intro packet. This packet is singed with the top's public key, and contains:
- the top's name
- the bottom's name
- the top's wireguard public key
the bartender validates the signature on the intro packet, and performs additional checks, such as checking the top's name against an ACL.
Note that this means the bartender server is able to know who connects to the bottom. This is unfortunately neccessary to prevent malicious actors from violating the bottom's privacy (discovering whether the bottom is online or getting their IP by getting them to perform DNS lookups against a potentially malicious DNS servers) as well as draining the battery for mobile devices.
After the bottom recieves the intro packet need to reverify the packet's validity(to avoid impersonation by malicious bartenders). When resolving the DNS record the bottom should consider some form of TOFU authentication as well as allowing keys to be specified statically specification for known peers
If the bottom accepts the connection they can reply with a signed accept packet, containing their wireguard public key as well as a public ip:port pair the top can use to reach them over wireguard. Mobile devices can use a turn server(which might be provided by the bartender) to obtain an ip:port pair their peers can use.
The connection management daemon should now be notified about the new peer.
Connection management
After an ip:port pair is obtained the two peers can establish a wireguard connection. First an IP address needs to be assigned. This is a perfect use case for the IPv6 ORCHID prefix. There is consideration to be had here about whether IP assignments should be the same across all peers(eg. by hashing the pubkey) or if they should differ between peers.
With an ipv6 address assigned firewall rules should be applied, based on the peer's hostname. Additionally a form of nat64 can be setup now if ipv4 connectivity is needed.
When the firewall rules are in place the peers can finally establish a wireguard connection.
Once a connection is established the two peers should have a daemon running accessible by other peers over the VPN connection that can be used to establish a better connection.
At it's most basic the protocol should just allow peers to inform each other of IPs they can be found at. This should be enough for simple NAT hole punching with a stun server, with extensions allowing for more advanced techniques in the future.
When multiple connections are available the daemon should monitor possible connections and automatically attempt to select the best one. Exact techniques here are going to depend on the implementation, when implementing it should be possible to reuse the logic from other mesh VPNs
Sounds neat! What now?
To be honest I don't know. Thanks for reading my infodump though.
I wrote this blog post because I've had this design rummaging in my brain for months and I needed to get it out there. Attempting to build the thing I've started constantly bikeshedding small details, over and over again, meaning I struggled to make meaningful progress, on top of simply not having enough engineering skill and time to build this by myself.
I love the idea of having a network that works like this. I really hope someone can pick this up and make something that's at least similar to this. If you do start work on this let me know. I'd love to contribute.