published on

# WireGuard and the Linux Networking Subsystem

At the end of this post is going to be a crude list of the tasks I’ve accomplished during this summer and a conclusion about the GSoC et cetera, while everything before it will be about WireGuard and the Linux kernel itself, larger in scope, and my GRO research overall, explaining more stuff, so that this blog post is interesting even to the people not having an in-depth knowledge about networks or WireGuard.

This Preface looked way too serious, nah? Let’s get back to my old silly style of writing for what’s next!

# Intro to WireGuard

Yeah, I know, boring stuff but we gotta pass through it for those that don’t know.

So WireGuard, for those that can’t read its website, is a simple level 3 VPN protocol which aims to be secure, sneaky and simple. In a nutshell you exchange keys a la ssh, you get yourself in the Allowed IPs list and you just got yourself in a Cryptokey Routing Table, which allows you to connect to the servers.

What’s great with this, however, is that on top of being more efficient than older protocols it is actually much smaller, too. Take for example this Lines of Code(LoC) count from latest WireGuard git at the time of writing, with tests, compatibility hacks and crypto removed:

➜  src git:(grt/dql) ✗ cloc *
34 text files.
33 unique files.
21 files ignored.

github.com/AlDanial/cloc v 1.76  T=1.01 s (27.8 files/s, 5040.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                               14            623            298           3294
C/C++ Header                    13            133             77            574
make                             1             20              3             63
-------------------------------------------------------------------------------
SUM:                            28            776            378           3931
-------------------------------------------------------------------------------


If that isn’t directly obvious to you, in that particular case, having a VPN protocol in-kernel that small makes it have a much smaller attack surface and so contribute to security and, in our case performance, but you cannot really derive performance from a LoC count.

So yay, got through my propaganda? Now’s the real talk: While the project is fancy and all, I was contracted to fix out several TODOs to make WireGuard acceptance in mainline Linux faster and, as a matter of fact, it currently is on the Linux Kernel Mailing List and Linus Torvalds himself is pressing for its adoption so I’d say we’re pretty good so far!

Since this was a contract work and that I spent most of my summer on ONE task I’m going to talk about this one instead of going through the codebase and explain what could be better or what couldn’t because:

1. it’s easier for me
2. i’m going to get yelled at for ranting like an old man on stuff that I think could be improved while missing critical knowledge of linux to understand why this is dumb.

So this big thing that I’ve wasted almost all my summer on is called GRO, short for Generic Receive Offload. To understand what it does let’s talk about internet protocols (and bear with me for this crazy story or just skip the paragraph if you know everything):

A long time ago, there lived a messenger, and angry people wanting to talk throughout the country. After a bit of inner thinking and pondering about the feasability of teleportation, he settled that before anything he needed to identify everyone and find them, and so he gave them addresses, which he called IPs. Those were ordained so that he could find easily an house without knowing any map of the country. Now that the messenger made everyone have an IP, he could finally help out the villagers send their packages! So he got a bag in which he threw everyone’s packages, got several other messangers to deliver packages with him and split the big packages into packets. Then messengers ran as fast as they could to the address specified to deliver those packets he called UDP(the gal had a kick for weird names).
Unfortunately, UDP had issues: indeed the messengers were running so fast that sometimes they lost packets and they had no real delivery order, so packets could be received out of order, which was inadmissible for our needy citizens. Our messenger then had a brilliant idea, which was TCP: he labelled each package, to know which one to send first, and would make sure the recipient received the package, otherwise he would replicate it himself and deliver it again.
But, alas, there was yet another issue: what happened to your expansive huge Chinese vase that was broken into pieces for easier delivery? Well the deliveryman had yet another trick up his sleeve: a glue that can piece back every packet back together, lowering the work needed by the recipient; this is GRO.

Understood that weird story? No? Fine then let’s continue!
To stop out with the excentric stories GRO is very useful in our case since every small packet actually have a checksum to verify its integrity, so that the data is indeed correct. But that is expansive, computation wise and so GRO was born: we get from a theoretical complexity of $$O(n \times jitter)$$ to $$O(n+jitter)$$ where jitter is the complexity cost of calling the checksum verification function.

To go back to the original topic this would, for WireGuard, make us able to have a bunch of packets encrypted with the same peer key which would bring the performance improvements. Unfortunately, GRO in its original TCP implementation, actually concatenates packets, getting rid of their headers, which would make us unable to identify the separate payloads to decrypt, henceforth we need to dig deeper into…

# The Linux Networking Subsystem

Before continuing further I’d like you to acknowledge that WireGuard transmits everything through UDP, and so that the TCP comment I made above doesn’t really stand for WireGuard and as such more research on how GRO actually behaved inside the Linux Kernel was needed. Which also meant dealing with a bunch of dragons, which is always fun. Henceforth we begin looking at the offload definition called during interrupts, let’s begin with udp4_gro_receive for now, hmmm, the most it does is call udp_gro_receive so let’s go down the rabbit hole, this sets the same_flow var, aight aight, but what does this call_gro_receive_sk do and return? Oh, basically nothing, just sets a value or return a list, doesn’t seem to be helpful…