Sun, 02 Oct 2022 UTC

# UDP tee

## The problem

I had to migrate an old logging system to a new logging system, where one logging source was emitting syslog over UDP. During the migration we needed to keep the old system in place, and in parallel set up new system for validation and UAT before finally cutting over. The simplest solution I could think up was to duplicate the UDP syslog packets, and send them to both logging systems simultaneously. In other words, “UDP tee”.

It was a simple concept but the implementation was trickier. This being 2020 the whole system (both logs source and sinks) was running on Kubernetes. We did not have access to dedicated network hardware, or a service mesh, to do any kind of port or packet mirroring.

Ideally we would be able to run a regular Kubernetes service which would take a single source of UDP packets and duplicate it to multiple sinks. So we needed to run this in a container, which means userspace only - no kernel-level netfilter tricks.

## A poor solution

Google doesn’t have great results for “udp tee”, and the top few userspace solutions seem to use combinations of tee and socat similar to this:

socat - udp-recvfrom:$port,fork | tee >(socat - udp-sendto:$dest1) >(socat - udp-sendto:$dest2)

I created a logs-tee container with this solution for testing, however it had significant problems.

### Stream conversion

By piping the UDP data from one socat process to another, the incoming UDP datagrams are converted to a stream (the pipe) before being converted again back into datagrams to be emitted to the two destinations. The problem with doing this is that datagrams have boundaries which are lost on the conversion to a stream. socat will buffer up to 8192 bytes by default, so that will be the maximum size of the UDP datagrams emitted as per the man page.

… At most <size> bytes are transferred per step. Default is 8192 bytes.

This is a problem for RFC 5426 § 3.1, which states:

Each syslog UDP datagram MUST contain only one syslog message, which MAY be complete or truncated. … Additional data MUST NOT be present in the datagram payload.

In this pcap of packets egressing logs-tee, the first packet to each destination is the goben control packet, and the following packets are throughput testing data. By default goben uses 64000-byte datagrams, so you can see how badly chopped up (and irregular) these are.

No.  Time      Source      Destination  Protocol  Length  Info            
1    0.000000  172.23.0.2  172.23.0.3   UDP       287     56565  →  8080  Len=245
2    0.000003  172.23.0.2  172.23.0.4   UDP       287     56565  →  8080  Len=245
3    0.000224  172.23.0.2  172.23.0.3   UDP       4138    56565  →  8080  Len=4096
4    0.000252  172.23.0.2  172.23.0.4   UDP       8234    56565  →  8080  Len=8192
5    0.000271  172.23.0.2  172.23.0.3   UDP       4138    56565  →  8080  Len=4096
6    0.005674  172.23.0.2  172.23.0.4   UDP       7210    56565  →  8080  Len=7168
7    0.005676  172.23.0.2  172.23.0.3   UDP       7210    56565  →  8080  Len=7168
8    0.005697  172.23.0.2  172.23.0.4   UDP       1066    56565  →  8080  Len=1024
9    0.005704  172.23.0.2  172.23.0.3   UDP       1066    56565  →  8080  Len=1024
10   0.011586  172.23.0.2  172.23.0.4   UDP       8234    56565  →  8080  Len=8192
11   0.011602  172.23.0.2  172.23.0.3   UDP       8234    56565  →  8080  Len=8192
12   0.016175  172.23.0.2  172.23.0.3   UDP       5162    56565  →  8080  Len=5120
13   0.016199  172.23.0.2  172.23.0.4   UDP       8234    56565  →  8080  Len=8192
14   0.016214  172.23.0.2  172.23.0.3   UDP       3114    56565  →  8080  Len=3072
15   0.021850  172.23.0.2  172.23.0.3   UDP       8234    56565  →  8080  Len=8192
16   0.021856  172.23.0.2  172.23.0.4   UDP       8234    56565  →  8080  Len=8192
17   0.026482  172.23.0.2  172.23.0.3   UDP       8234    56565  →  8080  Len=8192
18   0.026502  172.23.0.2  172.23.0.4   UDP       8234    56565  →  8080  Len=8192
19   0.032363  172.23.0.2  172.23.0.4   UDP       8234    56565  →  8080  Len=8192
20   0.032377  172.23.0.2  172.23.0.3   UDP       8234    56565  →  8080  Len=8192

Indeed the logs sink did not correctly parse logs using this solution.

### Buffers and packet loss

By piping to tee there were now multiple buffers that could fill:

If any of these buffers filled, it would stall the pipeline and cause incoming packets to be dropped. And indeed we saw significant packet loss using this solution.

### Performance

Trying to benchmark this system using goben didn’t even work - the server just hung without printing any statistics.

## A better solution

It’s clear we need to keep datagram boundaries. One way to do that would be to avoid transforming the flow of logs from datagrams to a stream. If we write to the localhost broadcast address we can then have multiple listeners, and keep the datagram boundaries! I created a container with two socat processes:

Process 1 reads datagrams off a regular UDP port, and sends them to 127.255.255.255 (localhost broadcast).

# 1111 listen port, and 9999 broadcast port are both arbitrary
socat -u "udp-recvfrom:1111,fork" udp-sendto:127.255.255.255:9999,broadcast

Process 2 reads datagrams from the broadcast, and sends them to a UDP unicast address.

# 192.168.77.66 is an arbitrary address
socat -u udp-recvfrom:9999,reuseaddr,fork udp-sendto:192.168.77.66

Any number of processes can receive the broadcast and forward packets to a different address. So process 2 can be duplicated any number of times for an arbitrary number of endpoints.

### Datagram retention

The only things which could overflow here are UDP socket buffers. And the datagram boundaries are retained as expected:

No.  Time      Source      Destination  Protocol  Length  Info            
1    0.000000  172.23.0.2  172.23.0.3   UDP       287     56565  →  8080  Len=245
2    0.000195  172.23.0.2  172.23.0.4   UDP       287     56565  →  8080  Len=245
3    0.000860  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
4    0.000913  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
5    0.005504  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
6    0.005515  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
7    0.014920  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
8    0.015292  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
9    0.021584  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
10   0.021698  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
11   0.025073  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
12   0.025485  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
13   0.031647  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
14   0.031682  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
15   0.034086  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
16   0.034146  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
17   0.038558  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
18   0.038708  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000
19   0.043166  172.23.0.2  172.23.0.3   UDP       64042   56565  →  8080  Len=64000
20   0.043350  172.23.0.2  172.23.0.4   UDP       64042   56565  →  8080  Len=64000

### Performance

I wrapped this up in a container and ran goben over this system. With one client and two servers I was able to get 100 Mbit/s to both servers without packet loss. 100 Mbit/s is plenty of performance for my use-case, plus this service is perfectly horizontally scalable. I can just replicate the pod a few times if packet loss becomes a problem!

goben client:

$ docker run  -it --rm --name=goben-client --network iperf goben -tls=false -passiveServer -udp -hosts logs-tee -maxSpeed 100
2022/09/23 15:11:32 goben version 0.6 runtime go1.18beta1 GOMAXPROCS=8 OS=linux arch=amd64
2022/09/23 15:11:32 connections=1 defaultPort=:8080 listeners=[":8080"] hosts=["logs-tee"]
2022/09/23 15:11:32 reportInterval=2s totalDuration=10s
2022/09/23 15:11:32 client mode, udp protocol
2022/09/23 15:11:32 open: opening TLS=false udp 0/1: logs-tee:8080
2022/09/23 15:11:32 handleConnectionClient: starting TCP 0/1 172.23.0.4:8080
2022/09/23 15:11:32 handleConnectionClient: options sent: {2s 10s 1000000 1000000 64000 64000 true 100 map[]}
2022/09/23 15:11:32 clientWriter: starting: 0/1 172.23.0.4:8080
2022/09/23 15:11:32 clientReader: starting: 0/1 172.23.0.4:8080
2022/09/23 15:11:34 0/1  report   clientWriter rate:    100 Mbps    195 snd/s
2022/09/23 15:11:36 0/1  report   clientWriter rate:    100 Mbps    195 snd/s
2022/09/23 15:11:38 0/1  report   clientWriter rate:    100 Mbps    195 snd/s
2022/09/23 15:11:40 0/1  report   clientWriter rate:    100 Mbps    195 snd/s
2022/09/23 15:11:42 handleConnectionClient: 10s timer
2022/09/23 15:11:42 workLoop: 0/1 clientReader: read udp 172.23.0.5:50372->172.23.0.4:8080: use of closed network connection
2022/09/23 15:11:42 0/1 average   clientReader rate:      0 Mbps      0 rcv/s
2022/09/23 15:11:42 clientReader: exiting: 0/1 172.23.0.4:8080
2022/09/23 15:11:42 workLoop: 0/1 clientWriter: write udp 172.23.0.5:50372->172.23.0.4:8080: use of closed network connection
2022/09/23 15:11:42 0/1 average   clientWriter rate:    100 Mbps    195 snd/s
2022/09/23 15:11:42 clientWriter: exiting: 0/1 172.23.0.4:8080
2022/09/23 15:11:42 172.23.0.4:8080 output:
 100 ┤                                                                 ╭─── 
 100 ┤                                                             ╭───╯    
 100 ┤                                                        ╭────╯        
 100 ┤                                                   ╭────╯             
 100 ┤                                              ╭────╯                  
 100 ┤                                          ╭───╯                       
 100 ┤                                     ╭────╯                           
 100 ┤                                 ╭───╯                                
 100 ┤                            ╭────╯                                    
 100 ┤                       ╭────╯                                         
 100 ┼───────────────────────╯                                              
        Output Mbps: 172.23.0.4:8080 Connection 0
2022/09/23 15:11:42 handleConnectionClient: closing: 0/1 172.23.0.4:8080
2022/09/23 15:11:42 aggregate reading: 0 Mbps 0 recv/s
2022/09/23 15:11:42 aggregate writing: 100 Mbps 195 send/s

goben server:

$ docker run  -it --rm --name=goben-server-0 --network iperf goben -passiveServer -tls=false
2022/09/23 15:14:33 goben version 0.6 runtime go1.18beta1 GOMAXPROCS=8 OS=linux arch=amd64
2022/09/23 15:14:33 connections=1 defaultPort=:8080 listeners=[":8080"] hosts=[]
2022/09/23 15:14:33 reportInterval=2s totalDuration=10s
2022/09/23 15:14:33 server mode (use -hosts to switch to client mode)
2022/09/23 15:14:33 listenTCP: TLS=false spawning TCP listener: :8080
2022/09/23 15:14:33 serve: spawning UDP listener: :8080
2022/09/23 15:14:38 handleUDP: incoming: 172.23.0.4:56565
2022/09/23 15:14:38 handleUDP: options received: {2s 10s 1000000 1000000 64000 64000 true 100 map[]}
2022/09/23 15:14:40 0/0  report      handleUDP rate:    100 Mbps    195 rcv/s
2022/09/23 15:14:42 0/0  report      handleUDP rate:    100 Mbps    195 rcv/s
2022/09/23 15:14:44 0/0  report      handleUDP rate:    100 Mbps    195 rcv/s
2022/09/23 15:14:46 0/0  report      handleUDP rate:    100 Mbps    195 rcv/s
2022/09/23 15:14:48 handleUDP: total duration 10s timer: 172.23.0.4:56565
2022/09/23 15:14:48 0/0 average      handleUDP rate:     99 Mbps    195 rcv/s

Here’s a trivial Dockerfile for goben, and a Dockerfile and entrypoint.sh for a logs-tee service.

Note that goben relies on the 5-tuple for identifying a client. For that reason, when benchmarking, the entrypoint.sh binds to a socket using sourceport=56565,reuseaddr for logs-tee egress so that goben recognises logs-tee as a single client. For the actual logging use-case those options are not required and we can rely on ephemeral ports.

tags : network bash k8s container