Relay Protocol (UDP Audio)

The relay protocol is a minimal, connection-less UDP protocol that runs on a
single port (default 5005). It carries two independent traffic directions:

  • RX direction — mixer fans out post-mix audio to relay clients
    (shiloh-relay, shiloh-web-relay, Pi relays)
  • TX direction — a broadcaster (shiloh-broadcaster) sends captured audio
    upstream to the mixer for insertion into the mix bus

Both directions use the same UDP socket on the mixer. Distinct packet tag
ranges prevent collisions.

Packet type table

Tag Name Direction Total size Description
0x01 REGISTER client → mixer 3 + name_len RX: relay client registers for audio
0x02 ACCEPT mixer → client 13 RX: session accepted, parameters follow
0x03 REJECT mixer → client 2 RX: registration refused
0x04 AUDIO mixer → client 9 + payload RX: audio data packet
0x05 PING client → mixer 5 Heartbeat (both directions)
0x06 PONG mixer → client 5 Heartbeat reply
0x07 BYE client → mixer 5 Clean session teardown (both directions)
0x10 REGISTER_TX broadcaster → mixer 4 + name_len TX: broadcaster registers for ingest
0x11 ACCEPT_TX mixer → broadcaster 15 TX: ingest accepted, slot assignment follows
0x12 REJECT_TX mixer → broadcaster 2 TX: ingest registration refused
0x13 AUDIO_TX broadcaster → mixer 10 + payload TX: audio data from broadcaster

All integers are little-endian. All audio samples are S16LE.


RX direction — relay client packets

0x01 REGISTER (client → mixer)

Sent by a relay client to announce itself and request audio.

offset  size  field
     0     1  type = 0x01
     1     1  version  (u8; current = 2, min accepted = 1)
     2     1  name_len (u8; 0..32)
     3  name_len  name (UTF-8, no NUL; informational label for logs/UI)

Minimum size: 3 bytes. Maximum size: 35 bytes.

The name is used by the mixer to look up a persisted feed assignment. On
first contact the mixer records the name with a default assignment of main.

Example (name = “pi-kitchen”, version = 2):

01 02 0a 70 69 2d 6b 69 74 63 68 65 6e
^^                                        type = REGISTER
   ^^                                     version = 2
      ^^                                  name_len = 10
         ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^  "pi-kitchen"

Response: ACCEPT (0x02) or REJECT (0x03).


0x02 ACCEPT (mixer → client)

Server response to a successful REGISTER. The client must store session_id
and echo it in every subsequent PING and BYE.

offset  size  field
     0     1  type = 0x02
     1     1  version      (u8; echoes the client's requested version)
     2     4  session_id   (u32 LE; server-assigned, random within [1, 2^31))
     6     4  sample_rate  (u32 LE; e.g. 48000)
    10     1  channels     (u8; always 2 for stereo)
    11     2  frames       (u16 LE; frames per AUDIO packet, e.g. 160 code default, 128 deployed)

Total: 13 bytes.

The client must validate incoming AUDIO packets against session_id and
silently drop any mismatches. This guards against stale packets from a prior
session arriving after reconnect.

Example (session=0xDEADBEEF, 48000 Hz, stereo, 160 frames code default; deployed is 128 = 0x0080):

02 02 EF BE AD DE 80 BB 00 00 02 A0 00
^^                                        type = ACCEPT
   ^^                                     version = 2
      ^^ ^^ ^^ ^^                         session_id = 0xDEADBEEF (LE)
                  ^^ ^^ ^^ ^^             sample_rate = 48000 (LE)
                              ^^          channels = 2
                                 ^^ ^^   frames = 160 (LE; deployed: 0x0080 for 128)

0x03 REJECT (mixer → client)

offset  size  field
     0     1  type = 0x03
     1     1  reason_code  (u8; see Rejection codes below)

Total: 2 bytes.


0x04 AUDIO (mixer → client)

Carries one packet’s worth of interleaved S16LE PCM audio.

offset  size  field
     0     1  type = 0x04
     1     4  session_id  (u32 LE; must match ACCEPT)
     5     4  seq         (u32 LE; monotonic packet counter, wraps at 2^32)
     9  varies  payload   (interleaved S16LE; frames × channels × 2 bytes)

Header: 9 bytes. Code default payload: 160 frames × 2 channels × 2 bytes = 640 bytes → total 649 bytes. Deployed config: 128 frames = 512 bytes payload → total 521 bytes — well under Ethernet MTU.

Packet rate: 48000 / 160 = 300 pps (code default); deployed: 48000 / 128 = 375 pps.
Bandwidth per client: ~1.6 Mbit/s (300 pps × 677 bytes [649 payload + 28 UDP/IP overhead] × 8 bits) at 160 frames; ~1.65 Mbit/s (375 pps × 549 bytes) at 128 frames.

Client receive rules:

  1. Drop if session_id does not match the current session.
  2. Drop if seq <= last_seq and last_seq - seq < 1_000_000 (out-of-order or duplicate). Do not attempt reordering — latency cost exceeds gain on LAN.
  3. Decode payload as interleaved S16LE: frame 0 = [L0_lo, L0_hi, R0_lo, R0_hi], frame 1 = [L1_lo, L1_hi, R1_lo, R1_hi], …
  4. Convert S16 → f32: sample_f32 = (s16 as f32) / 32768.0.
  5. Push to audio ring. On ring-full, drop oldest block and log underrun.

0x05 PING (client → mixer)

offset  size  field
     0     1  type = 0x05
     1     4  session_id  (u32 LE)

Total: 5 bytes.

Cadence:

  • shiloh-relay (Pi relay client): every 2 seconds
  • shiloh-web-relay: every 1 second

The mixer updates last_seen on receipt. If a client’s PING is absent for
more than 5 seconds the session is evicted. The AUDIO stream itself does not
reset last_seen — only PING does.


0x06 PONG (mixer → client)

offset  size  field
     0     1  type = 0x06
     1     4  session_id  (u32 LE)

Total: 5 bytes.

The mixer sends PONG in reply to every valid PING. On the relay client side,
PONG bumps the watchdog liveness timer so a session parked on the off feed
(which receives no AUDIO) does not falsely trigger a reconnect.


0x07 BYE (client → mixer)

offset  size  field
     0     1  type = 0x07
     1     4  session_id  (u32 LE)

Total: 5 bytes.

Clients MUST send BYE on clean shutdown. The mixer removes the session
immediately without waiting for the eviction timeout. Also accepted in the
TX direction — the BYE parser tries the ingest session map first, then the
relay session map.


TX direction — broadcaster packets

The TX direction uses a separate tag range (0x100x13) so the mixer’s
dispatch table can distinguish broadcaster packets from relay-client packets on
the same socket without inspecting payload.

Ingest session IDs are assigned from 0x80000000 upward, keeping them
distinct from relay session IDs (assigned from 1 upward). This prevents a
stray BYE from one namespace from evicting a session in the other.

0x10 REGISTER_TX (broadcaster → mixer)

offset  size  field
     0     1  type = 0x10
     1     1  version   (u8; current = 2)
     2     1  channels  (u8; number of channels to send, e.g. 2 or 4)
     3     1  name_len  (u8; 0..32)
     4  name_len  name  (UTF-8, no NUL; must match an entry in the mixer allow-list)

Minimum size: 4 bytes.

The name must match an entry in the mixer’s ingest allow-list
([[ingest.sender]] in the mixer TOML). If the name is unknown, the mixer
responds with REJECT_TX 0x04. If the channels count does not match the
configured channel count for that sender, the mixer responds with
REJECT_TX 0x05.

If a session already exists for this name, the mixer evicts it and drains its
ring buffers before accepting the new registration.

Response: ACCEPT_TX (0x11) or REJECT_TX (0x12).


0x11 ACCEPT_TX (mixer → broadcaster)

offset  size  field
     0     1  type = 0x11
     1     1  version      (u8; echoes client version)
     2     4  session_id   (u32 LE; high-bit set, i.e. >= 0x80000000)
     6     4  sample_rate  (u32 LE; e.g. 48000)
    10     1  channels     (u8; confirmed channel count)
    11     2  frames       (u16 LE; frames per AUDIO_TX packet the broadcaster should send)
    13     2  start_slot   (u16 LE; first ring-buffer slot index for this sender)

Total: 15 bytes.

start_slot tells the broadcaster which JACK input slot its audio will be
routed to. A sender with channels = 2 and start_slot = 4 occupies slots
4 and 5. The broadcaster must include channels in every AUDIO_TX header.


0x12 REJECT_TX (mixer → broadcaster)

offset  size  field
     0     1  type = 0x12
     1     1  reason_code  (u8; same code table as REJECT)

Total: 2 bytes.


0x13 AUDIO_TX (broadcaster → mixer)

offset  size  field
     0     1  type = 0x13
     1     4  session_id  (u32 LE; must match ACCEPT_TX)
     5     4  seq         (u32 LE; monotonic, wraps at 2^32)
     9     1  channels    (u8; must match ACCEPT_TX channels)
    10  varies  payload   (interleaved S16LE; frames × channels × 2 bytes)

Header: 10 bytes. Code default payload (160 frames stereo): 640 bytes → total 650 bytes. Deployed (128 frames): 512 bytes → total 522 bytes.

Mixer receive rules:

  1. Drop if packet is < 10 bytes or tag is not 0x13.
  2. Drop if session_id has no active ingest session entry.
  3. Drop if the peer address does not match the registered peer (prevents session hijacking after NAT rebind; real reconnects go through REGISTER_TX).
  4. Drop if channels != session.channels.
  5. Drop if seq <= last_seq and last_seq - seq < 1_000_000.
  6. Decode interleaved S16LE into f32 and push into per-slot ring buffers starting at start_slot.
  7. On ring-full for a slot, drop the frame and increment the per-slot drop counter.

The ingest timeout is 3 seconds of no AUDIO_TX. The eviction reaper runs
inside the broadcast thread’s main loop (~1 Hz effective cadence).


Rejection codes

Used in both REJECT (0x03) and REJECT_TX (0x12).

Code Constant Meaning Client action
0x01 REJECT_CAPACITY Server at maximum client count Retry with exponential backoff
0x02 REJECT_VERSION Protocol version not in [PROTO_MIN, PROTO_VERSION] Check client version; do not retry with the same version
0x03 REJECT_INTERNAL Mixer internal error Retry with backoff
0x04 REJECT_UNKNOWN_SENDER Name not in the mixer allow-list (TX only) Configuration error; do not retry
0x05 REJECT_CHANNEL_COUNT channels in REGISTER_TX does not match allow-list config (TX only) Configuration error; do not retry

Audio sample encoding

All audio on the wire is interleaved S16LE. For a stereo packet with
frames = 160 (code default):

payload byte layout (320 samples, 640 bytes):

  byte 01:   frame 0, channel 0 (L) as signed 16-bit little-endian
  byte 23:   frame 0, channel 1 (R) as signed 16-bit little-endian
  byte 45:   frame 1, channel 0 (L)
  byte 67:   frame 1, channel 1 (R)
  ...
  byte 636637: frame 159, channel 0 (L)
  byte 638639: frame 159, channel 1 (R)

For frames = 128 (deployed config):

payload byte layout (256 samples, 512 bytes):

  byte 01:   frame 0, channel 0 (L)
  byte 23:   frame 0, channel 1 (R)
  ...
  byte 508509: frame 127, channel 0 (L)
  byte 510511: frame 127, channel 1 (R)

For a 4-channel TX sender (channels = 4):

  byte 01:   frame 0, channel 0
  byte 23:   frame 0, channel 1
  byte 45:   frame 0, channel 2
  byte 67:   frame 0, channel 3
  byte 89:   frame 1, channel 0
  ...

Conversion functions (from protocol/src/lib.rs):

// f32 [-1.0, 1.0] → S16 (clamps, symmetric at ±32767)
fn f32_to_s16(sample: f32) -> i16 {
    (sample.clamp(-1.0, 1.0) * 32767.0) as i16
}

// S16 → f32
fn s16_to_f32(sample: i16) -> f32 {
    (sample as f32) * (1.0 / 32768.0)
}

Handshake flow diagrams

RX direction (relay client)

Relay Client            Mixer (:5005)
      |   REGISTER        |
      |------------------>|  (announce name, version)
      |   ACCEPT          |
      |<------------------|  (session_id, sample_rate, channels, frames)
      |                   |
      |  [control plane]  |
      | set_relay_assignment_default  -->  Mixer (:19997)
      |                   |
      |   AUDIO (loop)    |
      |<------------------|  (seq, session_id, S16LE payload)
      |   AUDIO           |
      |<------------------|
      |   ...             |
      |   PING (every 2s) |
      |------------------>|
      |   PONG            |
      |<------------------|
      |   BYE             |
      |------------------>|  (on clean shutdown)

The control-plane set_relay_assignment_default is sent by shiloh-web-relay
immediately after ACCEPT to pin the session to a feed. shiloh-relay (Pi
relay) sends this as well with its configured default feed.

TX direction (broadcaster)

Broadcaster             Mixer (:5005)
      |   REGISTER_TX     |
      |------------------>|  (name, channels, version)
      |   ACCEPT_TX       |
      |<------------------|  (session_id, sample_rate, channels, frames, start_slot)
      |                   |
      |   AUDIO_TX (loop) |
      |------------------>|  (seq, session_id, channels, S16LE payload)
      |   AUDIO_TX        |
      |------------------>|
      |   ...             |
      |   PING (every 1s) |
      |------------------>|
      |   PONG            |
      |<------------------|  (mixer replies)
      |   BYE             |
      |------------------>|  (on clean shutdown)

The broadcaster sends PING every 1 second on a dedicated heartbeat thread.
The mixer’s ingest watchdog evicts sessions after 3 seconds of no AUDIO_TX.

RX handshake failure (REJECT → retry)

Relay Client            Mixer
      |   REGISTER        |
      |------------------>|
      |   REJECT (0x01)   |  (e.g. at capacity)
      |<------------------|
      |  [wait 1s]        |
      |   REGISTER        |
      |------------------>|
      |  [wait 2s on fail]|
      |   ...             |
      |   REGISTER        |  (up to 5 attempts)
      |------------------>|

Heartbeat and eviction

Role Cadence Eviction timeout Reaper
RX relay client PING 1–2 s 5 s since last PING ~1 Hz (broadcast thread)
TX ingest session N/A (evicted on no AUDIO_TX) 3 s since last AUDIO_TX same loop

The broadcast thread runs a non-blocking retain over clients every loop
iteration (approximately every 1 ms when there is audio to send). Eviction is
logged with the session id and name. After eviction, the mixer stops sending
AUDIO to that address. A client that is evicted must start a fresh REGISTER
— the old session id is gone.


Reconnect behavior

RX relay client (shiloh-relay, shiloh-web-relay)

Reconnect is triggered by:

  • No AUDIO received for > 3 seconds (watchdog thread)
  • Socket error during recv or send
  • Voluntary BYE on clean shutdown

Backoff:

  1. Session error → sleep 1 s, retry
  2. Each consecutive failure within 10 s of session start: double backoff, capped at 30 s
  3. If a session runs for ≥ 10 s before failing: reset backoff to 1 s

TX broadcaster (shiloh-broadcaster)

Reconnect is triggered by:

  • REGISTER_TX fails 5 times (outer loop retries)
  • send of AUDIO_TX succeeds after a period of failures (indicates mixer
    restarted; session_id is stale; tx_loop returns an error to force re-register)

Backoff inside REGISTER_TX attempts: 1 s → 2 s → 4 s, capped at 5 s.
The outer reconnect loop in main handles repeated registration failure.


Capacity

Limit Value Source
Max relay clients (RX) 16 (default) Compile-time / config
Max ingest senders (TX) ingest.slot_count / sender channel counts Config
Max client name length 32 bytes Wire format (name_len u8, enforced at 32)
Session id range (RX) 1 … 2^31 - 1 next_session wraps at 1
Session id range (TX) 0x80000000 … 2^32 - 1 INGEST_SESSION_BASE

Bandwidth reference

At 48 kHz / stereo / S16:

Code default (160 frames)

Metric Value
Packet rate 300 pps
RX AUDIO packet (IP payload) 649 bytes
TX AUDIO_TX packet (IP payload) 650 bytes
UDP/IP header overhead 28 bytes
Per-client bandwidth (RX) ~1.6 Mbit/s
8 RX clients ~13 Mbit/s (trivial on LAN)

Deployed config (128 frames)

Metric Value
Packet rate 375 pps
RX AUDIO packet (IP payload) 521 bytes
TX AUDIO_TX packet (IP payload) 522 bytes
UDP/IP header overhead 28 bytes
Per-client bandwidth (RX) ~1.65 Mbit/s
8 RX clients ~13.2 Mbit/s (trivial on LAN)