Relay Protocol (UDP Audio)
The relay protocol is a minimal, connection-less UDP protocol that runs on a
single port (default 5005). It carries two independent traffic directions:
- RX direction — mixer fans out post-mix audio to relay clients
(shiloh-relay,shiloh-web-relay, Pi relays) - TX direction — a broadcaster (
shiloh-broadcaster) sends captured audio
upstream to the mixer for insertion into the mix bus
Both directions use the same UDP socket on the mixer. Distinct packet tag
ranges prevent collisions.
Packet type table
| Tag | Name | Direction | Total size | Description |
|---|---|---|---|---|
0x01 |
REGISTER | client → mixer | 3 + name_len | RX: relay client registers for audio |
0x02 |
ACCEPT | mixer → client | 13 | RX: session accepted, parameters follow |
0x03 |
REJECT | mixer → client | 2 | RX: registration refused |
0x04 |
AUDIO | mixer → client | 9 + payload | RX: audio data packet |
0x05 |
PING | client → mixer | 5 | Heartbeat (both directions) |
0x06 |
PONG | mixer → client | 5 | Heartbeat reply |
0x07 |
BYE | client → mixer | 5 | Clean session teardown (both directions) |
0x10 |
REGISTER_TX | broadcaster → mixer | 4 + name_len | TX: broadcaster registers for ingest |
0x11 |
ACCEPT_TX | mixer → broadcaster | 15 | TX: ingest accepted, slot assignment follows |
0x12 |
REJECT_TX | mixer → broadcaster | 2 | TX: ingest registration refused |
0x13 |
AUDIO_TX | broadcaster → mixer | 10 + payload | TX: audio data from broadcaster |
All integers are little-endian. All audio samples are S16LE.
RX direction — relay client packets
0x01 REGISTER (client → mixer)
Sent by a relay client to announce itself and request audio.
offset size field
0 1 type = 0x01
1 1 version (u8; current = 2, min accepted = 1)
2 1 name_len (u8; 0..32)
3 name_len name (UTF-8, no NUL; informational label for logs/UI)
Minimum size: 3 bytes. Maximum size: 35 bytes.
The name is used by the mixer to look up a persisted feed assignment. On
first contact the mixer records the name with a default assignment of main.
Example (name = “pi-kitchen”, version = 2):
01 02 0a 70 69 2d 6b 69 74 63 68 65 6e
^^ type = REGISTER
^^ version = 2
^^ name_len = 10
^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ "pi-kitchen"
Response: ACCEPT (0x02) or REJECT (0x03).
0x02 ACCEPT (mixer → client)
Server response to a successful REGISTER. The client must store session_id
and echo it in every subsequent PING and BYE.
offset size field
0 1 type = 0x02
1 1 version (u8; echoes the client's requested version)
2 4 session_id (u32 LE; server-assigned, random within [1, 2^31))
6 4 sample_rate (u32 LE; e.g. 48000)
10 1 channels (u8; always 2 for stereo)
11 2 frames (u16 LE; frames per AUDIO packet, e.g. 160 code default, 128 deployed)
Total: 13 bytes.
The client must validate incoming AUDIO packets against session_id and
silently drop any mismatches. This guards against stale packets from a prior
session arriving after reconnect.
Example (session=0xDEADBEEF, 48000 Hz, stereo, 160 frames code default; deployed is 128 = 0x0080):
02 02 EF BE AD DE 80 BB 00 00 02 A0 00
^^ type = ACCEPT
^^ version = 2
^^ ^^ ^^ ^^ session_id = 0xDEADBEEF (LE)
^^ ^^ ^^ ^^ sample_rate = 48000 (LE)
^^ channels = 2
^^ ^^ frames = 160 (LE; deployed: 0x0080 for 128)
0x03 REJECT (mixer → client)
offset size field
0 1 type = 0x03
1 1 reason_code (u8; see Rejection codes below)
Total: 2 bytes.
0x04 AUDIO (mixer → client)
Carries one packet’s worth of interleaved S16LE PCM audio.
offset size field
0 1 type = 0x04
1 4 session_id (u32 LE; must match ACCEPT)
5 4 seq (u32 LE; monotonic packet counter, wraps at 2^32)
9 varies payload (interleaved S16LE; frames × channels × 2 bytes)
Header: 9 bytes. Code default payload: 160 frames × 2 channels × 2 bytes = 640 bytes → total 649 bytes. Deployed config: 128 frames = 512 bytes payload → total 521 bytes — well under Ethernet MTU.
Packet rate: 48000 / 160 = 300 pps (code default); deployed: 48000 / 128 = 375 pps.
Bandwidth per client: ~1.6 Mbit/s (300 pps × 677 bytes [649 payload + 28 UDP/IP overhead] × 8 bits) at 160 frames; ~1.65 Mbit/s (375 pps × 549 bytes) at 128 frames.
Client receive rules:
- Drop if
session_iddoes not match the current session. - Drop if
seq <= last_seqandlast_seq - seq < 1_000_000(out-of-order or duplicate). Do not attempt reordering — latency cost exceeds gain on LAN. - Decode payload as interleaved S16LE: frame 0 = [L0_lo, L0_hi, R0_lo, R0_hi], frame 1 = [L1_lo, L1_hi, R1_lo, R1_hi], …
- Convert S16 → f32:
sample_f32 = (s16 as f32) / 32768.0. - Push to audio ring. On ring-full, drop oldest block and log underrun.
0x05 PING (client → mixer)
offset size field
0 1 type = 0x05
1 4 session_id (u32 LE)
Total: 5 bytes.
Cadence:
shiloh-relay(Pi relay client): every 2 secondsshiloh-web-relay: every 1 second
The mixer updates last_seen on receipt. If a client’s PING is absent for
more than 5 seconds the session is evicted. The AUDIO stream itself does not
reset last_seen — only PING does.
0x06 PONG (mixer → client)
offset size field
0 1 type = 0x06
1 4 session_id (u32 LE)
Total: 5 bytes.
The mixer sends PONG in reply to every valid PING. On the relay client side,
PONG bumps the watchdog liveness timer so a session parked on the off feed
(which receives no AUDIO) does not falsely trigger a reconnect.
0x07 BYE (client → mixer)
offset size field
0 1 type = 0x07
1 4 session_id (u32 LE)
Total: 5 bytes.
Clients MUST send BYE on clean shutdown. The mixer removes the session
immediately without waiting for the eviction timeout. Also accepted in the
TX direction — the BYE parser tries the ingest session map first, then the
relay session map.
TX direction — broadcaster packets
The TX direction uses a separate tag range (0x10–0x13) so the mixer’s
dispatch table can distinguish broadcaster packets from relay-client packets on
the same socket without inspecting payload.
Ingest session IDs are assigned from 0x80000000 upward, keeping them
distinct from relay session IDs (assigned from 1 upward). This prevents a
stray BYE from one namespace from evicting a session in the other.
0x10 REGISTER_TX (broadcaster → mixer)
offset size field
0 1 type = 0x10
1 1 version (u8; current = 2)
2 1 channels (u8; number of channels to send, e.g. 2 or 4)
3 1 name_len (u8; 0..32)
4 name_len name (UTF-8, no NUL; must match an entry in the mixer allow-list)
Minimum size: 4 bytes.
The name must match an entry in the mixer’s ingest allow-list
([[ingest.sender]] in the mixer TOML). If the name is unknown, the mixer
responds with REJECT_TX 0x04. If the channels count does not match the
configured channel count for that sender, the mixer responds with
REJECT_TX 0x05.
If a session already exists for this name, the mixer evicts it and drains its
ring buffers before accepting the new registration.
Response: ACCEPT_TX (0x11) or REJECT_TX (0x12).
0x11 ACCEPT_TX (mixer → broadcaster)
offset size field
0 1 type = 0x11
1 1 version (u8; echoes client version)
2 4 session_id (u32 LE; high-bit set, i.e. >= 0x80000000)
6 4 sample_rate (u32 LE; e.g. 48000)
10 1 channels (u8; confirmed channel count)
11 2 frames (u16 LE; frames per AUDIO_TX packet the broadcaster should send)
13 2 start_slot (u16 LE; first ring-buffer slot index for this sender)
Total: 15 bytes.
start_slot tells the broadcaster which JACK input slot its audio will be
routed to. A sender with channels = 2 and start_slot = 4 occupies slots
4 and 5. The broadcaster must include channels in every AUDIO_TX header.
0x12 REJECT_TX (mixer → broadcaster)
offset size field
0 1 type = 0x12
1 1 reason_code (u8; same code table as REJECT)
Total: 2 bytes.
0x13 AUDIO_TX (broadcaster → mixer)
offset size field
0 1 type = 0x13
1 4 session_id (u32 LE; must match ACCEPT_TX)
5 4 seq (u32 LE; monotonic, wraps at 2^32)
9 1 channels (u8; must match ACCEPT_TX channels)
10 varies payload (interleaved S16LE; frames × channels × 2 bytes)
Header: 10 bytes. Code default payload (160 frames stereo): 640 bytes → total 650 bytes. Deployed (128 frames): 512 bytes → total 522 bytes.
Mixer receive rules:
- Drop if packet is < 10 bytes or tag is not
0x13. - Drop if
session_idhas no active ingest session entry. - Drop if the peer address does not match the registered peer (prevents session hijacking after NAT rebind; real reconnects go through REGISTER_TX).
- Drop if
channels != session.channels. - Drop if
seq <= last_seqandlast_seq - seq < 1_000_000. - Decode interleaved S16LE into f32 and push into per-slot ring buffers starting at
start_slot. - On ring-full for a slot, drop the frame and increment the per-slot drop counter.
The ingest timeout is 3 seconds of no AUDIO_TX. The eviction reaper runs
inside the broadcast thread’s main loop (~1 Hz effective cadence).
Rejection codes
Used in both REJECT (0x03) and REJECT_TX (0x12).
| Code | Constant | Meaning | Client action |
|---|---|---|---|
0x01 |
REJECT_CAPACITY |
Server at maximum client count | Retry with exponential backoff |
0x02 |
REJECT_VERSION |
Protocol version not in [PROTO_MIN, PROTO_VERSION] |
Check client version; do not retry with the same version |
0x03 |
REJECT_INTERNAL |
Mixer internal error | Retry with backoff |
0x04 |
REJECT_UNKNOWN_SENDER |
Name not in the mixer allow-list (TX only) | Configuration error; do not retry |
0x05 |
REJECT_CHANNEL_COUNT |
channels in REGISTER_TX does not match allow-list config (TX only) |
Configuration error; do not retry |
Audio sample encoding
All audio on the wire is interleaved S16LE. For a stereo packet with
frames = 160 (code default):
payload byte layout (320 samples, 640 bytes):
byte 0–1: frame 0, channel 0 (L) as signed 16-bit little-endian
byte 2–3: frame 0, channel 1 (R) as signed 16-bit little-endian
byte 4–5: frame 1, channel 0 (L)
byte 6–7: frame 1, channel 1 (R)
...
byte 636–637: frame 159, channel 0 (L)
byte 638–639: frame 159, channel 1 (R)
For frames = 128 (deployed config):
payload byte layout (256 samples, 512 bytes):
byte 0–1: frame 0, channel 0 (L)
byte 2–3: frame 0, channel 1 (R)
...
byte 508–509: frame 127, channel 0 (L)
byte 510–511: frame 127, channel 1 (R)
For a 4-channel TX sender (channels = 4):
byte 0–1: frame 0, channel 0
byte 2–3: frame 0, channel 1
byte 4–5: frame 0, channel 2
byte 6–7: frame 0, channel 3
byte 8–9: frame 1, channel 0
...
Conversion functions (from protocol/src/lib.rs):
// f32 [-1.0, 1.0] → S16 (clamps, symmetric at ±32767)
fn f32_to_s16(sample: f32) -> i16 {
(sample.clamp(-1.0, 1.0) * 32767.0) as i16
}
// S16 → f32
fn s16_to_f32(sample: i16) -> f32 {
(sample as f32) * (1.0 / 32768.0)
}
Handshake flow diagrams
RX direction (relay client)
Relay Client Mixer (:5005)
| REGISTER |
|------------------>| (announce name, version)
| ACCEPT |
|<------------------| (session_id, sample_rate, channels, frames)
| |
| [control plane] |
| set_relay_assignment_default --> Mixer (:19997)
| |
| AUDIO (loop) |
|<------------------| (seq, session_id, S16LE payload)
| AUDIO |
|<------------------|
| ... |
| PING (every 2s) |
|------------------>|
| PONG |
|<------------------|
| BYE |
|------------------>| (on clean shutdown)
The control-plane set_relay_assignment_default is sent by shiloh-web-relay
immediately after ACCEPT to pin the session to a feed. shiloh-relay (Pi
relay) sends this as well with its configured default feed.
TX direction (broadcaster)
Broadcaster Mixer (:5005)
| REGISTER_TX |
|------------------>| (name, channels, version)
| ACCEPT_TX |
|<------------------| (session_id, sample_rate, channels, frames, start_slot)
| |
| AUDIO_TX (loop) |
|------------------>| (seq, session_id, channels, S16LE payload)
| AUDIO_TX |
|------------------>|
| ... |
| PING (every 1s) |
|------------------>|
| PONG |
|<------------------| (mixer replies)
| BYE |
|------------------>| (on clean shutdown)
The broadcaster sends PING every 1 second on a dedicated heartbeat thread.
The mixer’s ingest watchdog evicts sessions after 3 seconds of no AUDIO_TX.
RX handshake failure (REJECT → retry)
Relay Client Mixer
| REGISTER |
|------------------>|
| REJECT (0x01) | (e.g. at capacity)
|<------------------|
| [wait 1s] |
| REGISTER |
|------------------>|
| [wait 2s on fail]|
| ... |
| REGISTER | (up to 5 attempts)
|------------------>|
Heartbeat and eviction
| Role | Cadence | Eviction timeout | Reaper |
|---|---|---|---|
| RX relay client PING | 1–2 s | 5 s since last PING | ~1 Hz (broadcast thread) |
| TX ingest session | N/A (evicted on no AUDIO_TX) | 3 s since last AUDIO_TX | same loop |
The broadcast thread runs a non-blocking retain over clients every loop
iteration (approximately every 1 ms when there is audio to send). Eviction is
logged with the session id and name. After eviction, the mixer stops sending
AUDIO to that address. A client that is evicted must start a fresh REGISTER
— the old session id is gone.
Reconnect behavior
RX relay client (shiloh-relay, shiloh-web-relay)
Reconnect is triggered by:
- No AUDIO received for > 3 seconds (watchdog thread)
- Socket error during recv or send
- Voluntary BYE on clean shutdown
Backoff:
- Session error → sleep 1 s, retry
- Each consecutive failure within 10 s of session start: double backoff, capped at 30 s
- If a session runs for ≥ 10 s before failing: reset backoff to 1 s
TX broadcaster (shiloh-broadcaster)
Reconnect is triggered by:
- REGISTER_TX fails 5 times (outer loop retries)
sendof AUDIO_TX succeeds after a period of failures (indicates mixer
restarted; session_id is stale;tx_loopreturns an error to force re-register)
Backoff inside REGISTER_TX attempts: 1 s → 2 s → 4 s, capped at 5 s.
The outer reconnect loop in main handles repeated registration failure.
Capacity
| Limit | Value | Source |
|---|---|---|
| Max relay clients (RX) | 16 (default) | Compile-time / config |
| Max ingest senders (TX) | ingest.slot_count / sender channel counts |
Config |
| Max client name length | 32 bytes | Wire format (name_len u8, enforced at 32) |
| Session id range (RX) | 1 … 2^31 - 1 | next_session wraps at 1 |
| Session id range (TX) | 0x80000000 … 2^32 - 1 | INGEST_SESSION_BASE |
Bandwidth reference
At 48 kHz / stereo / S16:
Code default (160 frames)
| Metric | Value |
|---|---|
| Packet rate | 300 pps |
| RX AUDIO packet (IP payload) | 649 bytes |
| TX AUDIO_TX packet (IP payload) | 650 bytes |
| UDP/IP header overhead | 28 bytes |
| Per-client bandwidth (RX) | ~1.6 Mbit/s |
| 8 RX clients | ~13 Mbit/s (trivial on LAN) |
Deployed config (128 frames)
| Metric | Value |
|---|---|
| Packet rate | 375 pps |
| RX AUDIO packet (IP payload) | 521 bytes |
| TX AUDIO_TX packet (IP payload) | 522 bytes |
| UDP/IP header overhead | 28 bytes |
| Per-client bandwidth (RX) | ~1.65 Mbit/s |
| 8 RX clients | ~13.2 Mbit/s (trivial on LAN) |