Skip to main content

Signal Flow

Everything RFWhisper does, in order, from antenna (or rig audio) to your decoder.

Stage-by-stage detailโ€‹

1. Audio captureโ€‹

  • Backends: PortAudio (Linux/mac), WASAPI (Windows), CoreAudio (mac native), ALSA (Linux native).
  • Frame sizes: 10โ€“30 ms native (160โ€“480 samples @ 16 kHz; 480โ€“1440 @ 48 kHz). Default 10 ms at 48 kHz for DFN3.
  • Sample format: float32 internally. Device-side conversion via the backend.
  • Thread: realtime priority where OS allows.

2. SPSC ringsโ€‹

Single-producer single-consumer lock-free ring between capture and processing, and between processing and output. Tuned to ~4 frames of headroom; larger buffers trade latency for safety against device jitter.

3. Pre-DSP (rfwhisper/dsp/)โ€‹

In order:

  1. De-emphasis (if input is NBFM) โ€” 50 ยตs / 75 ยตs curves.
  2. Polyphase resample to model native rate (48 kHz for DFN3 / RNNoise-ham). Uses liquid_msresamp. Never scipy.signal.resample in the hot path.
  3. Windowing (Hann by default; sqrt-Hann when the output chain also windows). VOLK-accelerated.
  4. STFT via liquid-dsp FFT plans, plans reused across frames.
  5. Feature prep โ€” model-specific. DFN3 ingests spectrogram mag+phase; RNNoise uses its custom 42-dim bark features (handled inside the ONNX graph).
  6. (v0.3) Adaptive narrowband notch โ€” classical LMS / gated IIR that removes stable carriers before the NN. Off by default for CW/FT8 profiles.

All preallocated. Zero Python in the per-frame critical path (we use C extensions / numpy views with pre-cast buffers).

4. ONNX inference (rfwhisper/models/)โ€‹

  • Execution provider chosen by rank: CoreML โ†’ DirectML โ†’ CUDA โ†’ XNNPACK โ†’ CPU. Override with RFWHISPER_PROVIDER=....
  • I/O tensors allocated once; bound via ONNX Runtime's IO binding API.
  • intra_op_num_threads tuned per-target (see Performance / Embedded if you're reading in the repo).
  • GIL released around OrtRun.

5. Post-DSPโ€‹

  1. iSTFT + overlap-add with matched window. Overlap-add windows must sum to 1.0 โ€” verified in unit tests.
  2. Gain match โ€” output RMS matched to input RMS on a slow AGC to avoid perceived level jumps when toggling bypass.
  3. Soft limiter โ€” catches occasional NN overshoots, never touches normal levels.

6. Output ring + deviceโ€‹

Symmetric to the input side. Drains to the virtual cable or speaker.

Per-profile variationsโ€‹

Mode profiles (YAML in rfwhisper/profiles/) tweak:

  • NN aggressiveness (blend factor 0.0โ€“1.0)
  • Notch enable + bandwidth (v0.3)
  • De-emphasis on/off
  • Attack / release envelopes for the soft limiter
  • Frame size (longer windows for FT8, shorter for CW transient preservation)
ProfileNN aggressionNotchFrameRationale
ssb0.9off10 msSpeech intelligibility
cw0.6off5 msTransient preservation is paramount
ft80.7off20 msDecoder tolerates latency; be gentler
ft40.8off10 msโ€”
fm-narrow0.9optional10 msDe-emphasis first
vhf-ssb0.85optional10 msWeak-signal-friendly

Latency measurements per stage live in Latency Budget.