Signal Flow
Everything RFWhisper does, in order, from antenna (or rig audio) to your decoder.
Stage-by-stage detailโ
1. Audio captureโ
- Backends: PortAudio (Linux/mac), WASAPI (Windows), CoreAudio (mac native), ALSA (Linux native).
- Frame sizes: 10โ30 ms native (160โ480 samples @ 16 kHz; 480โ1440 @ 48 kHz). Default 10 ms at 48 kHz for DFN3.
- Sample format:
float32internally. Device-side conversion via the backend. - Thread: realtime priority where OS allows.
2. SPSC ringsโ
Single-producer single-consumer lock-free ring between capture and processing, and between processing and output. Tuned to ~4 frames of headroom; larger buffers trade latency for safety against device jitter.
3. Pre-DSP (rfwhisper/dsp/)โ
In order:
- De-emphasis (if input is NBFM) โ 50 ยตs / 75 ยตs curves.
- Polyphase resample to model native rate (48 kHz for DFN3 / RNNoise-ham). Uses
liquid_msresamp. Neverscipy.signal.resamplein the hot path. - Windowing (Hann by default; sqrt-Hann when the output chain also windows). VOLK-accelerated.
- STFT via liquid-dsp FFT plans, plans reused across frames.
- Feature prep โ model-specific. DFN3 ingests spectrogram mag+phase; RNNoise uses its custom 42-dim bark features (handled inside the ONNX graph).
- (v0.3) Adaptive narrowband notch โ classical LMS / gated IIR that removes stable carriers before the NN. Off by default for CW/FT8 profiles.
All preallocated. Zero Python in the per-frame critical path (we use C extensions / numpy views with pre-cast buffers).
4. ONNX inference (rfwhisper/models/)โ
- Execution provider chosen by rank:
CoreML โ DirectML โ CUDA โ XNNPACK โ CPU. Override withRFWHISPER_PROVIDER=.... - I/O tensors allocated once; bound via ONNX Runtime's IO binding API.
intra_op_num_threadstuned per-target (see Performance / Embedded if you're reading in the repo).- GIL released around
OrtRun.
5. Post-DSPโ
- iSTFT + overlap-add with matched window. Overlap-add windows must sum to 1.0 โ verified in unit tests.
- Gain match โ output RMS matched to input RMS on a slow AGC to avoid perceived level jumps when toggling bypass.
- Soft limiter โ catches occasional NN overshoots, never touches normal levels.
6. Output ring + deviceโ
Symmetric to the input side. Drains to the virtual cable or speaker.
Per-profile variationsโ
Mode profiles (YAML in rfwhisper/profiles/) tweak:
- NN aggressiveness (blend factor 0.0โ1.0)
- Notch enable + bandwidth (v0.3)
- De-emphasis on/off
- Attack / release envelopes for the soft limiter
- Frame size (longer windows for FT8, shorter for CW transient preservation)
| Profile | NN aggression | Notch | Frame | Rationale |
|---|---|---|---|---|
ssb | 0.9 | off | 10 ms | Speech intelligibility |
cw | 0.6 | off | 5 ms | Transient preservation is paramount |
ft8 | 0.7 | off | 20 ms | Decoder tolerates latency; be gentler |
ft4 | 0.8 | off | 10 ms | โ |
fm-narrow | 0.9 | optional | 10 ms | De-emphasis first |
vhf-ssb | 0.85 | optional | 10 ms | Weak-signal-friendly |
Latency measurements per stage live in Latency Budget.