Speech Agents Learn to Listen
Today’s digest spans full-duplex spoken dialogue, emotional voice synthesis, and ASR that catches hesitation and disfluency. Together, these papers push voice agents toward more natural, responsive, and expressive interaction.
BayLing-Duplex model architecture illustrating the multi-channel autoregressive design enabling integrated speech and text dialogue management. From BayLing-Duplex.