Akapulu Labs logo Akapulu Labs Research

Voice Gets More Real-Time

Today’s digest spans holistic video dubbing, streaming speech LLMs, expressive and waveform-native TTS, and low-latency voice conversion. The common thread: better control, faster inference, and more natural-sounding speech across generation and recognition.

Voice Gets More Real-Time

HoliDubber overall framework diagram. From HoliDubber.

Video Dubbing & Visual Speech Alignment

SpeechLLMs & Streaming Recognition

TTS & Expressive Voice Synthesis

Voice Conversion & Realtime Speech