Akapulu Labs logo Akapulu Labs Research

Smarter Voices, Fewer Errors

Today’s digest spotlights more expressive speech synthesis, emotion conversion, and better ASR reliability for voice agents. From image-based TTS and continuous latent speech models to hallucination steering in Whisper, the focus is on making spoken AI sound better and fail less.

Smarter Voices, Fewer Errors

Overview of the proposed method. From Pixel-TTS.

TTS & Voice Synthesis

Expressive Voice Conversion

ASR Reliability for Voice Agents