Akapulu Labs logo Akapulu Labs Research

Reliable voice AI for multi-speaker dialogue and long-form synthesis

Today’s digest spans speech LLMs, voice agents, and TTS: from diarization-aware multi-speaker grounding and persona-driven speech role-play to more robust, faster, and longer-form speech generation. The common thread is making voice systems more controllable, consistent, and reliable.

Reliable voice AI for multi-speaker dialogue and long-form synthesis

Proposed Decoupled Speech Role-Playing Agent (DeSRPA) framework. A Frozen LLM Controller (left) steers a Frozen StyleTTS~2~ (right) via Inference-Time Control Vectors, injecting personality and acoustic styles directly without parameter updates. From DeSRPA.

SpeechLLMs & Voice Agents

TTS & Voice Synthesis