Akapulu Labs logo Akapulu Labs Research

Streaming Speech, Lip Sync, Turn-Taking

Today’s digest spans talking avatars, 4D human reconstruction, and speech agents that anticipate endpoints and manage multi-party turns. It also includes expressive TTS with finer emotion control for more natural spoken output.

Streaming Speech, Lip Sync, Turn-Taking

Qualitative comparison: same-scene condition. The reference video (cyan border, top row) provides identity context. All five methods generate from the same driving audio and scene image. produces the most faithful identity and natural motion. From Avatar V.

Talking Avatars & Lip Sync

Digital Humans & 4D Reconstruction

SpeechLLMs & Voice Agents

TTS & Expressive Voice Synthesis