Akapulu Labs logo Akapulu Labs Research

Controlling speech, audio, and avatars

Today’s digest spans new ways to shape how machines speak, listen, and move: from universal audio generation and controlled TTS to stronger ASR alignment and adaptive speech encoders. It also includes a fresh step toward smoother digital-human motion from sparse sensor input.

Controlling speech, audio, and avatars

AudioCALM architecture overview illustrating the continuous autoregressive model with flow-matching head and asymmetric experts for universal audio generation. From AudioCALM.

Digital Humans & Avatar Motion

SpeechLLMs & Spoken Audio Generation

TTS & Voice Synthesis

ASR Architectures & Adaptation