Akapulu Labs logo Akapulu Labs Research

Speech systems tighten context, accents, and avatars

Today’s digest spans context-aware spoken dialogue, multilingual and accent-aware speech models, faster ASR decoding, and dynamic 3D head editing. Together, these papers push conversational systems toward more faithful, natural, and controllable interaction.

Speech systems tighten context, accents, and avatars

Overview of the proposed Edit3DGS framework. The core components are multi-view batch editing and mask-based refinement, which together ensure both spatial consistency across views and preservation of facial expressions. At each timestep, selected camera views are rendered and edited using instruction-guided diffusion. The edited results across multiple timesteps are then aggregated to update the original Gaussian model through 3D fitting. After several iterations, the refined 3D avatar can be reprocessed through 2D editing to propagate changes to previously unseen timesteps. From Edit3DGS.

Digital Humans & 3D Faces

SpeechLLMs & Spoken Dialogue

TTS & Voice Synthesis

ASR Architectures