Akapulu Labs logo Akapulu Labs Research

Real-Time Voices, Agents, and Avatars

Today’s digest spans end-to-end interactive models, better speech and TTS synthesis, and instant 3D avatar generation. The common thread: more natural, controllable, low-latency conversational AI across voice and visual embodiment.

Real-Time Voices, Agents, and Avatars

Qualitative Results. We show the animated results of our generated 3D Gaussian avatars for test IDs and novel expressions. Our ~generates authentic, ID-preserving avatars for diverse attributes, , races, genders, ages, hairstyles, and expressions, only from a single image. Also, the input image's visual details, such as tattoos or accessories, are faithfully reflected in the 3D Gaussian avatars. Note that ~can generate unseen observations from the input image, such as the mouth interior and eye pupil, aided by our diffusion model. Please refer to the supplementary video for the dynamic avatar animation results. From FiCA.

Digital Humans & 3D Avatars

SpeechLLMs & Voice Agents

TTS & Voice Synthesis