Akapulu Labs logo Akapulu Labs Research

Speech synthesis, voice conversion, and interactive avatars push toward real-time realism

Today’s digest spans transcript-free text-to-speech, diffusion-guided speech generation, streaming voice conversion, and speech codecs built for cleaner identity control. It also features physically interactive 3D avatars that deform realistically under contact and motion.

Speech synthesis, voice conversion, and interactive avatars push toward real-time realism

An illustration of our framework. (a) To faithfully reflect the user-defined motion, we decouple the kinematic velocity from the deformation gradient update (Sec.~). (b) By computing the velocity from the transformations of the embedded skeletal structure, our method preserves the pose consistency throughout the simulation (Sec.~). From PIAvatar.

Digital Humans & 3D Avatars

TTS & Voice Synthesis

Voice Conversion & Streaming Speech