Voice, Avatars, and Spoken Reasoning
Today’s digest spans real-time talking avatars, relightable digital humans, universal speech synthesis, and a new look at how speech-language models reason internally. Together they point to more natural, controllable, and interactive conversational AI systems.
We propose InteractiveAvatar, a real-time streaming audio-driven avatar generation framework that enables intent-aware interaction. InteractiveAvatar interprets user intent to generate contextually relevant actions throughout the dialogue while maintaining long-range visual consistency. From InteractiveAvatar.