Jun 27, 2026

Talkers Move and Speak Better

Today’s digest spans expressive talking heads, stylized co-speech gesture generation, and code-mixing speech tools for better downstream ASR. The common thread: tighter control over how synthetic voices, faces, and motions carry meaning.

SpatialAvatar-0 method overview illustration. From SpatialAvatar-0.

Talking Avatars & Digital Humans

EmoZone-Talker

EmoZone-Talker: Regional Semantic Control of Audio-Driven 3DGS Talking Heads via Facial Action Units

EmoZone-Talker enables fine-grained, anatomically interpretable control of facial expressions in audio-driven 3D Gaussian Splatting talking heads by disentangling spatial and temporal interactions of speech and facial Action Units, improving expression realism, controllability, and lip-sync accuracy.

SpatialAvatar-0

SpatialAvatar-0: High-Quality 4D Head Avatar with Multi-Stage Reconstruction

SpatialAvatar-0 creates high-quality 4D head avatars from few portraits by unifying feed-forward prediction and per-subject refinement on a shared Gaussian-splat FLAME-mesh representation. It enables strong zero-shot generalization and faster refinement by preserving spatial layout and Gaussian count throughout.

Gesture & Embodied Animation

SiGnature

SiGnature: Explicit Motion Diffusion for Stylized Semantic Gesture

SiGnature generates co-speech gestures that are semantically meaningful and stylistically faithful to the speaker by explicitly modeling motion in joint-rotation space. It injects semantic gestures during inference without retraining, preserving personalized style while smoothly integrating precise semantic motions.

Speech & ASR

Code-Mixing Guided TTS Alignment

Improving Code-Switching ASR with Code-Mixing Guided Synthetic Speech

Introduces a preference-learning framework for synthetic speech generation that preserves code-switching patterns in conversational Mandarin-English. By incorporating a speech-based code-mixing metric into TTS training, it improves language-boundary consistency to enhance downstream code-switching ASR performance.