Open Source · Rust/WASM · 30-day Free Trial

Voice to avatar,
entirely in the browser

AnimaSync extracts emotion from speech and generates lip sync, facial expressions, and body motion in real time — no server required.

Start Building View on GitHub
Full Animation Pipeline
🎤
Audio Input
File, Mic, or TTS
16kHz PCM
🧠
ONNX Inference
Rust/WASM engine
Phoneme → Viseme
👄
Lip Sync
52 ARKit blendshapes
jaw, mouth, tongue
😊
Expressions
Brow, cheek, eyes
Emotion from voice
👁
Eye Blink
Stochastic injection
2.5–4.5s interval
🎭
VRM Body
VRMA bone animation
Idle ↔ Speaking
52
ARKit Blendshapes
30 fps
Animation Output
<300ms
Mic-to-Render
0
Server Required

Everything from audio alone

One engine handles the full animation pipeline — from raw audio to animated avatar.

👄

Lip Sync

ONNX neural inference maps speech phonemes to 52 ARKit blendshapes at 30fps. Crisp mouth movements with natural co-articulation.

😊

Facial Expressions

Voice energy and pitch automatically drive brows, cheeks, eyes, and smile. Emotion follows the speaker naturally.

👁

Eye Animation

Stochastic blink injection at 2.5–4.5s intervals with 15% double-blink probability. No dead-eyed avatars.

🎭

Body Motion

Embedded VRMA bone animation clips with smooth idle-to-speaking crossfade. Breathing, gestures, and posture shifts.

🎙

Real-time Streaming

AudioWorklet captures microphone at 16kHz. Process chunks as they arrive — no need to wait for complete audio.

Client-side Only

Rust/WASM + ONNX Runtime Web. No server, no API calls, no data leaves the browser. Works offline after first load.

Animate an avatar in 4 lines

Install from npm, initialize the engine, and start generating animation frames. Works with any Three.js + VRM setup.

$ npm install @goodganglabs/lipsync-wasm-v1
import { LipSyncWasmWrapper } from '@goodganglabs/lipsync-wasm-v1'; const lipsync = new LipSyncWasmWrapper(); await lipsync.init(); // 30-day free trial const result = await lipsync.processFile(audioFile); const frame = lipsync.getFrame(result, 0); // number[111]

Install from npm

Two engines, one API surface. Pick the engine that fits your project.

@goodganglabs/lipsync-wasm-v1
Recommended

Phoneme classification engine — 111-dim output with full expression control. Built-in IdleExpressionGenerator, VoiceActivityDetector, and VRM 18-dim mode.

Output: 111-dim ARKit
Post: OneEuroFilter + constraints
VRM: 18-dim preset mode
$ npm install @goodganglabs/lipsync-wasm-v1
@goodganglabs/lipsync-wasm-v2
Lightweight

Emotion model — 52-dim ARKit blendshape prediction with 5-dim FiLM conditioning (neutral, joy, anger, sadness, surprise).

Output: 52-dim ARKit
Emotion: 5-dim FiLM conditioning
Peer: onnxruntime-web
$ npm install @goodganglabs/lipsync-wasm-v2

See it in action

Interactive demos you can try right now — no install needed.

Interactive Guide

Build Your Own AI Talking Avatar

6-step interactive tutorial. Choose V1 or V2 engine, adjust emotion in real time (V2), load a VRM avatar, apply lip sync — with live demos at each step.

Start Guide →
V1 Engine

Phoneme Visualization

V1 phoneme engine — 111-dim output mapped to 52 ARKit blendshapes. ONNX inference with real-time visualization.

Try it →
V2 Engine

Emotion Model Demo

V2 emotion model — 52 ARKit blendshapes with 5-dim FiLM conditioning. Emotion-aware lip sync, real-time rendering.

Try it →
Comparison

V1 vs V2 Side-by-Side

Same voice input, two animation engines, two avatars. See the difference live in a dual-panel view.

Try it →

Choose your engine

Two engines for different needs. Both produce ARKit-compatible output at 30fps.

Feature V1 Recommended V2
Output111-dim ARKit blendshapes52-dim ARKit blendshapes
ArchitecturePhoneme classification + viseme mappingEmotion model + FiLM conditioning
Post-processingOneEuroFilter + anatomical constraintscrisp_mouth + fade + auto-blink
Idle expressionsBuilt-in IdleExpressionGeneratorBlink injection in post-process
Voice activityBuilt-in VoiceActivityDetector
Emotion control5-dim FiLM conditioning (neutral, joy, anger, sadness, surprise)
Best forFull expression control, custom avatarsEmotion-aware lip sync, quick integration