Create Realistic Avatars From Audio Using Meta's Audio2Photoreal

Republished By Plato

Followers: 0

In a leap forward in generative AI, Meta AI has recently unveiled a revolutionary technology named Audio2Photoreal. This cutting-edge project, designed as an open-source initiative, enables the generation of full-body, lifelike 3D avatars based on audio input. The avatars not only display realistic facial expressions but also mimic complete body and gesture movements corresponding to the spoken words in multi-person conversations. Let’s delve into the intricacies of this game-changing technology.

Also Read: You Can Now Edit Text in Images Using Alibaba’s AnyText

How Audio2Photoreal Works

Audio2Photoreal employs a sophisticated approach that combines vector quantization’s sample diversity with high-frequency detail gained through diffusion, resulting in more dynamic and expressive motion. The process involves several key steps:

Dataset Capture: The model first captures rich datasets of two-person conversations to facilitate realistic reconstructions.
Motion Model Construction: From the data, it builds a composite motion model, including facial, posture, and body motion models.
Facial Motion Generation: Simultaneously, the model processes the audio using a pre-trained lip regressor to extract facial motion features. A conditional diffusion model then generates facial expressions based on these features.
Body Motion Generation: Then, the audio input is used to autoregressively output vector quantization (VQ) guided postures at 1 frame per second. These, along with audio, are fed into a diffusion model to generate high-frequency body motion at 30 frames/second.
Virtual Character Rendering: The generated facial and body movements finally pass to a trained virtual character renderer to produce realistic avatars.
Result Display: The final output showcases full-body, realistic virtual characters expressing subtle nuances in conversations.

Example of Usage Scenario

Audio2Photoreal finds application in various scenarios, such as training models with collected voice data to generate custom character avatars, synthesizing realistic virtual images from historical figures’ voice data, and adapting character voice acting to 3D games and virtual spaces.

Also Read: Decoding Google VideoPoet: A Comprehensive Guide to AI Video Generation

Features of the Product

Generates realistic human avatars from audio.
Provides pre-trained models and datasets.
Includes face and body models.
Achieves high-quality avatar rendering.
Offers open-source PyTorch code implementation.

How to Use Audio2Photoreal

To utilize Audio2Photoreal, users need to input audio data. The advanced models then generate realistic human avatars based on the provided audio, making it a valuable resource for developers and creators in digital media, game development, or virtual reality.

Also Read: MidJourney v6 Is Here to Revolutionize AI Image Generation

Our Say

The unveiling of Meta AI’s Audio2Photoreal marks a significant stride in the realm of avatar generation. Its ability to capture the nuances of human gestures and expressions from audio showcases its potential to revolutionize virtual interactions. The open-source nature of the project encourages collaboration and innovation among researchers and developers, paving the way for the creation of high-quality, lifelike avatars. As we witness the continual evolution of technology, Audio2Photoreal stands as a testament to the limitless possibilities at the intersection of audio and visual synthesis.