Microsoft's new AI tool is a deepfake nightmare machine
VASA-1 can create videos from a single image.
It almost seems quaint to remember when all AI could do was generate images from a text prompt. Over the last couple of years generative AI has become more and more powerful, making the jump from photos to videos with the advent of tools like Sora. And now Microsoft has introduced a powerful tool that might be the most impressive (and terrifying) we've seen yet.
VASA-1 is an AI image-to-video model that can generate videos from just one photo and a speech audio clip. Videos feature synchronised facial and lip movements, as well as "a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness."
On its research website, Microsoft explains how the tech works. "The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively. Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512x512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviours."
Microsoft just dropped VASA-1.This AI can make single image sing and talk from audio reference expressively. Similar to EMO from Alibaba10 wild examples:1. Mona Lisa rapping Paparazzi pic.twitter.com/LSGF3mMVnDApril 18, 2024
In other words, it's capable of creating deepfake videos based on a single image. It's notable that Microsoft insists the tool is a "research demonstration and there's no product or API release plan." Seemingly in an attempt to allay fears, the company is suggesting that VASA-1 won't be making its way into users' hands any time soon.
From Sora AI to Will Smith eating spaghetti, we've seen all manner of weird and wonderful (but mostly weird) AI generated video content, and it's only going to get more realistic. Just look how much generative AI has improved in one year.
Get the Creative Bloq Newsletter
Daily design news, reviews, how-tos and more, as picked by the editors.
Thank you for reading 5 articles this month* Join now for unlimited access
Enjoy your first month for just £1 / $1 / €1
*Read 5 free articles per month without a subscription
Join now for unlimited access
Try first month for just £1 / $1 / €1
Daniel John is Design Editor at Creative Bloq. He reports on the worlds of design, branding and lifestyle tech, and has covered several industry events including Milan Design Week, OFFF Barcelona and Adobe Max in Los Angeles.