Meta has just announced its own media-centric AI model, called Movie Gen, that can be used to generate realistic video and audio clips.
The company released multiple 10-second clips generated using Movie Genincluding a floating baby hippopotamus, Moo Deng style, to demonstrate its abilities. While the tool is not yet available for employ, Movie Gen’s announcement comes shortly after the Meta Connect event, which showcased modern and refreshed hardware and the latest version of its huge language model, Llama 3.2.
Going beyond generating elementary text clips into video, the Movie Gen model can make targeted edits to an existing clip, such as adding an object to someone’s hands or changing the appearance of a surface. In one of Meta’s sample videos, a woman wearing a VR headset was transformed to look like she was wearing steampunk binoculars.
Audio snippets can be generated along with videos using Movie Gen. In the example clips, the AI human stands near a waterfall with audible splashes and hopeful symphony sounds; a sports car’s engine purrs and tires screech as it speeds along the track, and a snake slithers through the jungle to the accompaniment of tense horns.
Meta shared further details about Movie Gen in a research paper published on Friday. Movie Gen Video consists of 30 billion parameters, while Movie Gen Audio consists of 13 billion parameters. (The model’s parameter count roughly reflects its capabilities; for comparison, the largest Lamy 3.1 variant has 405 billion parameters.) Movie Gen can create high-resolution videos up to 16 seconds long, and Meta claims it outperforms competing models in overall video quality.
Earlier this year, CEO Mark Zuckerberg demonstrated Meta AI’s “Imagine Me” feature, where users can upload their photos and role-play in various scenarios by posting their AI image drowning in golden chains on Threads. A video version of a similar feature is possible with the Movie Gen model – think of it as something like ElfYourself on steroids.
What information was Movie Gen trained on? The details aren’t clear in Meta’s news post: “We trained these models on a combination of licensed and publicly available datasets.” The sources of training data and what can be downloaded from the Internet remain a contentious issue for generative AI tools, and it is rarely publicly known what text, video or audio clips were used to create any of the main models.
It will be fascinating to see how long it takes for Meta to make Movie Gen available. The announcement blog vaguely hints at a “potential future release.” By comparison, OpenAI announced its AI video model called Sora earlier this year and has not yet made it publicly available or shared any upcoming release date (though WIRED obtained some exclusive Sora clips from the company to investigate bias).
Given Meta’s legacy as a social media company, it’s possible that Movie Gen-based tools will eventually start appearing on Facebook, Instagram and WhatsApp. In September, rival Google shared plans to make aspects of its Veo video model available to creators for Shorts on YouTube next year.
While larger tech companies are still holding off on fully releasing their video models to the public, you can already experiment with AI video tools offered by smaller, emerging startups like Runway AND Pique. If you’ve ever wondered what it would be like to see yourself, try Pikaffects cartoonishly crushed using a hydraulic press or suddenly melt into a puddle.
