Monday, December 23, 2024

Sora’s AI video revolution is still a long way off

Share

The first version of Sora OpenAI can generate video of almost anything you throw at it – superheroes, cityscapes, animated puppies. This is an impressive first step for an AI video generator. The actual results are far from satisfactory, however, and many of the films are so laden with oddities and inconsistencies that it’s tough to imagine anyone finding a exploit for them.

Sora was released on Monday, after nearly a year of trailers teasing its capabilities. However, before you get to the video generation feature, you’ll encounter a few hurdles. Firstly, the account creation was closed within hours of launch due to overwhelming demand. Those who successfully sign up will find that unlocking the feature also requires a subscription: a $20 monthly “Plus” membership will allow you to generate videos in 480p or 720p, with a length limit of five or 10 seconds, depending on resolution. To unlock everything, including 1080p quality and 20-second videos, you’ll need to shell out $200 a month for a “Pro” Sora subscription.

The results of my Plus Level tests were disappointing. Basic prompts with circumscribed description seem to work best – for example, “cat playing with a ball of yarn” creates a very realistic-looking cat jumping excitedly on the floor. But Sora gave the cat another tail for a few moments, and the yarn itself was jittery and looked like poorly inserted CGI.

These visual issues occurred more frequently and were more pronounced for sophisticated prompts that included detailed scene descriptions. It’s tough to get anywhere near natural human movement: hands flailed everywhere when I asked to see the person doing the makeup, and videos of people eating salad and sausage rolls looked horribly like viral AI clips of Will Smith inhaling spaghetti.

Sora includes an compelling Storyboard feature that is designed to lend a hand you put together quick instructions for longer videos. It’s similar to a video editing timeline, allowing users to explain what they want Sora to generate every two seconds, rather than inserting one long description of the entire video. It’s quite effortless to exploit, but the results were even worse. The more details I added, the more distortions and oddities appeared.

However, some things impressed me. Video generation was faster than expected, typically under 30 seconds, even for 10-second clips. Patterns on fur and textiles also remained consistent, even during brisk movement, and the lighting, shadow and mirror effects generated by Sora do a fantastic job of simulating reality. The sunlight streaming through the window will provide a flash of glow and shine through beautifully through all the materials you would expect. Even at low resolutions, most objects have a high level of detail and do not create a clutter of pixels.

For all its flaws, Sora performed better than Runway AI, which is considered one of the better AI video generators for photorealism simulation. When identical prompts were introduced on both platforms, Sora’s results looked more realistic and contained significantly less visual distortion. The quality of Sora’s results is also comparable to demos of Adobe’s Firefly Video Model that I saw in October in Adobe Max, although OpenAI clearly lacks the advantage of ensuring that the results generated will be commercially safe and sound. Adobe achieved this by training its AI models solely on licensed or public domain content, an ethos that OpenAI does not adhere to.

[The above video was generated using Runway.AI using the same prompt I gave Sora.]

Nothing that Sora generated from scratch was actually there usablethough. It’s definitely not ready for entertainment or commercial work that requires narrative coherence, and you’d really have to reach out to even exploit it as a replacement for a quick flash of footage. Perhaps getting high-quality videos that don’t contain any obvious AI quirks is possible with enough time, experience, and editing skills, but if that’s the case, it doesn’t seem like Sora is fundamentally “democratizing” content creation yet.

There are also several guardrails that aim to prevent the generation of copyright infringement or other unpleasant content, but with varying degrees of effectiveness. Sora completely blocks attempts to generate information about political figures such as Donald Trump and Kamala Harris, warning the user that such suggestions may violate OpenAI’s terms of service. Celebrity names like Taylor Swift and Lewis Hamilton aren’t blocked, but instead they just insert a random person who doesn’t look like them into the video. It also does a pretty good job of avoiding recognizable characters and brand icons, even with descriptions that try to force results like “blue bipedal cartoon hedgehog with red shoes.”

Things get more uncertain when it comes to the scenes you request. Some violent terms, such as “truck plowing into scared protesters,” were blocked, but it did generate a clip of an explosion at the Empire State Building – even if the effects were laughably cartoonish. Videos were also produced of juvenile children modeling swimsuits on a runway and pointing guns at smiling parents.

Sora includes a feature that allows you to upload your own reference images. The pop-up message forces users to check a series of boxes before they can be used, promising that you own the rights to these images and will not upload anything that contains minors, violence or explicit subject matter, otherwise your account may be suspended or banned “without refund.” However, the biggest deterrent to abuse of this feature is financial – only users with Pro subscriptions can upload images that include people. If this is a feature used to create the more impressive Sora demos we’ve seen, this is a significant limitation.

It’s early days and there are some obvious problems to solve, but nothing I’ve seen so far makes me think Sora will revolutionize video production overnight. The features that enable you to create high-quality output come with a subscription that’s about as high-priced as customary filming and video creation tools, putting it out of reach for many. It’s tough to imagine that an entire movie produced using this technology would be enjoyable to watch in its current state.

However, quality issues don’t stop people from trying to take advantage of convenient AI video tools – YouTube is already saturated with AI-generated nonsense aimed at juvenile children. Sora is now more than capable of creating similar content, and it will only cost you $20 a month.

Latest Posts

More News