OpenAI researchers are developing a fresh model that speeds up media generation by 50 times

Join our daily and weekly newsletters to receive the latest updates and exclusive content on our industry-leading AI coverage. Find out more

A pair of OpenAI researchers have published a paper describing a fresh type of model — specifically, a fresh type of continuous-time consistency model (sCM) — that increases the speed at which media, including images, video and audio, can be generated by artificial intelligence by 50 times compared with customary diffusion models, generating images in almost 10th of a second compared to over 5 seconds for regular diffusion.

With the introduction of sCM, OpenAI has managed to achieve comparable sample quality in just two sampling steps, offering a solution that speeds up the generation process without sacrificing quality.

Described in pre-peer-reviewed article published on arXiv.org AND blog post published todayauthored by Cheng Lu and Yang Song, the innovation enables these models to generate high-quality samples in just two steps – much faster than previous diffusion-based models that required hundreds of steps.

There was also a song lead author of article published in 2023 from OpenAI researchers, including former chief scientist Ilya Sutskever, who coined the concept of “coherence models” as having “points on the same trajectory map to the same starting point.”

Although diffusion models have provided excellent results in creating realistic images, 3D models, audio and video, their inefficiency in sampling – often requiring tens to hundreds of consecutive steps – made them less useful for real-time applications.

Theoretically, this technology could form the basis for a near-real-time AI image generation model with OpenAI. As fellow VentureBeat reporter Sean Michael Kerner mused on our internal Slack channels: “Could DALL-E 4 be left far behind?”

Faster sampling while maintaining high quality

In customary diffusion models, a vast number of denoising stages are needed to produce a sample, which contributes to their low speed.

In contrast, sCM transforms noise into high-quality samples directly in one or two steps, reducing computational costs and time.

OpenAI’s largest sCM model, which boasts 1.5 billion parameters, can generate a sample in just 0.11 seconds on a single A100 GPU.

This results in a 50x acceleration of wall clock times compared to diffusion models, greatly increasing the feasibility of real-time generative AI applications.

Achieving diffusion model quality with much less computational resources

The sCM team trained a continuous-time consistency model on ImageNet 512×512, scaling to 1.5 billion parameters.

Even at this scale, the model maintains sample quality comparable to the best diffusion models, achieving a Fréchet Inception Distance (FID) score of 1.88 on ImageNet 512×512.

This brings sample quality within 10% of diffusion models, which require much greater computational effort to achieve similar results.

Benchmarks show good performance

OpenAI’s fresh approach has been subjected to extensive benchmarking against other state-of-the-art generative models.

By measuring both sample quality through FID scores and effective sampling calculations, the study shows that sCM provides the highest quality results with significantly less computational overhead.

While previous rapid sampling methods struggled with degraded sample quality or intricate training setups, sCM overcomes these challenges by offering both speed and high fidelity.

The success of SCM is also attributed to its ability to scale proportionally to the teacher diffusion model from which it draws knowledge.

As both the sCM and the teacher diffusion models escalate in size, the sample quality gap decreases further, and increasing the number of sampling steps in sCM further reduces the quality gap.

Applications and future applications

The quick sampling and scalability of SCM models open up fresh opportunities for real-time, multi-domain generative AI.

From image generation to audio and video synthesis, sCM provides a practical solution for applications requiring quick, high-quality output.

Additionally, OpenAI research shows the potential for further system optimization that could further accelerate performance by adapting these models to the specific needs of various industries.

VB every day

Stay up to date! Get the latest news in your inbox every day

By subscribing, you agree to VentureBeat’s Terms of Service.

Thank you for subscribing. Find more VB newsletters here.

An error occurred.

Categories

OpenAI researchers are developing a fresh model that speeds up media generation by 50 times

Faster sampling while maintaining high quality

Achieving diffusion model quality with much less computational resources

Benchmarks show good performance

Applications and future applications

7 Ways to Reduce Hallucinations in LLM Manufacturing

Measuring progress towards AGI: A cognitive framework

Why Walmart and OpenAI are disrupting their sales agent offerings

A quantum leap towards the Turing Award

Join our next live broadcast: War Machine

More News

Why Walmart and OpenAI are disrupting their sales agent offerings

Join our next live broadcast: War Machine

The Justice Department says Anthropic cannot be trusted with its warfighting systems

COBOL is the asbestos of programming languages

7 Ways to Reduce Hallucinations in LLM Manufacturing

Measuring progress towards AGI: A cognitive framework

Why Walmart and OpenAI are disrupting their sales agent offerings