Tuesday, March 10, 2026

OpenCV Founders Launch Video AI Startup to Take on OpenAI and Google

Share

A novel artificial intelligence startup founded by creators the world’s most widely used computer vision library emerged from stealth with technology that generates realistic, human-centric videos lasting up to five minutes — a dramatic leap beyond the capabilities of rivals, including OpenAI solutions Sora and Google I see.

CraftStorywhich launched on Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that solves one of the most crucial limitations plaguing the nascent AI video industry: duration. While OpenAI Sora 2 peaks at 25 seconds, and most competing models produce clips lasting 10 seconds or less, the CraftStory system can create continuous, consistent video presentations that last as long as a typical YouTube tutorial or product demo.

The breakthrough could unlock significant commercial value for enterprises seeking to scale video production for training, marketing and customer education – markets where miniature AI-generated clips have proven insufficient despite their visual refinement.

“If you actually try to create a video using one of these video generation systems, you will find that often you want to implement a specific creative vision, and no matter how detailed the instructions are, the systems basically ignore some of your instructions,” said Victor Erukhimov, founder and CEO of CraftStory, in an exclusive interview with VentureBeat. “We have developed a system that can generate videos for as long as you need them.”

How parallel processing solves the problem of long video formats

CraftStory’s advancement is based on what the company describes as a parallel dissemination architecture – a fundamentally different approach to how AI models generate video compared to the sequential methods used by most competitors.

Customary video generation models work by running diffusion algorithms on increasingly larger three-dimensional volumes, with time representing the third axis. To generate a longer video, these models require proportionately larger networks, more training data, and significantly more computational resources.

CraftStory instead, it runs multiple smaller diffusion algorithms simultaneously throughout the duration of the video, combining them bidirectionally. “The second part of the film may also influence the previous part,” Eruchimov explained. “And this is quite important, because if you do it one by one, the artifact that appears in the first part will go into the second part, and then it will accumulate.”

Instead of generating eight seconds and then combining additional segments, the CraftStory system processes all five minutes at once through connected dissemination processes.

Most importantly, CraftStory trained its model on proprietary footage, rather than relying solely on videos downloaded from the Internet. The company rented studios to shoot the actors using high-frame-per-second camera systems that capture crisp detail even in fast-moving elements like fingers, avoiding the motion blur typical of standard 30-frame-per-second YouTube clips.

“We showed that you don’t need a lot of data or a big training budget to create high-quality videos,” Erukhimov said. “You just need high-quality data.”

Model 2.0 currently works as a video-video system: users send a still image for animation and a “driving video” of a person whose movements the artificial intelligence will replicate. CraftStory provides pre-made driving videos recorded with professional actors who receive a share of the revenue when their motion data is used. Users can also upload their own recordings.

The system generates 30-second low-resolution clips in approximately 15 minutes. An advanced lip-syncing system synchronizes your mouth movements with the script or soundtrack, and gesture matching algorithms ensure that your body language matches your speech rhythm and emotional tone.

I’m fighting in a war chest of $2 million against billions

CraftStory’s funding comes almost entirely from Andrzej Filewwho sold his project management software company Wrike to Citrix $2.25 billion in 2021 and is currently running ZencoderAI coding company. The modest raise contrasts sharply with the billions flowing into competitive efforts that OpenAI has raised over $6 billion only in the last round of financing.

Eruchimov rejected the idea that huge capital is a condition for success. “I don’t necessarily buy the idea that computation is the path to success,” he said. “It definitely helps if you have the math. But if you raise a billion dollars on PowerPoint, in the end no one will be happy, neither the founders nor the investors.”

Filew defended the David versus Goliath approach. “When you invest in startups, you’re basically betting on people,” he said in an interview with VentureBeat. “To paraphrase Margaret Mead: never underestimate what a small group of thoughtful, committed engineers and scientists can build.”

He argued that CraftStory benefits from a focused strategy. “Large labs are in an arms race to build general-purpose entry-level video models,” Filev said. “CraftStory rides this wave and dives deep into a specific format: long-form, immersive, human-centric film.”

Why computer vision knowledge matters in generative video AI

Eruchimov’s credibility comes from his deep roots in computer vision, not the transformer architectures that have dominated recent AI developments. He was one of the first co-authors Open CV – an open source computer vision library that has become the de facto standard for computer vision applications, covering over 84,000 stars on GitHub.

When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the express goal of maintaining and developing the library. The company greatly expanded OpenCV and moved into automotive security systems before Intel acquired it in 2016.

Filev said it’s this background that makes Eruchimov well-prepared to generate video. “Sometimes people overlook the fact that AI-powered generative video is not just about the generative part. It’s about understanding movement, facial dynamics, temporal coherence and how people actually move,” Filev said. “Victor has spent his career solving exactly these problems.”

Enterprise focus focuses on training videos and product demos

While most of the public interest in AI video generation has focused on innovative tools for consumers, CraftStory is pursuing a decidedly enterprise-focused strategy.

“We are definitely thinking more about B2B than about consumers,” Eruchimow said. “We’re thinking about companies, especially software companies, being able to create cool training videos and product videos and launch videos.”

The logic is uncomplicated: corporate training, product tutorials and customer education videos often last just a few minutes and require consistent quality at all times. A 10-second AI clip cannot effectively show you how to apply enterprise software or explain convoluted product features.

“If you need a longer video, you should come with us,” Eruchimov said. “We can create up to five minutes of consistent, high-quality video.”

Filev maintained this assessment. “A huge gap in this market is the lack of models that can generate consistent videos over longer sequences, and this is extremely important for real-world applications,” he said. “If you’re creating an ad for your business, a 10-second video, no matter how good it looks, is simply not enough. You need 30 seconds, you need two minutes – you need more.”

The company anticipates savings for customers. Filev suggested that “a small business owner could create content in minutes that would previously cost $20,000 and take two months to produce.”

CraftStory is also courting innovative agencies that produce video content for enterprise clients, with a value proposition focused on cost and speed: Agencies can capture an actor on camera and turn that footage into a finished AI-powered video, rather than managing costly, multi-day photo shoots.

Another major development in CraftStory’s roadmap is its text-to-video model, which will enable users to generate long-form content directly from scripts. The team is also developing support for moving camera scenarios, including the popular walk-and-talk format common in high-end advertising.

Where CraftStory fits into a fragmented competitive landscape

CraftStory is entering a crowded and rapidly growing market. OpenAI Sora 2while not yet publicly available, it has generated considerable buzz. Google I see models progressing quickly. Runway, PiqueAND AI stability all offer video generating tools with different capabilities.

Erukhimov acknowledged that there are competitive pressures, but emphasized that CraftStory serves a distinct niche focused on human-centered films. He made it the company’s core strategy to innovate quickly and conquer the market, rather than relying on technical moats.

Filev sees the market splitting into distinct layers, with enormous tech companies acting as “API providers of powerful general-purpose generation models,” while specialized players like CraftStory focus on specific apply cases. “If the big players are building engines, CraftStory is building a production studio and assembly line upstairs,” he said.

Model 2.0 is available now at app.craftstory.com/model-2.0, and the company is offering early access to users and enterprises interested in testing the technology. It’s unclear whether a poorly funded startup can gain significant market share against wealthier incumbents, but Eruchimov is characteristically confident that opportunities lie ahead.

“AI-generated video will soon become the primary way companies tell their stories,” he said.

Latest Posts

More News