New Gen-3 Alpha AI video generator can create detailed humans and surreal situations.

Screen capture of a Runway Gen-3 Alpha video generated with the prompt

Enlarge / Screen capture of a Runway Gen-3 Alpha video generated with the prompt “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha that’s still under development, but it appears to create video of similar quality to OpenAI’s Sora, which debuted earlier this year (and has also not yet been released). It can generate novel, high-definition video from text prompts that range from realistic humans to surrealistic monsters stomping the countryside.

AI-generated beer commercial contains joyful monstrosities, goes viral

Unlike Runway’s previous best model from June 2023, which could only create two-second-long clips, Gen-3 Alpha can reportedly create 10-second-long video segments of people, places, and things that have a consistency and coherency that easily surpasses Gen-2. If 10 seconds sounds short compared to Sora’s full minute of video, consider that the company is working with a shoestring budget of compute compared to more lavishly funded OpenAI—and actually has a history of shipping video generation capability to commercial users.

Gen-3 Alpha does not generate audio to accompany the video clips, and it’s highly likely that temporally coherent generations (those that keep a character consistent over time) are dependent on similar high-quality training material. But Runway’s improvement in visual fidelity over the past year is difficult to ignore.

AI video heats up

It’s been a busy couple of weeks for AI video synthesis in the AI research community, including the launch of the Chinese model Kling, created by Beijing-based Kuaishou Technology (sometimes called “Kwai”). Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherency that reportedly matches Sora.

Gen-3 Alpha prompt: “Subtle reflections of a woman on the window of a train moving at hyper-speed in a Japanese city.”

Not long after Kling debuted, people on social media began creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were novel and weird but generally lacked coherency; we tested out Dream Machine and were not impressed by anything we saw.Advertisement

Meanwhile, one of the original text-to-video pioneers, New York City-based Runway—founded in 2018—recently found itself the butt of memes that showed its Gen-2 tech falling out of favor compared to newer video synthesis models. That may have spurred the announcement of Gen-3 Alpha.

Gen-3 Alpha prompt: “An astronaut running through an alley in Rio de Janeiro.”

Generating realistic humans has always been tricky for video synthesis models, so Runway specifically shows off Gen-3 Alpha’s ability to create what its developers call “expressive” human characters with a range of actions, gestures, and emotions. However, the company’s provided examples weren’t particularly expressive—mostly people just slowly staring and blinking—but they do look realistic.

Provided human examples include generated videos of a woman on a train, an astronaut running through a street, a man with his face lit by the glow of a TV set, a woman driving a car, and a woman running, among others.

Gen-3 Alpha prompt: “A close-up shot of a young woman driving a car, looking thoughtful, blurred green forest visible through the rainy car window.”

The generated demo videos also include more surreal video synthesis examples, including a giant creature walking in a rundown city, a man made of rocks walking in a forest, and the giant cotton candy monster seen below, which is probably the best video on the entire page.

Gen-3 Alpha prompt: “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

Gen-3 will power various Runway AI editing tools (one of the company’s most notable claims to fame), including Multi Motion BrushAdvanced Camera Controls, and Director Mode. It can create videos from text or image prompts.

Runway says that Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward the development of what it calls “General World Models,” which are hypothetical AI systems that build internal representations of environments and use them to simulate future events within those environments.

ARS VIDEO

How Scientists Respond to Science Deniers

A few limitations

While these demos look fun at first glance, it’s worth mentioning a few drawbacks of an announcement like this. Since Gen-3 is not yet public and we do not have access yet, we have not had the chance to evaluate it. That means that even if you take Runway’s stated claim (“All of the videos on this page were generated with Gen-3 Alpha with no modifications”) at face value, the videos were very likely cherry-picked as having especially optimal results.

FURTHER READING

Fran Drescher: “We are all going to be in jeopardy of being replaced by machines”

Also, all image and video synthesis models require large datasets of existing images or video, usually either culled from sources found online without permission or licensed from rights holders. Runway has not said where it obtained the training data to train Gen-3, but it says the model was trained both on videos and still images.

That said, going by face value, the demo videos appear impressive and state-of-the-art (an ever-moving target) for video synthesis. If the tech keeps getting better over the next few years, it’s likely that video synthesis clips will eventually find their way into professional video projects somehow.

Gen-3 Alpha prompt: “A man made of rocks walking in the forest, full-body shot.”

While media has never accurately captured reality, photorealistic video was, for a long time, largely anchored to real objects and situations (barring expensive special effects and CGI departments). If a fine enough measure of generational control is achieved, AI video tech stands poised to bring that big-budget capability to low-budget video productions, which may dramatically lower the cost of filmmaking in the future. But with some entertainment industry jobs potentially at stake—including visual effects teams, actors, and set designers—we expect to see struggle and backlash along the way.Advertisement

As mentioned, Gen-3 Alpha is not yet available to the public, but the company offers an inquiry sign-up for commercial entities who might want to fine-tune the model for future commercial use. Runway says that Gen-3’s release, whenever it comes, will be accompanied by content safeguards, such as an in-house visual moderation system and C2PA provenance standards.

A recap of AI video synthesis on Ars Technica

Since 2022, we’ve covered a number of AI video synthesis models. We’ve also missed a few notable projects, such as Phenaki (mentioned briefly in one piece), Runway’s Gen-1Pika (mentioned in a roundup syndicated from FT), Luma Dream Machine, and Kling (both mentioned above). To provide a brief rundown of where the technology has been so far, here’s a list of related Ars Technica articles. This is as much for our benefit as it is for yours because it’s sometimes difficult to keep all of these AI video models straight.

Even a cursory look at the process from the earliest models above shows that AI video synthesis technology is steadily on the move, and the increased capability is likely only limited by available compute and enough high-quality training data. We’ll keep you posted.

LEAVE A REPLY

Please enter your comment!
Please enter your name here