Disney and OpenAI Signal the Arrival of AI Video Streaming

Disney and OpenAI Signal the Arrival of AI Video Streaming
NEWS | 23 December 2025
Disney and OpenAI’s agreement hints at a future in which viewers don’t just choose what to watch but generate it on demand Recently I looked up the earliest surviving motion picture, Roundhay Garden Scene, which dates back to 1888. Four figures, two men and two women, walk around a yard with quick, jerky steps. It lasts about two seconds. I also recently watched some clips made in 2016 by researchers at the Massachusetts Institute of Technology and the University of Maryland that are among the first fully artificial-intelligence-generated videos. Each is about a second long. In one, a blurry figure stands on a golf green, bent at the waist to putt. No one would confuse these videos or Roundhay Garden Scene for the slick realism of contemporary cinema. And just as skeptics often deride AI video as wasteful, 19th-century critics dismissed early cinema as a “foolish curiosity.” Yet a recent agreement between Disney and OpenAI offers a glimpse of a different future. Starting in early 2026, the tech company’s video generator Sora will be able to create videos featuring more than 200 characters from Disney, Marvel, Pixar and the Star Wars franchise. And Disney+ will stream a selection of user-made clips. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. Disney will also invest $1 billion in OpenAI and use its tools to build “new experiences for Disney+ subscribers,” according to a Disney and OpenAI joint press release. In announcing the partnership, Disney CEO Robert Iger said that the company would “thoughtfully and responsibly extend the reach of our storytelling through generative AI.” He also said in a recent earnings conference call that he intends for subscribers to create content within Disney+ itself. If you want to watch Elsa and Cinderella take down Maleficent, you’ll be able to ask for the scene—though it may last only 20 seconds. If this is the start of AI TV on demand, I wonder how long it will be until these clips reach 20 minutes or an hour, given the environmental burden and the computing costs. Plenty of people believe it’s impossible, but I imagine that few of those who watched Roundhay Garden Scene foresaw The Great Train Robbery, a 12-minute milestone of silent cinematography from 1903, much less Gone with the Wind—or streaming. The challenge of image generation lies in how today’s systems work. They are built on diffusion, a technique that begins with “noise” that is gradually refined into an image. Picture an image of a person standing in mist. The AI essentially removes the mist and puts in new pixels in repeated passes until a coherent figure appears. Each pass to refine a generated image increases the cost. Video is even more challenging. The series of images must be coordinated so that facial features don’t change and coffee mugs don’t vanish. In one second of high-definition video, millions of pixels are changing. During a keynote speech at a hackathon hosted by AI community hub AGI House, Bill Peebles, an OpenAI researcher who helped develop Sora, said, “We discovered how painful it is to work with video data. It’s a lot of pixels in these videos.” To manage the pixels, OpenAI’s system compresses video to a simplified version that keeps crucial information. It then treats it like a loaf of bread—slicing it into frames that it then divides into cubes. This allows the model to coordinate all the cubes with each other, much as the models that power ChatGPT relate all the words in a response. The leap from seconds to minutes is so punishing because the more frames you add, the more information the model has to keep in view. As videos get longer, inconsistencies accumulate. True “on-demand” AI TV would also require cuts between scenes. If every Disney+ user were requesting it with near-term technology, the costs would be staggering. Researchers have been hunting for more efficient approaches. One is for the model to break the job into stages. “Instead of denoising or generating the whole video all at once, you generate frame by frame,” says Tianwei Yin, a research scientist at AI image editing start-up Reve, who co-developed the CausVid video-generation software. “At each step, your compute is limited to a much smaller portion instead of the full thing, and this enables you to go much longer.” Yin believes that systems will more efficiently reach five minutes of generation by next year and that, through the integration of different existing AI technologies, they could reach an hour not long after. Others have echoed this optimism. In a recent BBC interview, Google CEO Sundar Pichai described the possibility of high school students making feature-length AI films in coming years. Cristóbal Valenzuela, CEO of the AI-video-generation company Runway, told El País earlier this month, “Having 60 or 90 minutes with consistent characters and story still isn’t possible. But it will be soon.” He went on to say that watching AI videos as they are generated in real time is also on the horizon. The road from curated fan clips to feature-length films will pass through some unglamorous innovations, not to mention negotiations over how to pay the creatives whose work feeds it. And though the financial burden of AI videos seems prohibitive, millions of people globally are involved in producing and training AI models, and the costs of technologies usually decrease. For instance, bandwidth was prohibitively expensive in 1998—it cost about $1,200 per megabit per second (Mbps) monthly for large networks—but by 2025 the lowest reported cost was $0.05 per Mbps monthly, a 99.996 percent decrease. This change made streaming on Disney+ or Netflix possible. The cultural path of new mediums is far harder to imagine, and resistance is often intense. Poet Charles Baudelaire railed against photography in 1859 for its lazy realism that dragged art away from the imagination. In past centuries, “sceptics and partisans both compared photography to painting, and moving pictures to theatre,” wrote present-day scholar Reuben de Lautour. We appear to be in an even more complicated moment. What seems certain is that, as in the past, technology will rapidly evolve, allowing millions of creators to test possibilities we can’t yet predict.
Author: Eric Sullivan. Deni Ellis Béchard.
Source