AI Text to Video in 2024: Watershed or Washout? -- AIBoardroom

AI Boardroom Insights

Stylized image of board members sitting around a conference table

AI Text to Video in 2024: Watershed or Washout?

Last week OpenAI announced Sora, its text-to-video generator -- and the results are, for the most part, pretty darn impressive. However, just like other AI text-to-video products, such as those from Google (VideoPoet), Meta (Make-A-Video) and Microsoft (Godiva), none of these services are available to the public yet. Rather, they are in "research mode" which means that beyond the lucky few chosen to participate in the pre-release trials, none of us really know how well they work beyond the examples given by the companies who created them.

In previous technical waves, when technologies were shown but not given access to -- especially those with major wow factors -- there was always a hint of "vaporware" in the mix. And although that could still be true to some level with text-to-video AI, there are a number of legitimate reasons that these companies could be holding off on expanding access.

To start, the U.S. presidential elections are coming in November 2024. Just five days ago, OpenAI, Microsoft, Adobe, Meta and other AI and social media companies signed a pledge in Munich to "collaborate on developing tools for detecting misleading AI-generated images, video and audio", and to develop watermarking/metadata technology, the companies said. Considering that most technical experts agree watermarking won't work, and that metadata itself can be easily stripped, some analysts speculate that this next level of AI -- one that would put the ability to create deep fake videos into wide public accessibility -- won't go live until after the actual election.

There may also be issues of cost and infrastructure. Right now, without video generation, it's estimated that ChatGPT costs OpenAI $700,000 a day to operate. Videos will be exponentially more expensive to generate; there are reports that each video -- which may range from seconds to a minute -- can take hours each to generate. Some providers may not have worked out a cost structure that makes it feasible to release these tools to a wider audience just yet (if they ever can be).

Finally, even when things work well in the AI realm, things go wrong. Just this week, Google had to temporarily pull Gemini's image generator because of an issue they likely never expected. So by rolling out slowly, vendors are giving themselves time to get as many quirks out as possible (although some may argue against excess caution, pointing to the theory that Facebook's Blender failed a few days before ChatGPT took off simply because it was made too safe).

So for right now, when it comes to AI text-to-video, expect it to remain in "curated demo" mode for a while, unless you're lucky enough to be in the hands-on preview group for Sora, VideoPoet, Make-A-Video or Stable DIffusion's Stable Video. We'll keep you posted as each service (and new competitors) roll out to wider markets.

Posted by Becky Nagel on 02/23/2024