It sure looks like OpenAI trained Sora on game content—and legal experts say that could be a problem

Jan 6

From Techcrunch, December 11, 2024

OpenAI has never revealed exactly which data it used to train Sora, its video-generating AI. But from the looks of it, at least some of the data might’ve come from Twitch streams and walkthroughs of games.

Sora launched on Monday, and I’ve been playing around with it for a bit (to the extent the capacity issues will allow). From a text prompt or image, Sora can generate up to 20-second-long videos in a range of aspect ratios and resolutions.

When OpenAI first revealed Sora in February, it alluded to the fact that it trained the model on Minecraft videos. So, I wondered, what other video game playthroughs might be lurking in the training set?

Rebecca Staffel

It sure looks like OpenAI trained Sora on game content—and legal experts say that could be a problem

Bipartisan House Task Force Report Outlines Key Areas of AI Focus for Health Care

Illuminate Education faces lawsuit alleging failure to keep user data in the dark

The AI Forum | Exploring the legal and social challenges of AI | Est. 2023