It sure looks like OpenAI trained Sora on game content—and legal experts say that could be a problem
From Techcrunch, December 11, 2024
OpenAI has never revealed exactly which data it used to train Sora, its video-generating AI. But from the looks of it, at least some of the data might’ve come from Twitch streams and walkthroughs of games.
Sora launched on Monday, and I’ve been playing around with it for a bit (to the extent the capacity issues will allow). From a text prompt or image, Sora can generate up to 20-second-long videos in a range of aspect ratios and resolutions.
When OpenAI first revealed Sora in February, it alluded to the fact that it trained the model on Minecraft videos. So, I wondered, what other video game playthroughs might be lurking in the training set?