OpenAI’s Sora can create realistic video clips from text prompts

OpenAI, the same company that created ChatGPT and Dall.E has just unveiled its latest video-generation model called Sora. The new model takes text prompts and turns them into ‘realistic and imaginative scenes.’ The new model can currently create minute-long clips purely based on text prompts users have written.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

OpenAI’s blog post says the model can “generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background.” More scarily, OpenAI says that the model not only understands what the user is asking with the prompt but also how the things in the prompt exist in the physical world.

One of the sample video clips created with Sora using text prompts. Video credit: OpenAI

The result is truly amazing and scary. Because the model has a deep understanding of language, it can accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately portrays characters and visual style.

The model also can accept image inputs and generate video based on that image. It can also fill in the missing frames in a video, or even extend the video when needed.

Prompt used: Tour of an art gallery with many beautiful works of art in different styles. Video credit: OpenAI

According to OpenAI, Sora is a diffusion model, where it generates a video by starting with one that looks like static noise and gradually transforms it by removing the noise over many steps. Similar to GPT models, Sora uses a transformer architecture, unlocking superior scaling performance

The quality of the video is pretty good, but there are still some visual glitches in some of the clips. Sora struggles to render fast movements correctly including fast-moving backgrounds, and some clips even have the multiple limbs glitch that is always associated with AI generated content.

The green foliage looks grainy and blotchy

Currently, Sora is only available to “red teamers” assessing the model for potential harms and risks. OpenAI did say that the company is using the same safety methods built into Dall-E 3 to ensure bad actors will not be able to create content in violation of its usage policies. So no violent, explicit, hateful, deep-fake or other similar content will be allowed by the text or image classifier.

OpenAI did not share when Sora will be available for the public – just that it is currently working with stakeholders (policymakers, educators and artists) around the world to understand their concerns and to identify positive use cases for this new technology.

[SOURCE]

OpenAI’s Sora can create realistic video clips from text prompts

Sharil Abdul Rahman

POPULAR

Proton e.MAS 7: How much does it cost to replace the tyres?

Senheng offers home solar solutions nationwide, with 0% interest plans and 10% S-Coin cashback