Meta has unveiled new technology allowing people to create videos using nothing but text.
Facebook creator Meta has released Make-A-Video, a state-of-the-art artificial intelligence system capable of generating videos from user-inputted text.
Ars Technica noted that the technology is built up on existing technologies such as that of OpenAI's Dall-E, and comes months after Meta announced an earlier text-to-image model it called "Make-A-Scene."
Unlike other text-to-image generation synthesizers which use labeled data to create images, Make-A-Video combined unlabeled video training data with still images that have captions.
Using these, the text-to-image generator was able to learn what images need to be shown, where to show them, and when. It then briefly displays images in motion accordingly.
"Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage."
While the AI behind the technology really works, it's worth noting that the images that are generated as a result of the text being processed could look realistic, or otherwise "nightmarish" and "both dreamlike and terrible," TechCrunch said.
Meta's sample generated images truly show just how capable Make-A-Video is in terms of generating videos from plain text. The images show it is capable of generating surreal, realistic and stylized videos.
One sample video generated from a text that reads "a teddy bear painting a portrait" looks just like that: a teddy bear holding a paintbrush painting a self-portrait. The details are surprisingly accurate, including the stuffed toy's "fur" and the texture of the painting.
Another sample video, "A young couple walking in a heavy rain," shows a couple sharing an umbrella while walking on the street under the rain. Their faces cannot be seen as their backs are turned to the camera-or at least that's how the AI generated the video.
There are some weird-looking details in some sample videos, however. Some could elicit chuckles, while some could make it hard for the fainthearted to sleep.
A "cat watching TV with a remote in hand," for example, shows a striped cat seriously looking for shows on the TV while holding the remote. The paw holding the remote, however, interestingly look like a child's hand.
Make-A-Video is also able to able to generate a variety of videos using just a single photo, and more.
Meta released Make-A-Video as an open source project in order to elicit feedback from the community. The people behind the text-to-video synthesizer also worked on removing certain datasets to ensure safe use.