Meta’s ‘Make-A-Video’ Lets Anyone Create Videos Using Nothing But Text

LOSING? — Zuckerberg said Meta is willing to invest in technical experts in order to compete with the other big tech companies. (Photo: https://www.reutersconnect.com/all?id=tag%3Areuters.com%2C2021%3Anewsml_RC2ZGQ9YAIHK%3A1111249561&search=all%3AZuckerberg)

Meta has unveiled new technology allowing people to create videos using nothing but text.

Facebook creator Meta has released Make-A-Video, a state-of-the-art artificial intelligence system capable of generating videos from user-inputted text.

Ars Technica noted that the technology is built up on existing technologies such as that of OpenAI's Dall-E, and comes months after Meta announced an earlier text-to-image model it called "Make-A-Scene."

Unlike other text-to-image generation synthesizers which use labeled data to create images, Make-A-Video combined unlabeled video training data with still images that have captions.

Using these, the text-to-image generator was able to learn what images need to be shown, where to show them, and when. It then briefly displays images in motion accordingly.

As per Engadget, this technology simply pulls content in a different format. In the developers' own words, as read in a white paper,

"Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage."

"Nightmarish"

While the AI behind the technology really works, it's worth noting that the images that are generated as a result of the text being processed could look realistic, or otherwise "nightmarish" and "both dreamlike and terrible," TechCrunch said.

Meta's sample generated images truly show just how capable Make-A-Video is in terms of generating videos from plain text. The images show it is capable of generating surreal, realistic and stylized videos.

One sample video generated from a text that reads "a teddy bear painting a portrait" looks just like that: a teddy bear holding a paintbrush painting a self-portrait. The details are surprisingly accurate, including the stuffed toy's "fur" and the texture of the painting.

Another sample video, "A young couple walking in a heavy rain," shows a couple sharing an umbrella while walking on the street under the rain. Their faces cannot be seen as their backs are turned to the camera-or at least that's how the AI generated the video.

There are some weird-looking details in some sample videos, however. Some could elicit chuckles, while some could make it hard for the fainthearted to sleep.

A "cat watching TV with a remote in hand," for example, shows a striped cat seriously looking for shows on the TV while holding the remote. The paw holding the remote, however, interestingly look like a child's hand.

Make-A-Video is also able to able to generate a variety of videos using just a single photo, and more.

Meta released Make-A-Video as an open source project in order to elicit feedback from the community. The people behind the text-to-video synthesizer also worked on removing certain datasets to ensure safe use.

Meta’s ‘Make-A-Video’ Lets Anyone Create Videos Using Nothing But Text

More From BusinessTImes

The best of BusinessTimes news delivered right into your email box absolutely free.