OpenAI's ChatGPT is evolving - it's now equipped to speak and "see."
On Monday, OpenAI announced on its official website that within the next two weeks, it will roll out voice and image capabilities for ChatGPT to its Plus and enterprise users. These features will allow users to engage in voice dialogues or present images to ChatGPT.
In terms of voice capabilities, ChatGPT can now respond to questions and commands verbally, positioning it as a direct competitor to personal assistants like Apple's Siri. Additionally, ChatGPT will offer five distinct voice options and will support features like converting voice audio to text and translating podcast voiceovers into other languages.
For its image capabilities, users can submit a picture and ask related questions, and ChatGPT will provide answers or suggestions based on the image. The voice feature will be available on both iOS and Android platforms, while the image feature will be accessible across all platforms.
Speaking Out with Five Voice Options
OpenAI has enhanced the way users interact with ChatGPT. Instead of just typing sentences into a text box, users can now verbally prompt the chatbot.
This isn't a novel feature - it's reminiscent of conversing with Google Assistant. However, OpenAI hopes that due to underlying technological advancements, the responses will be superior. Most virtual assistants are currently being revamped using large models, with OpenAI leading the charge.
OpenAI launched the ChatGPT app in May this year, which already had a voice-to-text feature. Adding voice responses aims to offer users a more human-like conversation experience. The company hopes this new feature will encourage users to use its mobile app on-the-go, directly competing with personal assistant products like Google's Assistant, Apple's Siri, and Amazon's Alexa.
OpenAI is introducing a new text-to-speech model, claiming it can "generate human-like audio through text and a few seconds of voice sample." Users can choose from five voice options for ChatGPT. The potential of this model, OpenAI believes, goes beyond just these options. For instance, OpenAI is collaborating with Spotify to translate podcasts into other languages while retaining the original voice.
Gaining "Eyes" to Understand Images
The company also shared that premium and enterprise users will soon have access to the image feature, which operates similarly to Google Lens. By simply snapping a photo of interest, ChatGPT will identify the subject and respond accordingly.
For example, users can upload a picture of pink sunglasses and ask the chatbot for outfit recommendations to match, or submit an image of a math problem seeking a solution.
Analysts point out that since the launch of ChatGPT in early 2022, OpenAI has been diligently adding more features and capabilities to its bot, while cautiously avoiding the introduction of new issues. With this update, the company is trying to strike a balance, consciously limiting what its new models can do.
However, this approach might not be sustainable in the long run. As more people use voice controls and image searches, and as ChatGPT gradually becomes a truly multimodal, practical virtual assistant, maintaining safe and reasonable boundaries will become increasingly challenging.
ChatGPT's Aspiration to be the "Super Assistant"
This upgrade undoubtedly brings ChatGPT a step closer to becoming the "super assistant," intensifying competition with downstream software.
Previous reports mentioned that OpenAI CEO Sam Altman privately told developers that the company aims to transform ChatGPT into a "super-intelligent personal work assistant." This would enable it to execute a variety of tasks based on individual and work needs, such as drafting emails or documents in the user's style or providing the latest information on relevant business topics.
Analysts highlight that both Microsoft and OpenAI can offer technical services to B2B clients looking to build AI capabilities, creating a direct business conflict between the two. In the long run, if OpenAI accelerates its software offerings for individuals and businesses, ChatGPT might reshape the consumer application ecosystem, potentially leading to an inevitable rift between the two entities.