ChatGPT now offers voice chats, here's how to do it

When OpenAI released GPT-4 last March, one of its main advantages was its multimodal capability, which would allow ChatGPT to accept image inputs. However, the multimodal capacity was not ready for deployment – until today.

On Monday, OpenAI announced that ChatGPT can now “see, hear and talk,” hinting at the chatbot’s new capabilities to receive both images and voice data and respond to voice conversations.

The image input function can be useful to get help on things that can be seen, such as solving a math problem on a sheet, identifying the name of a plant or looking at the items in its pantry and asking for recipes based on it.

Take a photo and add the question

In all these cases, the user just needs to take a photo of what he is looking at and add the question to which he wants to get an answer. OpenAI indicates that the ability to understand images is powered by GPT-3.5 and GPT-4.

The voice input and output function gives ChatGPT the same functionality as a voice assistant. To request a task from ChatGPT, users only have to use their voice and, once the request has been processed, he will answer you verbally.

In the demonstration shared by OpenAI, a user verbally asks ChatGPT to tell a story about a hedgehog at bedtime. ChatGPT responds by telling a story, as voice assistants such as Amazon’s Alexa do.

The race for AI assistants is on

The race for AI assistants is on, since last week, Amazon announced that Alexa would be equipped with a new LLM that would give it capabilities similar to those of ChatGPT, and by making it a hands-free AI assistant. The voice integration of ChatGPT in its platform makes it possible to obtain the same result.

To support the voice function, OpenAI uses Whisper, its voice recognition system, to transcribe the words spoken by a user into text, as well as a new speech synthesis model capable of generating a human-like sound from a text, with only a few seconds of speech.

To create the five ChatGPT voices that users can choose, the company collaborated with professional actors.

Only for ChatGPT Plus and Enterprise

Voice and image functions will be available only for ChatGPT Plus and Enterprise over the next two weeks. However, OpenAI claims that it will expand access to this feature to other users, such as developers, soon after.

If you are a Plus or Enterprise user, to access the image input function, all you have to do is press the photo button in the chat interface and upload an image. To access the voice function, go to Settings & New Features and opt for voice conversations.

Bing Chat, which is supported by GPT-4, supports image and voice inputs and is completely free. If you want to test these features but don’t have access to them yet, Bing Chat is a good alternative.

Source: “ZDNet.com “