Search This Blog

Tuesday, May 14, 2024

Google replied GPT-4o with Gemini demo

 Google replied GPT-4o with Gemini demo which is conversational and uses video

GPT-4o is a new multimodal model that will power ChatGPT Free and Plus. OpenAI’s big ChatGPT event is over, and we can safely say the company severely downplayed it when it said on Twitter that it would “demo some ChatGPT and GPT-4 updates.” Sam Altman’s teaser that it would be new stuff “we think people will love,” and the detail that it “feels like magic to me” best describe what OpenAI managed to pull off with the GPT-4o update for ChatGPT. Google is looking to snag some of that AI spotlight from OpenAI. After OpenAI impressed us with its GPT-4o model, Google followed up with a conversation Gemini demo which also made the future of AI look bright. 

Google released a new video that shows off a prototype of Gemini on a Pixel using live video and spoken prompts to provide information. The video was apparently filmed while setting up for Google I/O, which starts on 14 May 2024.  In the demo, they show the Pixel phone running Gemini AI the stage being put together for Google I/O and ask it what it thinks is happening. Correctly, it says it looks like a stage for a large event being set up. When letters are shown on a screen, Gemini says they're for Google I/O and offers a brief description of Google's upcoming event. 

As rumored, GPT-4o is a faster multimodal update which will handle voice, images, and live video. It’ll also let you interrupt it while you’re talking, and it can detect the tone of the user’s voice. The key detail in OpenAI’s tweet was correct however. This was going to be a live demo of ChatGPT’s new powers. And that’s really the big detail here. GPT-4o appears to be able to do what Google had to do with Gemini in early December when it tried to show off similar Gemini features. Much like OpenAI's latest ChatGPT demonstration, Google's Gemini video is impressive because of how natural the conversation feels. It's easy to forget that there isn't a person behind the voice coming out of the Pixel device. It moves naturally like it would if two friends were sitting down and talking. 

Google staged the early Gemini demos to make it seem that Gemini could listen to human voices in real time while also analysing the contents of pictures or live video. It was a mind-blowing tech that Google was proposing. However, in the coming days, we learned that Gemini could not do any of that. The demos were sped up for the sake of presenting the results, and prompts were typed rather than spoken. Without testing the two AI models, it's hard to say which one works better, but they're both nothing short of impressive. You have to be excited (and perhaps a little nervous) about the future of AI based on these two demos. And according to OpenAI, it will only get much better from here.

Gemini was successful at delivering the expected results. But the demo that Google ultimately showed us was fake. Considering one of the main issues with generative AI products is the risk of obtaining incorrect answers or hallucinations. As exciting as Google's demonstration is, the company came under fire in the past for making an AI demo look much more impressive than it actually was. We'll have to go hands-on with these Gemini changes before we know for sure how much better it is, but if the video is an accurate representation, the AI battle really is just getting started.

In mid-May, and OpenAI has the technology ready to offer the kind of interaction with AI. We just saw it demonstrated live on stage. ChatGPT, powered by the new GPT-4o model, was able to interact with various speakers simultaneously and adapt to their voice prompts in real time. GPT-4o was able to look at images and live video to offer answers to questions based on what it had just seen. It helped with math problems and coding. It then translated a conversation between two people speaking different languages in real time. These features were probably rehearsed and optimized over and over before the event. But OpenAI also took prompts from X for GPT-4o to try during the event.

There might be issues with GPT-4o once it rolls out to users. Nothing is perfect. It might have problems handling voice, picture and video requests. It might not be as fast as in the live demos from OpenAI’s event. But things will get better. The point is that OpenAI feels confident in the technology to demo it live. Gemini 1.5 (or later versions) will manage to match GPT-4o. And Google’s I/O event might even feature demos similar to OpenAI’s. Google's keynote is scheduled to kick off Google I/O, so we'll learn more about its Gemini plans and plenty of other projects from the company soon. However, it shows a big difference between the companies here. OpenAI went forward with this live demo when it had the technology ready. Google, meanwhile, had to fake a presentation to make Gemini seem more powerful than it was.




No comments:

Post a Comment

Muhammad (Peace be upon him) Names