Google claims Gemini surpasses ChatGPT in most performance tests
4 min readGemini is being introduced as an upgrade to Google’s chatbot Bard, but it’s not available in the UK or EU at the moment
Google has introduced a new artificial intelligence model named Gemini, asserting its superior performance over ChatGPT in various tests and showcasing “advanced reasoning” capabilities across diverse formats, such as evaluating and annotating a student’s physics homework. This unveiling follows the recent global AI safety summit, where tech companies committed to collaborating with governments on pre- and post-release testing of advanced systems. Google is currently engaged in talks with the newly established UK AI Safety Institute for testing the most potent version of Gemini, set to launch next year.
Google reported that Ultra surpassed “state-of-the-art” AI models, including ChatGPT’s formidable GPT-4, in 30 out of 32 benchmark tests, showcasing superior performance in reasoning and image understanding. Additionally, the Pro model outperformed GPT-3.5, which forms the foundation of the freely accessible version of ChatGPT, in six out of eight tests.
The model, available in three versions, is “multimodal,” enabling simultaneous comprehension of text, audio, images, video, and computer code. Gemini, integrated into various Google products, will initially launch in over 170 countries, excluding the UK and Europe, pending regulatory clearance for the Bard chatbot upgrade.
Demis Hassabis, CEO of DeepMind, the Google unit behind Gemini, described it as the “most complicated project” and a massive undertaking. The Pro and Nano versions of Gemini will be released simultaneously, with the Pro model accessible through Google’s Bard chatbot, and the Nano version available on mobile phones using Google’s Android system.
The most potent version, Ultra, is undergoing external testing and is slated for a public release in early 2024, integrated into Bard Advanced. Google reported that Ultra is the first AI model to surpass human experts, scoring 90% on a multitasking test (MMLU) covering 57 subjects. This model will drive AlphaCode2, a new code-writing tool, claimed to outperform 85% of competition-level human computer programmers. Ultra will undergo external “red team” testing for security and safety, with results shared with the US government, aligning with Joe Biden’s executive order from October.
When questioned about Gemini’s testing collaboration with the US or UK governments, as outlined in the AI safety summit, Hassabis mentioned ongoing discussions with the UK government for the AI Safety Institute to conduct tests. However, the Pro and Nano models won’t be part of these tests, which focus on the most advanced or “frontier” models. Sissie Hsiao, Bard’s general manager at Google, noted that the Pro-powered Bard version would not be immediately released in the UK and is also withheld from the European Economic Area, including the EU and Switzerland, due to ongoing work with local regulators. No specific regulatory issues causing the delays were disclosed by Google.
Yet, Google acknowledged that “hallucinations” or incorrect answers remain a challenge for the model. Eli Collins, the head of product at Google DeepMind, characterized it as an “unresolved research problem.”
While all versions of Gemini are multimodal in understanding various prompts, the publicly released Pro and Nano iterations this month can presently provide responses only in text or code format.
Google released promotional videos showcasing Gemini’s capabilities, featuring the Ultra model comprehending a student’s handwritten physics homework and offering detailed solutions, including displaying equations. Other videos demonstrated the Pro version of Gemini analyzing a drawing of a duck and correctly identifying the film being enacted in a smartphone video, such as a rendition of the iconic “bullet time” scene from The Matrix.
Collins mentioned that the most powerful mode of Gemini exhibited “advanced reasoning” and demonstrated “novel capabilities,” indicating its ability to perform tasks not previously showcased by AI models.
Concerns surrounding AI, a term referring to computer systems performing tasks requiring human intelligence, range from the proliferation of disinformation to the creation of “superintelligent” systems evading human control. Some experts express unease about the development of artificial general intelligence (AGI), which entails an AI performing a variety of tasks at a human or superhuman level of intelligence.
When asked about whether Gemini marked a significant stride toward AGI, Hassabis stated, “I think these multimodal foundational models are going to be a key component of AGI, whatever that final system turns out to be. But there are still things that are missing, which we’re still researching and innovating on now.
Hassabis mentioned that the data utilized to train Gemini was sourced from various outlets, including the open web. The publishing and creative sectors have voiced objections to AI companies utilizing copyrighted content accessible online for model development.