Beyond Bard: Google Launches Gemini, a Multimodal AI to Challenge ChatGPT

The post Beyond Bard: Google Launches Gemini, a Multimodal AI to Challenge ChatGPT appeared on BitcoinEthereumNews.com.

Google stunned the tech world on Wednesday with the debut of Gemini, its consumer- and business-facing suite of multimodal artificial intelligence tools. Among the tech giants pushing aggressively into AI, search titan Google seemed to be swimming in the middle space, as Microsoft-backed OpenAI pushed ChatGPT to Turbo and Vision and Anthropic upgraded Claude. As of today, Google bolts out of the gate with three versions of Gemini—Nano, Pro, and Ultra—which seamlessly understand and integrate text, images, audio and video. Gemini appears poised to outperform top-of-the line AI models from OpenAI, which just released a laundry list of new capabilities but soon after got buried in corporate intrigue. The most advanced version, Gemini Ultra, delivered strong results across several popular benchmarks, matching or exceeding human performance in some cases. For example, it set new records on 30 out of 32 benchmarks in the MMLU exam, which spans a variety of academic subjects. A key feature of Gemini is its “natively multimodal” training, allowing it to process multiple data types like text, images, and audio as inputs and outputs. This approach means that the model was built and trained from scratch to understand different inputs, rather than the result of bringing discrete modes and modules together later. The most popular multimodal AIs of today follow the latter roadmap. For example, ChatGPT combines GPT-4 Turbo with Dall-E 3 to process text to generate images, GPT-4 Vision to process images, and a special coding module for calculations. As a result, the LLM is relegated to the role of coordinator between different AI models that cannot independently understand the full nature of a specific problem. This limitation can also lead to vulnerabilities like prompt injection. For example, techniques to circumvent safety controls in place for text prompts by writing or printing it on a…

Crypto Pulpit

Crypto Pulpit

Beyond Bard: Google Launches Gemini, a Multimodal AI to Challenge ChatGPT

Leave a Reply Cancel reply

Recent Posts

Recent Comments