Google is gearing up to unveil its magnum opus, Gemini, which is poised to be a formidable competitor to GPT-4.

On September 14, sources familiar with the matter revealed that Google has provided an early version of Gemini to a select group of companies. This move suggests that Google is contemplating integrating it into consumer services. Additionally, the tech giant plans to offer Gemini to businesses through its cloud computing services, indicating that its official release is imminent.

These insiders also shared that Google intends to release various sizes of the Gemini model. This would allow developers to purchase a streamlined version for simpler tasks and a compact version suitable for personal devices.

In a strategic move to rival OpenAI and expedite Gemini's development, Google's CEO, Sundar Pichai, took a pivotal step in April. He merged two teams with distinct cultures and coding practices: Google Brain and DeepMind. The newly formed team is now led by Demis Hassabis, the original founder of DeepMind.

Hassabis expressed great confidence in the combined team, emphasizing that it brings together two pivotal forces essential for recent AI advancements. Google co-founder Sergey Brin has also re-entered the AI arena, actively participating in Gemini's training.

Over the subsequent months, the veil of mystery surrounding Gemini has gradually lifted, revealing its capabilities and potential.

Gemini's Multimodal Capabilities

The next leap for language models might involve executing a broader range of tasks on computers. One of Gemini's standout features is its multimodal capabilities. Unlike ChatGPT, which is solely a text model, Gemini can understand and generate both text and code, as well as interpret and produce images.

Furthermore, a crucial step in developing a language model with capabilities akin to ChatGPT involves using human feedback to enhance its performance. DeepMind's extensive experience in reinforcement learning could endow Gemini with novel abilities.

At Google's Developer I/O conference in May, the company mentioned that from its inception, Gemini's goal was to be a multimodal, efficiently integrated tool and API. Google teased its audience, stating that even in its early stages, Gemini displayed multimodal capabilities that were impressively unprecedented.

Gemini Meets AlphaGo

DeepMind's CEO, Hassabis, revealed that the new Gemini model would incorporate elements from both AlphaGo and large language models like GPT-4. This integration is expected to significantly enhance the system's problem-solving and planning capabilities.

Some AI experts believe that the primary limitation of language models is their indirect learning through text. AlphaGo's strengths could address this limitation. In 2016, DeepMind's AI system, AlphaGo, made history by defeating world Go champion Lee Sedol with a score of 4-1.

AlphaGo, built on DeepMind's pioneering reinforcement learning technology, learned to tackle complex decision-making tasks by repeatedly attempting them and receiving feedback on its performance. Additionally, AlphaGo utilized the Monte Carlo tree search method to explore and remember potential moves on the Go board.

Versatility in Size and Function

Google highlighted that Gemini is currently undergoing training. Once fine-tuned, it will be available in "various sizes and functions," similar to PaLM 2. Google envisions deploying it across different products to benefit a broad user base.

Beyond its applications in enterprise services, Gemini holds immense potential in medical use cases. Google has been testing an AI tool named Med-PaLM 2, which could be enhanced by Gemini's capabilities. This model could be employed in medical chatbots or robotic technologies to assist surgeries and medical procedures.

Moreover, insights from Google's development of DeepMind's Gato, a "universal" system, and the recently launched RT-2, a robotic Transformer model, could be integrated into Gemini. The collaboration between Google Brain and DeepMind presents a significant challenge to OpenAI and other competitors in the AI domain.

Integrating Gemini Across Google Applications

In a September interview, Pichai shared insights about integrating Gemini into Google products. He mentioned that conversational AIs like Bard are "not the end state" but rather a stepping stone towards more advanced chatbots.

Pichai envisioned the final fusion of Gemini and Bard as an "astonishing universal personal assistant" that would seamlessly integrate into various aspects of daily life, including travel, work, and entertainment. He reiterated that Gemini, with its ability to combine text and images, would make current AI chatbots seem "insignificant" in a few years.

Compared to existing models, Gemini aims to enhance the code-generating capabilities of software developers. Google hopes to leverage it to surpass Microsoft's GitHub Copilot code assistant.

Google Cloud's Pursuit of Microsoft Cloud with TOB Sales

Google aims to leverage Gemini to attract more users to its products, especially its cloud computing business. The company plans to offer the Gemini model to businesses through its Vertex AI service on Google Cloud. By releasing versions with different parameters, Google indirectly promotes its cloud services.

In May, Google announced that it would offer a set of Palm 2's LLM to Google Cloud customers through Vertex AI. Recently, Google also provided customers with a one-month free trial of its large model through the coding platform startup, Replit.