Ar-Rehman: Gemini 1.5 with greatly expanded context window

Google announces Gemini 1.5 model with dramatically enhanced performance

Following the 1.0 launch in December, Google today announced Gemini 1.5 as its next-generation model with “dramatically enhanced performance.” One of the main advancements in Gemini 1.5 is a significantly larger context window.

Gemini 1.5

Hot on the heels of releasing Gemini Ultra model last week, Google announced the launch of its newest model, Gemini 1.5. The Gemini 1.5 model will deliver dramatic improvements across a number of dimensions and Google claims that Gemini 1.5 model achieves comparable quality to 1.0 Ultra with much less compute capacity. An AI model’s “context window” is made up of tokens, which are the building blocks used for processing information. Tokens can be entire parts or subsections of words, images, videos, audio or code. The bigger a model’s context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful. In addition, the Gemini 1.5 model can also process up to 1 million tokens consistently. It is important to note that this is the longest context window supported by any large-scale foundation model yet. Gemini 1.5 Pro — Google’s middle tier — has a standard context window of 128,000 tokens (versus 32,000 tokens for Gemini 1.0). This translates to over 700,000 words, codebases with over 30,000 lines of code, 11 hours of audio, or 1 hour of video. GPT-4 Turbo is also at 128,000 and Claude 2.1 offers 200,000. “1.5 Pro can seamlessly analyse, classify and summarize large amounts of content within a given prompt. For example, when given the 402-page transcripts from Apollo 11’s mission to the moon, it can reason about conversations, events and details found across the document.”

Important Features

The model is based on a novel architecture which uses a Mixture-of-Experts (MoE) technique, and allows it to selectively activate the most relevant parts of its neural network depending on the input. These advancements are made possible by a new Mixture-of-Experts (MoE) architecture where models are “divided into smaller ‘expert’ neural networks.” This makes Gemini 1.5 more efficient to both train and serve. Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency. Gemini 1.5 Pro is a mid-size multimodal model, meaning it can handle different types of data such as text, images, videos, audio and code. It can perform at a similar level to Gemini 1.0 Ultra, the company’s largest model to date, while being more scalable and cost-effective. It also features a breakthrough experimental capability in long-context understanding, which enables it to process and reason about vast amounts of information in one go.

Multi-modal improvements

1.5 Pro can perform highly-sophisticated understanding and reasoning tasks for different modalities, including video.

Performance

When tested on a comprehensive panel of text, code, image, audio and video evaluations, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing our large language models (LLMs). And when compared to 1.0 Ultra on the same benchmarks, it performs at a broadly similar level. Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning. 1.5 Pro can perform highly-sophisticated understanding and reasoning tasks for different modalities, including video. For instance, when given a 44-minute silent Buster Keaton movie, the model can accurately analyse various plot points and events, and even reason about small details in the movie that could easily be missed.” 1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code. When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works.”

Availability

Google released the limited preview of Gemini 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI. Google will soon reveal pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens. Early testers will be able to try the 1 million token context window at no cost during the testing period. It’s described as experimental during this period.

Longer blocks of code

1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code. When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works.

Ar-Rehman

Search This Blog

Saturday, February 17, 2024

Gemini 1.5 with greatly expanded context window

No comments:

Post a Comment

Muhammad (Peace be upon him) Names

Report Abuse

Labels