Gemini 2.5 Flash
Gemini 2.5 Flash is a cutting-edge reasoning model now available in preview through the Gemini API via Google AI Studio and Vertex AI. Building on the foundation of its predecessor, Gemini 2.0 Flash, this version introduces significant advancements in reasoning capabilities while maintaining a focus on speed and cost efficiency. It is the first fully hybrid reasoning model, allowing developers to toggle "thinking" on or off and set customizable thinking budgets to balance quality, cost, and latency. Even with thinking disabled, Gemini 2.5 Flash surpasses the performance of 2.0 Flash while preserving its rapid response times.
The model’s innovative "thinking" process enables it to reason through its thoughts before generating responses, making it particularly adept at handling complex tasks that require multi-step reasoning, such as solving advanced math problems or analyzing intricate research questions. This process leads to more accurate and comprehensive outputs, as evidenced by its strong performance on Hard Prompts in LMArena, second only to Gemini 2.5 Pro. Developers can fine-tune the reasoning phase by setting a thinking budget, which dictates the maximum number of tokens the model can generate during the thinking process. The budget can range from 0 to 24,576 tokens, allowing for flexibility based on task complexity and resource constraints.
Gemini 2.5 Flash provides a cost-efficient solution with comparable metrics to other leading models at a fraction of the size and price, making it an optimal choice for developers seeking high-quality performance without excessive costs. The model is trained to automatically adjust its reasoning duration based on the complexity of the prompt, streamlining the decision-making process. For simpler tasks, developers can set the thinking budget to zero to achieve minimal latency while still benefiting from the enhanced capabilities of 2.5 Flash.
Available in preview, Gemini 2.5 Flash can be accessed through Google AI Studio, Vertex AI, and the Gemini app, with tools like the thinking_budget parameter enabling developers to experiment with controllable reasoning. This model represents a significant step forward in AI reasoning technology, offering unparalleled flexibility and performance for solving a wide range of challenges.