Google I/O 2024: Paving the Way for a New Generation of Technology

Editor\’s Note: The following is an edited transcript of Sundar Pichai\’s I/O 2024 remarks, adapted to reflect on-stage announcements. All announcements can be found in the I/O collection .

Gemini is now integrated into the entire Google experience.

Before we talk about this topic, I wanted to reflect on the moment we\’re living in now. We\’ve spent more than a decade investing in AI, creating innovations at every layer of production, from research to product to infrastructure. Today, we\’re excited to share the fruits of our long-term efforts.

However, we recognize that we are still in the early days of the AI ​​transformation. We are confident in the significant opportunities that await content creators, developers, startups, and all users, which Gemini aims to facilitate. Now let\’s get to the heart of the matter.

Gemini Age

Last year, at I/O, we shared our plans for Gemini, a powerful model designed from the outset to process multimedia. Gemini can analyze text, images, video, code, and more, marking a major step forward in generating results and responses to a variety of input types and paving the way for a new generation of input-output processing tools.

After announcing our plans, we released Google\’s first Gemini models, our most advanced yet. Gemini models demonstrated state-of-the-art performance across multiple media benchmarks. Two months later, we also released Gemini 1.5 Pro, which achieved unprecedented results in capacity and expanded context. This model can consistently process one million tokens, exceeding the capabilities of all other large baseline models.

As we strive to make Gemini\’s capabilities more accessible to everyone, we\’ve been working to quickly roll out new features and improvements to all users. Currently, more than 1.5 million developers use Gemini models across our tools. They can be used to generate debug code, gain new insights, or even build the next generation of AI-powered applications.

We\’ve also been working to integrate Gemini\’s leading capabilities in powerful ways across our products. Today, we\’ll highlight some examples from Google Search, Google Photos, Workspace, Android, and more.

Progress made

Gemini currently benefits all of our 2 billion users.

We\’ve also introduced new experiences, including on mobile devices, allowing users to interact directly with Gemini through the app, which is now available on both Android and iOS devices, and through Gemini Advanced, which provides users with access to our most advanced models. In just three months, more than one million people have signed up to try Gemini Advanced, and it continues to attract more users.

Expanding AI-Powered Summaries in Google Search

Google Search has undergone one of its most significant and exciting transformations thanks to Gemini.

Over the past year, we\’ve answered billions of requests as part of our Generative Search experience, giving users entirely new ways to ask longer, more complex questions and requests, and even search through images and explore the best results on the web.

We\’ve also tested this experience outside of Search Labs, and we expect it to lead not only to increased Google Search usage but also to increased user satisfaction. I\’m excited to announce that we\’ll begin rolling out the revamped AI-powered summaries experience to all users in the US this week, with more countries coming soon. Google Search has a lot of innovative features, and with Gemini, we can deliver even more powerful search experiences, even within other Google products.

Learn about Ask Photos

Take, for example, Google Photos, which we launched nine years ago. Since then, it\’s been used to organize our most important memories, with over six billion photos and videos uploaded every day. Users love browsing Google Photos to relive memories from different parts of their lives, and Gemini has made this process remarkably easy and streamlined. Let\’s say you wanted to pay for a parking ticket but forgot your car\’s registration number. Previously, you could search Google Photos for keywords and then browse through photos accumulated over the years to find the registration number. Now, you can easily ask Google Photos to search for what you\’re looking for. The app will then analyze the most frequently used cars in your photos, determine which one you own, and give you the registration number.

With Ask Photos, you can delve deeper into your memories. Let\’s say you\’ve been yearning for when your children were little. You can easily ask Google Photos, \”When did my daughter learn to swim?\” Then you can follow up with a more complex request, such as, \”I want to see how my daughter is progressing with her swimming lessons.\” In this case, Gemini does more than just a simple search; it infers contexts, from days spent training in the pool to diving in the ocean, and even identifies text and dates on swimming certificates. Google Photos then organizes these results into a summary that lets you relive those fond memories. We\’ll be rolling out Ask Photos this summer and will continue to expand its capabilities.

Explore more through multimedia and expanded context.

We designed Gemini to be fully multi-media capable, allowing it to process information across multiple formats. Although it\’s a single model, it handles multiple media. It seeks not only to understand the type of information being entered, but also to search for correlations between them. Multi-media expands the range of questions that can be asked and the answers that can be obtained. Extended context takes these capabilities to a new level, providing access to additional information: hundreds of pages of text, hours of audio, an hour of video, a token repository, or approximately 96 menus from The Cheesecake Factory. To obtain that number of menus, one million context-based tokens would have to be analyzed, and this is now possible with Gemini 1.5 Pro. Developers are already leveraging these capabilities in interesting ways.

Over the past few months, we\’ve begun rolling out Gemini 1.5 Pro with the ability to preview extended context. We\’ve made a series of quality improvements related to translation, encoding, and parsing, and these changes will be reflected in the model starting today. I\’m now excited to announce that the enhanced version of Gemini 1.5 Pro will be available to all developers worldwide and is also available directly to consumers in Gemini Advanced, with capacity for up to 1 million tokens in 35 languages.

Expanding to 2 million tokens in private preview.

We know we\’ve reached unprecedented potential with 1 million tokens, but I believe we can go even further. We\’re increasing the capacity to 2 million tokens today and making this update available to developers in private preview. I\’m proud to look back and see the progress we\’ve made in just a few months. I know these efforts are the next step toward our ultimate goal of providing unlimited, context-based capacity.

Gemini 1.5 Pro is available in Google Workspace

So far, we\’ve discussed two technical development processes: multimedia and extended context. Both have significant capabilities, but when combined, they unlock deeper possibilities and smarter experiences. We know that users often search Gmail for emails. We\’re working to make this process more efficient with Gemini. For example, if parents want to stay informed about everything going on at their child\’s school, Gemini can help.

You can now ask Gemini to summarize all incoming emails from the school. In the background, it identifies relevant messages and analyzes attachments, such as PDFs. It then provides a summary of key points and actions to take. Let\’s say you traveled and missed your parent meeting, but the hour-long Google Meet recording was uploaded. You can ask Gemini to provide you with a summary of the key points. Let\’s also say a group of parents sent a message saying they\’re looking for volunteers, and you find yourself with some free time. Of course, the Gemini model can formulate the appropriate response to volunteering. There are many more examples of how Gemini can facilitate your tasks. Gemini 1.5 Pro is available today in Workspace Labs.

Expanding horizons with AI agents

We see AI agents as an opportunity to achieve more. Agents are intelligent systems capable of analyzing, reasoning, planning, remembering, and \”thinking\” ahead, working across multiple programs and systems, all to help you complete your tasks according to your instructions and under your supervision. We\’re still in the early stages of development, but I can explain the cases we\’re working hard to solve. Let\’s start with the context of shopping. Buying new shoes is fun, but returning them if they don\’t fit is tedious.

Let\’s imagine that Gemini can do all the steps for you:

Search your inbox for the receipt.

Determine the order number from the email

Fill out the return form
and schedule an appointment to pick up the order through UPS.

So the process becomes much easier, right?

Let\’s now consider a more complex example.

Let\’s say you\’ve recently moved to Chicago. Gemini and Chrome can leverage this interaction to perform a number of actions to help you settle in. With its organization, analysis, and reasoning capabilities, you can explore the city and find nearby services, such as laundry and dog sitting. You\’ll also need to change your address across various websites and accounts. Gemini can perform these tasks on your behalf and request more information as needed, giving you full control over the process. We want you to know that your role is important to us. When we prototype these experiences, we carefully consider ways to keep the process confidential, secure, and convenient for everyone. These examples may be simple, but they clearly illustrate the types of situations we seek to solve with intelligent systems capable of proactive reasoning, analysis, reasoning, and planning on your behalf.

Gemini\’s role in achieving our mission

With its multimodality, extended context, and agents, Gemini contributes to our ultimate goal of making AI useful for everyone. This goal, in turn, helps us advance our mission: to organize the world\’s information across all types of inputs, make it accessible across all types of outputs, and connect it to each user\’s context in ways that deliver meaningful value.

Breaking New Ground

To reach the full potential of AI, we know new paths must be forged. The Google DeepMind team has already begun working in this direction. The Pro 1.5 model has been very popular, particularly thanks to its powerful context-based comprehension capabilities. However, we learned from developers that they wanted a faster and more cost-effective model. In response to their feedback, tomorrow we will introduce Gemini 1.5 Flash, a less complex model designed for situations where speed and cost are paramount. Flash 1.5 will be available in AI Studio and Vertex AI on Tuesday. Looking ahead, we have always wanted to design a universal agent that is useful in everyday life. We designed Project Astra and developed its ability to understand multiple media and conduct real-time conversations.

Unprecedented improvements in Google Search

A significant portion of our investment and innovation goes into one of our oldest products: Google Search. We designed Google Search 25 years ago to help users understand and analyze the vast amounts of information circulating online. With every change and update to the platform, we\’ve made advances that help provide better answers to questions. On mobile, we\’ve opened up new types of questions and answers based on precise context, location, and real-time information. With advances in natural language understanding and computer vision, we\’ve introduced new ways to search—for example, by making a voice request, humming a tune to find a song, or using a photo to search for something in your surroundings. More recently, it\’s also possible to use the Search Circle to point to what you\’re looking for. Gemini will continue to elevate Google Search, combining the power of our infrastructure, the latest advances in artificial intelligence, our high standards for information quality, and our long history of bringing rich web content closer to users. These efforts and developments will culminate in a comprehensive product that performs tasks on behalf of users. Google Search uses generative AI to suit human curiosity. We have reached the most exciting developments in Google Search yet. Liz Reed provides more information about Google Search in the era of Gemini .

Gemini is available on Android devices.

With billions of Android users worldwide, we\’re excited to offer a more integrated experience with Gemini across the operating system. As the new AI-powered assistant, Gemini can help you anytime, anywhere. We\’ve integrated Gemini modules into Android devices, including the latest on-device model, Gemini Nano, which is a multimedia-enabled device capable of processing text, voice, and speech, delivering new experiences while keeping your information private. You can find all the latest Android news here .

A responsible approach to artificial intelligence

We continue to pursue AI opportunities with boldness and excitement, but always with a sense of responsibility. We\’re developing a new AI-powered adversarial (red team) testing technique that builds on Google DeepMind\’s best gaming achievements, such as AlphaGp. We\’ve also expanded our watermarking innovations, like SynthID, to include text and video, making AI-generated content easier to find. James Manyika shares more .

Collaborative work to build the future

All of this demonstrates the significant progress we are making by taking a bold and responsible approach to making AI work for everyone.

Before we wrap up, let\’s try to count the number of times we\’ve mentioned the term \”artificial intelligence\” today. That number will surely increase before I finish.

I mention this not just for a laugh, but because it reflects a deeper understanding. We\’ve prioritized artificial intelligence in our approach for years. Thanks to decades of pioneering research, we\’ve achieved cutting-edge breakthroughs that advance AI for us and our industry. In addition, we have:

  • World-leading infrastructure designed for the age of AI
  • Recent innovations in Google Search, now powered by Gemini
  • Widely used products, including 15 products with a combined user base of half a billion people
  • Platforms that enable everyone – partners, customers, content creators, and all users – to contribute to building the future.

This progress would not have been possible without the amazing Google Developer Community. You make ideas a reality through the experiences and apps you design every day. So, to everyone here and to all the viewers around the world, I want to say, \’Tribute to the possibilities before us, and to our collective work to build a better future.\’

Leave a Reply

Your email address will not be published. Required fields are marked *