Tag:

Gemini

Google Doubles Down on AI: Veo 3, Imagen 4 and Gemini Diffusion Push Creative Boundaries

by admin May 21, 2025

Google I/O 2025 was never about subtlety. This year, the company abandoned incrementalism, delivering a cascade of generative AI upgrades that aim to redraw the map for search, video, and digital creativity.

The linchpin: Gemini, Google’s next-gen model family, is now powering everything from search results to video synthesis and high-resolution image creation—staking out new territory in a race increasingly defined by how fast, and how natively, AI can generate.

The showstopper is Veo 3, Google’s first AI video generator that creates not just visuals, but complete soundtracks—ambient noise, effects, even dialogue—synchronized directly with the footage. Text and image prompts go in, and fully-produced 4K video comes out.

This marks the first large-scale video model capable of generating audio and visuals simultaneously—a trend that began with Showrunner Alpha, an unreleased model, but Veo3 offers far more versatility, generating various styles beyond simple 2D cartoon animations.

“We’re entering a new era of creation with combined audio and video generation,” Google Labs VP Josh Woodward said during the launch. It’s a direct challenge to current video generation leaders—Kling, Hunyuan, Luma, Wan, and OpenAI’s Sora—positioning Veo as an all-in-one solution rather than requiring multiple tools.

Alongside Veo3, Imagen 4—Google’s latest iteration of its image generator model—arrives with enhanced photorealism, 2K resolution, and perhaps most importantly, text rendering that actually works for signage, products, and digital mockups.

For anyone who’s suffered through the gibberish text created by previous AI image models, Imagen 4 represents a significant improvement.

These tools don’t exist in isolation. Flow AI, a new subscription feature for professional users, combines Veo, Imagen, and Gemini’s language capabilities into a unified filmmaking and scene-editing environment. But this integration comes at a price—$125 per month to access the complete toolkit as part of a promotional period until the full $250 price starts to be charged.

Image: Google

Gemini: Powering search and “text diffusion”

Generative AI isn’t just for content creators. Gemini 2.5 now forms the backbone of the company’s redesigned search engine, which Google wants to evolve from a link aggregator into a dynamic, conversational interface that handles complex queries and delivers synthesized, multi-source answers.

AI overviews—where Google Gemini attempts to provide comprehensive answers to queries without requiring users to click through to other sites—now sit at the top of search pages, with Google reporting over 1.5 billion monthly users.

Image: Google via Youtube

Another interesting development is “Gemini Diffusion,” built with technology pioneered by Inception Labs months ago. Until recently, the AI community generally agreed that autoregressive technology worked best for text generation while diffusion technology excelled for images.

Autoregressive models generate each new token after reading all previous generations to determine the best next token—ideal for crafting coherent text responses by constantly reviewing the prompt and prior output.

Diffusion technology operates differently, starting with filling all the context with random information and refining (diffusing) the output each step to make the final product match the prompt—perfect for images with fixed canvases and aesthetics.

OpenAI first successfully applied autoregressive generation to image models, and now Google has become the first major company to apply diffusion generation to text. This means the model begins with nonsense and refines the entire output with each iteration, producing thousands of tokens per second while maintaining accuracy—for context, Groq (not xAI’s Grok), which is one of the fastest inference providers in the world, generates near 275 tokens per second, and traditional providers like OpenAI or Anthropic cannot come close to those speeds.

The model, however, isn’t publicly available yet—interested users must join a waiting list—but early adopters have shared impressive results showing the model’s speed and precision.

Hands-on with Google’s AI tools

We got our hands on several of Google’s new AI features, with mixed results depending on the tier.

Deep Research is particularly powerful—even beating ChatGPT’s alternative. This comprehensive research agent evaluates hundreds of sources and delivers reliable information with minimal errors.

What gives it an edge over OpenAI’s research agent is the ability to generate infographics. After producing a complete research text, it can condense that information into visually appealing slides. We fed the model everything about Google’s latest announcement, and it presented accurate information through charts, schemes, graphs, and mind maps.

Veo 3 remains exclusive to Gemini Ultra users, though some third-party providers like Freepik and Fal.ai already offer access via API. Flow isn’t available to try unless you spring for the Ultra plan.

Flow proves to be an intuitive video editor with Veo’s models at its core, allowing users to edit, cut, extend, and modify AI scenes using simple text prompts.

However, even Veo2 got a little love, which is making life easier for Pro users. Generations with the now-accessible Veo2 are significantly faster—we created 8 seconds of video in about 30 seconds. While Veo2 lacks sound and currently only supports text-to-video (with image-to-video coming soon), it understood our prompts and even generated coherent text.

Veo2 already performs comparably to Kling 2.0—widely considered the quality benchmark in the generative video industry. The new generations with Veo3 seem to be even more realistic, coherent, with good background sound and lifelike dialogue and voices.

For Imagen, it’s difficult to determine at first glance whether Google incorporates version 4 or still uses version 3 on its Gemini chatbot interface, though users can confirm this through Whisk. Our initial tests suggest Imagen 4 prioritizes realism unless specified otherwise, with better prompt adherence and visuals that surpass its predecessor.

We generated an image with different elements that don’t usually fit together in the same scene. Our prompt was “Photo of a woman with a skin made of glass, surrounded by thousands of glitter and ethereal pieces in a baroque room with the word ‘Decrypt’ written in neon, realistic.”

Even though both Imagen 3 and Imagen 4 understood the concept and the elements, Imagen 3 failed to capture the realistic style—which Imagen 4 easily did. Overall, Imagen 4 is comparable to the SOTA image generators, especially considering how easy it is to prompt.

Audio overviews have also improved, with models now easily providing over 20 minutes of full debates on Gemini instead of forcing users to switch to NotebookLM. This makes Gemini a more complete interface, reducing the fragmentation that previously required users to jump between different sites for various services.

The quality is comparable to that of NotebookLM, with slightly longer outputs on average. However, the key feature is not that the model is better, but that it is now embedded into Gemini’s chatbot UI.

Premium AI at a premium price

Google didn’t hide its monetization strategy. The company’s “Ultra” plan costs $250 monthly, bundling priority access to the most powerful models, Flow AI tools, and 30 terabytes of storage—clearly targeting filmmakers, serious creators, and businesses. The $20 “AI Pro” tier unlocks Google’s previous Veo2 model, along with image and productivity features for a broader user base. Basic generative tools—like simple Gemini Live and image creation—remain free, but with limitations like a token cap and only 10 researches per month.

This tiered approach mirrors the broader AI market trend: drive mass adoption with freebies, and then lock in the professionals with features too useful to pass up. Google’s bet is that the real action (and margin) is in high-end creative work and automated enterprise workflows—not just casual prompts and meme generation.

Edited by Andrew Hayward

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

May 21, 2025 0 comments

Crypto Trends

Google Unveils Android XR Glasses with Gemini AI Integration

by admin May 21, 2025

In brief

At Google I/O, Google demos Android XR glasses with Gemini AI for translation, navigation, media, and real-time help.
The glasses are in beta testing and Google plans to release them with eyewear brands Gentle Monster, and Warby Parker.
Google is positioning Android XR to rival Meta’s AI glasses.

Google unveiled Android XR, a new extended reality platform designed to integrate its Gemini AI into wearable devices such as smart glasses and headsets.

During its 2025 I/O developer conference on Tuesday, the tech giant showcased the Android XR glasses, the company’s first eyewear set since the ill-fated Google Glass smart glasses in 2023.

During the presentation, Shahram Izadi, Vice President and General Manager at Android XR, highlighted the need for portability and quick access to information without relying on a phone.

“When you’re on the go, you’ll want lightweight glasses that can give you timely information without reaching for your phone,” he said. “We built Android XR together as one team with Samsung and optimized it for Snapdragon with Qualcomm.”

Google first announced Android XR in December 2024. The reveal arrived eight months after Meta released the latest version of its Ray-Ban Meta AI glasses—a sign of growing competition in the wearable AI space.

Glasses with Android XR are lightweight and designed for all-day wear. They work with your phone so you can be hands-free, stay in the moment with friends and complete your to-do list. pic.twitter.com/CLXGxeQPzs

— Google (@Google) May 20, 2025

Like Meta’s AI glasses, the Android XR glasses include a camera, microphones, and speakers and can connect to an Android device.

Google’s flagship AI, Gemini, provides real-time information, language translation, and an optional in-lens display that shows information when needed.

During the presentation, Google also showed off the Android XR glasses live streaming capabilities, as well as their ability to take photos, receive text messages, and display Google Maps.

Google also demonstrated how Gemini can complement exploration and navigation through immersive experiences.

“With Google Maps in XR, you can teleport anywhere in the world simply by asking Gemini to take you there,” Izadi said. “You can talk with your AI assistant about anything you see and have it pull up videos and websites about what you’re exploring.”

While Google did not announce a release date or price, Izadi said the glasses would be available through partnerships with South Korean eyewear brand Gentle Monster and U.S. brand Warby Parker, adding that a developer platform for Android XR is in development.

“We’re creating the software and reference hardware platform to enable the ecosystem to build great glasses alongside us,” Parker said. “Our glasses prototypes are already being used by trusted testers, and you’ll be able to start developing for glasses later this year.”

Edited by Sebastian Sinclair

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

May 21, 2025 0 comments

Product Reviews

Google IO 2025 live keynote: all the latest on Gemini AI, Android 16 and more

by admin May 20, 2025

Welcome to our Google IO 2025 live blog, where we’re bringing you all the latest from the search giant’s opening keynote at the Shoreline Amphitheater in Mountain View, California.

Google is expected to speak about a whole host of products and services, with Gemini AI likely to be a major focus, with appearances from Android 16, WearOS 6 and Android XR all tipped to happen.

Google IO 2025 keynote live blog

LiveLast updated May 20, 2025 10:32 AM

The liveblog has ended.

No liveblog updates yet.

Source link

May 20, 2025 0 comments

Product Reviews

Live updates on Gemini, Android XR, Android 16 updates and more

by admin May 20, 2025

Ready to see Google’s next big slate of AI announcements? That’s precisely what we expect to be unveiled today at Google I/O 2025, the search giant’s developer conference that kicks off today at 1PM ET / 10AM PT. Engadget will be covering it in real-time right here, via a liveblog and on-the-ground reporting from our very own Karissa Bell.

Ahead of I/O, Google already gave us some substantive details on the updated look and feel of its mobile operating system at The Android Show last week. Google included some Gemini news there as well: Its AI platform is coming to Wear OS, Android Auto and Google TV, too. But with that Android news out of the way, Google can use today’s keynote to stay laser-focused on sharing its advances on the artificial intelligence front. Expect news about how Google is using AI in search to be featured prominently, along with some other surprises, like the possible debut of an AI-powered Pinterest alternative.

The company made it clear during its Android showcase that Android XR, its mixed reality platform, will also be featured during I/O. That could include the mixed reality headset Google and Samsung are collaborating on, or, as teased at the end of The Android Show, smart glasses with Google’s Project Astra built-in.

As usual, there will be a developer-centric keynote following the main presentation (4:30PM ET / 1:30PM PT), and while we’ll be paying attention to make sure we don’t miss out any news there, our liveblog will predominantly focus on the headliner.

You can watch Google’s keynote in the embedded livestream above or on the company’s YouTube channel, and follow our liveblog embedded below starting at 1PM ET today. Note that the company plans to hold breakout sessions through May 21 on a variety of different topics relevant to developers.

Live4 updates

Tue, May 20, 2025 at 7:25 AM PDT
Glad to see Karissa made it. Traffic on I/O day is always dicey. But I’d recognize that dusty parking lot anywhere.
Tue, May 20, 2025 at 7:23 AM PDT
A woman holding up a badge that says “Karissa Bell, Engadget” with a red label above the name saying “Press.” Behind her is a tent in a large parking lot. (Karissa Bell for Engadget)
Karissa has not only arrived safely at Shoreline Amphitheater, but has also acquired her badge! Looks like it’s going to be lovely weather for the show, and probably a good idea to lather on sunscreen if you’re there!
Tue, May 20, 2025 at 7:22 AM PDT
Our senior reporter Karissa Bell will be reporting live from Google I/O, while senior reviewer Sam Rutherford will be leading this liveblog, backed up by AI reporter Igor Bonifacic. I’ll be around for support, logistics, vibes and snacks. The show kicks off at 1pm ET, but as you can see, we couldn’t wait to start. There’s been a lot, honestly.
Tue, May 20, 2025 at 7:00 AM PDT
Hello everyone! Welcome to our liveblog of Google’s annual I/O developer conference. I feel as if our liveblog tool has gotten more than its fair share of use these last two weeks. If it all feels very familiar to you too, that’s likely because we had two liveblogged events just last week, one of which was of the company’s Android showcase

Update, May 20 2025, 9:45AM ET: This story has been updated to include a liveblog of the event.

Update, May 19 2025, 1:01PM ET: This story has been updated to include details on the developer keynote taking place later in the day, as well as tweak wording throughout for accuracy with the new timestamp.

Source link

May 20, 2025 0 comments