• Playlist
  • Seattle Startup Toolkit
  • Portfolio
  • About
  • Job Board
  • Blog
  • Token Talk
  • News
Menu

Ascend.vc

  • Playlist
  • Seattle Startup Toolkit
  • Portfolio
  • About
  • Job Board
  • Blog
  • Token Talk
  • News

Image generated using ChatGPT’s new unified model.

Token Talk 11: Do Omni models bring us closer to AGI?

April 1, 2025

By: Thomas Stahura

Sam Altman’s manifest destiny is clear: achieve AGI.

There is little consensus on what AGI actually means. Altman defines it as “the equivalent of a median human that you could hire as a coworker and they could do anything that you’d be happy with a remote coworker doing.”

Dario Amodei, Anthropic founder and CEO, says AGI happens “when we are at the point where we have an AI model that can do everything a human can do at the level of a Nobel laureate across many fields.”

Demis Hassabis, CEO of Google DeepMind, puts it more succinctly. AGI, he says, is “a system that can exhibit all the cognitive capabilities humans can.”

If AGI is inevitable, the next debate is over timing. Altman thinks this year. Amodei says within two. Hassabis sees it arriving sometime this decade.

As I mentioned last week, AI researchers are working to unify multiple modalities — text, audio, and images — into a single model. These so-called “omni” models can natively generate and understand all three. GPT-4o is one of them. The “o” meaning Omni. It has handled both text and speech for nearly a year. But image generation was still ruled by diffusion models, until last week.

It began with a research paper from a year ago out of Peking University and ByteDance. The paper introduced Visual AutoRegressive modeling, or VAR. The approach uses coarse-to-fine next-scale prediction to generate images more efficiently. It does this by predicting image details at increasing resolutions, starting with a low-resolution base image and progressively adding resolution to it, which improves both speed and quality over conventional GPT-style raster-scan or diffusion denoising methods.

Put simply, VAR enables GPT-style models to overtake diffusion for image generation at large scales.

Qwen-2.5 Omni, the open-source omni model from China I referenced last week, may be an early sign of where things are heading. In its research paper, they wrote, “We believe Qwen2.5-Omni represents a significant advancement toward artificial general intelligence (AGI).”

Is omni a leap toward AGI? That’s the bet labs are making.

And generative model-native startups will need to respond. Companies like Midjourney and Stability, still rooted in diffusion, will likely have to build their own GPT-style image generators to compete. Not just for images, but potentially across all modalities. The same pressure may extend to music and video, pushing startups like Suno, Udio, Runway, and Pika to expand beyond their core businesses. This will be over years not months, especially for video. Regardless, I'm certain researchers at OpenAI, Anthropic, Google, and Microsoft are actively training their next gen omni models.

OpenAI has a lot riding on AGI. If it gets there first, Microsoft loses access to OpenAI’s most advanced models.

Tensions between the two have been building for months. The strain began last fall, when Mustafa Suleyman, Microsoft’s head of AI, was reportedly “peeved that OpenAI wasn’t providing Microsoft with documentation about how it had programmed o1 to think about users’ queries before answering them.” 

The frustration deepened when Microsoft found more value in the free DeepSeek model than in its $14 billion investment in OpenAI.

Microsoft is already developing its own foundation model, MAI, which is rumored to match OpenAI’s performance. OpenAI, meanwhile, just closed a $40 billion tender offer on the strength of GPT-4o and its new image generator, an update more significant than most realize.

From the outside, it appears AGI is near. Granted I suspect it will be around the 2030s when we’ll feel the impacts. My own working definition: a model capable of performing all economically valuable work on a computer, across all domains.

What that means for the labor market is another story. Stay tuned!

Tags Token Talk, Omni Models
← Token Talk 12: Want Tech Work, In this Economy?Token Talk 10: What Startups Gain from China’s AI Push →

FEATURED

Featured
Subscribe to Token Talk
Subscribe to Token Talk
You Let AI Help Build Your Product. Can You Still Own It?
You Let AI Help Build Your Product. Can You Still Own It?
Startup-Comp (1).jpg
Early-Stage Hiring, Decoded: What 60 Seattle Startups Told Us
Booming: An Inside Look at Seattle's AI Startup Scene
Booming: An Inside Look at Seattle's AI Startup Scene
SEATTLE AI MARKET MAP V2 - EDITED (V4).jpg
Mapping Seattle's Enterprise AI Startups
Our 2025 Predictions: AI, space policy, and hoverboards
Our 2025 Predictions: AI, space policy, and hoverboards
Mapping Seattle's Active Venture Firms
Mapping Seattle's Active Venture Firms
PHOTOS: Founders Bash 2024
PHOTOS: Founders Bash 2024
VC for the rest of us: A big tech employee’s guide to becoming startup advisors
VC for the rest of us: A big tech employee’s guide to becoming startup advisors
Valley VCs.jpg
Event Recap: Valley VCs Love Seattle Startups
VC for the rest of us: The ultimate guide to investing in venture capital funds for tech employees
VC for the rest of us: The ultimate guide to investing in venture capital funds for tech employees
Seattle VC Firms Led Just 11% of Early-Stage Funding Rounds in 2023
Seattle VC Firms Led Just 11% of Early-Stage Funding Rounds in 2023
Seattle AI Market Map (1).jpg
Mapping the Emerald City’s Growing AI Dominance
SaaS 3.0: Why the Software Business Model Will Continue to Thrive in the Age of In-House AI Development
SaaS 3.0: Why the Software Business Model Will Continue to Thrive in the Age of In-House AI Development
3b47f6bc-a54c-4cf3-889d-4a5faa9583e9.png
Best Practices for Requesting Warm Intros From Your Investors
 

Powered by Squarespace