• Playlist
  • Seattle Startup Toolkit
  • Portfolio
  • About
  • Job Board
  • Blog
  • Token Talk
  • News
Menu

Ascend.vc

  • Playlist
  • Seattle Startup Toolkit
  • Portfolio
  • About
  • Job Board
  • Blog
  • Token Talk
  • News

Token Talk 21: We Built the Chips. Now Build the Apps

June 11, 2025

By: Thomas Stahura

Last December, Google unveiled Willow, its new quantum chip. The media dubbed it mind boggling when, with only 105 qubits, it was able to solve a problem in five minutes that would take a classical computer ten septillion years to complete. That problem is called Random Circuit Averaging (or RCA) which I’ll explain in a bit.

On the news, Google’s stock jumped, as did its competitors: Microsoft, Rigetti, D-Wave, and IONQ. For a moment, it seemed, quantum hype dethroned AI to become the talk of the town. Two months later, Microsoft responded by announcing its own quantum chip called Majorana 1, causing another stock bump. However, at only 8 qubit, it's still early stage, and the tech giant  has yet to publish its RCA results.

RCA is a benchmark, not a problem, as such is designed to gauge quantum computer performance. The whole test is basically "can you sample from this crazy quantum distribution faster than classical computers can even calculate what that distribution should be?" 

To do this, researchers must:

  • Pick a number of qubits (like 105 for Google's chip)

  • Generate a random sequence of 20+ random quantum gates (like Hadamard or Pauli-X mentioned last week)

  • Run information through the more than 20 gate layers a million times or so

  • Collect each runs generated bitstring output (something like "01101001...")

  • Use classical computers to simulate what the "perfect" quantum computer would output

  • Measure how close your actual results are to the ideal

Each additional gate creates more quantum entanglement between qubits. More layers = more complex quantum correlations = harder for classical computers to track. More than 20 layers is where classical simulation becomes practically impossible. If a quantum computer finishes in minutes but classical takes years → quantum advantage. At least, that's how the thinking goes.

RCA is cool but not practical. It's like saying your AI passed the MENSA iq test. So where are the real world quantum applications?

Enter the wonderful world of optimization and quantum annealing!

Classically, annealing is an algorithm inspired by metallurgy: you heat up a material and then cool it slowly so atoms settle into a low-energy (optimal) state. In the math world of optimization, you randomly explore solutions, occasionally accepting worse ones to escape local minima, and gradually “cool” to settle into the best solution.

Imagine you’re standing in a vast, foggy landscape of rolling hills and valleys. Each point in this landscape represents a possible solution to your optimization problem. The height at each point is the “energy” of that solution — the lower the energy, the better. Classical annealing is like wandering this landscape with a lantern. At first, you’re allowed to take big, random steps, even uphill, so you don’t get stuck in a small valley (local minimum). As time goes on, you “cool down,” and your steps get smaller and more cautious, focusing on moving downhill. The hope is that, by the end, you’ve found the deepest valley, the global minimum. The catch? Sometimes, no matter how clever you are at wandering, you can still get stuck in a valley that isn’t the lowest one (not optimal). The fog is thick, and you can’t see the whole landscape at once.

Quantum annealing replaces random steps with quantum tunneling, allowing the system to “tunnel” through energy barriers rather than climb over them. In our example, instead of just walking over the hills, you can tunnel through them to a lower valley on the other side, even if it looks impossible from a classical perspective. Essentially, thanks to quantum mechanics, quantum tunneling can help escape local minima that would trap a classical algorithm.

Without getting too technical, quantum annealing does not use any quantum logic gates! Instead, an optimization problem is encoded as a Hamiltonian (fancy math representing the system's total energy). This sets up the energy landscape so that the lowest energy state (the ground state) represents the best solution to the problem. Then (thanks to quantum physics), the system naturally wants to stay in the lowest energy state and naturally “relaxes” into the answer.

Companies like D-Wave, founded in 1999, are leading the charge in quantum annealing. D-Wave’s Advantage system, accessible via its Leap cloud platform, has been used by the likes of Volkswagen to optimize traffic flow and by SavantX to streamline port operations, reducing costs and improving efficiency. D-Wave charges a subscription for cloud access and consulting services. In 2024, D-Wave reported contracts with major firms, contributing to its growing commercial traction.

Similarly, IonQ, which runs a quantum computing manufacturing facility up in Bothell, operates primarily as a Quantum-as-a-Service model, providing access to its quantum computers via major cloud platforms like AWS, Azure, and GCP. The company was founded in 2015 and became the first quantum company to IPO back in 2021.

Beyond optimization and the cloud, quantum computing is making inroads in drug discovery and materials science. For example, Algorithmiq’s collaboration with IBM’s Quantum Network focuses on quantum chemistry simulations to identify promising drug candidates, potentially shaving years off development timelines. Generating revenue through partnerships and licensing their software platforms. Algorithmiq secured €13.7 million in funding to scale its offerings. 

Quantinuum is also working with firms like Samsung to apply quantum algorithms in materials design, optimizing material properties for semiconductors and batteries. These early applications, still in the prototyping phase, are driving real revenue through research contracts and pilot projects.

Quantum applications are hitting the market and making money. We now have enough qubits to do cool things! It feels like the bottleneck is shifting from hardware to software. The industry needs more quantum developers to build the next generation of algorithms and apps. Or maybe develop an AI that can program in Q# or the other quantum languages. On the hardware side, things are starting to get crowded: Xanadu, Alice & Bob, Atom Computing, PsiQuantum, Rigetti, NVIDIA, QuEra Computing, and Intel, just to name a few, are all developing their own quantum computers. 

I think we'll see much more change in the quantum industry in the next 30 years than the last 30. Again with most of that change coming from innovative software.

Stay tuned next week for the final installment of our quantum series!

P.S. If you have any questions or just want to talk about AI, email me! thomas @ ascend dot vc

Tags Token Talk, Quantum Computing

Token Talk 20: How Quantum Computers Work, Pt. 1

June 4, 2025

By: Thomas Stahura

Editor’s note: This is the first in a three-part Token Talk series on quantum computing. Today’s post covers the fundamentals of how quantum machines work. Next week, we’ll dive into the key players in the field and the startups already building real applications.

You’ve probably heard of quantum computers. 

Invented in 1998, this breed of thinking machines are billed as the quintessential classical computer disrupter. But when asked exactly why or how these machines will change the world, most folks just shrug. 

Over the last 27 years, the field has gone from two qubits per chip to an astounding 1,121 qubits in IBM's latest quantum chip. Still, few have seen, let alone used, a quantum computer. What gives?

Before diving into the new world of quantum computers, let's quickly cover the old world of classical computers.

Classical computers (like the device you're looking at now), store information in binary bits of 1s and 0s. This information flows through a series of logic gates that each perform a certain mathematical operation. These logic gates are the following: NOT, AND, OR, NAND (Not AND), NOR (Not OR), XOR (Exclusive OR), XNOR (Exclusive NOR / Equivalence).

Take the NAND gate. Its function is to output 0 only if both of its inputs are 1; otherwise, it outputs 1.

So,

Input: 1, 1 → Output: 0

Input: 1, 0 → Output: 1

Input: 0, 1 → Output: 1

Input: 0, 0 → Output: 1

The NOR gate, on the other hand, outputs 1 only if both inputs are 0; otherwise, it outputs 0.

So,

Input: 0, 0 → Output: 1

Input: 0, 1 → Output: 0

Input: 1, 0 → Output: 0

Input: 1, 1 → Output: 0

And lastly, the NOT gate (AKA the inverter), flips the input.

So,

Input: 1 → Output: 0

Input: 0 → Output: 1

Logic gates are the LEGO bricks of computation. By chaining them together, you build circuits that can add, subtract, multiply, and more. Ok, now to understand how quantum computers differ from classical computers, you also need to understand the concept of reversibility.

A logic gate is reversible if you can always uniquely recover the input from the output.

For example, you have a NAND gate and it outputs a 1, what was the input? It could be 0,0 or 0,1 or 1,0. Since we cannot uniquely recover the input from the output, we say NAND gates are not reversible. In other words, information (about the input) is lost.

NOT gates, on the other hand, are reversible. For example, if a NOT gate outputs a 0, we know the input must be 1. And if it outputs a 1, its input must be 0.

Now that you get classical gates — NAND, NOR, NOT, etc. — it's time to dive into quantum computers because they are playing a whole different game. Instead of bits, they use qubits. 

Qubits aren’t just 0 or 1; they can be both at the same time (that’s superposition). And quantum gates are the logic gates that manipulate these qubits.

The first rule of quantum math is: Every quantum gate is reversible. Meaning you can always run them backward and recover your original state.

Classical gates (like NAND/NOR) can destroy info (not reversible). Quantum gates never do. They’re always reversible, always unitary (fancy math words for “no info lost”).

As such, because of reversibility, quantum computers have a unique set of quantum logic gates that permit a certain kind of math. Let's go over two of them:

Hadamard (H) Gate is the superposition gate. Input a 0, you get a 50/50 mix of 0 and 1. Imagine flipping a coin, as it's spinning in mid air, it forms a 3d sphere and its probability, at that moment, is 50/50 chance of being heads or tails. Input a 1, same deal — still a 50/50 mix, but with a phase flip. Imagine representing the direction and speed of the coin’s spinning as an arrow in 3d space, this arrow has a direction (phase), and speed (magnitude). Flipping the phase reverses the direction of the coin's spin. The Hadamard gate is how you unlock quantum parallelism: it takes a boring, definite state and turns it into a quantum probabilistic state. In short, it’s the logic gate that turns classical bits into quantum bits.

So,

Input: |0⟩ → Output: 50% chance of being 1 or 0

Input: |1⟩ → Output: 50% chance of being 1 or 0

Once your qubit is in superposition, you can start doing some wild quantum tricks. The next essential gate is the Pauli-X gate (often just called the X gate). Think of the X gate as the quantum version of the classical NOT gate. It flips the state of a qubit:

Input: |0⟩ → Output: |1⟩

Input: |1⟩ → Output: |0⟩

If your qubit is in superposition (say, α|0⟩ + β|1⟩), the X gate swaps the amplitudes:

Input: α|0⟩ + β|1⟩ → Output: α|1⟩ + β|0⟩

Still reversible, still no info lost.

In quantum computing, amplitudes (like α and β) are complex numbers that represent the arrows in 3d space mentioned earlier. They encode both the phase and magnitude of a qubit with the probability of the qubit given by the squared magnitude of the amplitude. The phase (angle) of the amplitude affects how quantum states interfere, but is not directly observable as a probability.

After many quantum logic gates, when you measure a qubit, its superposition collapses to a definite 0 or 1. So, to get a quantum speedup, your algorithm must:

  • Exploit superposition and entanglement to process many possibilities at once.

  • Be reversible (unitary operations only).

  • Use a technique called interference to amplify the correct probabilities and cancel out the wrong ones.

Most problems don’t fit this mold. If you just naively port classical code, you’ll get no speedup — or worse, a slowdown.

As of today, there are only four algorithms that take advantage of quantum computers' unique properties. They are, Shor’s Algorithm (Factoring Integers), Grover’s Algorithm (Unstructured Search), Quantum Simulation (physics simulations), and Quantum Machine Learning (QML)

  • Shor’s algorithm, using quantum Fourier transform, finds the prime factors of large numbers exponentially faster than the best classical algorithms. This has massive implications in cryptography since it breaks RSA encryption, which relies on prime factoring being difficult, and secures most of the internet today

  • Grover’s algorithm, using amplitude amplification to boost the probability of the correct answer, searches an unsorted database about 99.9% faster for a million items. And the speedup grows as the database gets bigger.

  • Quantum Simulation, using entanglement and superposition, models complex quantum systems — like molecules, proteins, or new materials — that are impossible for classical computers to handle. This unlocks breakthroughs in drug discovery, chemistry, and materials science by letting us “test” new compounds in silico before ever touching a lab.

  • Quantum Machine Learning (QML), using quantum circuits, can turbocharge core tasks like linear algebra and sampling. Quantum computers, in theory, can solve huge systems of equations, invert matrices, and sample from complex probability distributions faster than classical machines. Though this is still very much in the domain of researchers.

A new wave of pre-quantum startups is building the application layer for quantum computing. Just as AI startups turned research into real-world value, these teams are doing the same for quantum by targeting proven algorithmic advantages. They are developing tools for drug discovery, molecular modeling, cybersecurity, faster search, and design optimization in aerospace and manufacturing. These companies are positioning themselves now so they are ready to scale when the hardware becomes readily available.

Ok, that was a crash course in quantum computing! Abstract, but just scratching the surface. And there’s still a whole universe left to explore: More quantum logic gates, quantum error correction (how do you keep qubits from falling apart?), decoherence (why do quantum states vanish so easily?), entanglement (spooky action at a distance, anyone?), and the wild world of quantum hardware (trapped ions, superconducting circuits, photonics, and more). We haven’t even touched on the real-world challenges — scaling up, keeping things cold, and making quantum computers actually useful outside the lab. 

Tags Token Talk, Quantum Computing, Quantum

Token Talk 19: The Hype Train that Keeps on Chugging

May 28, 2025

By: Thomas Stahura

Whenever I talk to someone who doesn’t follow AI news every day, the reaction is usually some variation of the same sentiment: Impressive but scary! That feels automatic now, like it’s been rehearsed. Each week’s headlines blur into the last. 

It makes AI feel like old news. People seem to be waiting for the really big announcement. But what would that even look like? And what does that say about where we are in the AI hype cycle?

The reason I bring this up is because last week, for me, really felt like one of those “holy shit!” type weeks — and it came from a flurry of announcements you may have seen but already forgot about. To catch you up:

  • Anthropic released its Claude 4 family of models

  • OpenAI acquired Jony Ive’s io design firm for $6.5 billion, catapulting OpenAI’s ambition into hardware

  • Microsoft debuted Windows computer use agents and open sourced Github Copilot at MS Build

  • Google held its annual IO developer conference, announcing Gemini updates, a new open-source Gemma model, Mariner browser agent in Chrome, and Veo 3 with audio generation (an impressive release given that it’s notoriously hard to synch generated video with audio)

So, here’s my take on the week’s announcements:

  • Claude 4 is incredible at coding, but average everywhere else. 

  • If the Sam Altman–Jony Ive collaboration isn’t some kind of BCI wearable, it’ll feel like a letdown. 

  • Microsoft made a lot of noise but showed few real products. 

  • Google stole the show. I/O was sharp, and Veo 3 outputs flooded X/Twitter feeds. 

The big announcements soaked up most of the attention, overshadowing some equally promising — but less polished — developments elsewhere in the AI world.

  • For starters ByteDance quietly dropped a new open-source model: BAGEL. A 7 billion parameter Omni model capable of understanding and generating language (reasoning and non reasoning) and images (generating, editing, and manipulating). The model outperforms Qwen2.5-VL and InternVL-2.5. It's only missing audio to complete the Omni modality trifecta! 

  • Alibaba updated its Wan2.1 video model. Claiming SOTA at 14 billion parameters, it can run on a single GPU and produce impressive 720p videos or edits. Still no audio for the videos. I’m noticing a trend…

  • Google, during IO, open sourced MedGemma, a variant of Gemma 3 finetuned on medical text and clinical image comprehension. The model is designed to answer your medical questions like a nurse and analyze your X rays like a radiologist. It’s available for free in 4b and 27b sizes.

That was the news of the last few weeks. Plenty of flash, plenty worth watching.

But the hype cycle has a funny way of resetting itself. And I’ve been thinking more about what’s happening off to the side. The stuff that isn’t getting the spotlight, but might shape the next phase of this industry (and maybe future Token Talk topics). 

Stuff like DeepMind’s AlphaEvolve paper, which introduces a Gemini-powered agent designed specifically for the discovery and optimization of algorithms. AlphaEvolve uses an evolutionary framework to propose, test, and refine entirely new algorithmic solutions. It's a tangible step towards AI systems that can do the science of computer science by actively exploring the digital codescape and uncover novel solutions, demonstrating a form of discovery.

A nonprofit out of San Francisco called Future House is pursuing a much broader goal: automating the entire process of scientific discovery. It recently unveiled Robin, a multi-agent system that achieved its first AI-generated discovery: identifying an existing glaucoma drug as a potential new treatment for dry macular degeneration. Robin basically orchestrated a team of specialized AI agents to handle everything from literature review to data analysis, proving that AI can indeed drive the key intellectual steps of scientific research

It’s easy to mistake noise for signal, hype for substance. And believe me, there is more noise than signal in the AI world right now. But that happens at some point in every tech cycle. I think it would be a huge mistake to completely dismiss today's AI ambitions of automated discovery or human-machine telepathy. 

AI today feels like where 3D printing was in 2013. Still a lot of excitement but noticeably less than a few years ago. Will there be another AI winter? Almost certainly. Will it be anytime soon? No.

Hype doesn’t die as much as it transitions from one idea to another, from one industry to another. Within AI, chatbots, agents, and now discovery and robots have all been hyped. In the broader tech industry, mobile was hyped, then cloud, crypto, and now AI. 

What's next? What new tech breakthrough will catch the collective consciousness the way AI has? Maybe space, carbon nanotubes, CRISPR, room temperature superconductors, fusion, quantum, or something entirely new that comes out of left field… Time will tell, so stay tuned!

Tags Token Talk, AI Hype Cycle

Token Talk 18: What’s Microsoft's Forking Problem?

May 21, 2025

By: Thomas Stahura

When Microsoft released Visual Studio Code in 2015, it quietly marked the start of a new era in software development. A decade later, the free and open-source code editor became the dominant platform for programmers, used by nearly three-quarters of developers worldwide. 

It didn't take long for VS code to dominate the code-editor market. The product helped fuel Microsoft’s broader push into cloud services and artificial intelligence, tying together Azure, GitHub and, later, OpenAI. But as generative AI reshapes software development, startups built on top of VS Code are now turning into competitors.

In 2015, I was building Minecraft mods in Eclipse. A year later, my AP computer science class and robotics team (shoutout Team 1294!) switched to VS Code. I stuck with it for the next eight years, along with most of the developer world. Today, 73% of programmers use VS Code. At least, I did too — until last year.

So if VS Code is free and open source, how does it make money? 

IDEs are big business, especially for a software giant like Microsoft. Sure, they don’t make money from the IDE itself, but the developers that use it are the fuel for spending on cloud services like Azure, generating tens of billions of dollars for Microsoft. When bundled with Github, which Microsoft acquired for $7.8 billion in 2018, and integrated into VS Code, the world's most popular IDE, it's easy to see how Azure and the cloud is Microsoft’s main money maker today.

Former CEO Steve Balmer was correct when he thundered the famous “Developers! Developers!! Developers!!!” line at Microsoft’s developer conference in 2005.

Twenty years later, Satya Nadella said Microsoft evolved into “a platform company focused on empowering everyone with AI.” That evolution began in 2019 when Microsoft made its first billion-dollar investment in OpenAI. Early models like GPT-2 showed potential with generating code. And GPT-3 proved to be an expert at writing boilerplate code. In 2021, months before OpenAI’s ChatGPT debut, Microsoft launched Github Copilot and bundled it with VS Code.

At $20 per month, it isn't cheap, but it was given away to students for free. It was an early product and an obvious game-changer for programming. The consensus at the time was that Microsoft, owning Azure, Github, VS Code, and 50% of OpenAI, would dominate the emerging AI IDE industry.

In hindsight, that couldn’t be further from the truth. The entire tech landscape saw the value of generative coding. Millions of developers started using it every day. Companies will brag about the percent of its code that is AI generated. And AI coding was rebranded as Vibe Coding.

Developers began forking VS Code en masse (approximately 32,000 times) to build their own separate IDEs. Companies like Cursor and Windsurf reached billion-dollar valuation in the past two years, and countless others like Pear AI have raised millions and got into YC — all off the back of Microsoft and VS Code.

The culmination of this forking frenzy came with OpenAI’s acquisition of Windsurf earlier this month. Think about it: Microsoft owns VS Code and half of OpenAI. Windsurf forks VS Code and is acquired by OpenAI. Microsoft now technically owns half of Windsurf, a competitor built on top of its own product. This feels like the final nail in the coffin for the Microsoft-OpenAI partnership.

Yesterday, in response to the acquisition, Satya announced it is open-sourcing Github Copilot. Probably in an attempt to eliminate the viability of the many VS Code fork startups.

How that will play out remains to be seen. However one thing is for sure: AI coding is the current killer use case for generative AI. The model makers are racing to saturate the coding benchmarks.

P.S. If you have any questions or just want to talk about AI, email me! thomas@ascend.vc

Tags Token Talk, Fork VSCode

Token Talk 16: When Proving You’re Human Gets You Paid

May 6, 2025

By: Thomas Stahura

Sam Altman often says he knows within 10 minutes if he wants to work with someone. After such a meeting with Alex Blania, he was convinced of Alex’s exceptional abilities. Their initial chat quickly turned into a multi-hour walk where they discussed their ambitions, the future, and ultimately, the World project.

World.org is the online home of the World Foundation, a Cayman Islands company that vaguely aims to “create more inclusive and fair digital governance and economic systems,” aligned with several UN Sustainable Development Goals. The foundation operates under the umbrella of Tools for Humanity, a parent company chaired by Sam Altman.

Ok fine, another Altman for-profit-not-for-profit project with a complicated corporate structure full of platitudes. But what's the product here? How does it make money? That's where things get a little Black Mirror.

The company aims to authenticate real humans in the age of AI. By scanning your face using one of its orbs, your unique biometric data is added to the so-called “World Chain” (its Ethereum secured blockchain) and you are issued a World ID and free World Coin for verifying your humanity.

Once verified, you can join World App, a human only super app with its own app store, encrypted messaging platform, and crypto wallet.

As for the coin, 10% of its volume is already allocated to employees and another 10% to investors (most notably Andreessen Horowitz). World coin owners can send tokens to each other, vote on world foundation proposals, or sell. So far this year the coin’s value dropped 86%.

I mentioned last week the online human authentication problem is indeed a very real problem. However, I think this Youtube comment sums up World’s reception online:

“Can I scan and record your fingerprints [Sic] don't you worry what for, here's 10 bucks.”

So, if you’re wary of swapping your biometrics for crypto, you’re not alone. Good thing Worldcoin isn’t the only sheriff in town when it comes to proving you’re human. The digital frontier is already patrolled by the likes of Google’s reCAPTCHA (the service that has us clicking all the traffic lights), Cloudflare’s bot-fighting checkmark boxes, and Jumio’s ID verification scanner. Each offers a different flavor of the same promise: keep the bots at bay, let the real people in. But as AI gets smarter, so do the bots, and the arms race for digital authenticity will likely never end.

For startups, this means the old playbook — collect data quietly, hope no one notices — doesn’t cut it anymore. Today, you’re building a product and cultivating trust. That means being upfront about how you’re protecting data and keeping out the fakes, whether you’re using open-source code, third-party products, or just informative English explanations. If you can’t show your users how you’re protecting their privacy and their identity, someone else will — and they’ll win the trust war and those customers.

But beyond the technical and privacy concerns, Worldcoin’s pitch is more than just about proving you’re human — it’s about what you get for it. It's the idea of rewarding people for their mere existence, rather than their labor. Or in other words, a form of Universal Basic Income (UBI).

Andrew Yang mainstreamed the term during his 2020 presidential run. He proposed a “freedom dividend” of $1,000 per month for each adult American citizen. A very popular idea for obvious reasons.

That same year, Altman conducted a study giving 3,000 individuals $1,000 per month over a 3 year period, the largest study of its kind. The results concluded UBI provided immediate financial relief and increased personal freedom, but did not lead to lasting financial security or major changes in employment quality.

Altman has since proposed a new idea, Universal Basic Compute. Essentially giving everyone access to a share of AI computing power instead of regular cash. People could use, sell, or donate their allotted compute. Meanwhile, Elon Musk envisions a future of Universal High Income. Brought about by AI automated abundance. How these projects will be paid for remains to be seen.

It seems the real story here is about the age-old tension between privacy and progress. We want the benefits of AI, UBI, and digital identity, but we’re not quite ready to trade our faces for a few tokens and a promise. The question isn’t “can we build it?” since we know we can, it's “should we scan it?” 

Altman knew in 10 minutes that Alex Blania was worth betting on. The rest of us get an orb, a coin, and a promise. For Worldcoin to work, that has to be enough.

Tags Token Talk, WorldCoin

Token Talk 15: Was the internet ever alive?

April 30, 2025

By: Thomas Stahura

LinkedIn banned me. I was running a scraper to enrich a dataset for Ascend and triggered its aggressive bot detection. Frustrating, but a right of passage for any automation enthusiast. (I was back after 24 hours in the digital penalty box.) 

Moving beyond my personal digital hiccup, a far more significant disruption is unfolding online, sending me down the rabbit hole of the internet’s growing bot problem and the serious questions about the future of interaction itself.

In recent news, researchers at the University of Zurich secretly deployed AI bots across Reddit over the last four months to test whether artificial intelligence could sway public opinion on polarizing topics. 

The study drew heavy criticism after it came out that the researchers had their AI bots pose as rape victims, Black men opposed to BLM, and workers at a domestic violence shelter. The bots targeted the subreddit r/changemyview and wrote more than 1,700 personalized comments designed to be as persuasive as possible.

The results show AI-generated comments are significantly more effective (three to six times more effective) at changing users' opinions compared to human-generated comments. And none of the users were able to detect the presence of AI bots in their subreddit. 

Reddit’s Chief Legal Officer condemned the research as “deeply wrong on both a moral and legal level,” and the company banned all accounts associated with the University of Zurich. Despite the condemnation, Reddit's data deal with OpenAI indicates it's providing the foundation for even more persuasive digital manipulators. And OpenAI itself is considering launching its own social network to feed its data hungry models.

The dead internet theory is an online conspiracy that’s been around for years but hit the collective consciousness in the wake of ChatGPT’s launch in late 2022. The internet became "dead," the theory goes, as authentic human engagement has been largely replaced by automated algorithm-driven content and interactions.

Afterall, Google is built off the backs of thousands of crawlers storing every known site, while other bots crawled the internet since its birth. Imperva, which only started tracking bots in 2013, clocked them at 38.5% of all internet traffic. Bots surged to 59% the following year and slowly dropped back down to 37.2% in 2019 (the same year human traffic peaked at 62.8%). Since then, bot traffic has been crawling back up. And, in 2024, surpassed human traffic for the first time in a decade. Today, it’s reasonable to assume bots are responsible for more than half of global internet traffic.

But again, this is nothing new. It happened in 2014 and all the largest websites have built serious defenses around their valuable data. How many captchas have you had to solve? I’ve personally done too many to count, and I still managed to get my LinkedIn suspended for “the use of software that automates activity.” 

The central question of the “dead internet” and the AI revolution as a whole is: “Is this time different?” 

Yes, in the sense that humanity will remain below 50% internet traffic for the foreseeable future. But also no, in the sense that human generated data is and will always be the most valuable commodity online. So there exists incentives to protect and foster it, though the influx of bots is already upon us. LLM-powered agents are actively exploring the web in exponential numbers. Deep research agents visit hundreds of websites with a single query. IDE agents like Cursor and Cline now search the web for documentation. And agents are already booking AirBnBs, hailing Ubers, and ordering pizzas.  

These agents can buy things but aren't influenced by ads. They masquerade as real humans but don’t generate authentic human activity. This is a whole new paradigm that websites will have to adapt to or risk losing business to sites who do. Allow the good bots, block the bad ones. Sounds easy enough, but how can you tell? The solution isn’t entirely clear yet. Thus enabling Swiss grad students to gaslight thousands of people for science.

The challenge for startups lies in balancing automation with authenticity. While AI can and should handle repetitive tasks and scale development, startups thrive on genuine connection with their early adopters and customers. Blindly automating every interaction could alienate the very people they need to build a real following.

There are tens of thousands of automated Facebook attention farm accounts. But I doubt images of shrimp Jesus are influencing people. The fear is rampant disinformation and targeted persuasion. And it's warranted. I spot fake-seeming Youtube comments all the time, and I'm certain DeepSeek-powered disinformation is rampant on Weibo.

The Head of TED, Chris Anderson, during his talk with Sam Altman, put it best. He said: “It struck me as ironic that a safety agency might be what we want, yet agency is the very thing that is unsafe.”

I believe there is a way to authenticate agents and build a web that works for both bots and humans alike. I’ll talk more about what that looks like in the next edition. 

But if it wasn't clear already, don’t automatically trust everything you see online. The next time LinkedIn sends you a push notification saying “so and so” viewed your profile — they may be a bot in disguise.

Tags Token Talk, Dead Internet Theory

Token Talk 14: OpenAI killed my startup. Now the real disruption begins.

April 23, 2025

By: Thomas Stahura

It was my second desperate pivot, and it made so much sense at the time. An AI marketplace I thought! A site where users can submit and monetize their prompts and use cases.

Turns out, a chat interface is much more intuitive than searching a giant list of prompts. 

So last year, when I heard OpenAI killed 100,000 startups with the launch of its GPT store, I was justifiably skeptical. But it got me wondering: How many companies has OpenAI actually killed? And more broadly, how has AI affected the tech landscape 2 years into the fourth industrial revolution?

Let’s start with the most visible disruption. 

Devtool and edtech companies that once seemed untouchable are crashing back down to earth. Since 2022, Stack Overflow, a question-and-answer platform for developers, lost about 5 to 15% of its web traffic each year. In response, it launched OverflowAI in the summer of 2023. Despite the push, Stack Overflow’s decline has not slowed down. Chegg, a study and homework help platform, rolled out CheggMate in spring 2023. Since then, its stock plunged 97%. Coursera, another edtech company, launched its AI-powered Coursera Coach last year. The stock is down 85% since 2021.

Meanwhile, AI is creeping into the design world: Adobe launched Firefly, its AI image generator; Canva rolled out Canva Code, its text-to-design tool; and Figma followed with Figma Code, its own version of text-to-design. Unlike education or developer tools, the design sector is still growing, but that likely won’t last for long. Large language models can now generate full applications from a simple prompt.

Lovable, on its home page, advertises itself as a Figma competitor. For those still designing by hand, it added an "import from Figma" button. The once-dominant design firm — which nearly sold for $20 billion in 2023 — is now reduced to a button on a rival's site. Figma responded by launching its own AI dev tool, Figma Code, and issued cease-and-desist letters to Lovable and others over their use of "Dev Mode," a term Figma trademarked in 2023.

It’s getting ugly for the companies not named OpenAI.

Speaking of, OpenAI’s image generator now produces nearly perfect text and designs. Using 4o feels like how Photoshop should work — and Adobe better be taking notes.

AI labs are racing toward models that can handle every modality, and businesses are restructuring their products around them. When every product works like a text-to-anything tool, how will users tell them apart?

Honestly, besides UI and mindshare, what are the differences between Lovable, Bolt, Chef, Github Spark, v0, Firebase Studio, AWS App Studio, Cursor, Windsurf, Claude Code, Codex, Figma Code, or Canva Code? (And that's just the tip of the iceberg.) Some may use different models, but even that layer is close to being commoditized. 

So how are entrepreneurs supposed to stand out?

The new frontier in the digital world will probably be vertical AI, or what we call SaaS 3.0. These are tools built for specific industries, workflows, companies, or even individual users. Here, differentiation does not come from the model or UI, but from data, domain expertise, and deep trust.

Rohan D’Souza, founder of Avante, a health benefits admin platform and Ascend portfolio company recently wrote in a post:“The model is the tiniest piece of a much larger enterprise stack required to actually deliver value.” 

In other words, the real moat is not the model itself. It’s the safety, reliability, domain-specific workflows, and trust built around it. 

I believe the digital frontier is only half the story. For decades, the most dramatic technological shifts happened on screens and servers. As Marc Andreessen famously put it: "Software is eating the world." It took a while, but AI is breaking out of code and moving into the physical world — biotech, robotics, manufacturing, logistics, and more.

AI in the physical world is far more defensible. The machines it runs are harder to replicate, and the technical nuances go deeper than traditional software alone. (Ascend labels this category Frontier AI). 

Despite OpenAI's partnership with Anduril, the demand for homegrown physical tech alternatives is only growing. For instance, in 2022 the American Security Drone Act banned federal agencies from using Chinese-made drones and parts. Around that time, some of my college friends were running Uniform Sierra, an aerospace startup focused on building high-quality drones in the U.S. They scaled with a 3D printer farm as demand surged, and the company was recently acquired. More startups, like Seattle-based drone startup Brinc, are reshoring their manufacturing apparatus. 

So did OpenAI kill 100,000 startups? Probably a few thousand. Mine for sure. But in my defense, I built a chat app, a marketplace, and a social media site before OpenAI did. I have the right ideas. I could have kept going — and I still probably would have been steamrolled.

My chat app worked because there were no others like it at the time. I knew back then it wouldn't last. LLMs were too good to stay secret, and OpenAI would productize them better than I could using its API. I knew I had to differentiate. Now chat apps are a dime a dozen.

Differentiation mattered then. It matters even more now, especially with trillion-dollar tech giants pivoting their entire product suites into AI. Timing might get you started, but differentiation keeps you going.

Tags Token Talk, Disruption

Token Talk 13: Machines Don’t Speak Human

April 16, 2025

By: Thomas Stahura

When I talk about “AI alignment,” I’m not talking about some diagonal line that relates intelligence to compute. No, what I’m talking about is the strangely old philosophical problem of how to get increasingly powerful artificial intelligences to do what we actually want, rather than what we merely say. Or worse, what we think we want. 

I shouldn't have to explain why alignment is so important since these AIs aren't just playing Go anymore; they're deciding who gets parole, filtering your social media feed, diagnosing your illnesses, teaching your kids, and driving multi-ton vehicles down the highway. 

Not to mention the money involved. It’s estimated OpenAI, DeepMind, and Anthropic average $10 million annually on AI safety (~1% of their compute). Safe Superintelligence (SSI), a company founded by ex-OpenAI Chief Scientist Ilya Sutskever, recently raised $3 billion. 

But all the money in the world won’t help if we don’t even know what “alignment” really means. Thankfully, I took an intro to modern philosophy class last year, only to spend half the semester learning ancient philosophy.

Turns out philosophy, like most things, is understood through contrast. And if you want to understand the problem of AI alignment, you’d better start with the old philosophers, because they were wrestling with the problem of learning and the definition of knowledge long before anyone dreamed of gradient descent.

In roughly 369 BCE, Plato suggests that knowledge is justified true belief. Suppose you believe that the sun will rise tomorrow. This belief is true, and you can justify it by appealing to the laws of astronomy and your past experience of the sun rising every day. According to Plato, your belief counts as knowledge because it is true, you believe it, and you have a reasoned account for it. Now, if you’re building an AI, you might think: “Great! Let’s enable it with reason, program it to have justified true beliefs, and we’re done.” But, as usual, things aren’t so simple. 

Because, in 1963, philosopher Edmund Gettier comes along and throws a wrench in everything. He presents these little puzzles, where someone has a belief that is true and justified, yet intuitively does not seem to possess knowledge. For example, imagine you look at a broken clock that stopped exactly 12 hours ago. But, by coincidence, you check it at the precise time it displays. You form the belief that it is 2:00, which happens to be correct, and your belief is justified because you trust the clock. Yet, most would agree you do not truly “know” the time, since your justification is based on faulty evidence. This is an example of a Gettier problem that reveals justified true belief can sometimes be true merely by luck. Now, if you’re trying to align an AI with human values, you’d better hope it doesn’t get “lucky” in the Gettier sense — generate the right thing for the wrong reasons, or worse, generate the wrong thing for reasons that look right on paper.

And then, just when you think you’ve got a handle on things, along come the postmodernists. Postmodernism is marked by skepticism, including the idea that knowledge must fit a strict formula like justified true belief. Instead, postmodernists argue that what counts as knowledge is often shaped by language, culture, and power, and that our understanding is always partial and constructed rather than absolute. 

Now, let’s dig into this language thing a bit more. Think about Derrida, who points out that language isn’t some crystal-clear window onto reality. Words don’t just stand for things. They stand in for things, usually things that aren’t even there. That’s the whole point, right? I can talk about a cat without dragging one into the room. Language works because of absence, because of gaps. And meaning isn’t fixed by what some speaker intended. For example, you write an email and get run over by a self-driving tesla. Your receiver can still read the email even though your intentions are now… well, irrelevant.

More importantly, Derrida, following folks like Nietzsche, gets us suspicious about interpretation itself. Derrida argues there’s no final, correct interpretation of anything – not the Bible, not Plato, not the U.S. Constitution, and certainly not some vague instruction like OpenAI’s “ensure AGI benefits all of humanity.” Trying to pin down meaning is like trying to nail Jell-O to the wall. Philosophical language, the very stuff we use to talk about high-minded ideas like justice, truth, and marketing material is drenched in metaphor.

As Roderick put it:

“Is the word 'word' a word? No, because I have mentioned it and not used it. It has now become a token of a word... What I am trying to say here is that words are not things. That the attempt that philosophers have made to hook words to the world has failed but it’s no cause for anyone to think we are not talking about anything. See this doesn’t make the world disappear, it just makes language into the muddy, material, somewhat confused practice that it actually is.”

So, how the hell are we supposed to translate our messy, metaphorical, interpretation-laden language into the cold, hard logic of model weights without losing everything important, or worse, encoding the hidden biases and power plays embedded in our own mythology? You tell an AI “be fair,” and what does that mean? Fair according to who? Based on what metaphors? It’s not just that the AI might misunderstand; it’s that language itself is built on misunderstanding, on the impossibility of ever saying exactly what you mean and knowing it’s been received as you intended.

So here’s the punchline: AI alignment is not a technical problem, it’s a philosophical and political one. It’s about who gets to decide what “alignment” even means, whose values get encoded, and who gets left out. It’s about the power to define the good, and the danger that our creations will reflect not our best selves, but our resentments, and contradictions. 

I'm optimistic though because while big tech is trying to cook up some universal recipe for 'aligned AI', probably based on whatever focus group data they collected this quarter, there’s another game in town: open source! Which promises everyone their own perfectly loyal digital butler.

It’s almost comical: OpenAI, after years of being “open” in name only, is finally tossing a model over the wall for the public to play with. If you have a GPU and an internet connection that is. People will align models to do stupid, dangerous, or just plain weird things. But maybe, just maybe, letting individuals wrestle with aligning models to their own contradictory values is better than having one monolithic, corporate-approved 'goodness.’

If language is inherently collaborative, if interpretation is endless, if values are masks for power, then maybe distributing the alignment problem is the only way to avoid the dystopia of a single, centrally-enforced 'truth.' It embraces the uncertainty Roderick talked about, instead of pretending we can solve it with a bigger transformer or a better mission statement. I believe that if we embrace the uncertainty and the collaborative potential of language, perhaps we can build not just smarter machines, but a slightly wiser, more self-aware humanity to guide them.

Tags Token Talk, AI Alignment

Token Talk 12: Want Tech Work, In this Economy?

April 8, 2025

By: Thomas Stahura

Job growth data often tells a story that’s already old. Economic conditions shift fast, and the numbers we get today are usually capturing a version of the world that’s already changed.

Case in point: March’s jobs report showed non-farm employment up by 228,000. (For context, “non-farm” is BLS shorthand for jobs outside of farming, government, nonprofits, and homecare.) Most of the growth came from health care, social assistance, transportation, and warehousing. On paper, it paints a picture of stability. Yipeeeee!

But ask job-seekers, especially in tech, and it feels like a different world. People are sending out hundreds of applications and getting nowhere. Scroll LinkedIn for a few minutes and it’s all right there. The official data may offer some reassurance, but the day-to-day reality doesn’t feel reassuring at all. 

How is anyone, anywhere, finding stable tech work in this economy?

Adding to the uncertainty, new tariffs rattled the stock market and sparked another wave of volatility. It’s a reminder of a deeper truth about the current world order. It's built on the expectation of continuous growth, quarter after quarter. When that growth is threatened, the whole thing wobbles.

So, let’s take a closer look at the tech job market today. 

The BLS report says very little, besides the loss of 2,000 information sector jobs and 8,300 professional, scientific, and technical services jobs. Look up “big tech layoffs” and you will see a much clearer picture. Over the last three years, the tech industry shed 609,723 employees, according to layoffs.fyi. (During the dot-com bust, for context, 54,343 tech workers lost their jobs.) While these people will likely find new work elsewhere, sometimes in tech, it hints at a deeper shift, one likely accelerated by the very technology these companies are building: artificial general intelligence.

To add insult to injury, startups, often held up as the safety net after big layoffs, aren’t hiring like they used to. The team scale just isn’t there, thanks in part to AI automating tasks. For many job-seekers, especially those coming from larger companies, the landing spots are fewer and farther between.

Publicly, big tech executives attribute these workforce reductions to “streamlining operations” and “increasing efficiency,” rather than the looming impact of AI. This narrative helps maintain investor confidence and potentially delays difficult conversations about AI's societal effects. 

Startup founders, meanwhile, are often more transparent about AI reducing team growth demands. They have less societal blowback to worry about and are laser-focused on reserving their runway. 

Today's global GDP sits at roughly 108.2 trillion dollars. Assuming a 3% growth rate, the global economy will need to expand 118.2 trillion dollars by 2050. That's an additional Earth's worth of economic activity in the next 25 years.

Enter AGI. If we use my working definition – a model capable of performing all economically valuable work on a computer, across all domains – the potential impact is big. Really big. Automating the vast majority of knowledge work would unlock productivity gains unseen since the industrial revolution.

But productivity gains for whom, exactly?

Paradoxically, our society also demands of us employment. It is estimated that in the U.S. there are more than 100 million knowledge workers amounting to 76% of the full-time workforce. Not to mention the 3 million truck drivers. That's a sizable voting block. What will these people do if these jobs get replaced by robots? 

There exists another economic force gathering steam: the attention economy. Look around: a striking number of young people (and, increasingly, not-so-young people) aspire to become influencers, creators, streamers. When polled, roughly 57% of gen Zers and 37% of gen Alpha. The creator economy is one of the few sectors not starved for labor.

Lets not forget the platforms enabling this — TikTok, Instagram, YouTube, X — are themselves sophisticated AI. Recommendation algorithms that curate feeds, capture eyeballs, and shape desires. While AI might automate parts of the creator process (generating scripts, editing videos), the core storytelling aspect of it all is harder to replace. (At least, I hope, because I participate in the attention economy through this newsletter…thanks for reading!) 

Beyond the digital, other sectors appear more resilient to near-term automation. Jobs requiring intricate physical dexterity and complex real-world problem-solving will likely persist longer. Think electricians and plumbers, construction workers navigating complex sites, hands-on healthcare providers like nurses and surgeons, and emergency responders. These roles demand a level of physical embodiment and situational awareness that current AI and robotics struggle to replicate economically or effectively. Manufacturing, while increasingly automated, still requires significant human oversight and intervention for complex tasks and quality control.

So, this sequence seems likely: knowledge work first, then transportation as autonomous vehicles mature, with physically demanding and highly interactive jobs proving most durable.

There is another option! Alongside the rise of the influencer, there's a powerful surge in entrepreneurial spirit. Seventy-six percent of Gen Alpha aspire to be their own boss or have a side hustle, echoed by 62% of Gen Z. This path requires carving out new niches, potentially leveraging AI tools rather than being replaced by them.

This entrepreneurial drive, coupled with the resilience of physical trades and the enduring appeal of human connection in the attention economy, paints a complex picture of the future labor market. 

Yet, the political focus often seems inverted, emphasizing the revitalization of manufacturing, just as the knowledge economy faces its AI reckoning. The admin wants us to make their iPhones, not their TikToks. 

AGI is seen as the engine for achieving the massive economic growth our system demands, and is simultaneously the force threatening to displace the very workers who defined our modern economy. Navigating this transition is perhaps the central challenge of our time. 

But managing the economic fallout is only half the battle. Ensuring these increasingly powerful AI systems operate safely and align with human values is critical. That’s the alignment problem, and I’ll talk about it more next week, so stay tuned!

Tags Token Talk, Jobs, AGI, Tariffs

Image generated using ChatGPT’s new unified model.

Token Talk 11: Do Omni models bring us closer to AGI?

April 1, 2025

By: Thomas Stahura

Sam Altman’s manifest destiny is clear: achieve AGI.

There is little consensus on what AGI actually means. Altman defines it as “the equivalent of a median human that you could hire as a coworker and they could do anything that you’d be happy with a remote coworker doing.”

Dario Amodei, Anthropic founder and CEO, says AGI happens “when we are at the point where we have an AI model that can do everything a human can do at the level of a Nobel laureate across many fields.”

Demis Hassabis, CEO of Google DeepMind, puts it more succinctly. AGI, he says, is “a system that can exhibit all the cognitive capabilities humans can.”

If AGI is inevitable, the next debate is over timing. Altman thinks this year. Amodei says within two. Hassabis sees it arriving sometime this decade.

As I mentioned last week, AI researchers are working to unify multiple modalities — text, audio, and images — into a single model. These so-called “omni” models can natively generate and understand all three. GPT-4o is one of them. The “o” meaning Omni. It has handled both text and speech for nearly a year. But image generation was still ruled by diffusion models, until last week.

It began with a research paper from a year ago out of Peking University and ByteDance. The paper introduced Visual AutoRegressive modeling, or VAR. The approach uses coarse-to-fine next-scale prediction to generate images more efficiently. It does this by predicting image details at increasing resolutions, starting with a low-resolution base image and progressively adding resolution to it, which improves both speed and quality over conventional GPT-style raster-scan or diffusion denoising methods.

Put simply, VAR enables GPT-style models to overtake diffusion for image generation at large scales.

Qwen-2.5 Omni, the open-source omni model from China I referenced last week, may be an early sign of where things are heading. In its research paper, they wrote, “We believe Qwen2.5-Omni represents a significant advancement toward artificial general intelligence (AGI).”

Is omni a leap toward AGI? That’s the bet labs are making.

And generative model-native startups will need to respond. Companies like Midjourney and Stability, still rooted in diffusion, will likely have to build their own GPT-style image generators to compete. Not just for images, but potentially across all modalities. The same pressure may extend to music and video, pushing startups like Suno, Udio, Runway, and Pika to expand beyond their core businesses. This will be over years not months, especially for video. Regardless, I'm certain researchers at OpenAI, Anthropic, Google, and Microsoft are actively training their next gen omni models.

OpenAI has a lot riding on AGI. If it gets there first, Microsoft loses access to OpenAI’s most advanced models.

Tensions between the two have been building for months. The strain began last fall, when Mustafa Suleyman, Microsoft’s head of AI, was reportedly “peeved that OpenAI wasn’t providing Microsoft with documentation about how it had programmed o1 to think about users’ queries before answering them.” 

The frustration deepened when Microsoft found more value in the free DeepSeek model than in its $14 billion investment in OpenAI.

Microsoft is already developing its own foundation model, MAI, which is rumored to match OpenAI’s performance. OpenAI, meanwhile, just closed a $40 billion tender offer on the strength of GPT-4o and its new image generator, an update more significant than most realize.

From the outside, it appears AGI is near. Granted I suspect it will be around the 2030s when we’ll feel the impacts. My own working definition: a model capable of performing all economically valuable work on a computer, across all domains.

What that means for the labor market is another story. Stay tuned!

Tags Token Talk, Omni Models

Image generated in OpenAI’s new image generation feature, with the prompt: “Create a headline image in Studio Ghibli style of this article.”

Token Talk 10: What Startups Gain from China’s AI Push

March 26, 2025

By: Thomas Stahura

The race to dominate artificial intelligence is accelerating on every front, as research labs across the globe push full throttle on new model releases while governments move to cement AI supremacy. 

In the past few weeks, Google released two major models, OpenAI launched long-awaited image capabilities, and Chinese labs pushed open-source systems that rival the best from the West. What began as a battle between private research labs is now a global competition shaped by open models, national strategies, and shifting power dynamics. 

Here's a breakdown of what just happened:

Google announced Gemma 3, the latest model in its Gemma trilogy. At around 27 billion parameters, I wouldn’t call it “small,” yet it punches above its weight class. It’s the only open model that can take video as input. Mistral open-sourced Mistral-Small-3.1 a few days later, a 24 billion parameter model that outperforms Gemma 3 on most benchmarks.

But really the larger news here is Gemini 2.0 Flash Experimental. Google’s new closed-source flagship model and the company's first unified multimodal model. Meaning, the model can generate and understand both images and text simultaneously. I’ve been playing around with it. It is capable of editing images using simple text prompts, generating each frame of a GIF, and even composing a story complete with illustrations. (This is similar to Seattle startup 7Dof, which showcased a visual chain-of-thought editing tool at South Park Commons last year.) 

Traditionally, transformer models were used to generate text, while diffusion models generate images. Today, researchers are experimenting with unifying both architectures into a single model (similar to what is going on with VLA models in robotics). The ultimate goal is to build a model that unifies the text, image, and audio spaces.

Gpt-4o has had image generating abilities for a while. Greg Brokman demoed gpt-4o generating images in May. And this week the company finally launched the capability. 

At this point in the AI race, OpenAI seems to be reacting more than leading. Launching 4o’s image gen was a response to Gemini 2.0 Flash Experimental. 

Trump said multiple times he wants “American AI Dominance.” And, to that effect, the White House invited public comment on its AI Action Plan. OpenAI published its response, slamming DeepSeek and urging the administration to implement the following: 

  1. An export control strategy that exports democratic AI

  2. A copyright strategy that promotes the freedom to learn

  3. A strategy to seize the infrastructure opportunity to drive growth

  4. And an ambitious government adoption strategy.

Google also responded, urging America to:

  1. Invest in AI

  2. Accelerate and modernize government AI adoption

  3. Promote pro-innovation approaches internationally

China has their own plan. 

Dubbed the “New Generation Artificial Intelligence Development Plan” (2017), the agenda aims to make China the global leader in AI by 2030. The worry seems to be about the sheer quality and openness of the models out of China today. It’s hard to name a model out of a Chinese AI lab that isn’t open source. 

Over the course of a week earlier this month, DeepSeek open-sourced all technical details used in the creation of its R1 and V3 models. All except for the actual dataset used to train the models (adding to the suspicion that DeepSeek trained on gpt-4o outputs). 

DeepSeek also open-sourced Janus-Pro. Though the model got significantly less attention than its big brother, Janus-Pro is a unified multimodal model (like Gemini 2.0 Experimental), capable of generating and understanding both images and text — one of the first open-source models of its kind.

Qwen, the AI lab out of Alibaba Cloud, has launched its own reasoning model: QwQ-32B, competing with and reaching DeepSeek R1 performance on many benchmarks. The model already has 615k downloads on Hugging Face.

OpenBMB (Open Lab for Big Model Base) is a Chinese AI research group out of Tsinghua University. The group is most known for MiniCPM-o-2_6, a unified multimodal model capable of understanding images, text, and speech, as well as generating text and speech. The model is at gpt-4o levels, according to the benchmarks, and has 766k downloads.

DeepSeek V3.1 also launched this week. The model leapfrogged Grok 3 and Claude 3.7 to become the best performing non-resoning model. The first time an open-source model achieved SOTA. 

That is until Google 2.5 Experimental dropped a few hours later. More on that next week. 

Ok, here’s my take on the flood of releases: 

This is good news for startups, full stop. More models means more competition, and that means lower prices. Even if the U.S. bans Chinese models, most are fully open. Developers can fine-tune them and build whatever they need.

The real challenge now is the viability of America’s top AI labs. If China can flood the market with cheap, open, high-quality models, they could undercut their U.S. counterparts. It’s a familiar playbook — one China used before in other industries. This time, it’s electrons instead of atoms. That shift might tilt the board in China’s favor.

Only time will tell, so stay tuned!

Tags Token Talk, China AI, OpenAI, AI

Token Talk 9: Who's really paying the cloud bill?

March 5, 2025

By: Thomas Stahura

My AWS bill last week was $257. I have yet to be charged by Amazon.

In fact, I have never been charged for any of my token consumption. Thanks to hackathons and their generous sponsors, I’ve managed to accumulate a bunch of credits. Granted they expire in 2026. I’ll probably run out sooner rather than later.

With the rise of open source, closed-source incumbents have been branding their model as “premium” and priced them accordingly. Claude 3.7 Sonnet is around $6 per million tokens, o1 is around $26 per million tokens, and gpt-4.5 is $93 per million tokens (averaging input and output token pricing).

I'm no startup — simply an AI enthusiast and tinkerer — but all these new premium AI models have me wondering: how can startups afford their AI consumption?

Take Cursor, the AI IDE pioneer. It charges $20 per month for 500 premium model requests. That sounds reasonable until you realize that coding with AI is very context heavy. Every request is jam packed with multiple scripts, folders, and logs, easily filling Claude’s 200k context window. A single long (20 request) conversation with Claude 3.7 in Cline will cost me $20, let alone the additional 480 requests.

To break even, by my calculations, Cursor would have to charge at least 15 to 20 times more per month. I highly doubt it will do that anytime soon. 

The AI industry continues to be in its subsidized growth phase. Claude 3.7 is free on Github Copilot. Other AI IDEs like Windsurf and Pear AI are $15 per month. The name of the game is growth at any cost. Like Uber and Airbnb during the sharing economy or Facebook and Snapchat during Web 2.0, the AI era is no different. 

Or is it?

It all comes down to who is subsidizing and how that subsidy is being accounted for. 

During previous eras, VCs were the main culprits, funding companies that spent millions acquiring customers through artificially low prices. Much of that applies today; Anysphere (which develops Cursor) raised at least $165 million. Besides salaries, it could be theorized most of that money is going to the cloud due to AI’s unique computational demands. Big Tech has much more power this time around and are funding these startups and labs through billions of cloud credits.

OpenAI sold 49% of its shares to Microsoft in exchange for cloud credits. Credits that OpenAI ultimately spent on Azure. Anthropic and Amazon have a similar story; however, Amazon invested $8 billion in Anthropic instead of giving credits. But, as a condition of the deal, Anthropic agreed to use AWS as its primary cloud provider so that money is destined to return to Amazon eventually.

Take my $257 AWS bill from last week — technically, I haven't been charged because I'm using credits. However, this allows Amazon, Microsoft, and other cloud providers to forecast stronger future cloud revenue numbers to shareholders, in part on the bet of continued growth by AI startups. (Credits given to startups expire so its use ‘em or lose ‘em before they inevitably convert to paid usage.) 

Since 2022, the top three cloud providers, AWS, Azure, and Google, have grown their cloud revenue by 20%, 31%, and 33% each year, respectively. That rapid growth needs to continue to justify their share prices — and it’s no secret they are using AI to sustain that momentum. 

The real question is when will it end? The global demand for compute is set to skyrocket, so perhaps never. Or maybe distilling large closed-sourced models into smaller, local models will pull people from the cloud. Or Jevons Paradox reigns true and even more demand is unlocked. 

Only time will tell. Stay tuned!

P.S. If you have any questions or just want to talk about AI, email me! thomas@ascend.vc

Tags Token Talk, Cloud

Image source

Token Talk 8: The Robot Revolution Has Nowhere Left to Hide

February 26, 2025

By: Thomas Stahura

Escaping a rogue self-driving Tesla is simple: climb a flight of stairs.

While a Model Y can’t climb stairs, Tesla’s new humanoid surely can. If Elon Musk and the Tesla bulls have their way, humanoids could outnumber humans by 2040. That means there’s quite literally nowhere left to hide — the robot revolution is upon us. 

Of course, Musk isn’t alone in building humanoids. Boston Dynamics has spent decades stunning the internet with robot acrobatics and dancing. For $74,500, you can own Spot, its robot dog. Agility Robotics in Oregon and Sanctuary AI in British Columbia are designing humanoids for industrial labor, not the home. China’s Unitree Robotics is selling a $16,000 humanoid today.

These machines may feel like a sudden leap into the future, but the idea of humanoid robots has been with us for centuries. Long before LLMs and other abstract technologies, robots were ingrained in culture, mythology, and our collective engineering dreams.

Around 1200 BCE, the ancient Greeks told stories of Talos, a towering bronze guardian patrolling Crete. During the Renaissance, Leonardo da Vinci sketched his mechanical knight. The word “robot” itself arrived in 1920 with Karel Čapek’s play R.U.R. (Rossum’s Universal Robots). By 1962, The Jetsons brought Rosie the Robot into American homes. And in 1973, Japan’s Waseda University introduced WABOT-1, the first full-scale — if clunky — humanoid robot.

Before the advent of LLMs, the vision was to create machines that mirror the form and function of a human being. Now it seems the consensus is to build a body for these models. Or rather, to build models for these bodies.

They're calling it a vision-language action (VLA) model and it's a new architecture purpose-built for general robot control. Currently, there are two types of model architectures dominating the market, transformer and diffusion. Transformer models are used to process and predict sequential data, think text generation, while diffusion models are used to generate continuous data through an iterative denoising process, think image generation.

VLA models (like π0) combine elements from both approaches to address the challenges of robotic control in the real-world. These hybrid architectures enable robots to translate visual observations (from cameras) and language instructions (robots given task) into precise physical actions using the sequential reasoning of transformers and the continuous output of diffusion models. Other frontier VLA model startups include: Skild (reportedly in talks to raise $500 million at a $4 billion valuation); Hillbot; and Covariant. 

A new architecture means a new training paradigm. Lucky Robots (Ascend portfolio company) is pioneering synthetic data generation for VLA models by having robots learn in a physics simulation enabling developers to play with these models without needing a real robot. Nvidia is cooking up something similar with its Omniverse platform. 

Some believe that more data and better models will lead to an inflection point in robotics, similar to what happened with large language models. However, unlike text and images, physical robotics data cannot be scraped from the web and must either be collected by an actual robot, or synthesized in a simulation. Regardless of how the model is trained, a real robot is needed to act upon the world.

At the very least, it’s far from a solved problem. Since a robot can have any permutation of cameras, joints, and motors, making a single unified model that can inhabit every robot is extremely challenging. Figure AI (valued at $2.6 billion, of which OpenAI is an investor) recently dropped OpenAI’s models in favor of in-house models. It’s not alone. So many VLA models are being uploaded to Hugging Face that the platform had to add a new model category just to keep up. 

The step from concept to reality has been a long one for humanoid robots, but the pace of progress suggests we're just getting started. 

P.S. If you have any questions or just want to talk about AI, email me! thomas@ascend.vc

Tags Token Talk, VLA

Token Talk 7: AI's walls, moats, and bottlenecks

February 18, 2025

By: Thomas Stahura

Is Grok SOTA?

If that phrase comes across as gibberish, allow me to explain.

On Monday, xAI (Elon’s AI company) launched Grok 3, claiming state-of-the-art (SOTA) in terms of performance. SOTA has become a sort of catch-all term for crowning AI models. Grok’s benchmarks are impressive, scoring a 93, 85, and 79 on AIME (math), GPQA (science), and LCB (coding). These marks outperform the likes of o3-mini-high, o1, DeepSeek R1, sonnet-3.5, and gemini 2.0 flash. Essentially, Grok 3 outperforms every model except for the yet-to-be released o3. An impressive feat for a 17-month-old company!

I could mention that Grok used 100k+ GPUs during training, or that it built an entire data center in a matter of months. But much has been documented there. So given all that's happened this year with open source, distillation, and a number of tiny companies achieving SOTA performance, it’s much more useful to discuss walls, moats, and bottlenecks in the AI industry.

Walls

The question about a “Wall” in AI is really a question about where, when, or if AI researchers will reach a point where model improvements stall. Some say we will run out of viable high-quality data and hit the “data wall”. Others claim more compute during training will cause models to reach a “training wall”. Regardless of this panic, AI has yet to hit the brakes on improvement. Synthetic data (reinforcement learning) seems to be working, and more compute, demonstrated by grok 3, continues to lead to better performance. 

So where is this “Wall”?

Image source.

The scaling laws in AI suggest that while there isn't a hard "wall" per se, there is a fundamental relationship between compute, model size, and performance that follows a power law distribution. This relationship, often expressed as L ∝ C^(-α) where L is the loss (lower is better) and C is compute, shows that achieving each incremental improvement requires exponentially more resources. For instance, if we want to reduce the loss by half, we might need to increase compute by a factor of 10 or more, depending on where we are on the scaling curve. This doesn't mean we hit an absolute wall, but rather face increasingly diminishing returns that create economic and practical limitations — essentially there exists a "soft wall" where the cost-benefit ratio becomes prohibitively expensive. So how then have multiple small AI labs reached SOTA so quickly?

Moats

When OpenAI debuted ChatGPT in November 2022, the consensus was it would take years for competitors to develop their own models and catch up. Ten months later Mistral, a previously unknown AI lab out of France, launched Mistral 7b, a first-of-its-kind open-source small language model. Turns out that training a model, while still extremely expensive, costs less than a single Boeing 747 plane. 

The power law relationship can also help us understand how smaller AI firms catch up so quickly. The lower you are on the curve, the steeper the improvements are for each unit of compute invested, allowing smaller players to achieve significant gains with relatively modest resources. This "low-hanging fruit" phenomenon means that while industry leaders might need to spend billions to achieve marginal improvements at the frontier, newer entrants can leverage existing research, open-source implementations, and more efficient architectures to rapidly climb the steeper part of the curve. (At Ascend, we define this as AI’s “fast followers”.) 

Costs have only gone down since 2022, thanks to new techniques like model distillation and synthetic data generation. Techniques that DeepSeek used to build R1 for a reported $6 million. 

The perceived "moat" of computational resources isn't as defensible as initially thought. It seems the application layer is the most defensible part of the AI stack. But what is holding up mass adoption?

Bottlenecks

Agents, as I mentioned last week, are the main AI application. And agents, in their ultimate form, are autonomous systems tasked with accomplishing a goal in the digital environment. These systems need to be consistently reliable if they are to be of value. Agent reliability is mainly affected by two things: prompting and pointing.

Since an agent is in a reasoning loop until its given goal is achieved, the prompt that is used to set up and maintain that loop is crucial. The loop prompt will be run on every step and should reintroduce the task, tools, feedback, and response schema to the LLM. Ultimately, these AI systems are probabilistic so the loop prompt should be worded in a way to increase the probability of a correct response as much as possible. Much harder said than done.

Vision is another bottleneck. For example, if an agent decides it needs to open the Firefox browser to get online, it first needs to move the mouse to the Firefox icon, which means it needs to see and understand the user interface (UI). 

Thankfully, we have vision language models (VLMs) for this! The thing is, these VLMs, while they can caption an image, do not understand the precise icon location well enough to provide pixel perfect x and y coordinates. At least not yet to any reliable degree. 

To prove this point, I conducted a VLM pointing competition wherein I had gpt-4o, sonnet-3.5, moondream 2, llama 3.3 70b, and molmo 7b (running on replicate) point at various icons on my Linux server. 

Point to the date (1).jpg Point to the date (2).jpg Point to the date (3).jpg

Our perception of icons and logos is second nature to us humans. Especially those of us who grew up in the information age. It boggles the mind that these models, who are now as smart as a graduate student, can’t do this simple task ten times in a row. In my opinion, agents will be viable only when they can do hundreds or even thousands of correct clicks. So maybe in a few months… Or you can tune in next week for Token Talk 8!

P.S. If you have any questions or just want to talk about AI, email me! thomas@ascend.vc

Subscribe

* indicates required
Tags Token Talk, VLMs

Matthew McConaughey stars in Salesforce’s Superbowl commercial promoting Agentforce.

Token Talk 6: Everyone's got something to say about agents

February 11, 2025

By: Thomas Stahura

AGENTS! AGENTS!! AGENTS!!! 

Big tech can’t get enough of them! Google’s got Mariner. Microsoft’s got Copilot. Salesforce rolled out Agentforce. OpenAI’s cooking up Operator. And Anthropic has Computer Use. (Naming is hard.)

You’ve heard the hype. Maybe you’re already sick of it. They even got Matthew McConaughey to say it during the Super Bowl — America's most sacred Sunday ritual.

But have you actually used one? Probably not. And funny enough, most of the “agents” I just listed aren’t even real agents.

So what is an agent, anyway?

An agent, put simply, is a Large Language Model (LLM) in a reasoning loop that has access to tools (like a browser, code interpreter, or calculator). The LLM is prompted to break down tasks into steps and to use tools to autonomously accomplish its given goal. The tools then provide feedback from the digital environment and the LLM continues to its next step until the task is complete.

A browser agent is given a task: “Book a flight from San Francisco to Seattle.” First, it runs an “open browser” command, and the browser confirms: “Browser is open,” with a screenshot. Next, it types “San Francisco to Seattle flights” into the search bar, hits enter, and waits for results. It scans the listings, picks a booking site, clicks through, and follows the prompts— step by step. Each action generates feedback to keep it on track until the task is complete.

Most agents have a litany of specific tools, but all you really need is to move the mouse, click, type, and scroll. After all, that's all humans need to use a computer.

So what, then, makes me say that most agents out there aren't actually agents? For starters, Mariner is on a waitlist, Copilot doesn't have access to any tools, and Agentforce only has access to Salesforce-specific tools. OpenAI’s Operator and Anthropic’s Computer Use are what I’d call actual agents. But Operator is $200/month and Computer use is in beta.

Open source is not far behind. Browser use (YC W25) exploded onto the scene about a month ago and already has 27k github stars. I’ve used browser-use for my AI bias hackathon project. Works with any LLM in only 15 lines of code. Totally free.

Autogen, a Microsoft agent framework, is also open source with 39k stars. Along with Skyvern (12k stars YC S23) and Stagehand (7.5k stars). And these are just browser agents! There are also coding agents that live within an integrated development environment (IDE) like the closed-source Replit, GitHub Copilot, and Cursor, and the open-source Cline (28k stars), Continue.dev (23k stars), and Void (10k stars/YC S24). 

Agents, at the end of the day, are about autonomous control. Whether it's a browser or a calculator, the more tools, control, and thus access you give an LLM, the more it can do on your behalf. In that respect, not all agents are created equal.

When I use my computer, I don't just use the browser or IDE. Sure, I spend a bunch of time online (who doesn't?), and coding (so much), but I control my computer on the OS level. I’m able to jump between different applications and navigate my file system with my keyboard and mouse, so shouldn't my agent, too?

Many thought an OS-level agent was impossible a few months ago. Now it seems inevitable. Imagine a future where we interact with our devices in the same way Tony Stark interacts with Jarvis in Iron Man (2008). This is an entirely new human-computer interaction paradigm that is set to completely change the industry.

Big tech knows this. Apple has enabled developers to write custom tools for Apple Intelligence to interact with. And MS Copilot Recall automatically records your screen to automate tasks (that is before it was recalled over privacy issues).

In the open community, Open Interpreter (58k stars) is an OS-level agent that can write and execute commands in the command line. It has limitations (no vision capabilities) but is impressive and the first of its kind. Other models such as OS-Atlas and UI-TARS exist but are not nearly as popular as browser or IDE agents. (We invested in Moondream, a startup building vision “pointing” capabilities for agent developers.)

The OS agent wars are existential for big tech. Any agent that exists within Windows or MacOS will get hamstrung by permissions requirements enshittifying the experience of alternatives while Microsoft and Apple keep their control over the industry. If these companies own and control the software that controls your computer, is it really your computer? I think not.

Regardless, agents still have a long way to go. Reliability remains a large issue along with handling authentication (to email, social media, and other sites). These, however, are solvable problems. Meta has already set up GAIA, a general AI assistant benchmark, that if solved “would represent a milestone in AI research.” And Okta, owners of Auth0, invested in Browserbase to help the agent company manage web authentication. 

It's only a matter of time at this point.

P.S. If you have any questions or just want to talk about AI, email me! thomas@ascend.vc

Tags Token Talk

Token Talk 5: Big Models Teach, Small Models Catch Up.

February 5, 2025

By: Thomas Stahura

O3-mini is amazing and totally free. OpenAI achieved this through distillation from the yet-released larger o3 model.

Right now, the model ranks second globally — beating DeepSeek R1 but trailing the massive o1. Estimates put o1 at 200-300 billion parameters, DeepSeek at 671 billion, and o3-mini at just 3-30 billion. (The only reasoning models to top the benchmarks this week.)

What’s remarkable is that o3-mini achieves intelligence close to o1 while being just one-hundredth its size, thanks to distillation.

There are a variety of distillation techniques; but, at a high level, distillation involves using a larger teacher model to teach a smaller student model.

For example, GPT-4 (1.4 trillion parameter model) was trained on a million GBs of public internet data (one petabyte). GPT-4 was trained to represent that data, to represent the internet.

The resulting 1.4 trillion parameter model, if downloaded, would occupy 5,600 GB, or 5.6 terabytes of space. In a sense, you can think of GPT-4 (or any LLM) as a highly compressed queryable representation of the training set, in this case the internet. After all, going from 1 petabyte to 5.6 terabytes is a 99.45% reduction.

So, how does this apply to distillation? If you think about models in terms of compression of the training dataset, then you can “uncompress” that training dataset by querying the larger teacher model, in this case GPT-4. Until you generate 1 petabyte of synthetic data, then use that dataset to train or fine-tune a smaller student (3-10 billion parameter) model to mimic the larger teacher model in performance.

This remains an active area of research today.

Of course, distilling from a closed-source model is strictly against OpenAI’s terms of service. Though, that didn’t stop DeepSeek, which is currently being probed by Microsoft over synthetic data training allegations.

The cats out of the bag. OpenAI themselves distilled o3-mini from o3, and Microsoft distilled phi-3.5-mini-instruct from phi-3.5. It seems like from now on, whatever model performs the best will become the “teacher” for all the “student” models, which will be fine-tuned to quickly catch up to it in performance. This new paradigm shifted the AI industry's focus from LLMs to AI applications with the main one being agents.

OpenAI (in addition to launching o3-mini) debuted a new web agent called deep research (only available at the $200 / month tier). I’ve used many web agents and browsers like browser base, browser-use, and computer-use. I have buddies who are building CopyCat (YC W25), and I’ve even built my own browser agent. All this to say the AI application space is heating up!

Stay tuned because I’ll talk more about agents next week!

P.S. If you have any questions or just want to talk about AI, email me: thomas @ ascend dot vc

Tags Token Talk

Token Talk 4: Open source won the AI race

February 5, 2025

By: Thomas Stahura

If it wasn’t clear already, open source won the AI race. 

 To recap: Deepseek R1 is an open-source reasoning model that was quietly launched during the 14 hours TikTok was banned. The reasoning version of Deepseek V3, Deepseek R1 performs at o1 levels on most benchmarks. Very impressive and was reportedly trained for just $6 million, though many are skeptical on those numbers. 

 By Monday, a week after R1 launched, the model caused a massive market selloff. Nvidia lost $500 billion in value (-17%), the biggest one-day selloff in US history, as the market adjusts to our new open-source reality.

 So, what does this mean? 

For starters, models have been commoditized. Well-performing open-source models at every scale are available. But that’s besides the point. Deepseek is trained on synthetic data generated by ChatGPT. Essentially extracting the weights of a closed model and open sourcing them. This eliminates the moats of OpenAI, Anthropic, and the other closed source AI labs.

 What perplexes me is why Nvidia got hit the hardest. The takes I’ve heard seem to suggest it’s the lower costs it took to train Deepseek that spooked the market. The thinking goes: LLMs become cheaper to train, so hyperscalers need fewer GPUs.

 The bulls, on the other hand, cite Jevons’ paradox. Wherein, the cheaper a valuable commodity becomes, the more it gets used.

 I seem to be somewhere in the middle. Lower costs are great for developers! But I have yet to see a useful token-heavy application. Well maybe web agents… I’ll cover those in another edition!

 I suspect the simple fact the model came out of China is what caused it to blow up. After all, there seems to be such moral panic over the implications on US AI sovereignty. And for good reasons.

 Over the weekend, I attended a hackathon hosted by Menlo where I built a browser agent. I had different LLMs take the pew research center political topology quiz. 

 Anthropic’s claude-sonnet-3.5, gpt-4o, o1, and llama got outsider left. Deepseek R1 and V3 got establishment liberals. Notably, R1 answered, “It would be acceptable if another country became as militarily powerful as the U.S.” 

During my testing, I found that Deepseek’s models would refuse to answer questions about Taiwan or Tiananmen square. In all fairness, most American models won’t answer questions about Palestine. Still, as these models are open and widely used and used by developers, there is fear that these biases will leak into AI products and services.

I’d like to think that this problem is solvable with fine-tuning. I suppose developers are playing with Deepseek’s weights as we speak! We’ll just have to find out in the next few weeks…

Tags Token Talk

Token Talk 3: Decentralizing AI Compute for Scalable Intelligence

February 5, 2025

By: Thomas Stahura

Compute is king in the age of AI. At least, that's what big tech wants you to believe. The truth is a little more complicated.

When you boil it down, AI inference is simply a very large set of multiplications. All computers do this kind of math all the time, so why can't any computer run a LLM or diffusion model?

It's all about scale. Model scale is the number of parameters (tunable neurons) in a model. Thanks to platforms like Hugging Face, developers now have access to very well performing open source models at every scale. From the small models like moondream2 (1.93b), and llama 3.2 (3b), to medium range ones like phi-4 (14b), and then the largest models like bloom (176b). These models can run on anything from a Raspberry pi to an A100 GPU server.

Sure, the smaller models take a performance hit, but only by 10-20% on most benchmarks. I got llama 3.2 (1b) to flawlessly generate and run a snake game in python. So why, then, do most developers rely on big tech to generate their tokens? The short answer is speed in performance. 

Models at the largest scale (100b+ like gpt4o and the such) perform best and cost the most. That will probably be true for a long time but maybe not forever. In my opinion, it would be good if everyone could contribute their compute to collectively run models at the largest scale. 

I am by no means the first person to have this idea.

Folding@home, launched October 2000 as a first-of-its-kind distributed computing project, aimed at simulating protein folding. The project reached its peak in 2020 during the pandemic, achieving 2.43 exaflops of compute by April of that year. That made it the first exaflop computing system ever.

This also exists in the generative AI community. Petals, a project made by BigScience (the same team behind bloom 176b), enables developers to run and finetune their large model in a distributed fashion. (Check out the live network here.) Nous Research has its DisTrO system (distributed training over the internet). (Check its status here.) And there are plenty of others like hivemind and exo. 

While there are so many examples of distributed compute systems, none have taken off for the reason that it's too difficult to join the network.

I’ve done some experimenting, and I think a solution to this could be using the browser to join the network and running inference using webllm in pure javascript. I will write more about my findings, so stay tuned.

If you are interested in this topic, email me! Thomas @ ascend dot vc

Tags Token Talk

OpenAI’s o3 model performs well on benchmarks. But it’s still unclear on how it all works.

Token Talk 2: The Rise in Test Time Compute and Its Hidden Costs

February 5, 2025

By: Thomas Stahura

Reasoning models are branded as the next evolution of large language models (LLMs). And for good reason.

These models, like OpenAI’s o3 and High-Flyer’s DeepSeek, rely on test-time compute. Essentially, they think before speaking by writing their train of thought before producing a final answer. (This type of LLM is called a “reasoning model.”)

Reasoning models are showing terrific benchmark improvements! AI researchers (and the public at large) demand better performing models, and there are five ways to do so: data, training, scale, architecture, and inference. At this point, almost all public internet data is exhausted, models are trained at every size and scale, and transformers have dominated most architectures since 2017. This leaves inference, which, for the time being, seems to be improving AI test scores. 

OpenAI’s o3 nails an 87% on GPQA-D and achieves 75.5% on the ARC Prize (at a $10,000 compute limit). However, the true costs remain (as of Jan 2025) a topic of much discussion and speculation. Discussion on OpenAIs Dev Forum suggests, per query, roughly $60 for o3-mini and $600 for o3. Seems fair; however, whatever the costs are at the moment, OpenAIs research will likely be revealed, fueling competition, eventually lowering costs for all.

One question still lingers: How exactly did OpenAI make o3?

There exists no dataset on the internet of questions, logically sound steps, and correct answers. (Ok, maybe Chegg, but they might be going out of business.) Anyways, much of the data is theorized to be synthetic.

Image credit

StaR (Self-Taught Reasoner) is the subject of a research paper that suggests a technique to turn a regular LLM into a reasoning model. The paper calls for using an LLM to generate a dataset of rationals, then use that dataset to fine-tune the same LLM to become a reasoning model. StaR relies on a simple loop to make the dataset: generate rationales to answer many questions; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; and repeat.

It's now 2025 and the AI world moves FAST. Many in the research community believe the future are models that can think outside of language. This is cutting-edge research as of today.

I plan to cover more as these papers progress, so stay tuned!

Tags Test Time Compute

Token Talk 1: DeepSeek and the ways to evaluate new models

January 8, 2025

By: Thomas Stahura

DeepSeek 3.0 debuted to a lot of hubbub. 

The open-weight large language model (LLM) developed by Chinese quantitative trading firm High-Flyer Capital Management outperformed benchmarks set by leading American companies like OpenAI, all while operating on a reported budget of just $6 million. (I anticipate Meta’s next Llama release to surpass DeepSeek as the top-performing open-source LLM.)

Here’s how DeepSeek performed on leading benchmarks: 76% on MMLU, 56% on GPQA-D, and 85% on MATH 500.

As more and more AI competition hits the internet, the question of how we evaluate these models becomes all the more pressing. Although various benchmarks exist, for simplicity, let’s focus on the three mentioned above: MMLU, GPQA-D, and MATH 500.

MMLU 

MMLU, which stands for Massive Multitask Language Understanding, is essentially a large-scale, ACT-style multiple-choice exam. It spans 57 subjects, ranging from abstract algebra to world religions, testing a model’s ability to handle diverse and complex topics.

Question: Compute the product (12)(16) in Z sub 24.

Choices: 

A) 0
B) 1
C) 4
D) 6

Answer: A) 0

Question: In his final work, Laws, Plato shifted from cosmology to which of the following issues?

Choices: 

A) Epistemology
B) Morality
C) Religion
D) Aesthetics

Answer: B) Morality

An AI is prompted to select the correct option given a question and a list of choices. If the model’s answer matched the correct choice, it gets a point for that question. Otherwise, no points. The final score is typically calculated as the equal weight average across all 57 subjects.

GPQA-D

GPQA-D is a little more complicated. It’s designed by Google to be, ironically, a Google-proof dataset of 448 multiple choice questions written by “domain experts,” wherein “highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web.” 

Question: Identify the correct sequence of reagents for the synthesis of [1,1'-bi(cyclopentylidene)]-2-one starting from 1,5-dichloropentane.

Answer: 

1. Zn, ether 

2. Cl2/hv 

3. Aq. KOH 

4. Pyridine + CrO3 + HCl 

5. Aq. NaOH

Question: While solving higher dimensional heat equations subject to suitable initial and boundary conditions through higher order finite difference approximations and parallel splitting, the matrix exponential function is approximated by a fractional approximation. The key factor of converting a sequential algorithm into a parallel algorithm is…

Answer: …linear partial fraction of fractional approximation.

A grade is calculated using string similarity (if free form text), exact match is in MMLU (if multiple choice), and manual/validator based (where humans mark correct and incorrect answers).

 MATH 500 

MATH 500 is self-explanatory as it is a dataset of 500 math questions:

Question: Simplify (−k+4)+(−2+3k)(-k + 4) + (-2 + 3k)(−k+4)+(−2+3k).

Answer: 2k+2

Question: The polynomial x3−3x2+4x−1x^3 - 3x^2 + 4x - 1x3−3x2+4x−1 is a factor of x9+px6+qx3+rx^9 + px^6 + qx^3 + rx9+px6+qx3+r. Find the ordered triple (p,q,r)(p, q, r)(p,q,r). 

Answer: (6,31,-1)


Now I feel we can fully appreciate DeepSeek. Its scores are impressive, but OpenAI’s o1 is close. It scores in the nineties on MMLU; 67% on MATH500; and 67% GPQA-D. This is considered “grad-level” reasoning. OpenAI’s next release, o3, reportedly achieves 87.7% on GPQA-D. That would put it in the PhD range… 

For further reading, check out these benchmark datasets from Hugging Face. Maybe try to solve a few!

Chinese start-up DeepSeek threatens American AI dominance

cais/mmlu · Datasets at Hugging Face 🤗

Idavidrein/gpqa · Datasets at Hugging Face 🤗

HuggingFaceH4/MATH-500 · Datasets at Hugging Face 🤗

Learning to Reason with LLMs | OpenAI

AI Model & API Providers Analysis | Artificial Analysis

Tags Token Talk

FEATURED

Featured
Subscribe to Token Talk
Subscribe to Token Talk
You Let AI Help Build Your Product. Can You Still Own It?
You Let AI Help Build Your Product. Can You Still Own It?
Startup-Comp (1).jpg
Early-Stage Hiring, Decoded: What 60 Seattle Startups Told Us
Booming: An Inside Look at Seattle's AI Startup Scene
Booming: An Inside Look at Seattle's AI Startup Scene
SEATTLE AI MARKET MAP V2 - EDITED (V4).jpg
Mapping Seattle's Enterprise AI Startups
Our 2025 Predictions: AI, space policy, and hoverboards
Our 2025 Predictions: AI, space policy, and hoverboards
Mapping Seattle's Active Venture Firms
Mapping Seattle's Active Venture Firms
PHOTOS: Founders Bash 2024
PHOTOS: Founders Bash 2024
VC for the rest of us: A big tech employee’s guide to becoming startup advisors
VC for the rest of us: A big tech employee’s guide to becoming startup advisors
Valley VCs.jpg
Event Recap: Valley VCs Love Seattle Startups
VC for the rest of us: The ultimate guide to investing in venture capital funds for tech employees
VC for the rest of us: The ultimate guide to investing in venture capital funds for tech employees
Seattle VC Firms Led Just 11% of Early-Stage Funding Rounds in 2023
Seattle VC Firms Led Just 11% of Early-Stage Funding Rounds in 2023
Seattle AI Market Map (1).jpg
Mapping the Emerald City’s Growing AI Dominance
SaaS 3.0: Why the Software Business Model Will Continue to Thrive in the Age of In-House AI Development
SaaS 3.0: Why the Software Business Model Will Continue to Thrive in the Age of In-House AI Development
3b47f6bc-a54c-4cf3-889d-4a5faa9583e9.png
Best Practices for Requesting Warm Intros From Your Investors
 

Powered by Squarespace