Token Talk: Open source won the AI race

January 30, 2025

By: Thomas Stahura

If it wasn’t clear already, open source won the AI race.

To recap: Deepseek R1 is an open-source reasoning model that was quietly launched during the 14 hours TikTok was banned. The reasoning version of Deepseek V3, Deepseek R1 performs at o1 levels on most benchmarks. Very impressive and was reportedly trained for just $6 million, though many are skeptical on those numbers.

By Monday, a week after R1 launched, the model caused a massive market selloff. Nvidia lost $500 billion in value (-17%), the biggest one-day selloff in US history, as the market adjusts to our new open-source reality.

So, what does this mean?

For starters, models have been commoditized. Well-performing open-source models at every scale are available. But that’s besides the point. Deepseek is trained on synthetic data generated by ChatGPT. Essentially extracting the weights of a closed model and open sourcing them. This eliminates the moats of OpenAI, Anthropic, and the other closed source AI labs.

What perplexes me is why Nvidia got hit the hardest. The takes I’ve heard seem to suggest it’s the lower costs it took to train Deepseek that spooked the market. The thinking goes: LLMs become cheaper to train, so hyperscalers need fewer GPUs.

The bulls, on the other hand, cite Jevons’ paradox. Wherein, the cheaper a valuable commodity becomes, the more it gets used.

I seem to be somewhere in the middle. Lower costs are great for developers! But I have yet to see a useful token-heavy application. Well maybe web agents… I’ll cover those in another edition!

I suspect the simple fact the model came out of China is what caused it to blow up. After all, there seems to be such moral panic over the implications on US AI sovereignty. And for good reasons.

Over the weekend, I attended a hackathon hosted by Menlo where I built a browser agent. I had different LLMs take the pew research center political topology quiz.

Anthropic’s claude-sonnet-3.5, gpt-4o, o1, and llama got outsider left. Deepseek R1 and V3 got establishment liberals. Notably, R1 answered, “It would be acceptable if another country became as militarily powerful as the U.S.”

During my testing, I found that Deepseek’s models would refuse to answer questions about Taiwan or Tiananmen square. In all fairness, most American models won’t answer questions about Palestine. Still, as these models are open and widely used and used by developers, there is fear that these biases will leak into AI products and services.

I’d like to think that this problem is solvable with fine-tuning. I suppose developers are playing with Deepseek’s weights as we speak! We’ll just have to find out in the next few weeks…

Token Talk: Decentralizing AI Compute for Scalable Intelligence

January 22, 2025

By: Thomas Stahura

Compute is king in the age of AI. At least, that's what big tech wants you to believe. The truth is a little more complicated.

When you boil it down, AI inference is simply a very large set of multiplications. All computers do this kind of math all the time, so why can't any computer run a LLM or diffusion model?

It's all about scale. Model scale is the number of parameters (tunable neurons) in a model. Thanks to platforms like Hugging Face, developers now have access to very well performing open source models at every scale. From the small models like moondream2 (1.93b), and llama 3.2 (3b), to medium range ones like phi-4 (14b), and then the largest models like bloom (176b). These models can run on anything from a Raspberry pi to an A100 GPU server.

Sure, the smaller models take a performance hit, but only by 10-20% on most benchmarks. I got llama 3.2 (1b) to flawlessly generate and run a snake game in python. So why, then, do most developers rely on big tech to generate their tokens? The short answer is speed in performance.

Models at the largest scale (100b+ like gpt4o and the such) perform best and cost the most. That will probably be true for a long time but maybe not forever. In my opinion, it would be good if everyone could contribute their compute to collectively run models at the largest scale.

I am by no means the first person to have this idea.

Folding@home, launched October 2000 as a first-of-its-kind distributed computing project, aimed at simulating protein folding. The project reached its peak in 2020 during the pandemic, achieving 2.43 exaflops of compute by April of that year. That made it the first exaflop computing system ever.

This also exists in the generative AI community. Petals, a project made by BigScience (the same team behind bloom 176b), enables developers to run and finetune their large model in a distributed fashion. (Check out the live network here.) Nous Research has its DisTrO system (distributed training over the internet). (Check its status here.) And there are plenty of others like hivemind and exo.

While there are so many examples of distributed compute systems, none have taken off for the reason that it's too difficult to join the network.

I’ve done some experimenting, and I think a solution to this could be using the browser to join the network and running inference using webllm in pure javascript. I will write more about my findings, so stay tuned.

If you are interested in this topic, email me! Thomas @ ascend dot vc

Subscribe