
Explaining DeepSeek R1 (and How to Use It)
- By Bruce Nielson
- ML & AI Specialist
If you follow AI at all, you've probably been hearing across nearly all media about the sudden and unexpected rise of a new Large Language Model (LLM) from DeepSeek—based in China, no less!—that’s making waves and worrying American companies. So, what’s going on, and how can you take advantage of it?
Who is DeepSeek?
DeepSeek is a China-based company that has actually been in the LLM space for a while. Prior to R1 they released DeepSeek-V3-Base as their premier LLMs. But they were really just one model amongst many until they recently made a big splash with the highly visible release of their R1 model.
What is the DeepSeek R1 Model?
DeepSeek’s R1 came close to or even exceeded the benchmarks of all the latest and greatest LLMs out there including Open.Ais O1 and Anthropic’s Claude.
“R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.” (p. 1 of “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”. All quotes from this paper.)
The above graph is from DeepSeek’s excellent paper the published on their R1 model.
Now, I can hear you objecting: “Sure, that’s impressive and all, but so what? It’s barely better than the other models. And how reliable are LLM benchmarks anyway? Do they even mean anything?” And you’re right. Benchmarks don’t mean much, and the improvement isn’t that significant. So why is this considered such a major disruption, sending American companies into a panic?
Why Does R1 Scare Everyone?
DeepSeek claims they trained R1 on a budget of $5-$6 million. There are some claims this is an exaggeration. However, even if it cost some multiple of this, it is still less than the hundreds of millions or possibly billion or so often spend on training LLMs. Rumor has it that the revelation that you can train LLMs for far cheaper and less compute caused the sudden drop in the price of Nvidia’s stock. But this doesn’t really make sense because making LLMs cheaper to use will only increase demand for them, leading to increased demand for GPUs.
On top of the lower training costs, R1 has 671 billion parameters which is smaller than, say, GPT-4’s rumored 1.8 trillion parameters. But its architecture only activates 37 billion at a time which makes it pretty computationally efficient.
How R1 Was Created
The DeepSeek-R1 project teaches AI models to "think better" using a special kind of training where the model learns on its own through trial and error—no textbooks or examples needed (p. 2, 5). Instead of giving it answers, the team set up rules that make the model show its work, like solving math problems step-by-step before giving a final answer (p. 6). Over time, the model (called DeepSeek-R1-Zero) started doing clever things naturally, like double-checking its own answers, trying different approaches, or even having "aha moments" where it suddenly fixes its mistakes after rethinking (p. 8). It’s like the model figured out how to think, not just what to think.
The catch? DeepSeek-R1-Zero’s answers can be messy—sometimes mixing languages or missing clear formatting, making them hard to read (p. 10). But overall, it shows how letting AI learn by trying (and failing) can unlock surprising problem-solving skills. This led to training a second model with some cold start data to help develop the model’s thinking in a way comprehensible to humans. That became the DeepSeek-R1 Model. (Which is distinct from the original DeepSeek-R1-Zero model first developed using reinforcement learning.)
But the end results are impressive: on tough math tests (AIME 2024), it jumped from scoring 15.6% to 71% correct—and even 86.7% when voting on multiple answers, matching top AI models like OpenAI’s (p. 3). It’s also great at coding, beating 96% of humans in programming contests (p. 4).
Reinforcement Learning and LLM Development
Famously LLMs utilize human reinforcement learning to train the model to act in ways helpful (and safe) for humans. But R1 decided to use regular reinforcement learning to train the model to be excellent at what is known as ‘chain-of-thought’.
I will have to cover chain-of-thought in detail in a future post, but it’s a remarkably simple (but effective!) concept. You basically prompt the large language model to think step-by-step before giving an answer. The model will then naturally not jump straight to a (possibly wrong) answer but instead try to write some text about the problem and steps to solve it. This proved to be a powerful way to allow LLMs to score higher on reasoning benchmarks but at the expense of more tokens used to allow the LLM to ‘think’ about the problem.
Chain-of-thought will generally improve reasoning benchmark scores for models over 100B parameters. (Link). Open.AI incorporated chain-of-thought into their premium models and even fine-tuned their models to improve speed performance.
What R1 did was use rewards via Reinforcement Learning to reward good chain-of-thought reasoning in a certain format (for example to place the thinking inside
If you are interested in learning more about Reinforcement Learning, I’ve created two videos on the subject that jump into the mathematics and how it works. Reinforcement Learning continues to amaze with it’s almost ‘general’ ability to learn. Learn Q-Learning (one of the best kinds of Reinforcement Learning) in this video. Then I show how to turn that into Deep Reinforcement Learning in this video.
Distilling R1
Once the R1 model was completed, DeepSeek took a curate sample of 800k created by R1 and used it to teach smaller, simpler AI models (like Qwen and Llama) using DeepSeek-R1’s "thinking patterns." Surprisingly, this worked better than training those smaller models directly, and some of these smaller models now outperform much bigger rivals (p. 3). All these models are free for anyone to use (p. 3).
You can find them here on Hugging Face because they are open-source.
The Deadly Combo
It was these three things together that made such an impact. A smaller Large Language model than its competitors performing as well as the state-of-the-art reasoning models but built for a fraction of the cost using a new method the ‘big boys’ didn’t even know existed. And then the buggers made it all available for free! The nerve!
It’s rumored that Meta started looking over their AI team – which each got paid as much as it took to train R1 – and started to wonder what was going on.
How To Get Started with R1
The easiest way to get started with R1 is their free interface found at chat.deepseek.com. Be sure to click the “DeepThink (R1)” button and you’ll immediately see the chain-of-thought reasoning in action.
I’ve been very impressed with the results. In fact, I asked it a difficult philosophical question that I don’t think even has an answer on the internet and R1 surprisingly came up with a pretty convincing answer I’ve never heard from a human being.
If you want to run R1 locally, another great way to get started it to download LM Studio. They have the various R1 models (including the distilled models and quantized versions of the models) available.
In future posts I’ll go over better ways to run R1 locally, such as Llama.ccp and Ollama.
Other Links of Interest:
- DeepSeek-R1: Best Open-Source Reasoning LLM Outperforms OpenAI-o1
- What are DeepSeek-R1 distilled models?
- How Good is DeepSeek-R1-Lite Preview at Reasoning?
- DeepSeek-R1: pure reinforcement learning (RL), no supervised fine-tuning (SFT), no chain-of-thought (CoT)
- Can AI Really Think? DeepSeek’s R1 Says “Yes” (and Shows You How)
- Evolution of DeepSeek: How it Became a Global AI Game-Changer!