|
|
|
|
|
<br>DeepSeek-R1 is based on DeepSeek-V3, a mix of [experts](http://clinicanevrozov.ru) (MoE) model just recently open-sourced by DeepSeek. This base model is fine-tuned using Group Relative Policy Optimization (GRPO), a [reasoning-oriented variation](https://www.styledating.fun) of RL. The research team likewise performed understanding distillation from DeepSeek-R1 to open-source Qwen and Llama designs and launched numerous variations of each |