1 Exploring DeepSeek R1's Agentic Capabilities Through Code Actions
Alejandra Strzelecki edited this page 5 months ago


I ran a quick experiment investigating how DeepSeek-R1 performs on agentic tasks, regardless of not supporting tool use natively, and I was rather impressed by initial results. This experiment runs DeepSeek-R1 in a single-agent setup, where the design not just plans the actions but likewise develops the actions as executable Python code. On a subset1 of the GAIA recognition split, DeepSeek-R1 surpasses Claude 3.5 Sonnet by 12.5% outright, from 53.1% to 65.6% proper, bytes-the-dust.com and other models by an even bigger margin:

The experiment followed model usage guidelines from the DeepSeek-R1 paper and the model card: Don't utilize few-shot examples, prevent including a system prompt, and set the temperature to 0.5 - 0.7 (0.6 was utilized). You can find additional examination details here.

Approach

DeepSeek-R1's strong coding capabilities enable it to serve as a representative without being clearly trained for tool use. By allowing the model to create actions as Python code, [forum.batman.gainedge.org](https://forum.batman.gainedge.org/index.php?action=profile