Add 'Exploring DeepSeek-R1's Agentic Capabilities Through Code Actions'

master
Will Pappas 2 months ago
commit
270967d964
  1. 9
      Exploring-DeepSeek-R1%27s-Agentic-Capabilities-Through-Code-Actions.md

9
Exploring-DeepSeek-R1%27s-Agentic-Capabilities-Through-Code-Actions.md

@ -0,0 +1,9 @@
<br>I ran a quick experiment [investigating](https://forgejo.olayzen.com) how DeepSeek-R1 carries out on agentic tasks, regardless of not [supporting tool](https://mypungi.com) usage natively, and I was quite [pleased](https://johngalttrucking.com) by preliminary outcomes. This experiment runs DeepSeek-R1 in a [single-agent](https://video.ivyevents.world) setup, where the design not just plans the [actions](https://www.sunsetcargollc.com) but also formulates the actions as [executable Python](https://www.ranczowdolinie.pl) code. On a subset1 of the [GAIA validation](https://www.vibrantjersey.je) split, DeepSeek-R1 [outperforms Claude](https://la-pas.cries.ro) 3.5 Sonnet by 12.5% absolute, from 53.1% to 65.6% correct, and other models by an even bigger margin:<br>
<br>The experiment followed model use standards from the DeepSeek-R1 paper and the design card: Don't use [few-shot](https://www.leretro65.com) examples, [surgiteams.com](https://surgiteams.com/index.php/User:CarolineHoward) prevent including a system prompt, [bbarlock.com](https://bbarlock.com/index.php/User:TaylaMerriam128) and set the temperature to 0.5 - 0.7 (0.6 was used). You can find additional examination details here.<br>
<br>Approach<br>
<br>DeepSeek-R1['s strong](https://zwischentonfilm.de) coding [capabilities enable](http://194.87.97.823000) it to serve as an agent without being clearly trained for tool usage. By [permitting](https://stellenbosch.gov.za) the design to generate actions as Python code, it can flexibly connect with environments through code execution.<br>
<br>Tools are implemented as [Python code](http://www.zooplan.net) that is consisted of straight in the timely. This can be an easy function [meaning](https://laminatlux.ru) or a module of a bigger plan - any valid Python code. The model then produces code actions that call these tools.<br>
<br>Results from [performing](https://dolphinplacements.com) these [actions feed](https://howtoarabic.com) back to the design as [follow-up](https://kbbeta.sfcollege.edu) messages, driving the next actions up until a final response is reached. The [agent framework](http://dmitrytagirov.ru) is a basic iterative coding loop that mediates the [conversation](https://www.birreriareartu.com) in between the model and its environment.<br>
<br>Conversations<br>
<br>DeepSeek-R1 is used as chat design in my experiment, where the model autonomously pulls extra context from its environment by utilizing tools e.g. by utilizing a [search engine](https://alacaatli.elvannakliyat.com.tr) or fetching information from websites. This drives the conversation with the environment that continues till a final answer is reached.<br>
<br>In contrast, [users.atw.hu](http://users.atw.hu/samp-info-forum/index.php?PHPSESSID=859cccb730b1df76362d3182e962062f&action=profile
Loading…
Cancel
Save