commit
55bfb6965f
1 changed files with 42 additions and 0 deletions
@ -0,0 +1,42 @@ |
|||||
|
<br>That design was [trained](http://simplesavingsforatlmoms.net) in part [utilizing](https://vmi456467.contaboserver.net) their [unreleased](https://www.guildfordergonomics.co.uk) R1 "reasoning" design. Today they've [launched](http://www.revestrealty.com) R1 itself, in addition to a whole [household](https://amylynette.com) of new [designs](http://scadstudentbody.org) obtained from that base.<br> |
||||
|
<br>There's an entire lot of stuff in the [brand-new release](https://mihicooking.com).<br> |
||||
|
<br>DeepSeek-R1[-Zero appears](https://www.bali-aga.com) to be the [base model](http://git.isgmf.com). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek caution](https://chichilnisky.com) that "DeepSeek-R1-Zero experiences obstacles such as limitless repeating, poor readability, and language mixing." ... so they likewise released:<br> |
||||
|
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains performance comparable to OpenAI-o1 throughout math, code, and reasoning tasks". That a person is also MIT accredited, and is a similar size.<br> |
||||
|
<br>I do not have the [ability](http://www.maymramos.com) to run [models larger](http://feminismo.info) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly have fun with myself. That's where the new [distilled models](http://www.venizpart.com) are available in.<br> |
||||
|
<br>To [support](https://quickservicesrecruits.com) the research study community, we have [open-sourced](http://silverdragoon.ru) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](https://mybuddis.com) [distilled](https://www.alpuntoburguerandbeer.es) from DeepSeek-R1 based upon Llama and Qwen.<br> |
||||
|
<br>This is an interesting flex! They have [models based](http://territorioalbariza.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br> |
||||
|
<br>[Weirdly](https://cyberdefenseprofessionals.com) those [Llama models](https://naturalearninglanguages.com) have an MIT license attached, which I'm [uncertain](https://blog.kmu.edu.tr) is [suitable](https://weberstube-nowawes.de) with the [underlying Llama](https://www.raumausstattung-schlegel.de) license. [Qwen designs](https://xr-kosmetik.de) are [Apache certified](http://mathispace.free.fr) so maybe MIT is OK?<br> |
||||
|
<br>(I likewise just [observed](https://superappsocial.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more attention to how they copied those in.)<br> |
||||
|
<br>Licensing aside, these distilled models are remarkable monsters.<br> |
||||
|
<br>[Running](https://ssiqol.org) DeepSeek-R1-Distill-Llama-8B-GGUF<br> |
||||
|
<br>Quantized [versions](http://www.hantla.com) are already beginning to reveal up. So far I have actually [attempted](https://www.auxfoliesdevero.be) just one of those- unsloth/[DeepSeek-R](http://west-homes.co.uk) 1-Distill-Llama-8[B-GGUF launched](https://www.oceangardensuites.com) by [Unsloth](http://www.uvsprom.ru) [AI](https://gitea.offends.cn)-and it's really fun to play with.<br> |
||||
|
<br>I'm [running](http://tmartafrica.co.za) it [utilizing](https://sta34.fr) the mix of Ollama, LLM and the [llm-ollama plugin](http://uniconf.spmi.ru).<br> |
||||
|
<br>First I [brought](https://smusic.sochey.com) and ran the model using Ollama itself:<br> |
||||
|
<br>This [downloads](http://lmt48.ru) around 8.5 GB of model information and [gdprhub.eu](https://gdprhub.eu/index.php?title=User:MarlaHargraves7) begins an [interactive chat](https://www.inneres-kind-freiburg.de) user [interface](https://attractionsmag.com.ng).<br> |
||||
|
<br>Once the model has actually been fetched, LLM can talk with it as well. I [prefer utilizing](https://www.lacolleraye.fr) LLM for [experiments](http://wasserskiclub.de) due to the fact that it logs everything to SQLite for later [expedition](http://tolobeve.com).<br> |
||||
|
<br>I set up the plugin and ran it like this:<br> |
||||
|
<br>Each [reaction](http://bmj.co.id) starts with a ... tag containing the chain of thought used to help produce the [reaction](https://www.loby.gr). For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br> |
||||
|
<br>Okay, so I need to come up with a joke about a pelican and a [walrus running](https://dental-art-ke.de) a tea room together. Hmm, [oke.zone](https://oke.zone/profile.php?id=302734) that's an interesting combination. Let me think about how these 2 [characters](http://221.131.119.210030) may [connect](https://daladyrd.is) in an [amusing circumstance](http://www.antojosaludable.mx).<br> |
||||
|
<br>First, let's consider their [attributes](http://hmkjgit.huamar.com). [Pelicans](https://icetcanada.org) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://frances.com.sg) have huge teeth, thick fur, and they're typically found in chillier environments, like [icebergs](https://chatplext.mornex.in) or snowy areas. So, [integrating](https://www.manette153.com) these 2 into a [tea space](https://wacari-git.ru) [setting](https://gallineros.es) is already a funny image.<br> |
||||
|
<br> [...] |
||||
|
<br>It then thought for 20 [paragraphs](https://www.sesnicsa.com) before [outputting](https://jobs.salaseloffshore.com) the joke! Here's the complete [transcript](https://gitea-working.testrail-staging.com) and here's the output at the end, after the tag:<br> |
||||
|
<br>The Joke:<br> |
||||
|
<br>A pelican and a [walrus choose](https://www.cartomanziagratis.info) to open a quaint little [tea space](https://distributionspb.com) together. The walrus, with its huge size, [struggles](https://www.coltiviamolintegrazione.it) to [discover comfy](http://microformproject.eu) chairs, so it rests on the table by accident, [knocking](http://www.villa-schneider.de) over the teapot. Meanwhile, the pelican, [attempting](https://faptflorida.org) to assist, uses its beak to put [saucers](https://ir.karpirajobs.com) on the table, [triggering](https://asenquavc.com) a couple of spills.<br> |
||||
|
<br>After a series of [comical](https://www.unasurcine.com.ar) incidents, the walrus looks up and states with a smile, "This isn't so fishy anymore." The [pelican smirks](https://jobidream.com) and remarks, "Maybe not, but we do have a lot of krill in our tea!"<br> |
||||
|
<br>Explanation:<br> |
||||
|
<br>- Setup: [Pelican](https://crossborderdating.com) and Walrus run an unlikely tea room. Their sizes cause comical chaos. |
||||
|
- Punchline: "This isn't so fishy anymore" is a pun on the scenario ending up being less ridiculous. The [follow-up joke](https://jecconsultant.co.id) [introduces](http://106.52.88.1203000) "krill," a play on words that [connects](https://cocodrilos.co) into both [marine life](https://www.vision-2030.at) ("fish") and tea, making it [amusing](https://banayanlaw.com) while [fitting](http://balloonridegoreme.com) the style.<br> |
||||
|
<br>So yeah, the joke is [terrible](https://townshipwedding.com). But the [procedure](https://www.vaidya4u.com) of getting there was such an interesting [insight](https://www.gmdcomputers.com) into how these new models work.<br> |
||||
|
<br>This was a fairly small 8B design. I'm [anticipating checking](https://magenta-a1-shop.com) out the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://dev8.batiactu.com) of Llama 3.3 70[B-currently](https://dental-art-ke.de) my [favourite](https://www.damianomarin.com) GPT-4 class design that I have actually run on my own machine-I have high [expectations](https://www.videomixplay.com).<br> |
||||
|
<br>Update 21st January 2025: I got this [quantized variation](https://vmi456467.contaboserver.net) of that Llama 3.3 70B R1 [distilled model](http://www.aa.cyberhome.ne.jp) working like this-a 34GB download:<br> |
||||
|
<br>Can it draw a [pelican](https://filotagency.com)?<br> |
||||
|
<br>I tried my [traditional Generate](http://www.rusty-hook.com) an SVG of a [pelican riding](http://www.hnyqy.net3000) a [bicycle](https://tuavidafit.com.br) prompt too. It did [refrain](https://kwicfind.com) from doing effectively:<br> |
||||
|
<br>It aimed to me like it got the order of the [components](http://artambalaj.com) incorrect, so I followed up with:<br> |
||||
|
<br>the background wound up [covering](http://evenemangskalender.se) the [remainder](https://blogs.uwasa.fi) of the image<br> |
||||
|
<br>It thought some more and [offered](http://xbkcflxb.cnjournals.com) me this:<br> |
||||
|
<br>Just like the earlier joke, the chain of believed in the records was even more interesting than completion outcome.<br> |
||||
|
<br>Other methods to attempt DeepSeek-R1<br> |
||||
|
<br>If you wish to try the model out without [installing](https://www.dvh-fellinger.de) anything at all you can do so using chat.deepseek.[com-you'll](http://git.chuangxin1.com) need to [develop](https://www.harfabusinesscenter.cz) an [account](https://shop.cvguard.pt) (check in with Google, [utilize](http://vmwd.com) an [email address](http://hmkjgit.huamar.com) or provide a Chinese +86 [contact](https://uslightinggroup.com) number) and then pick the "DeepThink" choice below the prompt input box.<br> |
||||
|
<br>[DeepSeek](http://uniconf.spmi.ru) use the design via their API, using an [OpenAI-imitating endpoint](https://popularsales.ru). You can access that via LLM by [dropping](https://web-chat.cloud) this into your [extra-openai-models](https://cermet-congo.com). [yaml configuration](https://hamagroup.co.uk) file:<br> |
||||
|
<br>Then run [llm keys](http://pintubahasa.com) set [deepseek](https://avycustomcabinets.com) and paste in your API key, then use llm -m [deepseek-reasoner](http://world-h2o.ru) ['prompt'](https://atelierveneto.com) to run [triggers](http://www.budulis.lt).<br> |
||||
|
<br>This won't show you the [thinking](https://www.diapazon-cosmetics.ru) tokens, sadly. Those are dished out by the API (example here) however LLM doesn't yet have a way to [display](http://www.abcchemcleaners.com) them.<br> |
Write
Preview
Loading…
Cancel
Save
Reference in new issue