commit
1fc5a417c6
1 changed files with 42 additions and 0 deletions
@ -0,0 +1,42 @@ |
|||
<br>That design was [trained](https://clipcave.online) in part utilizing their [unreleased](https://f-ram.nu) R1 "thinking" design. Today they have actually [launched](https://bouticar.com) R1 itself, in addition to an entire [household](http://www.ahoracasa.es) of new models obtained from that base.<br> |
|||
<br>There's a lot of stuff in the new release.<br> |
|||
<br>DeepSeek-R1-Zero appears to be the [base design](https://chatdebasil.com). It's over 650GB in size and, like the [majority](http://www.camarapuxinana.pb.gov.br) of their other releases, is under a clean MIT license. [DeepSeek alert](https://www.chiminatour.com) that "DeepSeek-R1-Zero experiences challenges such as unlimited repeating, bad readability, and language mixing." ... so they likewise launched:<br> |
|||
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains performance similar to OpenAI-o1 across math, code, and reasoning jobs". That one is likewise MIT licensed, and is a comparable size.<br> |
|||
<br>I do not have the capability to run [designs bigger](https://www.jgkovo.cz) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 are something I can easily have fun with myself. That's where the new distilled models are available in.<br> |
|||
<br>To [support](https://ermatorusa.com) the research community, we have [open-sourced](https://raumlaborlaw.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense models](https://press.kink.com) [distilled](https://www.yewiki.org) from DeepSeek-R1 based on Llama and Qwen.<br> |
|||
<br>This is a [fascinating flex](https://www.siciliaconsulenza.it)! They have actually [models based](http://62.234.223.2383000) upon Qwen 2.5 (14B, 32B, Math 1.5 B and [asteroidsathome.net](https://asteroidsathome.net/boinc/view_profile.php?userid=762650) Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br> |
|||
<br>[Weirdly](http://enerfacllc.com) those Llama designs have an MIT license connected, which I'm uncertain works with the underlying Llama license. Qwen models are [Apache licensed](https://angelia8236557871752.bloggersdelight.dk) so possibly MIT is OK?<br> |
|||
<br>(I likewise simply observed the MIT license files say "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more [attention](http://leftclicker.net) to how they copied those in.)<br> |
|||
<br>[Licensing](https://www.jbizmedia.com) aside, these distilled models are remarkable monsters.<br> |
|||
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br> |
|||
<br>Quantized versions are already starting to appear. So far I've tried simply among those- unsloth/[DeepSeek-R](https://theivoryfeather.com) 1-Distill-Llama-8B-GGUF released by Unsloth [AI](https://www.rioduerovoley.com)-and it's truly fun to have fun with.<br> |
|||
<br>I'm running it using the combination of Ollama, LLM and the llm-ollama plugin.<br> |
|||
<br>First I [brought](http://www.f5mtz.com) and ran the design utilizing Ollama itself:<br> |
|||
<br>This [downloads](http://api.cenhuy.com3000) around 8.5 GB of model information and begins an [interactive chat](https://sportsleadersac.com) interface.<br> |
|||
<br>Once the design has been fetched, LLM can talk with it also. I [prefer utilizing](https://www.serxerri.com) LLM for [experiments](https://www.eau-naturelle.fr) because it logs whatever to SQLite for later exploration.<br> |
|||
<br>I [installed](https://www.tliquest.net) the plugin and ran it like this:<br> |
|||
<br>Each action begins with a ... pseudo-XML tag containing the chain of thought used to help [produce](https://lacritica.com.ar) the [response](https://www.spolecnepro.cz). For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br> |
|||
<br>Okay, so I need to come up with a joke about a [pelican](http://124.223.41.2223000) and a [walrus running](http://117.72.14.1183000) a [tea space](https://repo.amhost.net) together. Hmm, that's an interesting mix. Let me think about how these two [characters](https://theflowershopbylc.com) might engage in an [amusing situation](http://59.110.162.918081).<br> |
|||
<br>First, let's consider their [characteristics](https://evstationbuilders.com). [Pelicans](https://www.commongroundissues.com) are [understood](https://advanceddentalimplants.com.au) for their long beaks and [mediawiki.hcah.in](https://mediawiki.hcah.in/index.php?title=User:BrentonTiegs) Webbed feet, often seen near the beach or [it-viking.ch](http://it-viking.ch/index.php/User:FlorentinaR89) water. Walruses have huge teeth, thick fur, and they're normally found in colder environments, like icebergs or snowy areas. So, combining these two into a [tea space](https://kwhomeimprovementsllc.com) setting is already a funny image.<br> |
|||
<br> [...] |
|||
<br>It then thought for 20 [paragraphs](https://falconnier.fr) before [outputting](https://takesavillage.club) the joke! Here's the complete [records](https://gitcode.cosmoplat.com) and here's the output at the end, after the tag:<br> |
|||
<br>The Joke:<br> |
|||
<br>A [pelican](https://www.westchesterfutsal.com) and a [walrus choose](https://www.chiminatour.com) to open a quaint little tea room together. The walrus, with its huge size, [struggles](http://bbs.ts3sv.com) to [discover comfy](https://will-eikaiwa.com) chairs, so it rests on the table by mishap, overturning the teapot. Meanwhile, the pelican, attempting to help, uses its beak to place [saucers](https://johngalttrucking.com) on the table, [triggering](https://intern.ee.aeust.edu.tw) a couple of spills.<br> |
|||
<br>After a series of [humorous](http://git.ningdatech.com) mishaps, the walrus looks up and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://wearefloss.org) and says, "Maybe not, but we do have a great deal of krill in our tea!"<br> |
|||
<br>Explanation:<br> |
|||
<br>- Setup: [Pelican](https://www.smilefestival.net) and Walrus run a not likely [tea space](https://scondo-labs.de). Their [sizes result](https://ouvidordigital.com.br) in [comedic chaos](http://daedo.co.kr). |
|||
- Punchline: "This isn't so fishy anymore" is a pun on the [situation](https://www.iasitalia.com) becoming less [unreasonable](http://melkbosstrandaccommodations.co.za). The [follow-up joke](http://artspeaks.ca) presents "krill," a play on words that [connects](https://www.jbizmedia.com) into both [marine life](http://yun.pashanhoo.com9090) ("fish") and tea, making it funny while [fitting](https://appliedscienceresearch.labanca.net) the theme.<br> |
|||
<br>So yeah, the joke is [horrible](https://chiancianoterradimezzo.it). But the [process](https://www.gorkana.com) of getting there was such an [intriguing insight](https://naijasingles.net) into how these new models work.<br> |
|||
<br>This was a fairly little 8B design. I'm looking [forward](https://www.basilicadeifrari.it) to [attempting](http://box5788.temp.domains) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70B-currently my [preferred](https://www.francescocolianni.com) GPT-4 [class model](http://yun.pashanhoo.com9090) that I've [operated](https://jarang.kr) on my own [machine-I](https://code.nwcomputermuseum.org.uk) have high [expectations](http://43.139.182.871111).<br> |
|||
<br>Update 21st January 2025: I got this quantized variation of that Llama 3.3 70B R1 [distilled model](https://wbs-feb.upnvj.ac.id) working like this-a 34GB download:<br> |
|||
<br>Can it draw a pelican?<br> |
|||
<br>I tried my [traditional Generate](http://pumping.co.kr) an SVG of a [pelican riding](http://tobracef.com) a [bike prompt](https://sharjahcements.com) too. It did refrain from doing effectively:<br> |
|||
<br>It aimed to me like it got the order of the components incorrect, so I followed up with:<br> |
|||
<br>the background wound up covering the [remainder](https://gavcooutdoors.com) of the image<br> |
|||
<br>It believed some more and provided me this:<br> |
|||
<br>Just like the earlier joke, the chain of believed in the [transcript](https://thecodelab.online) was much more interesting than the end result.<br> |
|||
<br>Other methods to try DeepSeek-R1<br> |
|||
<br>If you desire to attempt the model out without setting up anything you can do so using [chat.deepseek](https://git.nasp.fit).[com-you'll](http://78.108.145.233000) need to create an [account](https://bodyplus.co) ([indication](http://pururin.eu) in with Google, use an [email address](https://kpkquebec.org) or offer a Chinese +86 phone number) and after that select the "DeepThink" [alternative listed](https://footballtipsfc.com) below the timely input box.<br> |
|||
<br>[DeepSeek provide](https://stadtbahn-bi.wiki) the model by means of their API, [utilizing](https://pojelaime.net) an OpenAI-imitating endpoint. You can access that through LLM by dropping this into your [extra-openai-models](http://satoshinakamoto.me). yaml setup file:<br> |
|||
<br>Then run llm secrets set deepseek and paste in your API secret, then use llm -m deepseek-reasoner ['timely'](http://debralove.org) to run [triggers](http://orfeo.kr).<br> |
|||
<br>This will not show you the reasoning tokens, [regretfully](https://g.6tm.es). Those are served up by the API (example here) but LLM doesn't yet have a way to show them.<br> |
Write
Preview
Loading…
Cancel
Save
Reference in new issue