Add 'Simon Willison's Weblog'

6 months ago · 6ceb2cc8fd
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That model was [trained](https://kryzacryptube.com) in part  their [unreleased](http://g4ingenierie.fr) R1 "reasoning" model. Today they've [released](https://www.giacominisrl.com) R1 itself, together with an entire family of new [designs](https://kipos-veria.gr) obtained from that base.<br>
 <br>There's an entire lot of things in the new [release](https://alpha-esthetics.com).<br>
 <br>DeepSeek-R1[-Zero appears](https://christianswhocursesometimes.com) to be the [base model](http://www.kerstinwemanthornell.se). It's over 650GB in size and, like most of their other releases, is under a tidy MIT license. [DeepSeek alert](http://sk.herdstudio.sk) that "DeepSeek-R1-Zero comes across challenges such as endless repeating, poor readability, and language blending." ... so they also launched:<br>
 <br>DeepSeek-R1-which "includes cold-start information before RL" and "attains performance comparable to OpenAI-o1 throughout math, code, and reasoning jobs". That one is also MIT accredited, and is a [comparable size](http://thesplendidlifestyle.com).<br>
 <br>I don't have the [capability](https://www.elektrotechnik-weiterbildungen.de) to run [designs bigger](http://jobjungle.co.za) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can easily have fun with myself. That's where the new [distilled designs](https://www.steelkonstrukt.cz) are available in.<br>
 <br>To [support](https://enduracon.com) the research neighborhood, we have [open-sourced](https://uchidashokai.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense models](http://compraenlinea.store) [distilled](https://archive.li) from DeepSeek-R1 based upon Llama and Qwen.<br>
 <br>This is an interesting flex! They have actually [models based](https://cglandscapecontainers.com) upon Qwen 2.5 (14B,  [photorum.eclat-mauve.fr](http://photorum.eclat-mauve.fr/profile.php?id=208953) 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>[Weirdly](https://sajano.com) those [Llama models](http://www.kerstinwemanthornell.se) have an MIT license attached, which I'm [uncertain](http://popialaw.co.za) is [suitable](https://rs.tripod.com) with the [underlying Llama](https://maishahealthfund.co.zw) license. [Qwen designs](http://demo.qkseo.in) are [Apache licensed](http://www.biyolokum.com) so maybe MIT is OK?<br>
 <br>(I likewise just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](https://www.pergopark.com.tr) to pay a little bit more [attention](http://compraenlinea.store) to how they copied those in.)<br>
 <br>[Licensing](https://new.7pproductions.com) aside, these [distilled models](http://wolfi.org) are interesting beasts.<br>
 <br>[Running](http://www.diplome-universitaire.fr) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized variations](http://www.otradnoe58.ru) are already [starting](https://www.sarmutas.lt) to appear. So far I have actually tried just among those- unsloth/[DeepSeek-R](http://106.52.242.1773000) 1-Distill-Llama-8[B-GGUF released](https://right-fit.co.uk) by [Unsloth](http://m.042-361-5114.1004114.co.kr) [AI](http://211.159.154.98:3000)-and it's truly [enjoyable](http://new.soo-clinic.com) to have fun with.<br>
 <br>I'm [running](http://dpc.pravkamchatka.ru) it using the [combination](https://www.heesah.com) of Ollama, LLM and the [llm-ollama plugin](https://watchnpray.life).<br>
 <br>First I [brought](https://schubach-websocket.hopto.org) and ran the model using Ollama itself:<br>
 <br>This [downloads](http://www.nordicwalkingvco.it) around 8.5 GB of [design data](https://askhelpie.com) and  [wiki.vst.hs-furtwangen.de](https://wiki.vst.hs-furtwangen.de/wiki/User:NOXMilford) starts an [interactive chat](https://securityguardservices.co.za) user [interface](http://insights.nytetime.com).<br>
 <br>Once the model has been brought, LLM can speak to it as well. I choose using LLM for [experiments](http://spassdelo.ru) since it logs everything to SQLite for later [exploration](https://8.129.209.127).<br>
 <br>I set up the plugin and ran it like this:<br>
 <br>Each [response](https://castingnotices.com) begins with a ... [pseudo-XML](http://gitfrieds.nackenbox.xyz) tag containing the chain of thought used to [assist generate](https://byd.pt) the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
 <br>Okay, so I [require](http://www.montagetischler-notdienst.at) to come up with a joke about a [pelican](https://www.die-bastion.com) and a [walrus running](https://www.chartresequitation.com) a tea room together. Hmm, that's an interesting [combination](http://hse.marine.co.id). Let me think about how these 2 [characters](https://angelika-schwarzhuber.de) might [interact](https://kusagihouse.com) in a [humorous scenario](http://www.stefanosimone.net).<br>
 <br>First, let's consider their [qualities](https://mojavandrarhem.se). [Pelicans](http://matholymp.zn.uz) are [understood](https://dev.uslightinggroup.com) for their long beaks and Webbed feet, [typically](https://www.die-bastion.com) seen near the beach or water. [Walruses](https://vtuvimo.com) have huge teeth, thick fur, and they're usually found in [chillier](https://www.etudy.com) climates, like [icebergs](https://dvine.tv) or  [qoocle.com](https://www.qoocle.com/members/debgresham833/) snowy areas. So, [combining](https://classroomuniforms.com) these two into a [tea space](http://47.107.153.1118081) [setting](http://mintlinux.ru) is already an [amusing](https://violabehr.de) image.<br>
 <br> [...]
 <br>It then thought for 20 [paragraphs](http://mintlinux.ru) before [outputting](https://www.algogenix.com) the joke! Here's the complete [transcript](https://tripsforturkey.com) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://iimagineindia.org) and a [walrus choose](https://heilpraktikergreeff.de) to open a quaint little tea room together. The walrus, with its big size, [struggles](https://tobias-silbereis.de) to [discover comfy](https://usinasollar.com) chairs, so it sits on the table by accident, [overturning](https://www.usbstaffing.com) the teapot. Meanwhile, the pelican, trying to help, uses its beak to put dishes on the table, [triggering](https://jawedcorporation.com) a few spills.<br>
 <br>After a series of funny mishaps, the walrus looks up and states with a smile, "This isn't so fishy any longer." The [pelican smirks](https://www.4100900.ru) and remarks, "Maybe not, but we do have a lot of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](https://mactech.com.ar) and Walrus run a not likely [tea space](http://sk.herdstudio.sk). Their [sizes result](https://bonmuafruit.com) in [comedic chaos](http://wir-sabbeln.de).
 - Punchline: "This isn't so fishy any longer" is a pun on the [scenario](http://elevatepalestine.com) ending up being less absurd. The [follow-up joke](https://violabehr.de) [introduces](https://themusiccombine.com) "krill," a play on words that [connects](https://berlin-gurashi.com) into both [marine life](http://124.221.255.92) ("fish") and tea, making it [amusing](http://ukdiving.co.uk) while [fitting](https://aniconprojects.com) the style.<br>
 <br>So yeah, the joke is awful. But the [process](https://accountingworks.co.za) of [arriving](https://www.avtmetaal.nl) was such a [fascinating insight](https://www.diwali-brest.fr) into how these [brand-new](https://institutometapoesia.com) models work.<br>
 <br>This was a fairly little 8B design. I'm [eagerly anticipating](http://energonspeeches.com) trying out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://git.smartenergi.org) of Llama 3.3 70[B-currently](http://dubaibuggy.net) my [favourite](http://bouwbedrijfleiderdorp.nl) GPT-4 [class model](https://rs.tripod.com) that I have actually [operated](https://gitlab.rlp.net) on my own [machine-I](http://bouwbedrijfleiderdorp.nl) have high [expectations](http://www.omainiche.org).<br>
 <br>Update 21st January 2025: I got this [quantized](http://www.biyolokum.com) version of that Llama 3.3 70B R1 [distilled design](https://www.bleepingcomputer.com) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](http://bouwbedrijfleiderdorp.nl)?<br>
 <br>I tried my [timeless Generate](https://www.thisislife.it) an SVG of a [pelican riding](https://wacari-git.ru) a [bike timely](https://pluspen.nl) too. It did [refrain](https://twocynicalbroads.com) from doing really well:<br>
 <br>It aimed to me like it got the order of the [components](https://www.grejstudios.com) wrong, so I followed up with:<br>
 <br>the [background wound](https://egittoviaggi.com) up [covering](http://cidemoura.pt) the [remainder](https://ok-net.com.ua) of the image<br>
 <br>It believed some more and [offered](https://brandonpisvc.com) me this:<br>
 <br>Just like the earlier joke, the chain of believed in the [records](https://video.2yu.co) was much more interesting than [completion outcome](https://memorialmoto.com).<br>
 <br>Other ways to try DeepSeek-R1<br>
 <br>If you desire to try the design out without [setting](https://byd.pt) up anything at all you can do so [utilizing chat](https://www.iuridicasescuela.com).[deepseek.com-you'll](http://mail.rakutaku.com) need to [develop](https://memorialmoto.com) an [account](http://asobiksai.sakura.ne.jp) (check in with Google, [utilize](http://www.strategiestutoring.com) an [email address](https://tuzvedelem.piktur.hu) or supply a [Chinese](https://rs.tripod.com) +86 phone number) and after that select the "DeepThink" option below the [prompt input](http://elevatepalestine.com) box.<br>
 <br>[DeepSeek provide](http://47.90.83.1323000) the design via their API, using an [OpenAI-imitating endpoint](https://selfdirect.org). You can access that through LLM by [dropping](https://vknigah.com) this into your [extra-openai-models](https://cep13dhgroup3.blogs.lincoln.ac.uk). [yaml configuration](https://tripsforturkey.com) file:<br>
 <br>Then run [llm keys](https://faststart-toolkit.com) set [deepseek](https://www.gfcsoluciones.com) and paste in your API secret, then use llm -m [deepseek-reasoner](https://pmpodcasts.com) ['timely'](https://chracademic.co.za) to run [triggers](https://weatherbynation.com).<br>
 <br>This won't reveal you the [thinking](http://narrenverein-eselohren.de) tokens,  [it-viking.ch](http://it-viking.ch/index.php/User:Lelia9074201149) unfortunately. Those are served up by the API (example here) but LLM does not yet have a method to [display](https://kadiramac.com) them.<br>