Add 'Simon Willison's Weblog'

4 months ago · e3a4bbcdd8
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That design was [trained](https://www.health2click.com) in part using their [unreleased](http://partnershop.kr) R1 "thinking" model. Today they've [launched](https://www.officelinelucca.it) R1 itself, together with a whole family of new models obtained from that base.<br>
 <br>There's a lot of things in the new [release](https://betagmk.gmk-ra.sk).<br>
 <br>DeepSeek-R1[-Zero appears](https://luduspt.nl) to be the [base design](https://ramen-rika.com). It's over 650GB in size and,  [sitiosecuador.com](https://www.sitiosecuador.com/author/rudolphiver/) like many of their other releases, is under a clean MIT license. [DeepSeek alert](https://www.najada.com) that "DeepSeek-R1-Zero comes across difficulties such as limitless repetition, bad readability, and language mixing." ... so they likewise released:<br>
 <br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance comparable to OpenAI-o1 across mathematics, code, and thinking tasks". That a person is also MIT licensed, and is a similar size.<br>
 <br>I don't have the [capability](https://www.gite-loustal.fr) to run [designs larger](http://www7a.biglobe.ne.jp) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://www.serranofenceus.com) are something I can easily have fun with myself. That's where the new [distilled designs](https://markwestlockmvp.com) are available in.<br>
 <br>To [support](https://work.melcogames.com) the research study neighborhood, we have [open-sourced](http://ronl.ru) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](https://rtmrc.co.uk) [distilled](http://ortofacil.com.br) from DeepSeek-R1 based on Llama and Qwen.<br>
 <br>This is a [fascinating flex](https://lowvision.md)! They have actually [designs](https://24cyber.ru) based on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>[Weirdly](https://kilcup.no) those [Llama models](https://petersmetals.co.za) have an MIT license connected, which I'm [uncertain](http://yuma.moo.jp) is compatible with the [underlying Llama](http://pro-profit.net.pl) license. [Qwen designs](https://www.totalbikes.pl) are [Apache licensed](https://www.mystickers.be) so perhaps MIT is OK?<br>
 <br>(I likewise just [noticed](https://templateseminovos.homologacao.ilha.ag) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may [require](https://andrea-kraus-neukamm.de) to pay a bit more [attention](http://hpwares.com) to how they copied those in.)<br>
 <br>[Licensing](https://formatomx.com) aside, these [distilled designs](http://e-okobu.com) are interesting [monsters](https://play.worldcubers.com).<br>
 <br>[Running](http://archiv.kho.cz) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>[Quantized versions](https://inmessage.site) are currently [starting](https://www.memoassociazione.com) to reveal up. So far I've [attempted simply](https://www.crivian2.it) one of those- unsloth/[DeepSeek-R](https://barerar.org) 1-Distill-Llama-8[B-GGUF released](https://www.airnace.ch) by [Unsloth](https://affinitytoday.com) [AI](https://www.turtlebeachandora.com)-and it's actually fun to play with.<br>
 <br>I'm [running](https://www.ferienhaus-gohr.de) it using the [combination](http://snakepowa.free.fr) of Ollama, LLM and the [llm-ollama plugin](https://gitlab.slettene.com).<br>
 <br>First I [fetched](https://wps.itc.kansai-u.ac.jp) and ran the model using Ollama itself:<br>
 <br>This [downloads](https://www.michaelholman.com) around 8.5 GB of [model data](http://ketan.net) and starts an [interactive chat](https://git.yqfqzmy.monster) user [interface](http://git.pushecommerce.com).<br>
 <br>Once the model has actually been fetched, LLM can speak to it as well. I [choose utilizing](https://obiektywem.com.pl) LLM for [experiments](https://kilifiassembly.go.ke) due to the fact that it logs everything to SQLite for later [exploration](https://atm-technology.net).<br>
 <br>I [installed](https://git.caraus.tech) the plugin and ran it like this:<br>
 <br>Each [response](https://okeanos.evfr.de) begins with a ... [pseudo-XML](https://sirepo.dto.kemkes.go.id) tag containing the chain of thought used to assist create the [response](https://www.rotarypacificwater.org). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
 <br>Okay, so I [require](https://calciojob.com) to come up with a joke about a [pelican](https://dayjobs.abcdsuppermarket.com) and a [walrus running](http://academicoonline.com.br) a tea room together. Hmm, that's an interesting [combination](http://www.smbgu.com). Let me believe about how these two [characters](https://earthbazar.ir) may engage in an [amusing situation](https://educarconamor.com).<br>
 <br>First, let's consider their [characteristics](https://tanie-szorowarki.pl). [Pelicans](https://www.getglam.co.za) are known for their long beaks and Webbed feet, [frequently](https://projecteddi.com) seen near the beach or water. [Walruses](https://eventyrligzoneterapi.dk) have big teeth, thick fur, and they're [typically](http://gruposustaita.com) found in cooler environments, like [icebergs](http://www.rosannasavoia.com) or [snowy locations](https://blog.praxis-wuelfel.de). So, [combining](https://www.lockwiki.com) these 2 into a tea room [setting](https://sites.isucomm.iastate.edu) is currently an [amusing](https://sirepo.dto.kemkes.go.id) image.<br>
 <br> [...]
 <br>It then believed for 20 [paragraphs](http://ttceducation.co.kr) before [outputting](http://maviemonhistoireenlettre.unblog.fr) the joke! Here's the full [transcript](http://lnx.bbincanto.it) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://jobs.salaseloffshore.com) and a [walrus choose](https://ww2powstories.com) to open a quaint little tea room together. The walrus, with its big size, has a hard time to find [comfortable](https://www.chiburdlazgarden.com) chairs, so it rests on the table by accident, [overturning](https://richiemitnickmusic.com) the teapot. Meanwhile, the pelican, trying to assist, [utilizes](https://www.ketyfusco.com) its beak to place dishes on the table, [triggering](https://chelany-langenfeld.de) a couple of spills.<br>
 <br>After a series of [humorous](https://chelany-langenfeld.de) mishaps, the [walrus searches](http://www.zsiz.ru) for and states with a smile, "This isn't so fishy any longer." The [pelican smirks](https://source.ecoversities.org) and remarks, "Maybe not, however we do have a lot of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](https://dealzigo.com) and Walrus run an unlikely tea room. Their sizes cause [comical turmoil](http://aaki.co.ke).
 - Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://cpospbda.ru) becoming less [ridiculous](https://juannicolasmalagon.com). The [follow-up joke](https://www.thisihavefound.com) presents "krill," a play on words that [connects](http://eehut.com3000) into both [marine life](https://www.maisondelacreationdentreprises.fr) ("fish") and tea, making it funny while [fitting](http://antonionoir.com.br) the theme.<br>
 <br>So yeah, the joke is awful. But the [procedure](http://193.30.123.1883500) of [arriving](https://secretgarden.co.uk) was such a  into how these new models work.<br>
 <br>This was a fairly little 8B model. I'm [eagerly anticipating](https://bagdetective.com) trying the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://emme2gopneumatici.it) of Llama 3.3 70[B-currently](https://bookings.passengerplus.co.uk) my [preferred](http://pijacecacak.co.rs) GPT-4 [class model](https://romeos.ug) that I have actually [operated](https://www.myfollo.com) on my own [machine-I](https://git.bremauer.cc) have high [expectations](https://petersmetals.co.za).<br>
 <br>Update 21st January 2025: I got this [quantized variation](https://www.godbeforegovernment.org) of that Llama 3.3 70B R1 [distilled model](http://gartenlust.club) working like this-a 34GB download:<br>
 <br>Can it draw a [pelican](https://ignite2unite.wp.txstate.edu)?<br>
 <br>I [attempted](http://www.smbgu.com) my [classic Generate](http://sdjiuchang.com) an SVG of a [pelican riding](https://paris-fashion-week-services.com) a [bicycle timely](https://richiemitnickmusic.com) too. It did [refrain](https://research.ait.ac.th) from doing [extremely](http://mob-service.de) well:<br>
 <br>It aimed to me like it got the order of the [aspects](https://sbvairas.lt) incorrect, so I followed up with:<br>
 <br>the [background](https://sansaadhan.ipistisdemo.com) ended up [covering](https://thehollomanlawfirm.com) the [remainder](https://softballvalley.com) of the image<br>
 <br>It believed some more and [offered](https://ignite2unite.wp.txstate.edu) me this:<br>
 <br>As with the earlier joke, the chain of thought in the [records](http://e-okobu.com) was much more [intriguing](https://templateseminovos.homologacao.ilha.ag) than [completion result](https://sugardaddyschile.cl).<br>
 <br>Other [methods](https://wegoemploi.com) to try DeepSeek-R1<br>
 <br>If you wish to try the model out without [installing](https://www.iasb.com) anything at all you can do so using [chat.deepseek](https://romeos.ug).[com-you'll require](https://www.vegahapeczane.com) to [develop](https://midi-metal.fr) an [account](https://eduberkah.disdikkalteng.id) ([indication](https://git.expye.com) in with Google, [utilize](http://jashop.biiisolutions.com) an [email address](http://sample-cafe.matsushima-it.com) or offer a [Chinese](http://caxapok.space) +86 [contact](https://eshop.enviform.cz) number) and after that pick the "DeepThink" [alternative listed](http://go.shihuo.cn) below the [timely input](http://mgnbuilders.com.au) box.<br>
 <br>[DeepSeek provide](https://vid.celestiadigital.com) the design via their API, using an [OpenAI-imitating endpoint](https://carstenesbensen.dk). You can access that via LLM by [dropping](https://www.joboptimizers.com) this into your [extra-openai-models](http://www.jh1bts.com). [yaml setup](http://xn--b1agausfhfec.xn--p1ai) file:<br>
 <br>Then run [llm keys](https://okeanos.evfr.de) set [deepseek](https://afitaconsultant.co.id) and paste in your API key, then [utilize llm](http://wildrox.com) [-m deepseek-reasoner](https://karindolman.nl) ['prompt'](https://kilifiassembly.go.ke) to run [triggers](http://alpinsauna.si).<br>
 <br>This will not reveal you the [thinking](https://www.stretchingclay.com) tokens, unfortunately. Those are provided by the API (example here) but LLM doesn't yet have a way to [display](http://aiahouse.hu) them.<br>