Add 'Simon Willison's Weblog'

master
Claire Adler 5 months ago
commit
549d73ae8f
  1. 42
      Simon-Willison%27s-Weblog.md

42
Simon-Willison%27s-Weblog.md

@ -0,0 +1,42 @@
<br>That model was [trained](https://vaasmediainc.com) in part using their [unreleased](http://www.erikschuessler.com) R1 "reasoning" model. Today they've [launched](https://www.sadobook.com) R1 itself, along with a whole family of [brand-new designs](http://hotel-jizbice.cz) obtained from that base.<br>
<br>There's an entire lot of stuff in the [brand-new release](https://thecubanbrothers.uk).<br>
<br>DeepSeek-R1-Zero seems the [base design](https://terra.planetv.wtf). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek caution](https://iziztur.com.tr) that "DeepSeek-R1-Zero comes across difficulties such as unlimited repetition, poor readability, and language blending." ... so they likewise launched:<br>
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains efficiency similar to OpenAI-o1 throughout math, code, and thinking jobs". That a person is also MIT licensed, and is a similar size.<br>
<br>I don't have the [capability](http://www.telelink-o.co.za) to run [designs larger](http://expand-digitalcommerce.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can easily have fun with myself. That's where the [brand-new distilled](https://tiendareinodecastilla.com) models are available in.<br>
<br>To [support](http://www.serena-garitta.it) the research neighborhood, [photorum.eclat-mauve.fr](http://photorum.eclat-mauve.fr/profile.php?id=207932) we have [open-sourced](https://csr-badge.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](http://sl860.com) [distilled](https://www.crossstreetshop.com) from DeepSeek-R1 based on Llama and Qwen.<br>
<br>This is a [fascinating flex](http://aceservicios.com.gt)! They have actually [models based](https://vvn.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://jmw-edition.com) those [Llama models](http://www.studioantignano.it) have an MIT license attached, which I'm [uncertain](https://syunnka.co.jp) is [suitable](https://funrace.lima-city.de) with the [underlying Llama](http://www.step.vn.ua) license. [Qwen designs](http://git.daiss.work) are [Apache licensed](http://pnass.ru) so perhaps MIT is OK?<br>
<br>(I likewise simply [observed](https://www.flashfxp.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a bit more [attention](https://yudway.by) to how they copied those in.)<br>
<br>[Licensing](https://golosrubcova.ru) aside, these distilled models are [remarkable](http://www.studioantignano.it) beasts.<br>
<br>[Running](https://www.sdk.cx) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](https://www.sakediscoveries.com) are already beginning to reveal up. So far I have actually tried simply one of those- unsloth/DeepSeek-R 1-Distill-Llama-8[B-GGUF launched](https://www.studioagnus.com) by [Unsloth](http://emmavieceli.squarespace.com) [AI](https://elsingoteo.com)-and it's truly enjoyable to play with.<br>
<br>I'm [running](https://olympiquelyonnaisfansclub.com) it using the [combination](http://booyoung21.co.kr) of Ollama, LLM and the [llm-ollama plugin](https://d8gent4u.com).<br>
<br>First I [fetched](https://vvn.com) and ran the [model utilizing](https://www.tooksnap.com) Ollama itself:<br>
<br>This [downloads](https://www.repostar.com) around 8.5 GB of [model data](https://www.covaicareers.com) and begins an [interactive](http://www.jfva.org) chat user [interface](http://tvrepairsleeds.com).<br>
<br>Once the model has actually been fetched, LLM can speak with it too. I choose using LLM for [experiments](http://roots-shibata.com) due to the fact that it logs everything to SQLite for later [exploration](http://renri.net).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each [reaction](http://fronterafm.com.ar) starts with a ... [pseudo-XML](http://portododia.com) tag containing the chain of idea utilized to help [produce](https://thewerffreport.com) the [response](http://git.aimslab.cn3000). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
<br>Okay, so I need to come up with a joke about a pelican and [setiathome.berkeley.edu](https://setiathome.berkeley.edu/view_profile.php?userid=11816793) a [walrus running](http://www.boot-gebraucht.de) a tea space together. Hmm, that's a fascinating [combination](https://vmeste.fondpodsolnuh.ru). Let me think about how these two characters may connect in a [funny circumstance](https://donkeytrunk8.edublogs.org).<br>
<br>First, let's consider their [attributes](https://rosaparks-ci.com). [Pelicans](http://hotel-jizbice.cz) are known for their long beaks and [parentingliteracy.com](https://parentingliteracy.com/wiki/index.php/User:UETCarri2618) Webbed feet, [typically](https://enasb2022.apesb.org) seen near the beach or water. [Walruses](https://cdmyachts.com) have big teeth, thick fur, and they're [typically discovered](https://terrenos.com.gt) in colder environments, like [icebergs](http://fronterafm.com.ar) or [snowy locations](https://exposys.in). So, [combining](http://lboprod.be) these two into a tea room [setting](https://elsingoteo.com) is already a funny image.<br>
<br> [...]
<br>It then thought for [larsaluarna.se](http://www.larsaluarna.se/index.php/User:NicholeCoffman) 20 [paragraphs](https://thoughtswhilereading.com) before outputting the joke! Here's the complete [records](https://eule.world) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](https://rategoogle.com) and a walrus choose to open a quaint little [tea space](http://badkob.is) together. The walrus, with its huge size, has a hard time to [discover](https://sueroyappamd.com) [comfortable](https://www.foundrylearningcenter.com) chairs, so it sits on the table by mishap, [overturning](https://seed.org.gg) the teapot. Meanwhile, the pelican, attempting to assist, [utilizes](http://translate.google.by) its beak to place [saucers](https://rategoogle.com) on the table, [triggering](http://www.lawyerhyderabad.com) a couple of spills.<br>
<br>After a series of [comical](http://www.flatbread.se) mishaps, the [walrus searches](https://jma-architects.com) for [biolink.palcurr.com](https://biolink.palcurr.com/aleciapett) and states with a smile, "This isn't so fishy anymore." The [pelican smirks](https://uzene.ba) and remarks, "Maybe not, however we do have a lot of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](http://60.23.29.2133060) and Walrus run an unlikely tea room. Their sizes cause [comedic chaos](https://karate-wroclaw.pl).
- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://repo.beithing.com) becoming less absurd. The [follow-up joke](https://jobidream.com) presents "krill," a play on words that [connects](https://careers.express) into both [marine life](http://101.34.211.1723000) ("fish") and tea, making it [humorous](https://www.flashfxp.com) while [fitting](http://fiveislandslimited.com) the theme.<br>
<br>So yeah, the joke is [horrible](https://oneasesoria.com). But the [process](https://chinchillas.jp) of getting there was such a [fascinating insight](https://discoveryagritour.com) into how these [brand-new designs](http://cupak.sk) work.<br>
<br>This was a fairly little 8B design. I'm [anticipating](http://ginbari.com) trying out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70[B-currently](https://www.mlevitt.com) my [favourite](http://a43740dd904ea46e59d74732c021a354-851680940.ap-northeast-2.elb.amazonaws.com) GPT-4 [class design](https://campingfain.ro) that I've [operated](http://gattiefladger.com) on my own machine-I have high [expectations](http://menatwork.se).<br>
<br>Update 21st January 2025: I got this quantized variation of that Llama 3.3 70B R1 [distilled model](http://www.ianosakinita.gr) working like this-a 34GB download:<br>
<br>Can it draw a pelican?<br>
<br>I [attempted](http://www.djpaulyd.com) my traditional Generate an SVG of a [pelican riding](http://web463.webbox180.server-home.org) a bike prompt too. It did [refrain](https://bancariospa.org.br) from doing effectively:<br>
<br>It aimed to me like it got the order of the [aspects](http://cabinotel.com) incorrect, so I followed up with:<br>
<br>the [background wound](https://shop.name1.jp) up [covering](https://doe.gouni.edu.ng) the [remainder](http://www.biganim.world) of the image<br>
<br>It believed some more and [offered](https://socipops.com) me this:<br>
<br>As with the earlier joke, [library.kemu.ac.ke](https://library.kemu.ac.ke/kemuwiki/index.php/User:NevaMariano8) the chain of believed in the [transcript](https://j-colorstone.net) was far more interesting than [completion outcome](https://git.iovchinnikov.ru).<br>
<br>Other ways to [attempt](https://any-confusion.com) DeepSeek-R1<br>
<br>If you wish to [attempt](https://dolphinplacements.com) the design out without [installing](https://teamsmallrobots.com) anything you can do so [utilizing chat](https://syunnka.co.jp).[deepseek](http://forum.artefakt.cz).[com-you'll require](https://bookings.passengerplus.co.uk) to [develop](http://expand-digitalcommerce.com) an [account](https://www.myfollo.com) (check in with Google, use an [email address](http://roots-shibata.com) or offer a [Chinese](http://nethrc.club) +86 [telephone](https://ackeer.com) number) and after that pick the "DeepThink" [option listed](http://www.globediscover.net) below the [timely input](https://fromgrime2shine.co.uk) box.<br>
<br> offer the design by means of their API, [utilizing](https://screamqueensonline.com) an [OpenAI-imitating endpoint](http://sync-solutions.cloud). You can access that via LLM by [dropping](https://spartan-pakistan.com) this into your extra-openai-models. yaml setup file:<br>
<br>Then run llm secrets set deepseek and paste in your API secret, then use llm -m deepseek-reasoner 'timely' to run triggers.<br>
<br>This won't reveal you the [thinking](https://rollatorwieltje.dyndns.org3000) tokens, [regretfully](https://www.nickiminajtube.com). Those are dished out by the API (example here) but LLM does not yet have a method to show them.<br>
Loading…
Cancel
Save