Simon Willison's Weblog
Addie Hindmarsh mengedit halaman ini 4 bulan lalu


That model was trained in part utilizing their unreleased R1 "thinking" model. Today they have actually released R1 itself, along with an entire family of brand-new models obtained from that base.

There's a whole lot of things in the brand-new release.

DeepSeek-R1-Zero appears to be the base design. It's over 650GB in size and, bytes-the-dust.com like the majority of their other releases, is under a tidy MIT license. DeepSeek warn that "DeepSeek-R1-Zero experiences difficulties such as endless repeating, bad readability, and language mixing." ... so they also launched:

DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency comparable to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is likewise MIT accredited, and is a similar size.

I don't have the ability to run models larger than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can easily play with myself. That's where the brand-new distilled designs are available in.

To support the research study community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick from DeepSeek-R1 based upon Llama and Qwen.

This is an interesting flex! They have models based upon Qwen 2.5 (14B, 32B, Math 1.5 B and [users.atw.hu](http://users.atw.hu/samp-info-forum/index.php?PHPSESSID=d4e5384bdfb694f82041ab1796e52084&action=profile