DeepSeek v3 beats Claude sonnet 3.5 and way cheaper

41 points
1/21/1970
18 hours ago
by helloericsf

Comments


patrickhogan1

It does not beat Claude Sonnet 3.5 on SWE Bench (42 to Claude's 50). It chooses 4 benchmarks of the 100s of available benchmarks and then decides it "beats" Claude Sonnet 3.5.

16 hours ago

helloericsf

14 hours ago

fragmede

what are the 100 coding benchmarks? I'm only aware of 7 and it beats Claude on 5 of them.

13 hours ago

patrickhogan1

I'm not aware of 100 coding benchmarks, but there are over 100 LLM benchmarks. This makes sense, as there will eventually be at least one benchmark for each human task.

In addition to automated benchmarks, there are also human-rated evaluations, such as Chatbot Arena.

I manually tested DeepSeek v3 against Claude 3.5 Sonnet. In my human evaluation, Claude 3.5 Sonnet outperformed DeepSeek v3, and it also outperforms DeepSeek v3 on SWE Bench. Therefore, the title of the post claiming "DeepSeek v3 beats Claude 3.5 Sonnet and is way cheaper" is wrong.

That said, I was surprised by how well it performed. Its fast. Ironically, I have a paid Claude Team Plan. At the same time I was conducting the evaluations, Claude was experiencing performance issues - https://status.anthropic.com and DeepSeek v3 was not. This is telling for the state of chip sale restrictions.

11 hours ago

sam_goody

What are the minimum and recommended amounts of RAM, hard disk space, CPU or GPU to run this locally.

As someone who just follows this stuff from afar, it is hard for me to conceptualize if this is a SaaS only model, or if it means we are getting to the point where you can have a A1 model on a local machine.

13 hours ago

Mithriil

Whole model is 671B parameters. Downloadable from Huggingface, with 163 LFS file of around 4.3GB. Around ~700GB total.

Recommended RAM: more than most PC.

9 hours ago