DeepSeek v3 beats Claude sonnet 3.5 and way cheaper
Comments
helloericsf
patrickhogan1
It does not beat Claude Sonnet 3.5 on SWE Bench (42 to Claude's 50). It chooses 4 benchmarks of the 100s of available benchmarks and then decides it "beats" Claude Sonnet 3.5.
helloericsf
True. More benchmark metrics here: https://x.com/deepseek_ai/status/1872242657348710721/photo/2
fragmede
what are the 100 coding benchmarks? I'm only aware of 7 and it beats Claude on 5 of them.
patrickhogan1
I'm not aware of 100 coding benchmarks, but there are over 100 LLM benchmarks. This makes sense, as there will eventually be at least one benchmark for each human task.
In addition to automated benchmarks, there are also human-rated evaluations, such as Chatbot Arena.
I manually tested DeepSeek v3 against Claude 3.5 Sonnet. In my human evaluation, Claude 3.5 Sonnet outperformed DeepSeek v3, and it also outperforms DeepSeek v3 on SWE Bench. Therefore, the title of the post claiming "DeepSeek v3 beats Claude 3.5 Sonnet and is way cheaper" is wrong.
That said, I was surprised by how well it performed. Its fast. Ironically, I have a paid Claude Team Plan. At the same time I was conducting the evaluations, Claude was experiencing performance issues - https://status.anthropic.com and DeepSeek v3 was not. This is telling for the state of chip sale restrictions.
sam_goody
What are the minimum and recommended amounts of RAM, hard disk space, CPU or GPU to run this locally.
As someone who just follows this stuff from afar, it is hard for me to conceptualize if this is a SaaS only model, or if it means we are getting to the point where you can have a A1 model on a local machine.
Mithriil
Whole model is 671B parameters. Downloadable from Huggingface, with 163 LFS file of around 4.3GB. Around ~700GB total.
Recommended RAM: more than most PC.
HF link: https://huggingface.co/deepseek-ai/DeepSeek-V3 Aider link: https://aider.chat/docs/leaderboards/ Pricing($0.14/$0.28 per 1M tokens) reference:https://x.com/xingyaow_/status/1872145835699691675?ref_src=t... LiveBench via reddit: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....