DeepSeek-V4 Technical Report [pdf]

24 points
1/21/1970
14 hours ago
by tianyicui

Comments


creamyhorror

Two key quotes:

Reasoning: Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months. Furthermore, DeepSeek-V4-Flash-Max achieves comparable performance to GPT-5.2 and Gemini-3.0-Pro, establishing itself as a highly cost-effective architecture for complex reasoning tasks.

Agent: On public benchmarks, DeepSeek-V4-Pro-Max is on par with leading open-source models, such as Kimi-K2.6 and GLM-5.1, but slightly worse than frontier closed models. In our internal evaluation, DeepSeek-V4-Pro-Max outperforms Claude Sonnet 4.5 and approaches the level of Opus 4.5.

While they're some months behind closed SOTA (though benchmarks put them close), I wonder if Deepseek 4's longer context capabilities and kv-cache advantage will make up for this

14 hours ago

daemonologist

$1.47/M input, $3.48/M output, open weights (MIT license), and competitive with the frontier on their selected benchmarks. Big if it holds up on real-world tasks.

14 hours ago

nthypes

Insane! Price is amazing with Opus 4.6 frontier level.

14 hours ago

nthypes

Actually better than Opus 4.6 on Terminal Bench 2.0

14 hours ago