Mistral Small 4

74 points

1/21/1970

13 hours ago

by pember

Comments

zacksiri

I tested the model in an agentic workflow. Here is the report:

https://upmaru.com/llm-tests/simple-tama-agentic-workflow-q1...

4 hours ago

Reubend

Seems like it does quite well on that particular benchmark?

2 hours ago

zacksiri

It's ok, it's not the best. There are models that do better, I'd use it for some basic tasks but not actual complex tasks like query generation and retrieval.

an hour ago

kristianp

Interesting that they target around 120 billion parameters. Just enough to fit onto a single H100 with 4 bit quant. Or 128GB APU like apple silicon, AMD AI cpus or the GB spark.

Copying GPT-OSS-120b?

Available to try at https://build.nvidia.com/mistralai/mistral-small-4-119b-2603

5 hours ago

revolvingthrow

I really wish the benchmarks were even slightly trustworthy for AI models. ~120B are the largest models I can run locally. Naturally I grabbed the 122B Qwen3.5, which had great benchmarks and… frankly, the model is garbage, worse than glm air 4.5 IMO. But then, qwen famously benchmaxxes.

And here we have another release. The benchmarks are just a tiny bit worse than qwen3.5 (for far less tokens). Am I to take it that the model is worse? Or does qwen’s benchmaxxing mean that slightly worse result of non-qwen models means a better model? I’d rather not spend hours testing things myself for every noteworthy release.

Ah well. Mistral has been fairly decent so worth taking a look. Obviously they’re behind the big 3, but in my experience their small models are probably the best you can get for several months after each release. I’m not sure how it works as a sales funnel for their paid models, same as with chinese models - people likely just go for google/openai/anthropic in this case - but I’m thankful for their existence.

4 hours ago

2001zhaozhao

Which Haiku model are they comparing to? Is it 4.5? In which case it's absolutely wild that Qwen3.5 122B is shredding it in those graphs

11 hours ago

adt

https://lifearchitect.ai/models-table/

10 hours ago