Show HN: Mdarena – Benchmark your Claude.md against your own PRs
22 points
1/21/1970
a day ago
by hudsongr
Comments
hudsongr
a day ago
Powdering7082
So by default it pulls out recent PRs, grabs only the tests that were commited and then checks to see how well an agent with different claude.mds can finish the test suite?
16 hours ago
aszen
This is quite interesting, will try it. I kind of expect this to be done continuously as the code base changes.
a day ago
geiser
Could be a neat addition to LynxPrompt. Quite interesting, thanks!
a day ago
Hey! I built this because everyone's writing CLAUDE.md files now but nobody knows if theirs actually works. The research is contradictory too, one paper says they hurt performance, another says they help. So I made a tool that just measures it on your own repo using your own PRs and your own test suite.
Turns out it's not often you can point to a single markdown file and say "this made the agent 27% better at resolving real tasks." That's what we saw on our production monorepo. I imagine this as a way for teams to actually make their agents write better code instead of guessing.