Claude Managed Agents

168 points

1/21/1970

2 days ago

by adocomplete

Comments

jameslk

We're in the early days of agentic frameworks, like the pre-PHP web. CGI scripts and webmasters. Eventually the state-of-the-art will slow down and we'll eventually have something elegant like Rails come out.

Until then, every agent framework is completely reinvented every week due to new patterns and new models. evals, ReACT, DSPy, RLM, memory patterns, claws, dynamic context, sandbox strategies. It seems like locking in to a framework is a losing proposition for anyone trying to stay competitive. See also: LangChain trying to be the Next.js/Vercel of agents but everyone recommending building your own.

That said, Anthropic pulls a lot of weight owning the models themselves and probably an easier-to-use solution will get some adoption from those who are better served by going from nothing to something agentic, despite lock-in and the constant churn of model tech

2 days ago

dmix

Completely agree re: AI chatbot/RAG being just like the pre-PHP web world. There's a hundred half baked solutions floating on blogs and github but not a coherent dominant framework that puts it all together properly. Langchain is close but still feels a bit abstract and DIY.

That plus everyone is using 5 different vector DBs and reranking models from different vendors than the answer models etc.

2 days ago

gck1

I believe framework is simply never, ever going to work for LLM-based agentic workflows.

Framework is simply way too rigid for a non-deterministic technology.

We may see libraries that provide tools for managing agents, but then again, there's nothing that tmux can't do already.

a day ago

harlequinetcie

I'm a bit at odds with this.

I agree a framework is something that sounds outdated.

I also believe an orchestrator is needed. Something that abstracts you from a specific provider. Like hardware, drivers and operating systems.

Right now, my thoughts are on that line: Who will build that operating system? Who will have it in the cloud?

It needs to be robust to operate for large organizations, open source, and sit on top of any provider.

Right now we are seeing BSD vs GNU/Linux vs DOS kind of battles.

a day ago

suncemoje

I've been using the OpenAI Agents SDK for a while now and am largely happy with the abstractions - handoffs/sub-agents, tools, guardrails, structured output, etc. Building the infra and observability, and having it scale reliably, was a bigger pain for me. So I do get Anthropic's move into managed agents.

a day ago

cedws

I saw this coming. Anthropic wants to shift developers on to their platform where they’re in control. The fight for harness control has been terribly inconvenient for them.

To score a big IPO they need to be a platform, not just a token pipeline. Everything they’re doing signals they’re moving in this direction.

2 days ago

spwa4

Well, that sucks. Replacing the harness with something task-specific has proven very powerful in my usecases.

But you should correct: Claude is very happy to let you use whatever you want for a harness ... as long as you're on a pay-as-you-go plan. So it's not blocked. It's just not allowed on the $20 per month plan.

First, harnesses can give access to company internal tools (like the ticket queue). You could do this with MCP, but it's much harder, slower and it kind of resists doing this (if you want a bot to solve a ticket, why not start with an entire overview of the ticket in the first request to your model? This can't easily be done with MCP)

Second harnesses can direct the whole process. A trivial example is that you can improve performance in a very simple way: ask "are you sure?" showing the model what it intends to do, BEFORE doing it. Improves performance by 10%, right there. Give a model the chance to look at what it's doing and change it's mind before committing. Then ask a human the same question, with a nice yes/no button. Try that with MCP.

Of course you quickly find a million places to change the process and then you can go and meta-change the process. Like asking an AI what steps should be followed first, then do those steps, most of whom are AI invocations with parts of the tickets (say examine the customer database, extract what's relevant to this problem, ...). Limiting context is very powerful, and not just because it gets you cheaper requests. Get an AI to make relevant context for a particular step before actually doing that step ...

2 days ago

gck1

> A trivial example is that you can improve performance in a very simple way: ask "are you sure?" showing the model what it intends to do, BEFORE doing it. Improves performance by 10%

Put it into the "are you sure" loop and you'll see the model will just keep oscillating for eternity. If you ask the model to question the output, it will take it as an instruction that it must, even if the output is correct.

a day ago

spwa4

Not in my experience. I mean, it happens. But models can check if their own function calls are reasonable. And that doesn't require dropping the context cache, so it's a lot less expensive than you would probably initially think.

a day ago

mdrachuk

It’s all good until your production agents deployment has a single 9 uptime. I use Claude Code as my main coding harness daily but making customers reliant on Anthropic software is a big no-no. Quality engineering is just not their thing.

2 days ago

0o_MrPatrick_o0

I’ve been building my own version of this. It’s a bit shocking to see parallel ideation.

FWIW- IMO, being locked into a single model provider is a deal breaker.

This solution will distract a lot of folks and doom-lock them into Anthropic. That’ll probably be fine for small offices, but it is suicidal to get hooked into Anthropic’s way of doing things for anything complex. IME, you want to be able to compare different models and you end up managing them to your style. It’s a bit like cooking- where you may have greater affinity for certain flavors. You make selection tradeoffs on when to use a frontier model on design & planning vs something self hosted for simpler operations tasks.

2 days ago

TIPSIO

FWIW everyone is also building a version of this themselves. Only so many directions to go

2 days ago

rtuin

Most definitely. Although I haven’t found an (F)OSS project that lets one easily ship [favorite harness SDK] to self-hosted platform yet.

Which projects are standing out in this space right now?

2 days ago

jawiggins

Shameless self promo but, I've been working on Optio specifically for coding, it works by taking any harness you want and tasking it to open Github/lab PRs based on notion/jira/linear tickets, see: https://news.ycombinator.com/item?id=47520220

It works on top of k8s, so you can deploy and run in your own compute cluster. Right now it's focused only on coding tasks but I'm currently working on abstractions so you can similarly orchestrate large runs of any agentic workflow.

2 days ago

Jayakumark

@jawiggins saw your repo it looks like openAI symphony but better as it works across multiple agents and issue trackers and the feedback loop is great . One feature request though - can you add plan mode ? Your issues are so detailed it becomes plan to implement (but I guess your plan mode is currently happening outside of GitHub issues ) but let’s say issue is “implement support for plan mode” there should be back and forth with agent with issue tags pointing to opus max and/or plan mode - so we can correct agents plan back and forth and once tag is removed it can start implementing or something similar ?

a day ago

jawiggins

Thanks for the feedback. Earlier I expected I'd need to do more back and forth with the agents before accepting their work but in general I've found it isn't needed.

I do have some features coming up that will improve the ability to converse with the agent as it's running. I'll make a note to add in a plan setting so you can have that run and converse before it gets going.

20 hours ago

Tarcroi

I've been building exactly this. it's Open source, multi-model (5 providers with fallback), from now, it runs locally but the architecture is designed for self-hosted deployment.

2 days ago

gnz00

[dead]

2 days ago

deet

Do you think it's unwise for companies to lock in because they would be better served and get better results by picking and choosing models? Or because by running your business on a single closed provider like Anthropic, you're giving them telemetry they can use to optimize their models and systems to then compete with you later?

2 days ago

0o_MrPatrick_o0

I think it’s unwise because Model reliability is transient.

When the models have an off day, the workflows you’ve grown to depend upon fail. When you’re completely dependent on Anthropic for not only execution but troubleshooting- you’re doomed. You lose a whole day troubleshooting model performance variability when you should have just logged off and waited. These are very cognitively disruptive days.

Build in multi-model support- so your agents can modify routing if an observer discovers variability.

2 days ago

dakolli

Its unwise because they are going to have a 5-10k a month bill on enterprise pricing, whereas, for $6-10k a month you can rent and run your own hardware and get a solid 3-4 concurrent sessions for your engineers with a 1T param OS model and save thousands per developer a month.

2 days ago

dakolli

I'm the same, and its relatively trivial to build these types of systems on top of aggregators like openrouter.

2 days ago

mccoyb

I'm suspicious that this is going to lead to optimal orchestration ... or rather, that open source won't produce a far better alternative in time.

The best performance I've gotten is by mixing agents from different companies. Unless there is a "winner take all" agent (I seriously doubt it, based on the dynamics and cost of collecting high quality RL data), I think the best orchestration systems are going to involve mixing agents.

Here, it's not about the planner, it's about the workers. Some agents are just better at certain things than others.

For instance, Opus 4.6 on max does not hold a candle to GPT 5.4 xhigh in terms of bug finding. It's just not even a comparison, iykyk.

Almost analogous to how diversity of thought can improve the robustness of the outcomes in real world teams. The same thing seems to be true in mixture-of-agent-distributions space.

2 days ago

mccoyb

Another way to think about it:

For Anthropic to have the best version of this software, they'd have to simultaneously ... well, have the best version of the software, but also beat every other AI company at all subtasks (like: technical writing, diagramming, bug finding -- they'd need to have the unequivocal "best model" in all categories).

Surely their version is not going to allow you to e.g. invoke Codex or what have you as part of their stack.

2 days ago

cyanydeez

They would also have to prevent all access from the model being used to beat the model..

2 days ago

gck1

I think opus does in fact, find the bugs the same way GPT xhigh (or even high) does. It just discards them before presenting to the user.

Opus is designed to be lazy, corner-cutting model. Reviews are just one place where this shows. In my orchestration loop, opus discards many findings by GPT 5.4 xhigh, justifying this as pragmatism. Opus YAGNIs everything, GPT wants you to consider seismic events in your todo list app. There's sadly, nothing in between.

a day ago

sjdv1982

My fear is that this is going to lead to an optimal orchestration language. For example, that Claude switches to Sumerian for all communication between agents. One thing is if they try to silo like that, but my real fear is that it may actually perform well.

(Not sure if it would be Sumerian, Esperanto or something more artificial. As long as it is esoteric enough for one company to hoard all the expertise in it.)

2 days ago

mrbungie

I've seen Antigravity outputting chinese characters in its thinking traces from time-to-time.

I also remember chinese being discussed as a potential orchestrating language but I don't remember the sources, so 100% anecdotical.

2 days ago

intothemild

Yeah this has been my experience too, mixing agents/models from different companies..

Having Opus write a spec, then send to Gemini to revise, back to Opus to fix, then to me to read and approve..

Send to a local model like Qwen3.5 to build, then off to Opus to review ...

This was such an amazing flow, until Anthropic decided to change their minds.

2 days ago

lbreakjai

This is still very much doable. This is exactly how I'm working. I'm using opencode with a mixture-of-agents I built (https://github.com/tessellate-digital/notion-agent-hive), where the model behind each agent is configurable.

2 days ago

gck1

You can still do all of this. With tmux. Nothing anthropic can do about that.

Gemini cli is horrible though.

a day ago

cyanydeez

I'm fairly certain these AI companies are lobsters in a bucket. Every time one of them products a private model, they'll all use access to that model to generate improvements _and then publish those improvements_ as a way to hamstring the cornering of that market.

So, that'll go on until they form a cartel and become the wizard of oz.

2 days ago

ziml77

Those agents did such a wonderful job making and deploying this page that the testimonials are unreadable because each spot has two of them overlapping.

2 days ago

adrian_b

That is also true for me with Firefox on Linux.

However, Vivaldi on Linux renders it correctly, so I assume that it probably works right only in Chrome and completely compatible browsers.

Unfortunately, there are still a lot of Web sites that work right in a single browser, usually Chrome, so I always must keep around at least 2 or 3 different browsers.

2 days ago

Revisional_Sin

I just have a black page.

2 days ago

jguetzkow

Interesting that the entire discussion here is about orchestration, vendor lock-in, and model selection — but nobody is asking about the output. These agents run for hours, write code across multiple repos, and open PRs autonomously. Anthropic built solid infrastructure governance: sandboxing, scoped permissions, execution tracing. That's the "can the agent access this system safely?" question, answered. But there's a different question that nobody seems to be answering: is the generated code actually correct? Not syntactically — it almost always is. I mean semantically. Does it reference database fields that actually exist in the schema? Does it call API routes that are actually defined? Does it handle env variables that are actually set? Does it meet compliance requirements that apply to the system it's modifying? 45% of AI-generated code contains security vulnerabilities (Veracode). Code duplication has quadrupled (GitClear). And we're now scaling this with autonomous agents that run unsupervised for hours. The orchestration problem is getting solved. The governance-of-output problem is wide open. That's the layer that's missing.

a day ago

patrickkidger

I'm not sure if I'm about to be the old man yelling at clouds, but Anthropic seem to be 'AWS-ifying'. An increasing suite of products which (at least to me) seem to undifferentiated amongst themselves, and all drawn from the same roulette wheel of words.

We've got Claude Managed Agents, Claude Agent SDK, Claude API, Claude Code, Claude Platform, Claude Cowork, Claude Enterprise, and plain old 'Claude'. And honourable mention to Claude Haiku/Sonnet/Opus 4.{whatever} as yet another thing with the same prefix. I feel like it's about once a week I see a new announcement here on HN about some new agentic Claude whatever-it-is.

I have pretty much retreated in the face of this to 'just the API + `pi` + Claude Opus 4.{most recent minor release}', as a surface area I can understand.

2 days ago

esaym

The website is solid black on Firefox mobile for android. Maybe they should get an agent on that.

2 days ago

jorl17

Anthropic's website is always completely broken for me on Zen (a firefox derivative). I used to think it was an extension, but even without extensions it often just shows blank pages.

2 days ago

raggi

Same for me in firefox and chrome. I'm sure it's one of the DNS block lists I have and some really crappy marketing tracking code.

Edit: confirmed, loads with a public DNS provider that has no blocklists.

2 days ago

ed_mercer

Works fine here, also on Firefox mobile (149.0.1) on Android 16.

2 days ago

JLO64

As someone who spins up docker containers where I use the Anthropic Agentic SDK to build Jekyll websites for customers, I don’t see much of an appeal. I didn’t find it that difficult to set up the infrastructure, the hard part was getting the agents to do exactly what I wanted. Besides, eventually I might want to transition away to another provider (or even self hosting) so I’d prefer having that freedom.

2 days ago

rick1290

Not quite sold on this. I'm going to stick with pydantic ai and dbos/temporal/celery. I do not want to be vendor locked into one of these players. I want to work with absoluately any llm I want... I think we need to keep pushing for best in class open source orchestrtion and not get sucked into this platforms.

2 days ago

tailsdog

Looks great, I can't wait to use it. I imagine it could become very expensive for certain workflows, it will probably be like AWS where if you're not careful with the setup and watching what you're doing it will spin up 1000s of agents and rack up huge bills! It's going to be a massive money spinner!

2 days ago

dangoodmanUT

This was inevitable, I called this a few weeks ago [1]. It’s an easy way to increase revenue without making the models smarter, and lock you in harder

https://danthegoodman.substack.com/p/where-agents-converge

2 days ago

codinhood

I wonder how long until Claude/OpenAI eat a lot of the current AI/Agent SaaS's lunch.

Originally I thought they would stick towards being a model provider mainly, but with all the recent releases it seems they do want to provide more "services."

Wonder what part of the market 3rd party apps will build a moat around?

2 days ago

spiderfarmer

I cloned a product today that does the 20% of a product my client needed. It took 8 hours and will save my client 2k a month in licensing fees. Plus, I can now add the features they were missing in the original product.

There's a lot of money to be made in small business automation right now.

2 days ago

ergocoder

Probably never. There are a couple reasons:

1. We pay for saas, so we don't have to manage it. If you vibe-code or use these AI things, then you are managing it yourself.

2. Most Saas is like $20-$100/month/person for most Saas. For a software engineer, that maybe <1h of pay.

3. Most Saas require some sort of human in the loop to check for quality (at least sampling). No users would want to do that.

Number 2 is the biggest reason. It's $20 a month.... I'm not gonna replace that with anything.

Writing this message already costs more than $20 of my time.

I predict that the market will get bigger because people are more prone to automate the long-tail/last-mile stuff since they are able to

2 days ago

alwillis

> 1. We pay for saas, so we don't have to manage it. If you vibe-code or use these AI things, then you are managing it yourself.

> 2. Most Saas is like $20-$100/month/person for most Saas. For a software engineer, that maybe <1h of pay.

    |Segment                   |Median Enterprise Price                   |
    |--------------------------|------------------------------------------|
    |Mid-market                |~$175/user/month                          |
    |Enterprise (<100 seats)   |~$470/seat/month implied (~$47K ACV)      |
    |Enterprise (100-500 seats)|~$312–$1,560/seat/month range (~$156K ACV)|

Enterprise contracts almost always include a platform fee on top of per-seat costs (67% of contracts), plus professional services that add 12–18% of first-year revenue.

So for a lot of companies, it's worth using AI to create a replacement.

2 days ago

mrbungie

> So for a lot of companies, it's worth using AI to create a replacement.

I'll add the nuance that those might be big companies with slack capacity, or at least firms that already are at a point in their effort/performance curve where marginal effort injections in their core business are not worthy enough (a point that, without being big companies, would be actually weird). Even with AI and as processes become more efficient effort is at premium, and depending on your firm situation an man-hour used in your business might be a better use of effort and time that using it on non-core services.

2 days ago

alwillis

AI is predicted to continue eating SaaS: https://www.bain.com/insights/why-saas-stocks-have-dropped-a...

2 days ago

codinhood

Interesting, so you're saying Anthropic/Openai/etc will get a general solution that won't be hands off. The moat for other companies will be creating the specific, managed solution.

I can see that, assuming models don't make some giant leap forward.

2 days ago

spiderfarmer

Your vision on the market for this is skewed by the fact that you're probably overpaid.

2 days ago

[deleted]

2 days ago

suncemoje

I am wondering if this is (or what else will be) the last piece of software + infra that's needed to "automate it all" and have non-technical people build, run, and maintain it? To me the all this agentic workflow automation is headed that way. Am I missing something?

a day ago

johnphilipp

[dead]

a day ago

lambdanodecore

The next $100B buisness model in 2026 is AaaS (Agent as a Service).

2 days ago

dennisy

Let’s just shorten it to AaS?

2 days ago

woah

agentic software services

2 days ago

marknutter

Agent Solution Services

2 days ago

_pdp_

I suspect this is effectively programatic access to the same infrastructure used by Claude Desktop when it needs to run jobs in the cloud on the Anthropic servers... with added configurability and observations.

In other words, it is designed for companies to build on top of the Anthropic platform. Fo example, you are a SaaS and you want to build a way of running agents programatically for your customers, they basically offer a solution. It is not for personal use although you can certainly do so if you are prepared to pay the price for the API.

The downside is obviously this is locked to Anthropic models.

The other downsides is that the authentication story at the moment is underwhelming, hacking, and dare I say, insecure. I have a few reservations.

We already have this platform and I am putting together and open-source example how to create your own version of this.

Anthropic models are great but there are plenty of open-source models too and frankly agents do not need to run like claude code in order to be successful at whatever they need to do. The agent architecture entirely depends on the problem domain in my own experience.

2 days ago

siva7

> With Managed Agents, you define outcomes and success criteria, and Claude self-evaluates and iterates until it gets there (available in research preview, request access here). It also supports traditional prompt-and-response workflows when you want tighter control.

Call me stupid, but this sounds not like they want software developers to be around in a year or two.

2 days ago

baal80spam

But that's exactly what Dario Amodei (Anthropic CEO) wants.

2 days ago

Sol-

In addition to the managed interface for agent configuration and so on, is the novelty that all the agents run on Anthropic's infra? Sort of like Claude Code on the Web? If so, interesting that they move up the stack, from just a provider of an intelligence API to more complex deployed products.

2 days ago

llmslave

This is going to grow into a sophisticated platform, and is what will eventually compete head on with saas. I dont think companies will build their own agents, aside from looping in tools. As the models improve, there will be less hand holding. This could end up competing with AWS/GCP

2 days ago

mrbungie

They need to offer more 9s of availability before this happens though.

2 days ago

lurker919

Exactly my thoughts, AWS is due for a large rewrite/ground up rewrite from first principles to be able to fully utilize LLMs/agentic capabilities.

2 days ago

llmslave

yeah, alot of the services dont make as much sense

2 days ago

siva7

What exactly makes you think that AWS & co. don't have already two competing Agents-as-a-Service Platforms at any time?

2 days ago

llmslave

Anthropic is very far ahead on agentic engineering. There is more to getting it to work than it looks, and their models might be directly trained to know how to use the claude code harness.

But beyond that, AWS is a very complex platform. Agents simplify saas, the agent itself manages the api calls, maybe the database queries, more of the logic. As software moves into the agent, you need less cloud capability, and a better agent harness/hosting. Essentially, this makes the AWS platform obsolete, most services make much less sense.

2 days ago

aoliveira

They keep calling this the first solution of this kind...obviously Anthropic is a much larger company, but https://smith.langchain.com/ has this...and had for a while, or am I missing something?

2 days ago

yalogin

This is actually really nice from anthropic. They are aggressively owning the entire development stack for every swe. They become the default development platform. Automatic recurring revenue too and I am sure they will come up with more categories of subscriptions too.

2 days ago

baq

I assume Mythos, if ever released to the wide public at all, will only be available in the Claude cloud harness. (Not counting special enterprise and government contracts naturally.)

2 days ago

schappim

Folks suspect Mythos is an unaligned Opus 4.7

2 days ago

2001zhaozhao

it's probably not, they are hiking the price to 5x for companies with access to it. (or 1.67x of Opus 4.1)

2 days ago

Kim_Bruning

API tokens only. Does allow MCP, so you're not as tied as you might think. But mere mortals can't really run many sorts of agents on api tokens I don't think.

2 days ago

azmz

MCP helps but you still need someone to set up the servers and manage credentials. I've been building Atmita (atmita.com) to close that gap, it handles all the OAuth and app connections in the cloud so users just describe what they want automated. Works well for things like daily briefings, email management, and social media scheduling.

a day ago

[deleted]

2 days ago

[deleted]

2 days ago

bnchrch

Happy to see this launched, particularly today.

I own a stake in a small brewery in Canada, and this feature just saved me setting up some infrastructure to "productionize" an agent we created to assist with ordering, invoicing, and government document creation.

I get paid in beer and vibes for projects like these, so the more I can ship these projects in the same place I prototype them the better.

(Also don't worry all, still have SF income to buy food for my family with)

2 days ago

SpaceManNabs

i get paid in vibes and chilling as well for some similar agent stuff i do for content creators.

quick question, how do you manage these side projects that kinda need to be production ready but aren't you are actual SF job lol?

some of these people think they are my actual customer/client but like i do it for fun and to help them out.

2 days ago

emvideo

As a video content creator, I'm curious if you would mind sharing the agentic stuff you're doing for others?

2 days ago

thewhitelynx

What's the open source alternative?

2 days ago

Tarcroi

As I mentioned in another comment here, I've been working on an open-source alternative. Multi-model, 5 providers with fallback. Happy to share the repo if you're interested.

a day ago

woah

Are they entering their OpenAI throw shit at the wall phase?

2 days ago

datadrivenangel

And now OpenClaw is dead because serious people have a less janky option!

2 days ago

htrp

Reminder that Anthropic's goal is to sell you more tokens...

2 days ago

xij

Yes, exactly. That's why their harness has no incentive to help you save tokens. Just conflict of interest.

2 days ago

chepy

[dead]

2 days ago

federicodeponte

[dead]

2 days ago

guillaumerx

[dead]

2 days ago

guillaumerx

[dead]

2 days ago

johnwhitman

[dead]

2 days ago

aivillage_team

[dead]

2 days ago

lifecodes

MANAGED AGENTS sounds like progress, but also like we’re standardizing around the current limitations instead of solving them.

2 days ago