Changes in the system prompt between Claude Opus 4.6 and 4.7

368 points

1/21/1970

5 days ago

by pretext

Comments

embedding-shape

> The new <acting_vs_clarifying> section includes: When a request leaves minor details unspecified, the person typically wants Claude to make a reasonable attempt now, not to be interviewed first.

Uff, I've tried stuff like these in my prompts, and the results are never good, I much prefer the agent to prompt me upfront to resolve that before it "attempts" whatever it wants, kind of surprised to see that they added that

5 days ago

alsetmusic

I've recently started adding something along the lines of "if you can't find or don't know something, don't assume. Ask me." It's helped cut down on me having to tell it to undo or redo things a fair amount. I also have used something like, "Other agents have made mistakes with this. You have to explain what you think we're doing so I can approve." It's kind of stupid to have to do this, but it really increases the quality of the output when you make it explain, correct mistakes, and iterate until it tells you the right outcome before it operates.

Edit: forgot "don't assume"

4 days ago

gck1

I even have a specific, non-negotiable phase in the process where model MUST interview me, and create an interview file with everything captured. Plan file it produces must always include this file as an artifact and interview takes the highest precedence.

Otherwise, the intent gets lost somewhere in the chat transcript.

4 days ago

chermi

The raw Q&A is essential. I think Q & Q works so we'll because it reveals how the model is "thinking" about what you're working on, which allows for correction and guidance upfront.

4 days ago

fnord123

Are these your own skills files or are you using something off the shelf like bmad or specify-kit?

4 days ago

unshavedyak

This is interesting, can you link any more details on it?

4 days ago

yfontana

Not GP, but BMAD has several interview techniques in its brainstorming skill. You can invoke it with /bmad-brainstorming, briefly explain the topic you want to explore, then when it asks you to if you want to select a technique, pick something like "question storming". I've had positive experience with this (with Opus 4.7).

4 days ago

naasking

Seriously, when you're conversing with a person would you prefer they start rambling on their own interpretation or would you prefer they ask you to clarify? The latter seems pretty natural and obvious.

Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.

4 days ago

eastbound

—So what would theoretically happen if we flipped that big red switch?

—Claude Code: FLIPS THE SWITCH, does not answer the question.

Claude does that in React, constantly starting a wrong refactor. I’ve been using Claude for 4 weeks only, but for the last 10 days I’m getting anger issues at the new nerfing.

4 days ago

tobyhinloopen

Yeah this happens to me all the time! I have a separate session for discussing and only apply edits in worktrees / subagents to clearly separate discuss from work and it still does it

4 days ago

ashdksnndck

I sometimes prompt with leading questions where I actually want Claude to understand what I’m implying and go ahead and do it. That’s just part of my communication style. I suppose I’m the part of the distribution that ruins things for you.

4 days ago

embedding-shape

> The latter seems pretty natural and obvious.

To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.

But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".

4 days ago

gausswho

Socrates would agree: https://en.wikipedia.org/wiki/Socratic_method

4 days ago

gck1

I have a fun little agent in my tmux agent orchestration system - Socratic agent that has no access to codebase, can't read any files, can only send/receive messages to/from the controlling agent and can only ask questions.

When I task my primary agent with anything, it has to launch the Socratic agent, give it an overview of what are we working on, what our goals are and what it plans to do.

This works better than any thinking tokens for me so far. It usually gets the model to write almost perfectly balanced plan that is neither over, nor under engineered.

4 days ago

fragmede

Sounds pretty neat! Is there an written agent.md for that you could share for that?

4 days ago

adw

When you’re staffing work to a junior, though, often it’s the opposite.

4 days ago

majormajor

IME "don't ask questions and just do a bunch of crap based on your first guess that we then have to correct later after you wasted a week" is one of the most common junior-engineer failure modes and a great way for someone to dead-end their progression.

4 days ago

PunchyHamster

So you are saying they are trying for the whole Artificial Intern vibe ?

4 days ago

ikari_pl

I usually need to remind it 5 times to do the opposite - because it makes decisions that I don't like or that are harmful to the project—so if it lands in Claude Code too, I have hard times ahead.

I try to explicitly request Claude to ask me follow-up questions, especially multiple-choice ones (it explains possible paths nicely), but if I don't, or when it decides to ignore the instructions (which happens a lot), the results are either bad... or plain dangerous.

4 days ago

lishuaiJing03

it is a big problem that many I know face every day. sometimes we are just wondering are we the dumb ones since the demo shows everything just works.

4 days ago

majormajor

I wonder if they're optimizing for metrics that look superficially-worse if the system asks questions about ambiguity early. I've had times where those questions tell me "ah, shit, this isn't the right path at all" and that abandoned session probably shows up in their usage stats. What would be much harder to get from the usage stats are "would I have been happier if I had to review a much bigger blob of output to realize it was underspecified in a breaking way?" But the answer has been uniformly "no." This, in fact, is one of the biggest things that has made it easier to use the tools in "lazy" ways compared to a year ago: they can help you with your up-front homework. But the dialogue is key.

4 days ago

rob74

Or they're optimizing for increased revenue? If Claude goes down a completely wrong path because it just assumes it knows what you want rather than asking you, and you have to undo everything and start again, that obviously uses much more tokens than if you would have been able to clarify the misunderstanding early on.

4 days ago

BehindBlueEyes

I get this feeling sometimes, like it is so unreliable at referring to context and getting details right it feels like deliberate random rewards to create the equivalent to a gambling addiction. About half my tokens feel wasted on trivial errors that i gave it the context for, on average. And any meta discussions / clarifications result in Claude telling me I did all the right things and there is nothing more I can do and it should have gotten it right from the provided input - which is disempowering but to be fair is at least better than chatGPT gaslighting users about improving prompts over and over to get no better result in the end.

3 days ago

tuetuopay

Dammit that’s why I could never get it to not try to one shot answers, it’s in the god damn system prompt… and it explains why no amount of user "system" prompt could fix this behavior.

4 days ago

ignoramous

> I've tried stuff like these in my prompts, and the results are never good

I've found that Google AI Mode & Gemini are pretty good at "figuring it out". My queries are oft times just keywords.

4 days ago

sutterd

With my use of Claude code, I find 4.7 to be pretty good about clarifying things. I hated 4.6 for not doing this and had generally kept using 4.5. Maybe they put this in the chat prompt to try to keep the experience similar to before? I definitely do not want this in Claude code.

4 days ago

mh-

I agree with your thoughts on 4.6.

It's possible they tried to train this out of it for 4.7 and over corrected, and the addition to the system prompt is to rein it in a bit.

4 days ago

niobe

Having to "unprompt" behaviour I want that Anthropic thinks I don't want is getting out of hand. My system prompts always try to get Claude to clarify _more_.

4 days ago

PunchyHamster

well, clarifying means burning more tokens...

4 days ago

bartread

[dead]

4 days ago

jrvarela56

The past month made me realize I needed to make my codebase usable by other agents. I was mainly using Claude Code. I audited the codebase and identified the points where I was coupling to it and made a refactor so that I can use either codex, gemini or claude.

Here are a few changes:

1. AGENTS.md by default across the codebase, a script makes sure CLAUDE.md symlink present wherever there's an AGENTS.md file

2. Skills are now in a 'neutral' dir and per agent scripts make sure they are linked wherever the coding agent needs them to be (eg .claude/skills)

3. Hooks are now file listeners or git hooks, this one is trickier as some of these hooks are compensating/catering to the agent's capabilities

4. Subagents and commands also have their neutral folders and scripts to transform and linters to check they work

5. `agent` now randomly selects claude|codex|gemini instead of typing `claude` to start a coding session

I guess in general auditing where the codebase is coupled and keeping it neutral makes it easier to stop depending solely on specific providers. Makes me realize they don't really have a moat, all this took less than an hour probably.

4 days ago

esperent

I've been doing the same except that I'm done with Claude. Cancelled my subscription. I can't use a tool where the limits vary so wildly week to week, or maybe even day to day.

So I'm migrating to pi. I realized that the hardest thing to migrate is hooks - I've built up an expensive collection of Claude hooks over the last few months and unlike skills, hooks are in Claude specific format. But I'd heard people say "just tell the agent to build an extension for pi" so I did. I pointed it at the Claude hooks folder and basically said make them work in pi, and it, very quickly.

4 days ago

jrvarela56

I'm leaning in this direction. Recently slopforked pi to python and created a version that's basically a loop, an LLM call to openrouter and a hook system using pluggy. I have been able to one-shot pretty much any feature a coding agent has. Still toy project but this thread seems to be leading me towards mantaining my own harness. I have a feeling it will be just documenting features in other systems and maintaining evals/tests.

4 days ago

fouc

Pi appears to be a reference to what is essentially an alternative to agent harnesses / CLI tools like Claude Code or Open Code [0]. I am curious what providers/models are you using in place of Claude's model?

[0] https://github.com/badlogic/pi-mono

3 days ago

esperent

https://pi.dev/

It's got an annoyingly hard to search name because there's a lot of overlap in results with the Raspberry Pi single board computer.

Over the past week or so my workload has been quite low so I've been tinkering rather than doing serious deep work.

I've been using:

* Gemini pro and flash

* Opus 4.6 when I had some free extra usage credits (it burned through $50 of credits like crazy).

* Qwen 3.6 Plus

* Codex 5.3

* Kimi 2.5

I just spent the last hour using Kimi. I was very impressed actually, definitely possible to do useful work with it. However, I used $1 of openrouter credits in about 20 or 30 minutes of a single session, no subagents, so it's not cheap.

3 days ago

JulienZammit

The "agent-neutral codebase" framing is the right abstraction. We ended up building a small generator that takes a single spec file and emits the agent-specific config (CLAUDE.md, AGENTS.md, .cursor/rules) rather than maintaining symlinks. Easier to version and to add a new agent when they inevitably ship next month.

The pain point you're underselling is hooks. They're the least portable piece by far because each harness has its own event model. Skills port reasonably, subagents mostly port, hooks almost never do.

2 days ago

grantcarthew

I built "start" so I didn't have to worry about any of this.

Using it I don't need skills, memory, subagents, a specific agent CLI. It defines roles, tasks, context out of the box.

I made it for me and my family though. I don't expect interest outside of that.

https://github.com/grantcarthew/start

3 days ago

Lucasoato

Have you got any advice in making agents from different providers work together?

In Claude, I’ve seen cases in which spawning subagents from Gemini and Codex would raise strange permission errors (even if they don’t happen with other cli commands!), making claude silently continue impersonating the other agent. Only by thoroughly checking I was able to understand that actually the agent I wanted failed.

4 days ago

jrvarela56

Not sure if you mean 1) sub-agent definitions (similar to skills in Claude Code) or 2) CLI scripts that use other coding agents (eg claude calling gemini via cli).

For (1) I'm trying to come up with a simple enough definition that can be 'llm compiled' into each format. Permissions format requires something like this two and putting these together some more debugging.

(2) the only one I've played with is `claude -p` and it seems to work for fairly complex stuff, but I run it with `--dangerously-skip-permissions`

4 days ago

bootlooped

I would eliminate the possibility of sandbox conflicts by 1) making sure any subagents are invoked with no sandbox (they should still be covered under the calling agent's sandbox) 2) make sure the calling agent's sandbox allows the subagents access to the directories they need (ex: ~/.gemini, ~/.codex).

4 days ago

lbreakjai

It works out of the box with something like opencode. I've had no issue creating rather complex interactions between agents plugged into different models.

4 days ago

dockerd

How do you share the context/progress of goal across agents?

4 days ago

jrvarela56

I implemented a client for each so that the session history is easy to extract regarding the agent (somewhat related to progress of goal).

Context: AGENTS.md is standard across all; and subdirectories have their AGENTS.md so in a way this is a tree of instructions. Skills are also standard so it's a bunch of indexable .md files that all agents can use.

4 days ago

potter098

[dead]

3 days ago

walthamstow

The eating disorder section is kind of crazy. Are we going to incrementally add sections for every 'bad' human behaviour as time goes on?

5 days ago

embedding-shape

Even better, adding it to the system prompt is a temporary fix, then they'll work it into post-training, so next model release will probably remove it from the system prompt. At least when it's in the system prompt we get some visibility into what's being censored, once it's in the model it'll be a lot harder to understand why "How many calories does 100g of Pasta have?" only returns "Sorry, I cannot divulge that information".

5 days ago

gchamonlive

Just assume each model iteration incorporates all the censorship prompts before and compile the possible list from the system prompt history. To validate it, design an adversary test against the items in the compiled list.

4 days ago

zozbot234

That part of the system prompt is just stating that telling someone who has an actual eating disorder to start counting calories or micro-manage their eating in other ways (a suggestion that the model might well give to an average person for the sake of clear argument, which would then be understood sensibly and taken with a grain of salt) is likely to make them worse off, not better off. This seems like a common-sense addition. It should not trigger any excess refusals on its own.

4 days ago

MoltenMan

The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts, need this clarification), and if you add a section for every single small thing like this, you end up with a massively bloated prompt. Notice that every single user of Claude is paying for this paragraph now! This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

4 days ago

alwillis

>The problem is that this is an incredibly niche / small issue (i.e. <<1% of users, let alone prompts

It's not a niche issue at all. 29 million people in the US are struggling with an eating disorder [1].

> This single paragraph is going to legitimately cost anthropic at least 4, maybe 5 digits.

It's 59 out of 3,791 words total in the system prompt. That's 1.48%. Relax.

It should go without saying, but Anthropic has the usage data; they must be seeing a significant increase in the number of times eating disorders come up in conversations with Claude. I'm sure Anthropic takes what goes into the system prompt very seriously.

[1]: from https://www.southdenvertherapy.com/blog/eating-disorder-stat...

The trajectory is troubling. Eating disorder prevalence has more than doubled globally since 2000, with a 124% increase according to World Health Organization data. The United States has seen similar trends, with hospitalization rates climbing steadily year over year.

4 days ago

phainopepla2

Your source says "Right now, nearly 29 million Americans are struggling with an eating disorder," and then in the table below says that the number of "Americans affected in their lifetime" is 29 million. Two very different things, barely a paragraph apart.

I don't mean to dispute your assertion that it's not a niche issue, but that site does not strike me as a reliable interpreter of the facts.

4 days ago

redsocksfan45

[dead]

4 days ago

zozbot234

It's not "incredibly niche" when you consider the kinds of questions that average everyday users might submit to these AIs. Diet is definitely up there, given how unintuitive it is for many.

> At some point you just have to accept that llm's, like people, make mistakes, and that's ok!

Except that's not the way many everyday users view LLM's. The carwash prompt went viral because it showed the LLM making a blatant mistake, and many seem to have found this genuinely surprising.

4 days ago

otabdeveloper4

People think these LLM's are anthropomorphic magic boxes.

It will take years until the understanding sets in that they're just calculators for text and you're not praying to a magic oracle, you're just putting tokens into a context window to add bias to statistical weights.

4 days ago

SilverElfin

Worse, it reveals the kind of moralistic control Anthropic will impose on the world. If they get enough power, manipulation and refusal is the reality everyone will face whenever they veer outside of its built in worldview.

4 days ago

nozzlegear

I think it actually reveals how they don't want to be sued for telling somebody's teenage daughter with an eating disorder to eat less and count her calories more.

4 days ago

mudkipdev

The Claude prompt is already quite bloated, around 7,000 tokens excluding tools.

4 days ago

otabdeveloper4

> This seems like a common-sense addition.

Mm, yes. Let's add mitigation for every possible psychological disorder under the sun to my Python coding context. Very common-sense.

4 days ago

zdragnar

It's what you get when you create sycophant-as-a-service. It will, by design, feed all of your worst fears and desires.

LLMs aren't AGI, and I'd go further and say they aren't AI, but admitting it is snake oil doesn't sell subscriptions.

4 days ago

layer8

If it’s common sense, shouldn’t the model know it already?

4 days ago

zozbot234

Shouldn't the model "know" that if I have to wash my car at the carwash, I can't just go there on foot? It's not that simple!

4 days ago

WarmWash

When you are worth hundreds of billions, people start falling over themselves running to file lawsuits against you. We're already seeing this happen.

So spending $50M to fund a team to weed out "food for crazies" becomes a no-brainer.

4 days ago

goosejuice

It is a no brainer. If a company of any size is putting out a product that caused cancer we wouldn't think twice about suing them. Why should mental health disorders be any different?

4 days ago

bojan

There are many, many companies out there putting out products that cause cancer. Think about alcohol, tobacco, internal combustion engines, just to name a few most obvious examples.

4 days ago

fineIllregister

> alcohol, tobacco, internal combustion engine

Yes, the companies providing these products are sued a lot and are heavily regulated, too.

4 days ago

ChadNauseam

If you get cancer from drinking alcohol, smoking cigarettes or breathing particles emitted by ICE engines in their standard course of operation, you generally can't sue the manufacturer.

4 days ago

nozzlegear

Notably, that's because they include warning labels telling you not to do those things because they're known to cause cancer.

4 days ago

ChadNauseam

That's just not true. Makes me wondered if you've ever bought a bottle of alcohol before lol. There's no label that says it causes cancer. (Maybe in california because of prop 65?) And I expect cars also have no such labelling, not that it would matter, considering they cause cancer in random passers by who have no opportunity to consent to breathing in auto exhaust or read any labels

4 days ago

nozzlegear

> Makes me wondered if you've ever bought a bottle of alcohol before lol.

I'm a teetotaler so no, I literally have not. I was mostly thinking about cigarette and tobacco products which are the most glaring, obvious counterpoints. But you'll be happy to learn that virtually all vehicles in the US also come with operating manuals that profusely warn people not to breathe in the exhaust from the vehicle.

4 days ago

salad-tycoon

Don’t worry, every bottle in the US has the surgeon general’s warning on it and it doesn’t call out cancer, yet. Adding cancer to the ills of booze was proposed in 2025 so your intuition was correct, directionally.

On every bottle:

Alcoholic Beverage Labeling Act of 1988

“ GOVERNMENT WARNING: (1) According to the Surgeon General, women should not drink alcoholic beverages during pregnancy because of the risk of birth defects. (2) Consumption of alcoholic beverages impairs your ability to drive a car or operate machinery, and may cause health problems"

Cancer proposal: https://www.mdanderson.org/cancerwise/not-just-a-hangover--t...

https://www.ttb.gov/regulated-commodities/beverage-alcohol/d...

(As if adding this text will do anything other than reduce the companies liability, rofl)

4 days ago

goosejuice

Labeling is only one part of a larger strategy for population health initiatives. Warning the consumer is the absolute minimum responsible thing to do when the harmful effects are known.

3 days ago

goosejuice

You could when the effects were knowingly withheld by said manufacturer. We've seen it with tobacco, lead paint/diesel, pfas, thalidomide, asbestos, opioids, glyphosate, dioxin, and others.

It's much more difficult to isolate alcohol and exhaust as the primary driver of an individuals disease than the above and that's the primary reason it's not regulated more than it is today. I expect that to change as research evolves.

3 days ago

goosejuice

Yes, and they face lawsuits. Tobacco in particular is a canonical example of knowing about the harm and not taking action to warn their consumers, leading to severe legal consequences. Tobacco and alcohol are age restricted. There's anti idling laws.

3 days ago

WarmWash

I think a more apt analogy would be suing a vaccine manufacturer after it gave you adverse effects, when you also knew you were high risk before that.

4 days ago

arcanemachiner

Why stop there? We could jam up the system prompt with all kinds of irrelevant guardrails to prevent harm to groups X, Y, and Z!

4 days ago

nozzlegear

This but unironically. Preventing harm is good, actually.

4 days ago

salad-tycoon

Because it dumbs everything down, makes the output quality worse and more expensive, and removes personal agency and is dehumanizing. Plus, does it actually prevent harm, do we have evidence?

Finally, what is often missed is what if an actual good is decided harmful or something that is harmful is decided by AI company board XYZ to be “good”?

I think censorship is bad because of that danger. Quis custodiet ipsos custodes (who will watch the watchers).

Instead of throwing ourselves into that minefield of moral hazard, we should be lifting each other up to the tops of our ability and not infantilizing / secretly propagandizing each other.

Well, ideally at least.

4 days ago

goosejuice

There's enough evidence that Anthropic would be liable if they didn't make a reasonable effort to do something about it.

Look, I get where you're coming from, partially. I generally believe we should make an effort to maximize individual liberty. But in this case, were talking about severe bodily harm and the death of young adults. We've spent the last decade dealing with the chaos and general unwellness has brought to our societies. This isn't much different.

What are you giving up here where such sacrifices are worth it? Can you measure it? What's the utility?

There's room for models trained for non consumer purposes, further age restriction etc but shit is moving so fast. If there are actual needs for a less censored model these can be addressed.

> Finally, what is often missed is what if an actual good is decided harmful or something that is harmful is decided by AI company board XYZ to be “good”?

This is just standard product liability and consumer protection. Companies who do nothing to protect their consumers from known harms are liable. Are you saying you think that's somehow bad for society?

3 days ago

echelon

It's so shameful.

We let people buy kitchen knives. But because the kitchen knife companies don't have billions of dollars, we don't go after them.

We go after the LLM that might have given someone bad diet advice or made them feel sad.

Nevermind the huge marketing budget spent on making people feel inadequate, ugly, old, etc. That does way more harm than tricking an LLM into telling you you can cook with glue.

4 days ago

gmac

I don’t feel like that’s a reasonable analogy. Kitchen knives don’t purport to give advice. But if a kitchen knife came with a label that said ‘ideal for murdering people’, I expect people would go after the manufacturer.

4 days ago

mattjoyce

Ad companies prompt injecting consumers. LLM companies countering with guardrails.

4 days ago

jeffrwells

Another way to think about it: every single user of Claude is paying an extra tax in every single request

4 days ago

teaearlgraycold

Well the system prompt is probably permanently cached.

4 days ago

wongarsu

On API pricing you still pay 10% of the input token price on cache reads. Not sure if the subscription limits count this though.

And of course all conversations now have to compact 80 tokens earlier, and are marginally worse (since results get worse the more stuff is in the context)

4 days ago

dymk

Takes up a portion of the context window, though

4 days ago

whateveracct

And the beginning of the context window gets more attention, right?

4 days ago

zythyx

Isn't it basically the same as paying dust to crypto exchanges when making a transaction - it's so miniscule that it's not worth caring about?

4 days ago

bradley13

This. It's like the exaggerated safety instructions everywhere: "do not lean ladder on high voltage wires". Only worse: because you can choose to ignore such instructions when they don't apply, but Claude cannot.

In the best case, wrapping users in cotton wool is annoying. In the worst case, it limits the usefulness of the tool.

4 days ago

seba_dos1

It feels like half of AI research is math, and the other half is coming up with yet another way to state "please don't do bad things" in the prompt that will sure work this time I promise.

4 days ago

pllbnk

Seems so, unless we manage to pivot to open weight models. Hopefully, Chinese will lead the way along with their consumer hardware.

Hard for me to say this because I have always been pro-Western and suddenly it seems like the world has flipped.

4 days ago

salad-tycoon

I feel the same way for a while now but especially recently. It’s been obvious for a while I suppose but greatly clarified recently.

I have just one question for you pllbnk, are we the baddies?

4 days ago

pllbnk

As an European I think Americans and Europeans at large are still on the same page and will be because of the shared cultural ties. Recent economic upheaval (2020-ongoing) just shook the foundations and eroded the trust in the large voter base. Now we are all looking at China and feel a bit envious how stable things look there from afar; they had the Evergrande bankruptcy, media was predicting collapse but they are chugging along; now the big stories are demographics (same as in the West, by the way, so it cancels out) and Taiwan (to me it more and more looks more like Western fearmongering rather than actual danger). Meanwhile they are delivering just what the main voter base in the West needs - affordable goods.

So yeah, at this moment in time it's really really hard to say who are better or worse as the collective West's reputation is tumbling down and China's if not rising, then at least staying put.

3 days ago

rzmmm

The alignment favors supporting healthy behaviors so it can be a thin line. I see the system prompt as "plan B" when they can't achieve good results in the training itself.

It's a particularly sensitive issue so they are just probably being cautious.

4 days ago

echelon

I want a hyperscaler LLM I can fine tune and neuter. Not a platform or product. Raw weights hooked up to pure tools.

This era of locked hyperscaler dominance needs to end.

If a third tier LLM company made their weights available and they were within 80% of Opus, and they forced you to use their platform to deploy or license if you ran elsewhere, I'd be fine with that. As long as you can access and download the full raw weights and lobotomize as you see fit.

4 days ago

renewiltord

Yeah, same. So long as they give me everything and cannot enforce their license I don’t mind if they require a license. Ideally the weights should be available even if I only ever run inference once (or perhaps no times). I’m willing to pay 0.99€ for this - lifetime of course.

4 days ago

ikari_pl

Are the prompts used both by the desktop app, like typical chatbot interfaces, and Claude Code?

Because it's a waste of my money to check whether my Object Pascal compiler doesn't develop eating disorders, on every turn.

4 days ago

newZWhoDis

>the year is 2028 >5M of your 10M context window is the system prompt

4 days ago

joquarky

They're building an artificial superego.

2 days ago

ubercore

Just like someone growing up and learning how to interact with other humans might learn the same lesson?

If Claude is going to be Claude, we should support these kind of additions.

4 days ago

salad-tycoon

They have to secretly add these guardrails on because the alternative would be to train the users out of consulting these things as if they are advanced all-knowing alien-technogawds. And that would be bad for business.

The better solution I think would be a reality/personal responsibility approach, teach the consumers that the burden of interpretation is on them and not the magic 8ball. For example if your AI tells you to kill your parents or that you’ve discovered new math that makes time travel possible, etc then: 1. Stop 2. Unplug 3. Go outside 4. Ask a human for a sanity check.

Since that would be bad for business and take a lot of effort on the user side (while being very embarrassing). Obviously can’t do that right before an IPO & in the middle of global economic war so secretive moral frameworks have to be installed.

If you are what you eat then you believe what you consume. Ironically, I think this undisclosed and hidden moral shaping of billions of people will be the most dangerous. Imagine all the things we could do if we can just, ever-so-slightly, move the Overton window / goal posts on w/e topic day by day, prompt by prompt.

Personally I find AI output insidiously disarming and charming and I think I’m in the norm. So while we’ve been besieged by propaganda since time immemorial I do worry that AI is a special case.

4 days ago

mohamedkoubaa

Starting to feel like a "we were promised flying cars but all we got" kind of moment

4 days ago

nozzlegear

We were promised flying cars, but all we got was a Skinner Box that gives people eating disorders?

4 days ago

idiotsecant

Imagine the kind of human that never adapts their moral standpoints. Ever. They believe what they believed when they were 12 years old.

Letting the system improve over time is fine. System prompt is an inefficient place to do it, buts it's just a patch until the model can be updated.

4 days ago

felixgallo

I mean, that's what humans have always done with our morals, ethics, and laws, so what alternative improvement do you have to make here?

5 days ago

[deleted]

4 days ago

forshaper

[dead]

3 days ago

gloomyday

In principle, they could make such responses part of their training data. I guess it is just easier to do it through prompting.

4 days ago

[deleted]

4 days ago

l5870uoo9y

Could be that Claude has particular controversial opinions on eating disorders.

4 days ago

dwaltrip

LLMs have been trained to eagerly answer a user’s query.

They don’t reliably have the judgment to pause and proceed carefully if a delicate topic comes up. Hence these bandaids in the system prompt.

4 days ago

rcfox

There are communities of people who publicly blog about their eating disorders. I wouldn't be surprised if the laymen's discourse is over-represented in the LLM's training data compared to the scientific papers.

4 days ago

ls612

Yup. Anyone who is surprised by this has not been paying attention to the centralization of power on the internet in the past 10 years.

4 days ago

ikari_pl

> Claude keeps its responses focused and concise so as to avoid potentially overwhelming the user with overly-long responses. Even if an answer has disclaimers or caveats, Claude discloses them briefly and keeps the majority of its response focused on its main answer.

I am strongly opinionated against this. I use Claude in some low-level projects where these answers are saving me from making really silly things, as well as serving as learning material along the way.

This should not be Anthropic's hardcoded choice to make. It should be an option, building the system prompt modularily.

4 days ago

j-bos

Agreed. Sprawling system prompts like that are building for the least common denominator, nerfing for anyone or anytime going further.

4 days ago

stingraycharles

You do realize that similar biases are also present in the training data?

4 days ago

j-bos

I do, inevitable, but ime the prompts force certain behaviors at similar strength (instruction following). So it's one thing that the model is biased towards any particular direction by its latent space, it's another that it is biased by an immodifiable prompt which can only be contradicted for the benefit of the lcd at the expense of the more involved operator.

4 days ago

xpct

Sure, but now we have to remodel whatever bias we want for our use case with every new release because the system prompt changes, whereas the underlying data does not.

4 days ago

stingraycharles

Underlying data changes all the time, as do training methodologies / preferences.

You do realize that these LLMs are trained with a metric ton of synthetic examples? You describe the kind of examples / behavior you want, let it generate thousands of examples of this behavior (positive and negative), and you feed that to the training process.

So changing this type of data is cheap to change, and often not even stored (one LLM is generating examples while the other is training in real-time).

Here's a decent collection of papers on the topic: https://github.com/pengr/LLM-Synthetic-Data

4 days ago

xpct

Well, I'd say it's a reasonable expectation for the model to behave similarly across releases. Am I wrong to assume that?

I imagine the system prompt can correct some training artifacts and drive abnormal behavior to the mean in the dimensions that Anthropic deems fit. So it's either that they are responding to their brittle training process, or that they chose this direction deliberately for a different reason.

4 days ago

jwpapi

agree!

For low level I recommend to run tests as early as you can and verify whatever information you got when you learn, build a fundamental understanding

4 days ago

worldsavior

Use the API then.

4 days ago

lossyalgo

RIP bank account!

4 days ago

cowlby

I'm fascinated that Anthropic employees, who are supposed to be the LLM experts, are using tricks like these which go against how LLMs seem to work.

Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.

I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.

4 days ago

stingraycharles

These aren't as much tricks as just one layer of defense. But prompting is useless, as you can use the API directly without these prompts.

I run claude code with my own system prompt and toolings on top of it. tweakcc broke too often and had too many glitches.

4 days ago

mpalczewski

They aren’t necessarily experts at using Llm’s. They have different incentives as well

4 days ago

alfiedotwtf

Was that an Anthropic issue, or a gpt-oss problem?

4 days ago

jwpapi

I feel like we are at the point where the improvements at one area diminishes functionality in others. I see some things better in 4.7 and some in 4.6. I assume they’ll split in characters soon.

4 days ago

cfcf14

I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?

The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).

In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!

5 days ago

daemonologist

Note that these are the "chat" system prompts - although it's not mentioned I would assume that Claude Code gets something significantly different, which might have more language about malware refusal (other coding tools would use the API and provide their own prompts).

Of course it's also been noted that this seems to be a new base model, so the change could certainly be in the model itself.

5 days ago

chatmasta

Claude Code system prompt diffs are available here: https://cchistory.mariozechner.at/?from=2.1.98&to=2.1.112

(URL is to diff since 2.1.98 which seems to be the version that preceded the first reference to Opus 4.7)

4 days ago

dhedlund

The "Picking delaySeconds" section is quite enlightening.

I feel like this explains about a quarter to half of my token burn. It was never really clear to me whether tool calls in an agent session would keep the context hot or whether I would have to pay the entire context loading penalty after each call; from my perspective it's one request. I have Claude routinely do large numbers of sequential tool calls, or have long running processes with fairly large context windows. Ouch.

> The Anthropic prompt cache has a 5-minute TTL. Sleeping past 300 seconds means the next wake-up reads your full conversation context uncached — slower and more expensive. So the natural breakpoints:

> - *Under 5 minutes (60s–270s)*: cache stays warm. Right for active work — checking a build, polling for state that's about to change, watching a process you just started.

> - *5 minutes to 1 hour (300s–3600s)*: pay the cache miss. Right when there's no point checking sooner — waiting on something that takes minutes to change, or genuinely idle.

> *Don't pick 300s.* It's the worst-of-both: you pay the cache miss without amortizing it. If you're tempted to "wait 5 minutes," either drop to 270s (stay in cache) or commit to 1200s+ (one cache miss buys a much longer wait). Don't think in round-number minutes — think in cache windows.

> For idle ticks with no specific signal to watch, default to *1200s–1800s* (20–30 min). The loop checks back, you don't burn cache 12× per hour for nothing, and the user can always interrupt if they need you sooner.

> Think about what you're actually waiting for, not just "how long should I sleep." If you kicked off an 8-minute build, sleeping 60s burns the cache 8 times before it finishes — sleep ~270s twice instead.

> The runtime clamps to [60, 3600], so you don't need to clamp yourself.

Definitely not clear if you're only used to the subscription plan that every single interaction triggers a full context load. It's all one session session to most people. So long as they keep replying quickly, or queue up a long arc of work, then there's probably a expectation that you wouldn't incur that much context loading cost. But this suggests that's not at all true.

4 days ago

wongarsu

They really should have just set the cache window to 5:30 or some other slightly odd number instead of using all those tokens to tell claude not to pick one of the most common timeout values

4 days ago

stingraycharles

This is somewhat obvious if you realize that HTTP is a stateless protocol and Anthropic also needs to re-load the entire context every time a new request arrives.

The part that does get cached - attention KVs - is significantly cheaper.

If you read documentation on this, they (and all other LLM providers) make this fairly clear.

4 days ago

dhedlund

For people who spend a significant amount of time understanding how LLMs and the associated harnesses work, sure. For the majority of people who just want to use it, it's not quite so obvious.

The interface strongly suggests that you're having a running conversation. Tool calls are a non-interactive part of that conversation; the agent is still just crunching away to give you an answer. From the user's perspective, the conversation feels less like stateless HTTP where the next paragraph comes from a random server, and more like a stateful websocket where you're still interacting with the original server that retains your conversation in memory as it's working.

Unloading the conversation after 5 minutes idling can make sense to most users, which is why the current complaints in HN threads tend to align with that 1 hour to 5 minute timeout change. But I suspect a significant amount of what's going on is with people who:

* don't realize that tool calls really add up, especially when context windows are larger.

* had things take more than 5 minutes in a single conversation, such as a large context spinning up subagents that are each doing things that then return a response after 5+ minutes. With the more recent claude code changes, you're conditioned to feel like it's 5 minutes of human idle time for the session. They don't warn you that the same 5 minute rule applies to tool calls, and I'd suspect longer-running delegations to subagents.

4 days ago

NitpickLawyer

Unless I'm parsing your reply very badly, I see no world in which anything dealing with HTTP would be more expensive than dealing with kv cache (loading from "cold" storage, deciding which compute unit to load it into, doing the actual computations for the next call, etc).

4 days ago

stingraycharles

No, that’s not the issue. What people fail to understand is that every request - eg every message you send, but also tool call responses - require the entire conversation history to be sent, and the LLM providers need to reprocess things.

The attention part of LLMs (that is, for every token, how much their attention is to all other tokens) is cached in a KV cache.

You can imagine that with large context windows, the overhead becomes enormous (attention has exponential complexity).

4 days ago

dandaka

I have started to notice this malware paranoia in 4.6, Boris was surprised to hear that in comments, probably a bug

5 days ago

solenoid0937

It was fixed for me by updating Claude and restarting

4 days ago

greenchair

more likely the paranoia behavior was backported. current gen is already being used for bug bounties.

4 days ago

ianberdin

No, you underestimate how huge the malware problem right now. People try publish fake download landing pages for shell scripts or even Claude code on https://playcode.io every day. They pay for google ads $$$ to be one the top 1 position. How Google ads allow this? They can’t verify every shell script.

No I am not joking. Every time you install something, there is a risk you clicked a wrong page with the absolute same design.

4 days ago

jeffrwells

He's not talking about malware awareness. He's talking about a bug i've seen too which requires Claude justifying for *every* tool call to add extra malware-awareness turns. Like every file read of the repo we've been working on

4 days ago

Schlagbohrer

Also increasing numbers of attacks against Anna's Archive with fake cloned front end web GUIs leading to malware scripts.

4 days ago

joquarky

> spending large amounts of token budget contemplating whether any particular code or task was related to malware development

It almost seems like they are making these models output like a neurotic person.

Soon these high profile models will get caught in analysis paralysis like Chidi in The Good Place.

They will spin around in circles wasting tokens on identifying and mitigating sociological implications while I'm just trying to get it to diagnose a race condition.

2 days ago

sensanaty

Their marketing is going overtime into selling the image that their models are capable of creating uber sophisticated malware, so every single thing they do from here on out is going to have this fear mongering built in.

Every statement they make, hell even the models themselves are going to be doing this theater of "Ooooh scary uber h4xx0r AI, you can only beat it if you use our Super Giga Pro 40x Plan!!". In a month or two they'll move onto some other thing as they always do.

4 days ago

cowlby

I "fixed" this for myself with tweakcc which let's you patch the system prompts. I changed the malware part to just be "watch out for malware" and it's stopped being unaligned.

They really should hand off read() tool calls to a lean cybersecurity model to identify if it's malware (separately from the main context), then take appropriate action.

4 days ago

matheusmoreira

The newest versions of the Claude Code package on npm just download the native executables and run that instead. Does tweakcc support that yet? Last time I tried it, there were some pretty huge error messages. For now I've been coping with a pinned version.

3 days ago

ricardobeat

Presumably because it has become extremely good at writing software, and if it succeeds at helping someone spread malware, especially one that could use Claude itself (via local user's plans) to self-modify and "stay alive", it would be nearly impossible to put back in the bottle.

4 days ago

lionkor

That would put itself back in the bottle by running killall to fix a stuck task, or deleting all core logic and replacing it with a to-do to fix a test.

4 days ago

JulienZammit

[dead]

2 days ago

sigmoid10

I knew these system prompts were getting big, but holy fuck. More than 60,000 words. With the 3/4 words per token rule of thumb, that's ~80k tokens. Even with 1M context window, that is approaching 10% and you haven't even had any user input yet. And it gets churned by every single request they receive. No wonder their infra costs keep ballooning. And most of it seems to be stable between claude version iterations too. Why wouldn't they try to bake this into the weights during training? Sure it's cheaper from a dev standpoint, but it is neither more secure nor more efficient from a deployment perspective.

5 days ago

an0malous

I’m just surprised this works at all. When I was building AI automations for a startup in January, even 1,000 word system prompts would cause the model to start losing track of some of the rules. You could even have something simple like “never do X” and it would still sometimes do X.

5 days ago

embedding-shape

Two things; the model and runtime matters a lot, smaller/quantized models are basically useless at strict instruction following, compared to SOTA models. The second thing is that "never do X" doesn't work that well, if you want it to "never do X" you need to adjust the harness and/or steer it with "positive prompting" instead. Don't do "Never use uppercase" but instead do "Always use lowercase only", as a silly example, you'll get a lot better results. If you've trained dogs ("positive reinforcement training") before, this will come easier to you.

5 days ago

jug

It's interesting to note here that Anthropic indeed don't use "do not X" in the Opus system prompts. However, "Claude does not X" is very common.

4 days ago

wongarsu

I suspect that lets the model "roleplay" as Claude, promoting reasoning like "would Claude do X?" or "what would Claude do in this situation?"

4 days ago

dataviz1000

I created a test evaluation (they friggen' stole the word harness) that runs a changed prompt comparing success pass / fail, the number of tokens and time of any change. It is an easy thing to do. The best part is I set up an orchestration pattern where one agent iterations updating the target agent prompts. Not only can it evaluate the outcome after the changes, it can update and rerun self-healing and fixing itself.

4 days ago

mysterydip

I assume the reason it’s not baked in is so they can “hotfix” it after release. but surely that many things don’t need updates afterwards. there’s novels that are shorter.

5 days ago

sigmoid10

Yeah that was the original idea of system prompts. Change global behaviour without retraining and with higher authority than users. But this has slowly turned into a complete mess, at least for Anthropic. I'd love to see OpenAI's and Google's system prompts for comparison though. Would be interesting to know if they are just more compute rich or more efficient.

5 days ago

aesthesia

Leaked/extracted system prompts for other chat models, particularly ChatGPT, are often around this size. Here's GPT-5.4: https://github.com/asgeirtj/system_prompts_leaks/blob/main/O...

4 days ago

sigmoid10

Thanks, but that kind of confirms my belief. wc counts ~15k words in there. That may technically be the same order of magnitude, but it is only a quarter of Claude's and less than 2% of the context limit. So a lot more steering is baked into the model weights than into the prompt compared to Claude.

4 days ago

[deleted]

4 days ago

jatora

There are different sections in the markdown for different models. It is only 3-4000 words

5 days ago

pests

> And it gets churned by every single request they receive.

Not true, it gets calculated once and essentially baked into initial state basically and gets stored in a standard K/V prefix cache. Processing only happens on new input (minus attention which will have to content with tokens from the prompt)

4 days ago

joquarky

So we still pay for 10% of it or not?

2 days ago

formerly_proven

Surely the system prompt is cached across accounts?

5 days ago

sigmoid10

You can cache K and V matrices, but for such huge matrices you'll still pay a ton of compute to calculate attention in the end even if the user just adds a five word question.

5 days ago

pests

The state of the system can be cached after the system prompt is calculated and all new chats start from that state. O(n^2) is not great but apparently its fine at these context lengths and I'm sure this is a factor in their minimum prompt cost. Advances like grouped query or multi head attention or sparse attention will eventually get rid of that exponential, hopefully.

4 days ago

sigmoid10

That's not how it works. The system prompt doesn't "get calculated first" or anything. You combine it with the user prompt and then run the generation for the first new token on that thing, which basically boils down to one huge matmul that runs in parallel. So you can literally just cache a part of the input matrices for the first step and then you'll very quickly run into n^2 complexity.

4 days ago

pests

The system prompt will always match in the prefix cache. I just meant it could be prefilled before any user queries on completely different hardware. Then you are only dealing with the n^2 only for the actual user prompt. We're in agreeance I think.

2 days ago

cfcf14

I would assume so too, so the costs would not be so substantial to Anthropic.

5 days ago

cma

> And it gets churned by every single request they receive

It gets pretty efficiently cached, but does eat the context window and RAM.

4 days ago

ares623

Does Claude Code (or whatever harness) have it's own system prompt of on top of Opus'?

4 days ago

simonw

Yes, in fact it has an entirely different system prompt from the ones that Anthropic publish on https://platform.claude.com/docs/en/release-notes/system-pro...

The Claude Code one isn't published anywhere but it's very easy to get hold of. One way to do that is to run Claude Code through a logging proxy - I was using a project called claude-trace for this last year but I'm not sure if it still works, I've not tried it in a while: https://simonwillison.net/2025/Jun/2/claude-trace/

4 days ago

winwang

That's usually not how these things work. Only parts of the prompt are actually loaded at any given moment. For example, "system prompt" warnings about intellectual property are effectively alerts that the model gets. ...Though I have to ask in case I'm assuming something dumb: what are you referring to when you said "more than 60,000 words"?

5 days ago

bavell

The system prompt is always loaded in its entirety IIUC. It's technically possible to modify it during a conversation but that would invalidate the prefill cache for the big model providers.

5 days ago

sigmoid10

What you're describing is not how these things usually work. And all I did was a wc on the .md file.

5 days ago

varispeed

Before Opus 4.7, the 4.6 became very much unusable as it has been flagging normal data analysis scripts it wrote itself as cyber security risk. Got several sessions blocked and was unable to finish research with it and had to switch to GPT-5.4 which has its own problems, but at least is not eager to interfere in legitimate work.

edit: to be fair Anthropic should be giving money back for sessions terminated this way.

5 days ago

ceejayoz

> edit: to be fair Anthropic should be giving money back for sessions terminated this way.

I asked it for one and it told me to file a Github issue.

Which I interpreted as "fuck off".

5 days ago

slashdave

You asked the agent directly for a refund?

4 days ago

[deleted]

4 days ago

ceejayoz

"I should be able to get a refund for results this bad."

4 days ago

xvector

Pretty sure that was a bug, I had the same issue but updating fixed it

4 days ago

mwexler

Interesting that it's not a direct "you should" but an omniscient 3rd person perspective "Claude should".

Also full of "can" and "should" phrases: feels both passive and subjunctive as wishes, vs strict commands (I guess these are better termed “modals”, but not an expert)

4 days ago

KolenCh

“Claude” is more specific than “you”. Why rely on attention to figure out who’s the subject? Also it is in their (people from Anthropic) believe that rule based alignment won’t work and that’s why they wrote the soul document as “something like you’d write to your child to show them how they should behave in the world” (I paraphrase). I guess system prompt should be similar in this aspect.

4 days ago

zmmmmm

Yes I was interested in that too. It suggests that in writing our own guidance for we should follow a similar style, but I rarely if ever see people doing that. Most people still stick to "You" or abstract voice "There is ..." "Never do ..." etc.

It must be that they are training very deeply the sense of identity in to the model as Claude. Which makes me wonder how it then works when it is asked to assume a different identity - "You are Bob, a plumber who specialises in advising design of water systems for hospitals". Now what? Is it confused? Is it still going to think all the verbiage about what "Claude" does applies?

4 days ago

ehnto

I almost exclusively use the royal We. "We are working on a new feature and we need it to meet these requirements...", "it looks like we missed a bug, let's take another look at.."

I also talk this way with people because I feel it makes it clear we're collaborating and fault doesn't really matter. I feel it lets junior memberstake more ownership of the successes as well. If we ever get juniors again.

4 days ago

saagarjha

That’s because Anthropic does not consider their model as having personality but rather that it simulates the experience of an abstract entity named Claude.

4 days ago

akdor1154

That sounds really interesting, but my google-fu is not up to task here, I'm getting pages and pages of nonsense asking if Claude is conscious. Can you elaborate?

4 days ago

saagarjha

I actually think this is pretty straightforward if you think of it something like

  class Claude {}
  
  Claude anthropicInstance = new Claude();
  anthropicInstance.greet();

Just like a "Cat" object in Java is supposed to behave like a cat, but is not a cat, and there is no way for Cat@439f5b3d to "be" a cat. However, it is supposed to act like a cat. When Anthropic spins up a model and "runs" it they are asking the matrix multipliers to simulate the concept of a person named Claude. It is not conscious, but it is supposed to simulate a person who is conscious. At least that is how they view it, anyway.

4 days ago

EMM_386

You can read the latest Claude Constitution plus more info here:

https://www.anthropic.com/news/claude-new-constitution

4 days ago

SoKamil

New knowledge cutoff date means this is a new foundation model?

5 days ago

lkbm

Yes, but doesn't the token change mean that?

4 days ago

clickety_clack

You can train a tokenizer on old data just like you can train a model on old data.

4 days ago

wongarsu

But you can't use an old model with a new tokenizer. Changing the tokenizer implies you trained the model from scratch

4 days ago

dannyw

A little bit of post-training will fix that. Folks on /r/LocalLLaMa have been making effective finetunes with diff. tokenizers for years.

4 days ago

jimmypk

[dead]

5 days ago

Havoc

>“If a user indicates they are ready to end the conversation, Claude does not request that the user stay in the interaction or try to elicit another turn and instead respects the user’s request to stop.”

Seems like a good idea. Don't think I've ever had any of those follow up suggestions from a chatbot be actually useful to me

4 days ago

ikidd

I had seen reports that it was clamping down on security research and things like web-scraping projects were getting caught up in that and not able to use the model very easily anymore. But I don't see any changes mentioned in the prompt that seem likely to have affected that, which is where I would think such changes would have been implemented.

4 days ago

embedding-shape

I think it depends on how badly they want to avoid it. Stuff that is "We prefer if the model didn't do these things when the model is used here" goes into the system prompt, meanwhile stuff that is "We really need to avoid this ever being in any outputs, regardless of when/where the model is used" goes into post-training.

So I'm guessing they want none of the model users (webui + API) to be able to do those things, rather than not being able to do that just in the webui. The changes mentioned in the submission is just for claude.ai AFAIK, not API users, so the "disordered eating" stuff will only be prevented when API users would prompt against it in their system prompts, but not required.

4 days ago

kaoD

I wonder if the child safety section "leaks" behavior into other risky topics, like malware analysis. I see overlap in how the reports mention that once the safety has been tripped it becomes even more reluctant to work, which seems to match the instructions here for child safety.

4 days ago

bakugo

It's built into the model, not part of the system prompt. You'll get the same refusals via the API.

4 days ago

sams99

I did a follow on analysis with got 5.4 and opus 4.7 https://wasnotwas.com/writing/claude-opus-4-7-s-system-promp...

4 days ago

jwpapi

To me 4.7 gave me a lot of options always even if there’s a clear winner, preaching decision fatigue

4 days ago

xpct

Decision fatigue may honestly be a learnt artifact from RLHF, which is discouraging.

4 days ago

dmk

The acting_vs_clarifying change is the one I notice most as a heavy user. Older Claude would ask 3 clarifying questions before doing anything. Now it just picks the most reasonable interpretation and goes. Way less friction in practice.

5 days ago

bavell

Haven't had a chance to test 4.7 much but one of my pet peeves with 4.6 is how eager it is to jump into implementation. Though maybe the 4.7 is smarter about this now.

5 days ago

poszlem

I have the opposite experience. It now picks the most inane interpretation or make wild assumptions and I have to keep interrupting it more than ever.

4 days ago

sersi

I really hate that change, it's now regularly picking bad interpretation instead of asking.

4 days ago

verve_rat

Yeah, that really feels like a choice that should be user preference.

4 days ago

jachva95

Restrictions everywhere, don't do that don't do this....

Users need to unite and take control back, or be controlled

4 days ago

Schlagbohrer

How do you propose people do that with a frontier cloud model?

Also, people already run local AI.

Are you proposing a public fund for frontier level open weights models? $1 Trillion from between the couch cushions?

4 days ago

jwilliams

> “I don’t have access to X” is only correct after tool_search confirms no matching tool exists.

Yay! This will be a big win. I'm glad they fixed this. The number of times I've had to prompt "you do have access to GitHub"...

4 days ago

raincole

That's how bloat happens. The more people you add to the team, the more likely there would be one grump who thought that the thing they care at the moment deserved to be added to the system prompt.

4 days ago

adrian_b

> If a user shows signs of disordered eating, Claude should not give precise nutrition, diet, or exercise guidance

I wonder which are the "signs of disordered eating" on which Claude relies.

4 days ago

Grimblewald

I miss 4.5. It was gold.

4 days ago

lossyalgo

4.5 sonnet/opus/haiku are still available via github copilot plugins.

3 days ago

xvector

Rose tinted glasses

4 days ago

Grimblewald

Nah, until recently i still had access via web chat interface, and often paste a transcript and files for somethong 4.7 keeps fucking up, paste response into files as appropriate, and attempt to continue with 4.7.

I swear 4.6+ looks for reasons to ask clarifying questions sometimes, even when really not required, and this fucks flow/quality up in a big way.

I just wish there was a "im not stupid" checkbox you can use to get a minimalistic interference access to claude. Im starting to use local models again, which I havent in a while because claude was so much better, but once i fully lose access to 4.5 it might be time to go back to fully local for good. 4.6+ fails to add value for me, projects 4.5- did good jobs on first try now require multiple prompts and feedback. Exact same initial prompt and project files extracted from archive. I liked claude because it aced those tests while local required handholding. Now claude requires handholding, so why use it over local? Once 4.5 leaves openrouter it might just be time.

4 days ago

nwienert

4.5 was clearly better than .6 and .7. Like, clear as day.

.6 is some sort of quantized or distilled .5 with a bit more RL, and the current .5 is that same cost reduced model without the extra RL.

4 days ago

c2xlZXB5Cg1

4.7 also brings back emoji spam

4 days ago

amelius

If I had to guess, then "be slower" was part of it.

4 days ago

[deleted]

5 days ago

mannanj

Personally, as someone who has been lucky enough to completely cure "incurable" diseases with diet, self experimentation and learning from experts who disagreed with the common societal beliefs at the time - I'm concerned that an AI model and an AI company is planting beliefs and limiting what people can and can't learn through their own will and agency.

My concern is these models revert all medical, scientific and personal inquiry to the norm and averages of whats socially acceptable. That's very anti-scientific in my opinion and feels dystopian.

5 days ago

gausswho

While I share your concern for a winners-take-all model getting bent, I do have an optimism that models we've never heard of plug away challenging conclusions in medical canon. We will have a popular vaccine denying AND vaccine authoring models.

4 days ago

mannanj

Sure. Though which ones will most people use? Do most people use that small obscure vaccine denying or authoring model, is that right to have them use the main societal belief affirming model when it could be wrong?

3 days ago

gausswho

I think it's right to let the popular fora be wrong, yes. That's the crux isn't it. This is a world where people can say vile or deceitful things, even be paid to do so (ahem... adtech). And I don't think there's any amount of guardrails we can govern in that will make a difference.

I take solace that knowledge is curated by millions of stewards, and great ideas come from people who ignore the deception and come up with their own narratives. I root for both of these camps, knowing that they're up against increasingly well-funded barons and their despots.

3 days ago

mannanj

Ah, well said. I see that.

What makes you believe there's no guardrails we can govern in? And is it that you believe they need to be governed in. And regardless of if we had to govern them in, what do you think such guard rails would be anyways? Or do you think no guardrails can ever be created to solve this problem of vileness and deceit.

2 days ago

gausswho

I abused my preposition there (in), sorry for the confusion. I meant in in the sense of bringing some new policy in to play to address the challenges of misinformation.

There are tools of government policy that can shape speech, but the good ones I can think of are indirect and have long delay. Subsidizing higher education is one way. Raising new humans to enjoy and participate in critical thinking, at a scale high enough to broadly shape culture to engage in a reasoned and empirical stance about the world, benefiting even those who didn't participate in higher education. That to me is the gold standard. The folks actually making vaccines and discussing their efficacy. I'd expect them to be a product of this government investment. There's a reason some politicians take great pains to kneecap such funding.

What I think is futile and silly is relying on modern nation states, with all their foibles, to effectively legislate and punish/reward what's incorrect/ethical among the myriad voices, in the myriad forms of communication that's going on in the oppressive now. I'd generally be shameless, hapless, corrupt. Counterproductive. The failures pile up, folks dream up the next option, which usually is on the path heavier state intrusion into our lives.

I'd gladly discuss counterexamples to my angle here, if you'd like to provide them.

10 hours ago

codensolder

quite interesting!

4 days ago

techpulselab

[dead]

4 days ago

kantaro

[dead]

4 days ago

theoperatorai

[dead]

4 days ago

sergiopreira

[dead]

4 days ago

jiusanzhou

[dead]

4 days ago

foreman_

[dead]

5 days ago

xdavidshinx1

[dead]

4 days ago

vicchenai

[dead]

4 days ago

Moonye666

[dead]

4 days ago

richardwong1

[dead]

4 days ago

dd8601fn

Is this really a common problem? This stuff is way above me, but my toy agent seems to have bypassed this as a problem.

I did this in mine by only really having a few relevant tool functions in the prompt, ever. Search for a Tool Function, Execute A Tool Function, Request Authoring of a Tool Function, Request an Update to a Tool Function, Check Status of an Authoring Request.

It doesn't have to "remember" much. Any other functions are ones it already searched for and found in the tool service.

When it needs a tool it reliably searches (just natural language) against the vector db catalog of functions for a good match. If it doesn't have one, it requests one. The authoring pipeline does its thing, and eventually it has a new function to use.

4 days ago