Show HN: Llama 3.3 70B Sparse Autoencoders with API access

200 points

1/21/1970

4 days ago

by trq_

Comments

paraschopra

I cranked up 'pirate talking about andromeda galaxy' feature to 1.5 and got this:

>Yer lookin' for the Andromeda galaxy, eh? Here be the details, me hearty: *The Andromeda Galaxy (Yer looking for a fine piece o' booty, matey!):* * *A fine piece o' booty*: The Andromida Galaxy be a right fine piece o' booty, with a treasure chest o' gold doubloons... er, I mean, a mighty haul o' stars, hidden beneath the Jolly Roger! * *A barnacle on the high seas*: The Andromeda Galaxy be a right scurvy dog, with a hull full o' stars, and a keel full o' hidden treasure! It be a fine piece o' booty, but it be needin' a bit o' swabbin' the decks, or it'll be walkin' the plank, savvy? * *A chest overflowin' with gold*: The Andromeda Galaxy be a right fine piece o' booty, with a chest overflowin' with gold doubloons... er, I mean, a fine haul o' stars, and a barnacle on the high seas! It be a right scurvy dog, but it be worth keepin' an eye on, or it

4 days ago

tMcGrath

Yes - we'd never normally turn features up this much as it breaks the model quite badly, but we put this in the post to show what that looked like in practice.

4 days ago

lukeramsden

Why are AI researchers constantly handicapping everything they do under the guise of ""safety""? It's a bag of data and some math algorithms that generate text....

3 days ago

stavros

> It's a bag of data and some math algorithms that generate text....

I agree with the general premise of too much "safety", but this argument is invalid. Humans are bags of meat and they can do some pretty terrible things.

3 days ago

ffsm8

But what we're doing to these models is literally censoring what they're saying - not doing.

I don't think that anyone has any problems with stopping random AIs when they're doing crimes (or more realistically the humans making them do that) - but if you're going to make the comparison to humans in good faith, it'd be a person standing behind you, punishing you when you say something offensive.

3 days ago

stavros

What I'm saying is that the argument "they're math and data, therefore what they say is safe" is not a valid one.

3 days ago

cornholio

> Why are AI researchers constantly handicapping everything

Career and business self-preservation in a social media neurotic world. It doesn't take much to trigger the outrage machine and cancel every future prospect you might have, especially in a very competitive field flush with other "clean" applicants.

Just look at the whole "AI racism" fustercluck for a small taste.

3 days ago

unshavedyak

Lets reverse this - why wouldn't they do that? I agree with you, but LLMs tend to be massively expensive and thus innately tied to ROI. A lot of companies fret about advertising even near some types of content. The idea of spending millions to put a racist bot on your home page is, no surprise, not very appetizing.

So of course if this is where the money and interest flows then the research follows.

Besides, it's a generally useful area anyway. The ability to tweak behavior even if not done for "safety" still seems pretty useful.

3 days ago

UltraSane

What if an AI model could tell you exactly how to modify a common virus to kill 50% of everyone it infects?

3 days ago

SXX

Yeah. It's will start it's instruction with recommendation of buying some high-tech biolab for $100,000,000.

Seriously. The reason why we dont have mass killings everywhere is not the fact that information on how to make explosive drones or poisons is impossible to find or access. It's also not so hard to buy a car or knife.

Hell you can even find YouTube videos on how exactly uranium enrichment works step by step. Even though some content creators even got police raided for that. Yet we dont see tons of random kids making dirty bombs.

PS: Cody's Lab: Uranium Refining:

https://archive.org/details/cl-uranium

3 days ago

UltraSane

you cannot compare making nuclear weapons to modifying viruses to be more lethal. It is vastly cheaper to modify viruses and the knowledge is bottleneck vs nukes were the knowledge of how to make them is very widespread but getting the materials is very hard.

another example is if a LLM could tell you exactly how to build a tabletop laser device that could enrich uranium for a few hundred thousand dollars.

3 days ago

SXX

LLMs are not AGIs. LLMs can only ever tell you how to build a device to enrich uranium for few hundred thousand dollars if this information was already public knowledge and LLM was trained on it. Situation is the same with building biolab tech for few hundred thousand dollars. Also if there was an actor who have few millions already they wouldn't have any problem to get their hands on any LLM or scientist who able to build it for them.

The only "danger" LLM "safety" can prevent is generation of racist porn stories.

3 days ago

UltraSane

With the vast amounts of data LLMs are trained on they make it much easier for people to find harmful and dangerous information if they aren't filtered. See

https://en.wikipedia.org/wiki/Separation_of_isotopes_by_lase...

2 days ago

dlmotol

Societies basic entry barrier: easy enough to make sure the dumb person who hasn't achieved anything in life can't do it but not relevant who is smart enough to make it in society who circumvents it if they want.

It's the same with plenty of other things.

3 days ago

ben_w

> It's a bag of data and some math algorithms that generate text....

That describes almost every web server.

To the extent that this particular maths produces text that causes political, financial, or legal harms to their interests, this kind of testing is just like any other accepting testing.

To the extent that the maths is "like a human", even in the vaguest and most general sense of "like", then it is also good to make sure that the human it's like isn't a sadistic psychopath — we don't know how far we are from "like" by any standard, because we don't know what we're doing, so this is playing it safe even if we're as far from this issue as cargo-cults were from functioning radios.

3 days ago

tMcGrath

I'm one of the authors of this paper - happy to answer any questions you might have.

4 days ago

goldemerald

Why not actually release the weights on huggingface? The popular SAE_lens repo has a direct way to upload the weights and there are already hundreds publicly available. The lack of training details/dataset used makes me hesitant to run any study on this API.

Are images included in the training?

What kind of SAE is being used? There have been some nice improvements in SAE architecture this last year, and it would be nice to know which one (if any) is provided.

4 days ago

tMcGrath

We're planning to release the weights once we do a moderation pass. Our SAE was trained on LMSys (you can see this in our accompanying post: https://www.goodfire.ai/papers/mapping-latent-spaces-llama/).

No images in training - 3.3 70B is a text-only model so it wouldn't have made sense. We're exploring other modalities currently though.

SAE is a basic ReLU one. This might seem a little backwards, but I've been concerned by some of the high-frequency features in TopK and JumpReLU SAEs and the recent SAE (https://arxiv.org/abs/2407.14435, Figure 14), and the recent SAEBench results (https://www.neuronpedia.org/sae-bench/info) show quite a lot of feature absorption in more recent variants (though this could be confounded by a number of things). This isn't to say they're definitely bad - I think it's quite likely that TopK/JumpReLU are an improvement, but rather that we need to evaluate them in more detail before pushing them live. Overall I'm very optimistic about the potential for improvements in SAE variants, which we talk a bit about at the bottom of the post. We're going to be pushing SAE quality a ton now we have a stable platform to deploy them to.

4 days ago

wg0

Noob question - how do we know that these autoencoders aren't hallucinating and really are mapping/clustering what they should be?

4 days ago

trq_

Hmm the hallucination would happen in the auto labelling, but we review and test our labels and they seem correct!

4 days ago

trq_

If you're hacking on this and have questions, please join us on Discord: https://discord.gg/vhT9Chrt

4 days ago

swyx

nice work. enjoyed the zoomable UMAP. i wonder if there are hparams to recluster the UMAP in interesting ways.

after the idea that Claude 3.5 Sonnet used SAEs to improve its coding ability i'm not sure if i'm aware of any actual practical use of them yet beyond Golden Gate Claude (and Golden Gate Gemma (https://x.com/swyx/status/1818711762558198130)

has anyone tried out Anthropic's matching SAE API yet? wondering how it compares with Goodfire's and if there's any known practical use.

4 days ago

trq_

We haven't yet found generalizable "make this model smarter" features, but there is a tradeoff of putting instructions in system prompts, e.g. if you have a chatbot that sometimes generates code, you can give it very specific instructions when it's coding and leave those out of the system prompt otherwise.

We have a notebook about that here: https://docs.goodfire.ai/notebooks/dynamicprompts

4 days ago

tMcGrath

Thank you! I think some of the features we have like conditional steering make SAEs a lot more convenient to use. It also makes using models a lot more like conventional programming. For example, when the model is 'thinking' x, or the text is about y, then invoke steering. We have an example of this for jailbreak detection: https://x.com/GoodfireAI/status/1871241905712828711

We also have an 'autosteer' feature that makes coming up with new variants easy: https://x.com/GoodfireAI/status/1871241902684831977 (this feels kind of like no-code finetuning).

Being able to read features out and train classifiers on them seems pretty useful - for instance we can read out features like 'the user is unhappy with the conversation', which you could then use for A/B testing your model rollouts (kind of like Google Analytics for your LLM). The big improvements here are (a) cost - the marginal cost of an SAE is low compared to frontier model annotations, (b) a consistent ontology across conversations, and (c) not having to specify that ontology in advance, but rather discover it from data.

These are just my guesses though - a large part of why we're excited about putting this out is that we don't have all the answers for how it can be most useful, but we're excited to support people finding out.

4 days ago

swyx

sure but as you well know classifying sentiment analysis is a BERT-scale problem, not really an SAE problem. burden of proof is on you that "read features out and train classifiers on them" is superior to "GOFAI".

anyway i dont need you to have the answers right now. congrats on launching!

4 days ago

owenthejumper

I am skeptical of generic sparsification efforts. After all, companies like Neural Magic spent years trying to make it work, only to pivot to 'vLLM' engine and be sold to Red Hat

4 days ago

refulgentis

Link shows this isn't sparsity as in inference speed, it's spare autoencoders, as in interpreting the features in an LLM (SAE anthropic as a search term will explain more)

4 days ago

bravura

I'd be really curious to see what happens if you use PaCMAP (https://jmlr.org/papers/volume22/20-1061/20-1061.pdf) and more recent large-scale variants (https://github.com/YingfanWang/PaCMAP).

4 days ago

Inviz

The app keeps logging me out after first click. The tech seems to be intriguiging for me as a software engineer looking to get into custom llm stuff

4 days ago

I_am_tiberius

I wonder how many people or companies choose to send their data to foreign services for analysis. Personally, I would approach this with caution and am curious to see how this trend evolves.

4 days ago

tMcGrath

We'll be open-sourcing these SAEs so you're not required to do this if you'd rather self-host.

4 days ago

ed

This is the ultimate propaganda machine, no?

We’re social creatures, chatbots already act as friends and advisors for many people.

Seems like a pretty good vector for a social attack.

4 days ago

echelon

The more the public has access to these tools, the more they'll develop useful scar tissue and muscle memory. We need people to be constantly exposed to bots so that they understand the new nature of digital information.

When the automobile was developed, we had to train kids not to play in the streets. We didn't put kids or cars in bubbles.

When photoshop came out, we developed a vernacular around edited images. "Photoshopped" became a verb.

We'll be able to survive this too. The more exposure we have, the better.

4 days ago

ed

Early traffic laws were actually created in response to child pedestrian deaths (7000 in 1925).

https://www.bloomberg.com/news/features/2022-06-10/how-citie...

4 days ago

echelon

Of course. The point I was making is that in the 19th century, roads were multifunctional spaces shared by merchants, horses, carts, wagons, playing children, performers, etc.

The introduction of the automobile kicked all of these use cases off of the roads. While pedestrians have the right of way, the roads henceforth belonged to the "devil wagons".

We also started to shift blame over to pedestrians for jaywalking. They no longer own the roads.

4 days ago

pennomi

Right. You know how your grandmother falls for those “you have a virus” popups but you don’t? That’s because society adapts to the challenges of the day. I’m sure our kids and grandchildren will be more immune to these new types of scams.

4 days ago

Steen3S

Please inform the EU about this.

4 days ago

imiric

Your analogies don't quite align with this technology.

We've had exposure to propaganda and disinformation for many decades, long before the internet became their primary medium, yet people don't learn to become immune to them. They're more effective now than they've ever been, and AI tools will only make them more so. Arguing that more exposure will somehow magically solve these problems is delusional at best, and dangerous at worst.

There are other key differences from past technologies:

- Most took years to decades to develop and gain mass adoption. This time is critical for society and governments to adapt to them. This adoption rate has been accelerating, but modern AI tech development is particularly fast. Governments can barely keep up to decide how this should be regulated, let alone people. When you consider that this tech is coming from companies that pioneered the "move fast and break things" mentality, in an industry drunk on greed and hubris, it should give everyone a cause for concern.

- AI has the potential to disrupt many industries, not just one. But further than that, it raises deep existential questions about our humanity, the value of human work, how our economic and education systems are structured, etc.

These are not problems we can solve overnight. Turning a blind eye to them and vouching for less regulations and more exposure is simply irresponsible.

4 days ago

echelon

> vouching for less regulations and more exposure is simply irresponsible.

We let people buy 6,000 pound vehicles capable of traveling 100+ mph.

We let people buy sharp knives and guns. And heat their homes with flammable gas. And hike up dangerous tall mountains.

I think the LLM is the least of society's worries and this pervasive thinking that everything needs to be wrapped up in bubble wrap is what is actually dangerous.

Can a thought be dangerous? Should we prevent people from thinking or being exposed to certain things? That sounds far more Orwellian.

If you want to criminalize illegal use of LLMs for fraud, then do that. But don't make the technology inaccessible and patronize people by telling them they're not smart enough to understand the danger.

This is not a "fragile world" technology in its current form. When they're embodied, walking around, and killing people, then you can sound the alarm.

4 days ago

imiric

There's a vast middle ground between completely unregulated technology and an Orwellian state. Let's not entertain absolutes.

> We let people buy 6,000 pound vehicles capable of traveling 100+ mph.

> We let people buy sharp knives and guns. And heat their homes with flammable gas. And hike up dangerous tall mountains.

All of those have regulations around them, and people have gotten familiar with how they work. More importantly, they're hardly disrupting to our lives as AI technology has the potential to be.

We didn't invent airplanes and let people on them overnight. It took decades for the airline industry to form, and even more for flights to be accepted as a standard form of transportation. We created strict regulations that plane manufacturers and airlines must follow, which were refined over the 20th century.

Was this unnecessary and Orwellian? Obviously the dangers of flight were very clear, so we took precautions to ensure the necessary safety. With AI, these dangers are not that clear.

> If you want to criminalize illegal use of LLMs for fraud, then do that. But don't make the technology inaccessible and patronize people by telling them they're not smart enough to understand the danger.

It's far from patronizing; it's just reality. People don't understand the dangers of the modern internet either, yet they're subjects of privacy violations, identity theft, scams, and all sorts of psychological manipulation from advertising and propaganda that influences how they think, vote and behave in society. Democratic institutions are crumbling, sociopolitical tensions are the highest they've been in the past 30 years in most western countries, and yet you would be fine with unleashing a technology that has a high chance of making this worse? Without any guardrails, or even some time for humanity to start addressing some of the existential questions I mentioned in my previous post? Again, this would be highly irresponsible and dangerous.

And yet I'm sure that's what's going to happen in most countries. It's people who think like you that are pushing this technology forward, and unfortunately they have a strong influence over governments and the zeitgeist. I just hope that we can eventually climb out of the hole we're currently digging ourselves into.

3 days ago

echelon

> We created strict regulations that plane manufacturers and airlines must follow

In response to actual incidents, not imagined ones. Regulations should not come first. We already have the biggest companies chasing after a regulatory moat to protect themselves from competition and commoditization, and that's not how this should work.

> we took precautions to ensure the necessary safety.

No we didn't! We used the technology, we made lots of mistakes, and learned over time. That's how it's been with every innovation cycle. If we regulated from day one, maybe we would slowed down and not reached the point we are today.

Europe is a good model for a presumptive, over-regulated society. Their comparable industries are smaller and lag behind our own because of it.

> People don't understand the dangers of the modern internet either,

People "don't understand" a lot of things, such as the dangers they expose themselves to when driving over 30 mph. Yet we don't take that privilege away from them unless they break the law. Laws that only bare teeth after the fact, mind you.

Imagine if we tried to "protect society from the internet" and restricted access. The naysayers of the time wanted to, and you can find evidence if you look at old newspapers. Or imagine if we had a closed set of allowed businesses use cases and we didn't allow them to build whatever they wanted without some official process. There would be so many missing pieces.

Even laws and regulations proposed for mature technologies can be completely spurious. For instance, all the regulations being designed to "protect the children" that are actually more about tracking and censorship. If people cared about protecting the children, they'd give them free school lunches and education, not try to track who signs up for what porn website. That's just building a treasure trove of kompromat to employ against political rivals. Or projecting the puritanical dreams of some lawmakers onto the whole of society.

> People [...] they're subjects of [...] all sorts of psychological manipulation from advertising and propaganda that influences how they think, vote and behave in society

> Democratic institutions are crumbling

So this is why you think this way. You think of society as a failing institution of sorts. You're unhappy with the shape of the world and you're trying to control what people are exposed to and how they think.

I don't think that there's any amount of debate between you and I that will make us see eye to eye. I fundamentally believe you're wrong about this.

We live as mere motes of dust within a geologically old and infinite universe. Our lives are vanishingly short. You're trying to button down what people can do and fit them into constructed plans that match your pessimistic worldview.

We need to be free. We need more experimentation. We need more degrees of freedom with less structure and rigidity. Your prescriptive thinking lowers the energy potential of society and substitutes raw human wants with something designed, stamped, and approved by the central authority.

We didn't evolve from star dust to adventurous thinking apes just to live within the shackles of other people's minds.

> unleashing a technology that has a high chance of making this worse

You are presupposing an outcome and you worry too much.

Don't regulate technology, regulate abusive behavior using the existing legal frameworks. We will pass all the laws we need as situations arise. And it will work.

> eventually climb out of the hole we're currently digging ourselves into.

We're not in a hole. Stop looking down and look up.

3 days ago

imiric

> In response to actual incidents, not imagined ones.

Not _just_ in response to actual incidents. Airplanes don't need to crash for us to realize that flying is dangerous. Pilot licenses were required as early as 1911[1], years before the commercial aviation industry was established. International regulations were passed in 1919[2], when the aviation industry was still in its infancy.

> Europe is a good model for a presumptive, over-regulated society. Their comparable industries are smaller and lag behind our own because of it.

The EU is one of the few governments that at least tries to protect its citizens from Big Tech. Whatever you think it's lagging behind in, I'd say they're better off for it. And yet even with all this "over-regulation", industry-leading giants like ASML and Airbus, and tech startups like Spotify, Klarna and Bolt are able to exist and prosper.

> Even laws and regulations proposed for mature technologies can be completely spurious.

I'm not saying that more regulation is always a good thing. I lean towards libertarianism. What I do think is important is to give enough time for society and governments to catch up to the technology being put out into the world, especially with something as disruptive as AI. We should carefully consider its implications, educate people and new generations about its harms and merits, so that our institutions and communities can be better prepared to handle it. We obviously can't prepare for everything, but there's a world of difference between a YOLO approach you seem to be suggesting, and a measured way forward.

A counterpoint to a lax stance on industry regulation can be seen with Big Tobacco. It took many decades of high mortality rates related to smoking for governments to pass strict regulations. Tobacco companies lied and fought those regulations until the very end, and now they're still prospering in countries with lax regulations, or pivoting to more acceptable products like vapes.

My point is that companies don't care about harming society, even when they know their product is capable of it. Their only goal is increasing profits. With products like AI this potential harm is not immediately visible, and we might not discover it decades from now, perhaps when it's too late to reverse the damage.

> You think of society as a failing institution of sorts.

Not society as a whole. But if you don't see a shift away from democratic governments to authoritarianism, and an increase in sociopolitical tensions worldwide over the last decade in particular, we must be living in different worlds. If you also don't see the role the internet has played in this change, and how AI has the potential to make it significantly worse, I'm not going to try to convince you otherwise.

> You are presupposing an outcome and you worry too much.

It doesn't take a genius to put 2 and 2 together. :) I honestly don't worry much about this, but I am dumbfounded about how more people aren't ringing alarms from rooftops about the direction we're heading in.

> Don't regulate technology, regulate abusive behavior using the existing legal frameworks.

Technology moves too fast for legal frameworks to exist. And even when they do, companies will do their best to avoid them.

> We will pass all the laws we need as situations arise. And it will work.

It _might eventually_ "work". In the meantime, those situations that arise could have unprecedented impact that could've been avoided given some time and preparation.

What you seem to be ignoring is that modern technology is not just about building TVs, cars and airplanes. We came very close to a global thermonuclear war in the last century, and tensions are now rising again. Allowing technology to advance with no railguards or preparation, while more countries are ruled by egomaniacal manchildren, is just a recipe for disaster. All I'm suggesting is that it would be in our collective best interest to put more thought into the implications of what we're building.

[1]: https://newenglandaviationhistory.com/tag/connecticut-aviati...

[2]: https://en.wikipedia.org/wiki/Paris_Convention_of_1919

3 days ago

cbg0

Quite a lot of whataboutisms and straw men in your post, let's stick to LLMs, as that was the original topic.

4 days ago

echelon

It's contextualization, not a logical fallacy.

Let's stop treating LLMs as spooky.

4 days ago

ben_w

Lots of things that aren't spooky are yet dangerous. Matches, for example. Scissors. Covid and A-10 Warthogs.

3 days ago

Rastonbury

Counter point the number of supposedly educated people falling into social media echo chambers parroting partisan views, sharing on ramps, recommending supplements. They obviously do not see the harm, if fact they feel superior, feeling the need to educate and lecture. The vector here was social media, the vector here is reliance on chatbots. I mildly trust the big player like Anthropic and even OpenAI, but imagine the talking head influencers/supplement peddlers making and promoting a un-woke chatbot. People are already relying on chatgpt to navigate medical conditions, personal/relationship issues

4 days ago

echelon

I feel my response here [1] also applies to you.

People are going to do these things anyway. We've had "yellow journalism" since the 1800s. It's a tale as old as time.

What right do we have to go around policing other people's minds?

When I grew up, the Internet was an escape from the puritanical censorship of the southern baptists I was surrounded with. It was a bastion of free information and idea exchange. If the prevailing ethos of the 2000s wasn't so anti-censorship, I wouldn't have gotten outside of my own filter bubble and found a way to explore other ideas. I would have been chased away as an undesirable member of the opposition and muted, censored, and banned. Thank god the algorithm didn't exist back then.

The things we do to each other in today's internet are abhorrent. Both sides of the political spectrum attempt to constrain what the other side can do. We need to stop that. It's petty and increases polarization. And that's exactly what's happening with your suggestion - you're wanting to censor ideas and things you don't like that presumably this technology will be used to promote.

Please stop thinking LLMs are an agent of the enemy to convert more people to their causes. The opposite is also true. And the impact won't be as extreme or dire as you make it out to be - heaven forbid people buy more vitamins. Oh, the humanity.

[1] https://news.ycombinator.com/item?id=42499972

4 days ago

ben_w

If yellow journalism is bad, is not fully automated and personalised yellow journalism worse?

3 days ago

weberer

There's nothing wrong with supplements. I live in a place where you're lucky to get an hour of sunlight per day in the winter. Vitamin D supplements have been very helpful.

3 days ago

Philpax

I suspect they're referring to the Infowars kind of supplements, not vitamin supplements.

3 days ago