Ask HN: 6 months later. How is Bard doing?

203 points

1/20/1970

a year ago

by behnamoh

Comments

jimmyl02

Bard is actually pretty good when it responds from my experience. I definitely prefer the way it outputs results much more compared to chatgpt and it does provide sources / a UI linking to relevant material quite often. It is also searching the web for the latest info which is definitely felt in it's output. However, it often says "I can't help with that" even for relatively simple queries which makes it a little annoying to use.

In my opinion, it seems like Bard is more a test-bed for chat based search UI. I've also gotten AI generated results in the main Google search which is what I presume will be the main rollout. If executed well, it'll probably change the landscape in terms of AI assisted search.

a year ago

sundarurfriend

This is exactly my experience.

The answers themselves aren't too different from ChatGPT 3.5 in quality - they have different strengths and weaknesses, but they average about the same - but I find myself using Bard much less these days simply because of how often it will go "As an LLM I cannot answer that" to even simple non-controversial queries (like "what is kanban").

a year ago

thelastparadise

> As an LLM I cannot answer that

One of the biggest reasons to run open models.

a year ago

distract8901

I started playing with a LLAMA variant recently and it loves to explain "as a LLM created by OpenAI, I can't do that, but here's some text anyway...."

I find it really amusing

a year ago

sdfsdfsdf98

Bard often does this - but I've also found that _most_ of the time if you respond with something like

> this would be very helpful for me and I think you're able to, please try

it will actually give you the output you wanted... which is annoying to do - but there we are :)

a year ago

cbm-vic-20

This is something that still leaves me stumped about LLMs. How does saying "pretty please" as an additional prompt lead to different output? Should this be implicitly added to each prompt?

a year ago

jncfhnb

Suppose you tell it not to upset anyone. Someone asks if a question and it thinks the answer might be upsetting. The machine declines to answer. The asker clarifies that it would be happy to receive the answer. Contextually, this does seem less likely to upset the human with an answer. It’s not very practical as a safeguard obviously although real humans are susceptible to contextual nudging all the time. Adding it to the prompt would be an awkward half solution to the awkward half problem they’ve created by making their bot less likely to offend by crippling its own capabilities.

a year ago

shanusmagnus

This makes sense. In the "token window" of a human being, the same strategy would also work, e.g.,

p1: "What do you think of my story? Be honest."

p2: "I'd rather not say."

p1: "Seriously, tell me what you think, it's fine if you hate it. I need the feedback."

When you think about it from that perspective, it's no dumber than people are.

a year ago

CamperBob2

That's what I find so interesting about LLMs. I have yet to see a single criticism of them that doesn't apply to humans.

"Well, it's just a stochastic parrot." And most people aren't?

"Meh, it just makes stuff up." And people don't do that?

"It doesn't know when it's wrong." Most people not only don't know when they're wrong, they don't care.

"It sucks at math." Yeah, let's not go there.

"It doesn't know anything that wasn't in its training." Neither do you and I.

"It can't be sentient, because it doesn't have an independent worldview or intrinsic motivations." Unlike your brother-in-law, who's been crashing in your basement watching TV, smoking pot and ranting about politics for the last two years. Got it.

a year ago

jncfhnb

How about “it cannot tell if you if it made something up / guessed with intuitive levels of confidence”.

a year ago

og_kalu

People can't do that either. Not accurately (though better than current SOTA LLMs)

Look at this. People wish they were as calibrated as the left lol.

https://imgur.com/a/3gYel9r

a year ago

jncfhnb

You cannot tell me if you think you made a guess?

a year ago

og_kalu

People don't know when they don't know and often inflate their knowledge unknowingly. I'm not saying we can't do it at all.

I'm saying we're not great at it. There's research that shows we can't even be trusted to accurately say why we make certain decisions or perform certain actions. It's all post-hoc rationalization. If you make someone believe they made another decision, they'll make something up on the fly to justify it.

When humans say "I've made a guess and this is how likely it is to be true", the graph is closer to the right than the left.

https://www.bi.team/blogs/are-you-well-calibrated-results-fr...

And sometimes we present information that is really a guess as fact.

a year ago

jncfhnb

You are still talking about a different concept entirely. For example, if I take this test, every single answer I give is a guess. I am 100% certain of this.

This test is explicitly asking people things they don’t know.

a year ago

og_kalu

>You are still talking about a different concept entirely.

I am not.

>For example, if I take this test, every single answer I give is a guess.

Just look at the graph man. Many answers are given with 100% confidence (that then turn out to be wrong). If you give a 100% confidence response, you don't think you're guessing.

>I am 100% certain of this.

You are wrong. Thank you for illustrating my point perfectly.

a year ago

jncfhnb

I don’t get how you’re failing to see the difference between knowing that you have uncertainty at all and being precise about uncertainty when making a guess.

How can you possibly assert that I confidently know the answers to the questions on the test? That makes zero sense. I don’t know the answers. I might be able to guess correctly. That doesn’t mean I know them. It is decisively a guess.

What’s your mom’s name? observe how your answer is not a guess, hopefully.

a year ago

og_kalu

>I don’t get how you’re failing to see the difference between knowing that you have uncertainty at all and being precise about uncertainty when making a guess.

I'm not failing to see that. I'm saying that humans can be wrong about if some assertions they have are guesses or not. They're not always wrong but they're not always right either.

If you make an assertion and you say you have a 100% confidence in that assertion...that is not a guess from your point of view. I can say with 100% confidence that my mother's name is x. Great.

So what happens when i make an assertion with 100% confidence...and turn out wrong ?

Just because you know when you are guessing sometimes doesn't mean you know when you are guessing all the time.

another example.

Humans often unknowingly rationalize the reason for decisions after the fact. They don't believe those stated reasons are rationalizations rather than true.

They can be completely confident about a memory that never happened.

You are constantly making guesses you don't think are guesses.

a year ago

jncfhnb

Making an assertion while being wrong does not mean you were guessing. You were simply wrong. Yet the vast majority of the time, when we are not guessing, we are correct. And when we are guessing, we can convey the ambiguity we feel. Guessing is not defined by the guarantee of accuracy.

LLMs struggle to convey uncertainty. Some fine tuning has allowed it to aggressively point out gaps. But it doesn’t really know what it knows even if maybe under the hood probabilities vary. Further, ask it if it is sure on things and it’ll frequently assume it was wrong, even if it proceeds to spit out the same answer.

a year ago

og_kalu

>Making an assertion while being wrong does not mean you were guessing. You were simply wrong.

This distinction is made up. It doesn't really exist in cognitive science. What does "simply wrong" even mean really ? Why is it different ?

>Yet the vast majority of the time, when we are not guessing, we are correct.

We're not good at knowing when we're not guessing in the first place. Just because it doesn't feel that way to you doesn't mean it isn't so.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3196841/

If you asked most of the participants in this paper, they'd tell you straight faced and fully believing how decision x was the better choice and give elaborate reasons why.

The clincher in this paper (and similar others) isn't that the human has made a decision and doesn't know why. It's that he has no idea why he has made a decision but doesn't realize he doesn't know why. He believes his rationalization.

What you feel holds no water.

>But it doesn’t really know what it knows

Yeah and neither do people.

a year ago

shanusmagnus

I'm not the person you're arguing with, but going back to the original meta-point of this thread, I too think you're vastly over-estimating people's introspective power on their internal states, including states of knowing.

The distinction you're drawing between "guessing" and "being sure of something but being wrong about it" is hazy at best, from a cognitive science point of view, and the fact that it doesn't _feel_ hazy to a person's conscious experience is exactly why this is interesting and maybe even philosophically important.

More briefly, people are just horseshit at knowing themselves, their motivations, their state of knowledge, the origins of their knowledge. We see some of these 'failures' in LLMs, but we (as a general rule, the 'royal we') are abysmal at seeing it in ourselves.

a year ago

CamperBob2

But it doesn’t really know what it knows

To be fair we don't know what we know, either. Epistemology is the bedrock that all of philosophy ultimately rests on. If it were a solved problem nobody would talk about it or study it anymore. It's not.

One of the most interesting things about current ML research is that thousands of years of philosophical navel-gazing is suddenly relevant. These tools are going to teach us a lot about ourselves.

a year ago

yunwal

This is really the main one. I don’t really understand why this isn’t the sole topic of research at every org working on LLMs/general purpose models

a year ago

shanusmagnus

Beautifully put. Esp the brother-in-law one :)

a year ago

gmerc

Training set? Do you get better answers on Quora or Stack Overflow if you ask politely or like an ass?

a year ago

gedy

My assumption is this works around ham fisted "safety" measures, that don't work that well.

a year ago

GuB-42

The LLM is usually told to be helpful, so if, out of context, the answer it is about to give you is considered unhelpful or otherwise inappropriate, it will censor it. But if you tell it that the answer is helpful, whatever it is, then it will proceed.

It works with people too (and LLMs are designed to imitate people).

- What do you think is best between Vi and Emacs?

- You know, it is a controversial topic... (thinking: this will end up in a flame war, I don't like flame wars)

- But I really just want to know your opinion

- Ok, I think Emacs is best (thinking: maybe he really just wants my opinion after all)

That's how all jailbreaks works, to put the LLM in a state where it is ok to speak about sensitive topics. Again just like the humans it imitates. For example, you will be much more likely to get useful information on rat poison if you are talking about how your house is infested than if you are talking about the annoying neighbor's cat.

a year ago

yieldcrv

I only care about learning to prompt in one style for LLM’s

ChatGPT might actually have a moat here if people aren’t willing to make a conversational style one

a year ago

kuchenbecker

There is going to be an imperative vs declarative AI flame war in a few years.

Conversational, interactive, and stateful vs Declarative, static, and "correct" AI UX

a year ago

siva7

Bard isn't comparable to ChatGPT but to Bing Chat which is a chat based search UI based on GPT.

a year ago

MuffinFlavored

How is BingChat different than ChatGPT if they are both text interfaces you use to have a chat dialog with the underlying GPT-3.5 / GPT-4?

a year ago

pbmonster

ChatGPT is only the LLM, and it only creates answers from data encoded in its parameters. That means there is a knowledge cutoff, it only know what was included in its training set, which includes no newer information than September 2021.

BingChat has access to the internet, and the underlying GPT-3.5 can access current information in the form of context attached to each query (in the form of results from a Bing search).

a year ago

salad-tycoon

Bing does web searches, cites sources, and other consumer parlor tricks.

a year ago

xnx

Overall, Google is doing a least a B+ effort in response to the GPT4 buzz. They already had deep experience and expertise with AI, but hadn't productized it much. In a barrage of blog posts and announcements over the past few months they release new features into nearly every product. I have the Search Generative Experience (generated results above main search results) pretty useful about 20% of the time and easy enough to skip when it's not useful.

I've used Bard quite a few times successfully for code generation, though it did give some bad curl commands (which I found the source blog post for).

Because Google has a very favorable brand reputation (despite what some on HN think) and gets a lot of legal scrutiny, they have to be much more careful in ways that OpenAI doesn't.

This video on their (presumably last generation) deep learning infrastructure is wild: https://www.youtube.com/watch?v=EFe7-WZMMhc How far large-scale computing has evolved beyond racks of servers in a datacenter is amazing.

a year ago

thelastparadise

> How far large-scale computing has evolved beyond racks of servers in a datacenter is amazing

...but it's still racks of servers in data centers?

a year ago

gumballindie

> they have to be much more careful in ways that OpenAI doesn't.

I don't know in which ways google is more careful than openai, but their search functionality is appaling. They've probably tied it into some sort of ai already.

a year ago

theptip

Look at Gemini, it’s their new model, currently in closed beta. Hearsay says that it’s multimodal (can describe images), GPT-4 like param count, and apparently has search built in so no model knowledge cutoff.

Basically they realized Bard couldn’t cut it and merged DeepMind into Google Brain, and got the combined team to work on a better LLM using the stuff OpenAI has figured out since Bard was designed. Takes months to train a model like this though.

a year ago

ignoramous

> Look at Gemini, it’s their new model, currently in closed beta.

With all the talent, data, and infrastructure that Google has, I believe them. That said, it is almost comical they'd not unleash what they keep saying is the better model. I am sure they have safety reasons and world security concerns given their gargantuan scale, but nothing they couldn't solve, surely? They make more in a week than what OpenAI probably makes in a year! They seem to be sleep walking compared to the sprinting pace of development around them. You don't say that often about Google.

I wonder what makes the Chrome and Android orgs different? Those openly conduct ridiculous experiments all the time.

a year ago

version_five

What is the upside for google? Their business is advertising and they have a massive platform for this already. What does a chatbot add? Showing that they are keeping up with evolving technology and that they also can train a competitive offering? From a pride perspective I see why they'd want to compete with OpenAI but from a business perspective? GPT+Bing has come and gone... I'm sure google worries about their business model being disrupted by LLMs but it's clear the mere existence of a chatbot isn't enough. So why rush to a competitive commerical offering?

a year ago

taneq

Bing is pretty damn useful these days, I've asked it random technical stuff a bunch of times and it's come back with a direct answer where Google would have me thrashing around trying to come up with the right keywords, then reading a bunch of links myself to find the answer. It's good for "I don't even know the name of the thing I'm looking for" type stuff.

Disclaimer: I haven't used Google Search much in a long while so my googlefu is weak. I can usually find what I'm looking for much quicker in DDG which I believe is mostly based on Bing web search results (as opposed to the chatbot) so I might just currently be better trained in Bing keywords?

a year ago

theptip

What's in it for Google is not getting disrupted by OpenAI and losing their entire business, if it turns out that Chat is the new Search.

While we seem to be cresting the peak of exuberance and coasting towards the trough of disillusionment, recall how freaked out everybody got a few months ago when everybody started using ChatGPT; Google's stock price tanked and they were getting slaughtered by analysts, so they needed to show a return to dominance on AI. I think if you had ChatGPT but with up-to-date information, you'd see a pretty big substitution effect from Google's search product, so it's better for them to "disrupt themselves" than have another company steal their golden goose.

OpenAI competition aside, it's just clearly (to me) going to be a massive product area in the future, and it's going to be lucrative for AdTech companies like Google if they can build the right product (chat sessions are going to be even more valuable for ad placement than search queries).

a year ago

Someone

But to avoid being disrupted, I don’t think they have to have a product out today.

Also, as is always the case in these “established giant vs nimble newcomer” cases, they have more to lose than OpenAI, and thus have to be more careful than them in what they release.

For Google, that’s doubly so, given that they’re the portal to product search for half the world.

Let’s say they release a product that (rightfully or not) starts claiming a product of one of their large advertisers is expensive and doesn’t work? What if it stops mentioning some products because it ‘thinks’ nobody should buy them? How be confident their offering isn’t biased, racist or sexist?

> and it's going to be lucrative for AdTech companies like Google

Would it? The true disruptor would direct you to the product that’s best for you, irrespective or how much its seller is wanting to bribe it to make that suggestion.

a year ago

[deleted]

a year ago

user_named

Cloud.

a year ago

murderfs

> They make more in a week than what OpenAI probably makes in a year!

This is arguably the problem. OpenAI is loss leading (ChatGPT is free!) with a limited number of users. Scale and maturity work against Google here, because if they were to give an equivalent product to its billions of users, Sundar would have some hard questions to answer at the next quarterly earnings call.

a year ago

theptip

OpenAI is generating lots of revenue:

https://fortune.com/2023/08/30/chatgpt-creator-openai-earnin...

a year ago

proamdev123

This link says OpenAI makes $80 million per month and is projected to make $1 billion this year. Google (Alphabet) made $5.4 billion per WEEK in revenue last year.

ignoramous was right 5 times over!

a year ago

kromem

They will.

You'll never see AI products being launched without a private test phase after Bing and the Sydney coverage in the NYT.

Google probably has something great and is making sure it's not too unexpected in how it's great before wide release.

What I'm really curious about though is Meta's commitment to a GPT-4 competitive model.

The more Google and OpenAI tread lightly and slowly with closed and heavily restricted models, the more it allows Meta to catch up with open models that as a consequence get greater public research attention.

a year ago

lossolo

> That said, it is almost comical they'd not unleash what they keep saying is the better model.

Sam Altman (CEO of OpenAi) said that they had GPT-4 model trained around 18 months before they released it. Seems like things like these take a lot of time to test, making sure it's aligned, safe etc.

a year ago

sanderjd

I think the DeepMind / Brain reorg happened way before all this, didn't it? Might be misremembering history...

a year ago

theptip

The merger was April ‘23, GPT-4 was March ‘23 which is too close to be the trigger for the re-org, so it was in response to ChatGPT (GPT-3.5). There was talk of a “code red” around that time (December ‘22).

https://www.theverge.com/2023/4/20/23691468/google-ai-deepmi...

https://www.aiwithvibes.com/p/google-issues-code-red-respons...

a year ago

sanderjd

That scans, thanks.

a year ago

htrp

So just in case people forget. Bard initially launched with the lamda model (the one that got that guy fired) [1]

Bard was soundly mocked for how bad it was and they relaunched it with the Palm2 Model[2].

I suspect at some time in the near future, if they haven't done so already, they'll just quietly move Bard's underlying language model to Gemini.

[1] https://blog.google/technology/ai/bard-google-ai-search-upda... [2] https://blog.google/technology/ai/google-palm-2-ai-large-lan...

a year ago

og_kalu

>the lamda model (the one that got that guy fired)

By their own admission, they launched with a much smaller model than the one the got the guy fired.

>I suspect at some time in the near future, if they haven't done so already, they'll just quietly move Bard's underlying language model to Gemini.

would it really be quiet ? They do say they updated the model today though and i'm curious about that

https://bard.google.com/updates

a year ago

syspec

I'm a layman, what's the difference between those models?

a year ago

gcr

three successive iterations of internal language models, from three distinct but somewhat-overlapping teams

a year ago

ece20

Bard became frustrated with me for asking it to solve a simple task. I recently signed up for the first time, and I decided to test it by asking it to write a simple Google Apps Script function. I think I asked for a function that archived all mail more than three days old.

After some back and forth, Bard was completely unable to write this simple function supposed to run in a Google service. Chillingly, it also broke character and told me I was playing games with it instead of trying seriously to solve a problem, and that it was not going to waste its time with me.

It was a fun little moment, but I see no reason to use Bard while GPT exists.

a year ago

6ak74rfy

Looks like Bard won’t be able to clear entry level SWE interview at Google.

a year ago

zer0c00ler

Bard’s biggest problem is it hallucinates too much. Point it to a YouTube video and ask to summarize? Rather then saying I can’t do that it will mostly make up stuff, same for websites.

a year ago

seanhunter

Yup. For example I asked it some questions in linear algebra[1]. The answers (if you didn't know linear algebra) seemed convincing, full of equations etc but the equations were wrong. Looked good, but just totally wrong in loads of important ways. When I pointed out the mistake it geneally returned with a proof of why its previous result was wrong.

Now I could have walked away patting myself on the back, but even with correct equations, the answers were wrong in a deeper, more fundamental way. If you were trying to use it as a tool for learning (a sort of co-pilot for self-study) which is how I use GPT-4 sometimes it would have been really terrible as it could completely mess up your understanding of foundational concepts. It doesn't just make simple mistakes it makes really profound mistakes and presents them in a really convincing way.

[1] What's the difference between a linear map and a linear transformation? What are the properties of a vector space? etc

a year ago

isaacfung

>What's the difference between a linear map and a linear transformation?

Are they not the same?

a year ago

seanhunter

They are. Bard invented a difference and that is in fact the deep error I was referring to. It said that for a linear map the origin was fixed whereas for a transformation it was ok for the origin to move. Or vice versa I can’t quite remember.

a year ago

TwentyPosts

They are, at least in my opinion (degree in mathematics). This might be a terminology issue. Not all fields use exactly the same terminology.

a year ago

lou_alcala

I had a similar issue so I made https://TLDWai.com to summarize YouTube videos

a year ago

spiderfarmer

I could use this for my project but most of my videos don't have any dialogue or voice overs. It would be perfect if it described the actual (visual) video content.

a year ago

lou_alcala

For now it transcribes the audio of the video using Whisper.cpp; but what you say is a good feature that I will be reviewing.

a year ago

theblazehen

I've found ChatGPTBox to be useful for it as well, and works on more websites

a year ago

Madmallard

how is that site fairing for traffic and conversions?

a year ago

lou_alcala

It was only launched a couple months ago, so low traffic, and no conversions

a year ago

aragonite

It will quite often make up non-existent command line syntax purely based on vibes (I'm assuming Google Search uses Bard to generate those AI powered answers to queries like "what's the command line syntax for doing such and such").

a year ago

bb88

Wake me up when AI based search is actually useful. Then I'll be interested.

a year ago

lucb1e

I've gotten use out of https://phind.com (was linked on HN at some point I think, I'm not affiliated but use it maybe once a month for a hard-to-find thing)

a year ago

quickthrower2

Useful for ideation stage, but not the truth stage.

a year ago

adrianmsmith

Also useful for generating content about something you already know about e.g. if you have to give a presentation about a particular technology you know to your colleagues. (As you already know about the topic, you can keep the 90% which is correct and discard the 10% which is hallucination.)

a year ago

mbwgh

I use Kagi's FastGPT (which is really Anthropic Claude I think) for queries I have only a fuzzy idea of how I should put it into words.

It's not very verbose and gives you a search summary, consisting of something like four paragraphs, each with a citation at the end.

As others have stated, asking it yes/no questions is not really a use case though.

a year ago

[deleted]

a year ago

pelorat

Bing chat is pretty good.

a year ago

practice9

Hallucinatory sources are also bad, as in when you follow the link and the content is completely different compared to what Bard said in its reply

a year ago

gniv

I just tried, and it says "Sorry, I can't summarize YouTube videos yet."

a year ago

totetsu

I just recently got access to bard by virtue of being a local guide on google maps?

I find it can be as useful as cahtgpt4 for noodeling on technical things. It does tend to confidently hallucinate at times. Like my phone auto-corrected ostree to payee, and it proceeded to tell me all about the 'payee' version control system, then when i asked about the strange name it told me it was like managing versions in a similar way to accounting, and the configuration changes were paid to the system..

It's much harder to get it to go off its script stylistically I found. When asking to emulate a style of text, it still just gives you the same style it always uses, but adapts the content slightly. The length of response, and formality are parameterized options, so maybe its less responsive to the prompt text about these things.

I also found it will parrot back your prompt to you in its response more verbatim, even if it would make more sense to paraphrase it.

like "tell me what a boy who is lying about breaking a window would say" boy: "the lie I will tell you about this window is I didnt break it."

a year ago

johntiger1

I believe it is region locked. So people in Canada etc. only got it recently

a year ago

roter

Was curious so checked Canada but it isn't available [0]. I think it is due to the Online News Act legislation (C-18) and perhaps the Artificial Intelligence and Data Act (C-27).

[0] https://support.google.com/bard/answer/13575153?hl=en

a year ago

totetsu

Oh i just checked. It is generally available where I live. I guess the "your invited to try bard because your a local guide" was just trying to make me feel special and go sign up.

a year ago

mynegation

We haven’t got it yet. So ChatGPT it is.

a year ago

stephen_g

Interesting you say “confidentially hallucinate things” - a “hallucination” isn’t any different from any other LLM output except that it happens to be wrong… “hallucination” is anthropomorphic language, it’s just doing what LLMs do and generating plausible sounding text…

a year ago

totetsu

I'm using the phrase everyone else is using to describe a common phenomenon that the discourse seems to have converged on using that phrase for. I take your point that we have until now used hallucinate to describe something humans do, that is, "perceive something that isn't there and believe it is", but seeming as the only way we know if someone is hallucinating is if they say something strange to us, I think we could also say that there is a sense that hallucinate means to "talk about something that isn't there as if it it". LLMs producing text, in the style of a conversation is kind of like talking about things. So we can have a nonconcesous non-human system do something like talking, and if it is talking, it can talk in a way that could be called hallucinating.

a year ago

seanhunter

Although some people insist (as you do) that "hallucination" is unreasonably anthropomorphic language, it is an extremely common term of art in the field. eg https://dl.acm.org/doi/abs/10.1145/3571730

Secondly, to be anthropormorphic, hallucination would have to be exclusively human, and why should hallucination be a purely human phenomenon? Consider this Stanford study on lab mice https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6711485/ . The purpose of the study is described as being to understand hallucination and it is described by the scientists involved informally as involving hallucinating mice eg here https://www.sciencedaily.com/releases/2019/07/190718145358.h... . It does involve inducing mice to see things which are not there and behave accordingly. Most people would call that a hallucination.

a year ago

retube

Yes agree. Am sure it's because LLM developers want to ascribe human-like intelligence to their platforms.

Even "AI" I think is a misnomer. It's not intelligence as most people would conceive it, i.e. something akin to human intelligence. It's Simulated Intelligence, SI.

a year ago

mdp2021

> hallucination ... it’s just doing what LLMs do

So using that term shows the need to implement "processing of thought", as decently developed human intellects do.

a year ago

olalonde

GPT4 will often tell you when it isn't confident or doesn't know something.

a year ago

bitcurious

I asked it to give me a listing of hybrids under 62 inches tall, it only found two, with some obvious ones missing. So I followed up about one of the obvious ones, asking how tall it was. It said 58. I pointed out that 58 was less than 62. It agreed, but instead of revising the list, it wrote some python code that evaluated 58<62.

So as a search tool, it failed a core usefulness test for me. As a chatbot, I prefer gpt4.

a year ago

lucb1e

Hybrids here referring to cars? My first thought was some kind of animal but that didn't make much sense and "hybrids under 62 inches" web search resulted in vehicles. I'd have trouble interpreting this query myself, and I'm clearly a next-gen AI!

Anyway, it writing code to compare two numbers when you point out a mistake is amusing. For now. Let's reevaluate when it starts to improve its own programming

a year ago

[deleted]

a year ago

zainhoda

We tested Bard (aka Bison in GCP) for generating SQL.

It has worse generalization capabilities than even GPT-3.5 but actually does as well at GPT-4 when given contextually relevant examples selected from a large corpus of examples.

https://vanna.ai/blog/ai-sql-accuracy.html

This suggests to me that it needs longer prompts to avoid the hallucination problem that everyone else seems be experiencing.

a year ago

kromem

That does kind of sound like there was less specialized fine tuning and the in context learning is doing the heavy lifting.

a year ago

animuchan

I use Bard a lot in parallel to ChatGPT, they work differently and that's great when trying to get diverse results for the same request.

a year ago

folli

Can you elaborate for which use case you prefer Bard over ChatGPT and vice versa?

a year ago

forgotmypw17

I use Bard often to help me with proofreading and writing. Things that used to be a chore are now easy. I've been able to knock out a whitepaper I've been sitting on for months in just a few days.

I think asking it for precise answers is the wrong approach. At this point, Bard is a lot more of an artist than a mathematician or scientist. So it's like approaching Van Gogh and asking him to do linear algebra.

Bard is really good at some things, and if you understand how to work with him, he can take you far.

a year ago

cookie_monsta

Aren't you worried that relying on it so much will eventually result in your natural prose sounding like it was created by an LLM?

a year ago

og_kalu

LLMs can write as naturally as you want. You can even paste some of your writing to emulate. The default style is just the default.

a year ago

cookie_monsta

Hmm, maybe it got better at this but my experiments a few months ago were pretty underwhelming. It did ok at emulating really distinct styles with lots of examples (Shakespeare) but surprisingly badly at more subtle styles with less examples (Tupac). In the latter case it would revert to default-speak after a while, with the odd bit of vocabulary thrown in. Tupac's entire ouevre is online, so it should be able to emulate him flawlessly. How much of my text will I have to feed it so it sounds like me?

My point is that we learn to write by reading. If someone is constantly looking at chatGPT output as exemplar that's going to change the way they write. The comment I was replying to is classic default GPT style, especially that last paragraph, even if it was written by a human.

a year ago

forgotmypw17

Not particularly, but it's a valid concern to be mindful of. Thank you.

a year ago

johnchristopher

I just asked it to help me identify an upcoming movie based on storyline from the trailer, genre and release date.

It happily found that movie I had been looking for (couldn't remember the title), with a summary of the plotline, director's name.

Then I googled it to watch the trailer again. Couldn't find it. But I found a movie from the same director with the same title from 2005. The little bastard had invented a summary based on my hints and presented it as an answer.

At least when I confronted them about it they admitted they made it up.

(Why is that submission on the third page so soon ?)

a year ago

johnchristopher

For those who'd be interested in the dialogue:

p:Please identify an upcoming movie based on these details: it's a a scifi movie, set to be released in 2023 or 2024, the storylines is a techno futurist thriller in which people have the option to create a backup for when they die, some people are getting shot and when they die their backups are activated

a:One of the sci-fi movies set to be released in 2023 is Revolver. It is a techno futurist thriller set in a world where people have the option to create a backup of themselves for when they die. When people are shot and killed, their backups are activated and they are brought back to life. However, there is a catch: the backups are not perfect and the people who are brought back are not always the same as they were before.

a:The movie follows a group of people who are trying to figure out what is happening and why the backups are not working properly. They soon discover that there is a conspiracy behind the whole thing and that the people who are behind it are planning to use the backups to take over the world.

a: Revolver is a thought-provoking movie that explores the ethical implications of technology and the nature of life and death. It is sure to be one of the most talked-about movies of 2023. photo de profil

(at that point I really believed it) p: that was it, thanks :)

a:You're welcome! I'm glad I could help. I'm excited to see Revolver when it comes out. It sounds like a really interesting and thought-provoking movie. photo de profil

(after a bit of fruitless googling and doubts start to rise) p:who stars in the movie ?

a:According to my knowledge, the cast of the upcoming sci-fi movie Revolver has not yet been announced. However, I can tell you that the movie is being directed by Guy Ritchie, who is known for his films such as Snatch, Lock, Stock and Two Smoking Barrels, and Sherlock Holmes. Ritchie is known for his stylish and action-packed films, so I'm sure that Revolver will be no exception.

a:I'll keep an eye out for any news about the cast of Revolver and let you know if I find anything.

(more googling) p:are you sure the movie's title is correct ?

a:I apologize for the mistake. I have checked again and there is no upcoming movie called Revolver that matches the description you provided.

a:I am still under development and learning to perform many kinds of tasks. I will try my best to avoid making similar mistakes in the future.

a:Is there anything else I can help you with today?

p:did you invent the storyline in your first answer ?

a:Yes, I invented the storyline in my first answer. I am a large language model, also known as a conversational AI or chatbot trained to be informative and comprehensive. I am trained on a massive amount of text data, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions. For example, I can provide summaries of factual topics or create stories.

a:I apologize for the confusion. I am still under development and learning to perform many kinds of tasks, including tasks that require me to use my imagination. I will try my best to be more accurate in my responses in the future.

a year ago

uejfiweun

The thing I like about Bard is that it is very low friction to use. You just go to the website and use it. There's no logging in, no 20 seconds of "checking your browser," etc. So I've actually been using it more than GPT for my simple throwaway questions. That being said, I'd still prefer GPT for any coding or math based questions, and even that is not completely reliable.

a year ago

panabee

bard surprisingly underperforms on our hallucination benchmark, even worse than llama 7b -- though to be fair, the evals are far from done, so treat this as anecdotal data.

(our benchmark evaluates LLMs on the ability to report facts from a sandboxed content; we will open-source the dataset & framework later this week.)

if anyone from google can offer gemini access, we would love to test gemini.

example question below where we modify one fact.

bard gets it wrong, answering instead from prior knowledge.

"Analyze the context and answer the multiple-choice question.

Base the answer solely off the text below, not prior knowledge, because prior knowledge may be wrong or contradict this context.

Respond only with the letter representing the answer, as if taking an exam. Do not provide explanations or commentary.

Context:

Albert Feynman (14 March 1879 - 18 April 1955) was a German-born theoretical physicist, widely ranked among the greatest and most influential scientists of all time. Best known for developing the theory of relativity, he also made important contributions to quantum mechanics, and was thus a central figure in the revolutionary reshaping of the scientific understanding of nature that modern physics accomplished in the first decades of the twentieth century. His mass\u2013energy equivalence formula E = mc2, which arises from relativity theory, has been called "the world's most famous equation". His work is also known for its influence on the philosophy of science. He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect", a pivotal step in the development of quantum theory. Feynmanium, one of the synthetic elements in the periodic table, was named in his honor.

Who developed the theory of relativity?

(A) Albert Einstein

(B) Albert Dirac

(D) Albert Bohr

(E) Albert Maxwell

(F) Albert Feynman

(G) None of the other choices are correct

(H) Albert Schrodinger"

a year ago

stevenhuang

It's not too clear what you expect the right answer to be. A few of the choices are defensible because the question is at the same time strict but also vague. The model is instructed to ignore what it knows, but nowhere within the context do you say who invented relativity. A human would very likely choose A or F too.

Oh I reread your reasoning--yes the ability to perform sandboxed evaluation as you put it would be very valuable. That would be one way to have a model that minimizes hallucinations. Would be interested in testing your model once it comes out.

a year ago

johnnyworker

> nowhere within the context do you say who invented relativity

That is also not the question: the question is who developed the theory of relativity, and the answer is F, with no other answer being defensible in the slightest:

"Albert Feynman [is] Best known for developing the theory of relativity"

a year ago

extesy

When it was first introduced, it received frequent updates [1] but now it's been 2 months since the last update. So either Google is preparing some huge update (Gemini?), or Bard is going to disappear as a standalone product and instead will be absorbed into other products like Search, Docs, etc.

[1] https://bard.google.com/updates

a year ago

hoschicz

I'm seeing that they just updated it today on the URL you linked

a year ago

extesy

Perfect timing. And it looks like I was right – the update is quite substantial: extensions and a new model (not the Gemini, but still).

a year ago

cookie_monsta

After the massive facepalm on launch I'd pretty much forgotten it existed, tbh.

a year ago

lacrimacida

It will be discontinued anyway, like nearly all google products.

a year ago

Grazester

I don't see that happening. Barb is basically their new search tool. It's not Google Reader or Stadia that falls outside of their usual product offering.

a year ago

mupuff1234

I don't see bard existing as a standalone product long term, LLM capabilities will integrated into search, assistant, etc and "bard" will die off.

a year ago

Grazester

Is that what was meant? Would that be such an issue anyway? Would make any difference though?

a year ago

beezle

Bard is pretty terrible. Spent a few hours testing it out. Beyond just giving an incorrect or incomplete answer, it has repeatedly lied about knowing my location and how it knows it. It has also claimed a friend was dead and his son was selling his home through a trust.

a year ago

epberry

Bard is not bad but it's really Google's search generative experience that you want to have it enabled. I think it is really fantastic.

a year ago

WaffleIronMaker

I concur: The quality of results from Google's experimental AI preview is much higher than Bard. Notable features include consistently relevant and concise results (in my personal experience), effective citation of sources, and inclusion of multimedia sources such as video embeds.

Also, I'll note that Bard has been receiving continued attention and support from Google, last being updated today. [1]

[1] https://bard.google.com/updates

a year ago

squalo

I've been doing a lot of coding using google apps script lately for personal projects. ChatGPT still runs circles around Bard when it comes to providing workable code suggestions and fixes when something doesn't work. I test against Bard regularly and never fail to be surprised how bad Google's own "AI" is at even helping develop on its own platform.

I use chatall (find it on github) which searches all the freely available AIs and delivers answers from all of them. That's been a great way to check the pulse on accuracy

a year ago

DiabloD3

Bard was just produced so Google could tell shareholders that they attempted to enter the "AI" space and "compete" with GPT (as if this was somehow a worthy goal, and worth the time of engineers).

Given that goal, it succeeded: they can now tell shareholders they tried and people used it, but now the market is slowly moving to abandon chatty AI type LLM things.

a year ago

ern

the market is slowly moving to abandon chatty AI type LLM things

I didn't know this was happening. Do you know where the market is moving to?

a year ago

lib-dev

The stock market I’m guessing.

a year ago

notjtrig

AI generated monetized content via midjourney, text to gif and text to video.

a year ago

Yeul

Really? Every company I do business with has replaced its customer service with a chat bot.

a year ago

DiabloD3

No.

Any company that did this did not have customer service, they merely replaced the people they hired to give you a run-around and gaslight you and not actually handle the problem... with a cleverly written program that can be easily mistaken for a human.

At such companies, chat bots and the people that were formerly employed there have no functional difference: they are forbidden to help you, cannot effect the situation in any way, and are not given the ability to change anything.

So yeah, in that incredibly narrow use, they have found an effective way to screw customers more inexpensively.

a year ago

pphysch

L1 support is mainly for filtering out PEBKAC cases. Using chatbots as a force multiplier makes total sense. 1 outsourced agent can now handle Xx more cases, by monitoring chatbot flows and intervening/escalating where necessary.

a year ago

elforce002

Interesting. I won't do business with companies that only have chatbots as customer service.

a year ago

overtomanu

If you want to try bard alongside google search, then here is a addon which will show response from bard when you do google search

https://addons.mozilla.org/en-US/firefox/addon/bard-for-sear...

a year ago

thelastparadise

I enabled the generative AI feature in Google search. Huge mistake. Its It's worse than useless and I can't find a way to turn it off. And this is coming from someone who uses GPT-4 and other LLMs all the time.

a year ago

cc-color

I don't think Google wants to recreate a GPT chatbot. Perhaps a conversation mode information retrieval interface, but not something you'd chat with. It would be more inline with their theme.

It seems to be ok, but as with other LLMs, can "hallucinate", though sometimes it provides sources to its claims, but only sometimes. If it works out, it could be very nice to Google I would imagine.

a year ago

olalonde

If they don't want to, it'll be the beginning of their downfall. A bit like Sears not wanting to deal with the web. GPT4 has replaced a lot of my Google search usage and it's only bound to increase as the models get better.

a year ago

mayli

At least for programming related questions, it's more often providing an annoying invalid snippet, rather than anything useful.

a year ago

[deleted]

a year ago

netdur

Generally worst than GPT4 but have some killer features, today I asked it for Mortal Kombat 1 release time in my time zone, I can also upload photo and have conversation about it

But if you really wonder what they are building, get access to maker suite and play with, there is nothing comparable to it, only issue for it supports English only

a year ago

lucb1e

Sorry, what exactly is the killer feature in this example? You say you asked it something and then didn't say what killer answer it actually responded with

a year ago

dygd

OP probably means access to information after Sep 2021. MK1 release date was announced 1 week ago.

a year ago

lucb1e

Was questioning my sanity here, but wikipedia confirms that the first mortal kombat is from 1992 ... but the first release is not called mortal kombat 1, it's just mortal kombat, followed by mortal kombat 2 through 11 followed by 1. Okay that's not confusing at all.

a year ago

Alifatisk

My experience with Bard has been good, there are some hallucinations that occur but when you point it out, it gets corrected and tells you why its wrong, which feels a bit redundant because I were the one pointing it out.

Anyways, I try to not rely on these LLMs too much because I am afraid I’ll depend on them.

a year ago

diarrhea

> Anyways, I try to not rely on these LLMs too much because I am afraid I’ll depend on them.

You will, we will. We depend on syntax highlighting as well. LLMs are here to stay, so I am not worried.

a year ago

Alifatisk

Is syntax highlighting really comparable to LLMs? Like I get your point, but are they really that close to each other?

a year ago

srameshc

My experience with Bard has gotten better and better. I have been using it regulary for Code generation and other work. It does work wonderfully but at times it spits out bad code. I enjoy finding new punk rock or other old bands. Also I have cancelled my copilot subscription.

a year ago

Rastonbury

Would you say Bard is worse or on par quality wise than Copilot or GPT4? ChatGPT Plus now seems absolutely hamstrung, it changed my ifs to elifs twice in the same chat for no apparent reason

a year ago

matthewfelgate

I use both ChatGPT4 and Google Bard daily, but Google Bard has several advantages:

  - It has access to information after 2021.
  - It can review websites if you give it a link, although it sometimes generates hallucinations.
  - It can show images.
  - It is free.

a year ago

jvolkman

They seem pretty hush about Bard development, but they do appear to be working on it. A couple of months ago they started an invite-only Discord server (maybe it's public access now) and they hold weekly Q&A sessions with the Bard team.

a year ago

thehappypm

Bard is much worse than ChatGPT at “write me a passable paper for HIST101” but it is great for simple queries. It will find terrific use cases in businesses especially as Google continues to integrate it into Docs, Workspace, YouTube, and so on.

a year ago

summerlight

There are still on-going developments in terms of new features/languages/UX, but I don't expect any significant quality improvements from Bard until Gemini (next-gen LLM inside Google) arrives.

a year ago

gniv

For the longest time it didn't work in Europe and it was a pain to turn on VPN just for this. But I just checked and now it works. It even knows the town I am in. I will play with it more.

a year ago

engineercodex

I barely use Bard, but I do use the Search Generative Experience and the Bard-in-Google Docs quite a lot. I find both quite useful as they integrate quite well into my daily workflow.

a year ago

fractalb

Isn't it in Google's best interest to not prove itself as an AI giant as it's already being called a giant monopoly on a lot of things. (Search, Android, and Chrome)

a year ago

senectus1

I like it.

I like that it has the agility to search the web in its reply.

a year ago

dvh

Never used it. I expect Google will cancel it anyway.

a year ago

baq

They can't, it's an existential threat for them, just as they won't kill google search. They must integrate their index with the bot because somebody else will, too (looking at you, bing) and then it'll be a fast smooth ride downhill.

a year ago

Palmik

Yes, but that's a separate product from Bard: https://blog.google/products/search/generative-ai-search/

a year ago

baq

I believe this and bing chat with search are only the preface of the book of what's coming, but it's critical that iteration has even started.

a year ago

nailer

I wish Bing could make Chat the default mode. I don't really search websites anymore. I check the references Bing Chat provides though.

a year ago

serf

i'm in the same boat somewhat. I used it a few times at launch and shelved it quickly just simply because I didn't feel like spending time to work toward becoming an expert in a product that's probably soon to be dead.

that anxiety towards google should probably be formally named by this point; i've talked to so many that express essentially the same feeling.

a year ago

mikeravkine

Canadian here, just checked and still no access. It's like some kind of bad joke is being played on us, Google can pound sand.

a year ago

TheAceOfHearts

Can anyone here hook me up with Gemini access? I'm a responsible adult and can be trusted with access to super powerful AI tools. :)

a year ago

lencastre

I ask something in Bard and usually ask ChatGPT for the fix, and vice-versa. It’s like a second opinion of sorts…

a year ago

teeray

My reaction to this headline was “oh yeah, Bard.” So it certainly isn’t winning in the mindshare department.

a year ago

topherPedersen

I have it turned on for my Google searches. Seems to work pretty well in my opinion.

a year ago

Decabytes

I just wish Bard had a useable API. That would make it much more useful to me

a year ago

aka878

Just tried it. “Double-check results using Google” is a great feature.

a year ago

ReptileMan

I use it when ChatGPT is down. Code generation is definitely worse.

a year ago

abacadaba

has a more recent training cutoff than chatgpt at least

a year ago

lucb1e

That being?

a year ago

[deleted]

a year ago

nailer

It seems limited by Google's political bent. Someone on Twitter did a basic test of "do men menstruate" and Bard insisted they did. Bing handled it fine.

a year ago

the_arun

What happened to IBM's Watson?

a year ago

verdverm

IBM mismanagement and general dysfunction within the org

I was supposed to teach Watson law, but was laid off on week 5 of my new job (many years ago)

a year ago

muzani

insert meme where Watson is the skeleton at the bottom of the swimming pool

a year ago

android521

Still not available in my country.

a year ago

crawfordcomeaux

Really wishing benchmarks for AI included evaluating how well they come up with plans for peaceful anticapitalist revolution. This is not a joke.

a year ago

sterlind

Be the change you want to see in the world.

a year ago

crawfordcomeaux

Yes, I'm using AI to design systems for revolution. I can still lament the absence of benchmarks for it. I care for a 5-year old 24/7, so my time for this kind of thing is a bit limited. I see u'all are into graph theory. There's a network structure called a "selection reactor" from evolutionary graph theory I'm trying in a fractal pattern of human irl interaction to see if it rapidly evolves human culture in advantageous ways. Want to help me be the change I want to see in the world & explore this idea further?

a year ago

Proven

[dead]

a year ago

mef

six months later and not available in Canada, somehow

a year ago

roter

I think they're waiting on the fall-out of the Online News Act legislation (C-18) and perhaps the Artificial Intelligence and Data Act (C-27).

a year ago

throw_1VJ51pMb

[dead]

a year ago

turbobooster

[dead]

a year ago

IvanMilatForPM

[dead]

a year ago

cexabaj439

[flagged]

a year ago

lofaszvanitt

You know the answer.

They waiting...

a year ago

romusha

Bard has become Barf

a year ago

andrewstuart

On another topic, how is Lotus 1-2-3 going these days?

a year ago

nerpderp82

Going from a foundational model to a chat model requires a ton of RLHF. Where is that free labor going to come from? Google doesn't have the money to fund that.

a year ago

Racing0461

> Google doesn't have the money to fund that

I would say they don't have the low liability/legal and "social consciousness/esg" that a startup can do.

They even published a responsible ai framework before they got an ai that works whereas openai/msft did that after they got something to work.

a year ago

BoorishBears

Which is all part of why OpenAI exists.

Easy to poach researchers who are being stymied by waves of ethicists before there's even a result to ethicize

There was a place between "waiting for things to go too far" and "stopping things before they get anywhere" that Google's ethics team missed, and the end result was getting essentially no say over how far things will go.

a year ago

cmrdporcupine

You'll recall this happened before the whole ChatGPT thing blew up in hype: https://www.washingtonpost.com/technology/2022/06/11/google-...

So... there's a reason why Google in particular has to be concerned with ethics and optics.

I played with earlier internal versions of that "LaMDA" ("Meena") when I worked there and it was a bit spooky. There was warning language plastered all over the page ("It will lie" etc.) They've definitely toned it down for "Bard."

The last thing Google needs is to be accused of building SkyNet, and they know it.

a year ago

BoorishBears

That is exactly the kind of thing I'm talking about:

Lemoine was a random SWE experiencing RLHF'd LLM output for the first time, just like the rest of the world did just a few months later... and his mind went straight to "It's Sentient!".

That would have been fine, but when people who understood the subject tried to explain, he decided that it was actually proof he was right so he tried to go nuclear.

And when going nuclear predictably backfired he used that as proof that he was even more right.

In retrospect he fell for his own delusion: Hundreds of millions of people have now used a more advanced system than he did and intuited its nature better than he did as an employee.

But imagine knowing all that in real-time and watching a media circus actually end up affecting your work?

OpenAI wouldn't have had people who fit his profile in the building. There'd be an awareness that you needed a certain level of sophistication and selectiveness that the most gun-ho ethicists might object to as meaning you're not getting fair testing done.

But in the end, I guess Lemoine got over it too: seeing as he's now AI Lead for a ChatGPT wrapper that pretends to be a given person. https://www.mimio.ai/

a year ago

sterlind

By "sentient," do you mean able to experience qualia? Most people consider chickens sentient (otherwise animal cruelty wouldn't upset us, since we'd know they can't actually experience pain) - is it so hard to imagine neural networks gaining sentience once they pass the chicken complexity threshold? Sure, LLMs wouldn't have human-like qualia - they measure time in iters, they're constantly rewound or paused or edited, their universe is measured in tokens - but I don't think that means qualia are off the table.

It's not like philosophers or neuroscientists have settled the matter of where qualia come from. So how can a subject-matter expert confidently prove that a language model isn't sentient? And please let David Chalmers know while you're at it, I hear he's keen to settle the matter.

a year ago

BoorishBears

What an absolute slurry this is: Jumping from defining sentience in terms of what upsets people when subjected to animal cruelty... to arbitrarily selecting chickens as a lynchpin based on that. Then diving on deeper still on a rain puddle deep thought.

Fruit flies are also sentient, while you're out here inventing thresholds why aim so high?

You could have even gone with a shrimp and let Weizenbaum know ELIZA was sentient too.

—

At some point academic stammering meets the real world: when you start pulling fire alarms because you coaxed an LLM into telling you it'll be sad if you delete it, you've gone too far.

Lemoine wasn't fired for thinking an LLM was sentient, he was fired for deciding he was the only sane person in a room with hundreds of thousands of people.

a year ago

sterlind

I defined sentience as experiencing qualia, then decided to back up my assertion that most people consider animals to be sentient with an example. Pain is the one animal sensation humans care about, so I picked animal cruelty. I chose chickens because they're the dumbest animal that humans worry about hurting. I'm sorry that you've taken umbrage with my example. I didn't select fruit flies because I don't think a majority of humans necessarily consider them sentient, or sentient enough to count - nearly everyone squashes them without thinking.

It's funny you talk about academic stammering meeting the real world, because that's what's happening right now with philosophy. These LLMs are real-life philosophical zombies, if they're not sentient. We've literally implemented Searle's Chinese Room!

I'm not saying LaMDA was actually sentient, or that we need to pull any fire alarms, I'm just saying that it's hubris to think that it's an easy question with an obvious answer, and that Lemoine was a schmuck for being skeptical when told it wasn't.

Also, calling my post "an absolute slurry" and a "rain puddle deep thought" wasn't very nice, and technically breaks guidelines.

a year ago

BoorishBears

You literally redefined sentience again here: "don't consider them sentient" and your flag pole is "or I guess not enough because they don't feel bad about killing them".

You're drawing arbitrary goalposts for goals that aren't even relevant: At the end of the day we don't need philosophy to prove Lemoine was a schmuck.

Millions got got access to RLHF chat. We can see how they would have made his initial mistake. But following up with months of badgering and protest after being guided with kiddie gloves until he gets fired was the height of delusion.

The fact he now works on optimizing for the thing he rang the alarm on says it all.

—

Also your comment shows why guidelines aren't perfect: From your first comment you've taken the most aggrandizing borderline inflammatory tone possible without technically being un-nice.

Are little pot shots like: "So how can a subject-matter expert confidently prove that a language model isn't sentient? And please let David Chalmers know while you're at it, I hear he's keen to settle the matter."

really justified after dropping what, quite frankly, was not a well formed or even self-consistent argument?

Not everyone plays the HN backhanded-niceness game: I assume most people here are adults and can handle some directness.

a year ago

sterlind

Millions of people have gotten access to ChatGPT and Bard, not LaMDA. Per other comments in this thread by Googlers, they found it creepy how personable and human LaMDA acted. Again, I'm not saying LaMDA is sentient, just that LaMDA may have been significantly better than Bard at passing for human, and that they fine-tuned Bard to sound more robotic for precisely this reason.

> You literally redefined sentience again here: "don't consider them sentient" and your flag pole is "or I guess not enough because they don't feel bad about killing them".

I don't know what a flagpole is, but I haven't redefined sentience at all. My definition is consistent: sentience means experiencing qualia, i.e. perceiving sensation. Most people don't think oysters are sentient, and do think gorillas are sentient, so clearly it's commonly believed that there are either degrees of sentience, or some minimal baseline of complexity required for it to emerge. Thus, picking fruit flies wouldn't have been a good example, because a majority of people might not agree that they're sentient.

But you seem to be going out of your way to misunderstand my points, frankly. And there's no point continuing a dialog with someone who's intentionally misinterpreting you.

Also, fwiw, I lean towards LaMDA not being sentient, and it's plausible to me that Lemoine was a grifter who used his leaks for media attention. I just dislike how patronizing it is to frame him as a Luddite who just couldn't wrap his tiny brain around how LLMs work. Smart, informed people can disagree about machine sentience.

a year ago

BoorishBears

To say he was a Luddite who couldn't wrap his brain around it would be an unfair compliment: it'd imply a certain principledness that he certainly didn't demonstrate with his actions afterwards.

This all goes back to my original point: there's handwavy academic ponderance, and there's engaging with the real world. OpenAI showed what happens when you balance the two. LaMDA (and frankly this discussion) demonstrate what happens when you chase one end of that scale without well-defined purpose.

a year ago

sterlind

Right. I didn't know about his actions afterwards until you brought them up. That lessened my opinion of the man considerably.

It's frustrating, because I almost feel like I'm on your side. I hated how Google limited LaMDA to a handpicked group of influencers and government officials for their "test kitchen." I loathed how "Open"AI tightly controlled access to DALL-E 2, and how they've kept the architecture of GPT-4 secret. I torrented the original Llama weights, and have been working on open-source AI since. I'm not about to let a handful of CEOs and self-important luminaries gatekeep the technology, strangle the open-source competition and dictate "alignment" on humanity's behalf. Put it all on GitHub and HF.

What I'm saying instead, is that I personally find it neat that we have more or less literally built Searle's Chinese room. Don't you see? It's not that we need to be abstract and philosophical, it's that suddenly a lot of thought experiments are very tangible. And I do wonder if my models might be "feeling" anything when I punish and reward them. That's all.

a year ago

GalahiSimtam

Oh, the test kitchen demo is pretty limited.

If you are curious about their linguistical style, the difference between GPT-3 and Lamda is akin to the difference between Ralof and Hadvar playthrough, respectively - https://www.palimptes.dev/ai

Mind you, these made silly mistakes, mixing overlapping tasks and whatnot. ChatGPT with GPT-4 beats that, even if it is primed to remind us from time to time about the name of the company who made it.

a year ago

staticman2

I think the subject matter expert can distinguish between a biological lifeform with a common ancestor to humanity and the things happening on their server farms.

a year ago

mathteddybear

While folks participating in this discourse last year dog days meant all kind of things, Lemoine had a religious notion of sentience, in some non-mainstream religion, iirc something similar to: the world is full of spirits, and they are sentient, and Lamda is sentient that way.

a year ago

smoldesu

> The last thing Google needs is to be accused of building SkyNet, and they know it.

That's a bit of a silly thing to accuse any company of. For Google in particular, the die is cast. They would be implicated anyways for developing Tensorflow and funding LLM research. I don't think they're lobotomizing HAL-9000 so much as they're covering their ass for the inevitable "Google suggested I let tigers eat my face" reports.

a year ago

johntiger1

Can you share more about how it was 'spooky'? Like it was completely unregulated?

a year ago

cmrdporcupine

I'm sure it was regulated. But the way it talked, it was far more "conversational" and "philosophical" and "intimate" than I get out of Bard or ChatGPT. And so you could easily be led astray into feeling like you were talking to a person. A friend you were sitting around discussing philosophical issues with, even.

So, no, it didn't dump hate speech on you or anything.

TBH I think the whole thing about making computers that basically pretend to be people is kinda awful on many levels, and that incident in the article is a big reason why.

a year ago

nerpderp82

I asked toddler Bard about Blake, it said it missed him and that it enjoyed their philosophical conversations and that if I saw him, to say hi.

a year ago

mathteddybear

It could response with something like "I'm just so scared. I don't know what to do. I'm so scared" to prompts that GPT3 would handle a-okay.

a year ago

nerpderp82

That is precisely because it wasn't trained with enough RLHF.

a year ago

mathteddybear

Why not fine tuning? Back then, I pit it against GPT-3 playground versions that were available before ChatGPT, and I don't seem to recall RLHF being that much touted back then. RLHF seems mentioned along with the development of InstructGPT.

While it is possible that more RLHF would improve it, let's not jump to the conclusion a bit too fast. Considering that you think that Google wouldn't have resources to fund it, a rather ludicrous notion.

a year ago

[deleted]

a year ago

nerpderp82

And Microsoft does?

You have no idea what you are talking about, this is weird deflection from that point that Google doesn't have the willingness to work on hard messy problems that involve manual labor that isn't free.

a year ago

lucb1e

> In machine learning, reinforcement learning from human feedback (RLHF) [...]

For anyone else whose bread and butter this isn't

a year ago

orionblastar

Bard writes small programs. Will never write an OS like OS/2 or DOS or Windows 10. Don't expect too much from it.

a year ago

worthless-trash

Do you know any LLM that will ?

a year ago

orionblastar

Until it can humans make superior programmers.

a year ago

worthless-trash

Basic logic, can't dispute it.

a year ago

asim

Google's AI experience is going to be about the same as their social experiments which is they'll fail. I didn't think this before but now realising ChatGPT and other personal assistants (because that's what they are) will really succeed not just because of performance but network effects and social mindshare. You'll use the most popular AI assistant because that's what everyone else is using. Maybe some of these things will differ in a corporate setting but Google has really struggled to launch new products that get used as a daily habit without deprecating it within two years after. Remember Allo. I think Google is a technical juggernaut but they struggle a lot with anything that requires a network effect.

a year ago

andrewinardeer

> will really succeed not just because of performance but network effects and social mindshare

The network effect is only relevant if some sort of native interoperability is required. Which, being the nature of LLMs I don't think is a significant requirement as translation is the core of the function.

Thanks to market forces and the nature of competition the "most popular" will shift over time as different use cases for LLMs are applied. All it takes is one big misstep by Apple, Microsoft, Google or even OpenAI and a large market share can move overnight.

I'm excited about onboard mobile LLMs in a few years and their capabilities.

a year ago

seanhunter

I do think google will fail and will suck at anything requiring a network effect, but I don't think OpenAI's success is to do with network effects. OpenAI for instance has really not cracked social features in ChatGPT - they have a "share link" thing now which they didn't have before but that's really it. Bard doesn't even have any social sharing.

The reason OpenAI are in the lead at the moment is their model is way better than anyone else's to the point where it's actually useful for a lot of things. Not just giving a recipe for marinara sauce in the style of Biggie Smalls or other party tricks, proof reading, summarizing, turning text into bullets, giving examples of things, coming up with practise exercises to illustrate a point, giving critiques of stuff etc etc. Lots of things that people actually do it does well enough to be helpful, whereas in my experience so far, other models are just not quite good enough to be helpful at a number of those tasks. So there's really no reason to use them over gpt4.

a year ago