Sora is here

1152 points

1/21/1970

17 days ago

by toomuchtodo

Comments

yeknoda

I've found using these and similar tools that the amount of prompts and iteration required to create my vision (image or video in my mind) is very large and often is not able to create what I had originally wanted. A way to test this is to take a piece of footage or an image which is the ground truth, and test how much prompting and editing it takes to get the same or similar ground truth starting from scratch. It is basically not possible with the current tech and finite amounts of time and iterations.

17 days ago

jerf

It just plain isn't possible if you mean a prompt the size of what most people have been using lately, in the couple hundred character range. By sheer information theory, the number of possible interpretations of "a zoom in on a happy dog catching a frisbee" means that you can not match a particular clip out of the set with just that much text. You will need vastly more content; information about the breed, information about the frisbee, information about the background, information about timing, information about framing, information about lighting, and so on and so forth. Right now the AIs can't do that, which is to say, even if you sit there and type a prompt containing all that information, it is going to be forced to ignore most of the result. Under the hood, with the way the text is turned into vector embeddings, it's fairly questionable whether you'd agree that it can even represent such a thing.

This isn't a matter of human-level AI or superhuman-level AI; it's just straight up impossible. If you want the information to match, it has to be provided. If it isn't there, an AI can fill in the gaps with "something" that will make the scene work, but expecting it to fill in the gaps the way you "want" even though you gave it no indication of what that is is expecting literal magic.

Long term, you'll never have a coherent movie produced by stringing together a series of textual snippets because, again, that's just impossible. Some sort of long-form "write me a horror movie staring a precocious 22-year old elf in a far-future Ganymede colony with a message about the importance of friendship" AI that generates a coherent movie of many scenes will have to be doing a lot of some sort of internal communication in an internal language to hold the result together between scenes, because what it takes to hold stuff coherent between scenes is an amount of English text not entirely dissimilar in size from the underlying representation itself. You might as well skip the English middleman and go straight to an embedding not constrained by a human language mapping.

17 days ago

LASR

What you are saying is totally correct.

And this applies to language / code outputs as well.

The number of times I’ve had engineers at my company type out 5 sentences and then expect a complete react webapp.

But what I’ve found in practice is using LLMs to generate the prompt with low-effort human input (eg: thumbs up/down, multiple-choice etc) is quite useful. It generates walls of text, but with metaprompting, that’s kind of the point. With this, I’ve definitely been able to get high ROI out of LLMs. I suspect the same would work for vision output.

17 days ago

kurthr

I'm not sure, but I think you're saying what I'm thinking.

Stick the video you want to replicate into -o1 and ask for a descriptive prompt to generate a video with the same style and content. Take that prompt and put it into Sora. Iterate with human and o1 generated critical responses.

I suspect you can get close pretty quickly, but I don't know the cost. I'm also suspicious that they might have put in "safeguards" to prevent some high profile/embarrassing rip-offs.

17 days ago

robotresearcher

> Long term, you'll never have a coherent movie produced by stringing together a series of textual snippets because, again, that's just impossible.

17 days ago

dumbfounder

Why does the idea need to be generated by AI? Let people generate the ideas, the AI will help execute. I think soon (3-5 years) a determined person with no video skills will be able to put together a compelling movie (maybe a short). And that is massive. AI doesn’t have to do everything. Like all tech, it’s a productivity tool.

17 days ago

krainboltgreene

> Why does the idea need to be generated by AI?

This is the at-first-fun-but-now-frustrating infinite goal move. "AI (a stand in for literally anything) will do (anything) soon." -> "It won't do (thing), it's too complex." -> "Who said AI will do (thing)?"

17 days ago

flappyeagle

AI will self-drive cars in San Francisco

16 days ago

Breza

I'm suspicious of most claims of AI growth, but I think screenwriting is an area where there's real potential. There are many screenplays out there, many movie plots are very similar to each other, and human raters could help with training. And it's worth noting that the top four highest grossing movies right now are all sequels or film adaptations. It's not a huge leap to imagine an LLM in the future that's been trained on movie writing being able to create a movie script when given the Wicked musical. https://www.imdb.com/chart/boxoffice/

13 days ago

runarberg

The 2023 Writers Guild of America strike was in part to prevent screenplays being written entirely by generative AI.

So no I don’t think this will happen either. Authors may use use AI them selves as one tool in their tool box as they write their script, but we will not see entire production screen plays being written by generative AI set for theatrical release. The industry will simply not allow that to happen. At most you can have AI write a screen play for your own amusement, not for publication.

13 days ago

sleepybrett

I'm thinking more of a Gibsonian 'Garage Kubrick'. A solitary auteur (or small team) that produces the film alone perhaps without even touching a camera, generating all the footage using AI (in the novel the auteur creates all the footage through photo/found-footage manipulation, or at least thats all we see in text). The script will probably be human written, I'm not talking about an AI producing a film from scratch, rather a film being produced using AI to create all the visuals and audio.

15 days ago

runarberg

That is a far more reasonable prediction but I don’t even see this future. This kind of “film making” will at best be something generated for the amusement of the creator (think, give me a specific episode of Star Trek where Picard ...) or as prototypes or concepts of yet to be filmed with actual actors. And it certainly won’t be in theaters, not in 5 years, or ever.

Generative AI will not be able to approach the artistry of your average actor (not even a bad actor), it won’t be able match the lighting or the score to the mood (unless you carefully craft that in your prompt). It won‘t get creative with the camera angles (again unless you specifically prompt for a specific angle) or the cuts. And it probably won’t stay consistent with any of these, or otherwise break the consistency at the right moments, like an artist could.

If you manage to prompt the generative AI to create a full feature film with excellent acting, the correct lighting given the mood, a consistent tone with editing to match, etc. you have probably spent much more time and money into crafting the prompt than would otherwise have gone into simply hiring the crew to create your movie. The AI movie will certainly contain slop and be visibly so bad it guaranteed will not be in theaters.

Now if you hired that crew to make the movie instead, that crew might use AI as a tool to enhance their artistry, but you still need your specialized artists to use that tool correctly. That movie might make it to the theaters.

15 days ago

sleepybrett

blair witch project looked like shit, 'the cinematography doesn't approach a true director of photography', the actors were shit... etc. Given the right script and concept it can be amazing and the imperfection of AI can become part of the aesthetic.

14 days ago

runarberg

It was still a creative stroke of genius. The shit acting along with the shit cinemotography was preceded by a brilliant marketing campaign where you expected this lack of skill by the film makers.

In music you also have plenty of artists that have no clue how to play their instruments, or progress their songs, but the music is nonetheless amazing.

Skill is not the only quality of art. A brilliant artist works with their limitation to produce work which is better than the sum of its part. It will take AI the luck of ten billion universes before it produces anything like that.

13 days ago

SamPatt

It's a tool. The cleverness and artistry comes from the humans, not from the tools they use.

The AI isn't creating the fresh ideas. People are.

17 days ago

runarberg

So what you are saying is some aspects of movie making will use AI as parts of their jobs. That is very realistic and probably already happening.

Saying that large video models will be in theaters sounds like a completely different and much more ambitious prediction. I interpreted it as if large video models will produce whole movies on their own from a script of prompts. That there will be a single film maker with only a large video model and some prompts to make the movie. Such films will never be in the theater, unless by some grifter, and than it is certain to be a flop.

16 days ago

troupo

You should watch how movies are made sometime. How a script is developed. How changes to it are made. How storyboards are created. How actors are screened for roles. How locations are scouted, booked, and changed. How the gazillion of different departments end up affecting how a movie looks, is produced, made, and in which direction it goes (the wardrobe alone, and its availability and deadlines will have a huge impact on the movie).

What does "EXT. NIGHT" mean in a script? Is it cloudy? Rainy? Well lit? What are camera locations? Is the scene important for the context of the movie? What are characters wearing? What are they looking at?

What do actors actually do? How do they actually behave?

Here are a few examples of script vs. screen.

Here's a well described script of Whiplash. Tell me the one hundred million things happening on screen that are not in the script: https://www.youtube.com/watch?v=kunUvYIJtHM

Or here's Joker interrogation from The Dark Night Rises. Same million different things, including actors (or the director) ignoring instructions in the script: https://www.youtube.com/watch?v=rqQdEh0hUsc

Here's A Few Good Men: https://www.youtube.com/watch?v=6hv7U7XhDdI&list=PLxtbRuSKCC...

and so on

---

Edit. Here's Annie Atkins on visual design in movies, including Grand Budapest Hotel: https://www.youtube.com/watch?v=SzGvEYSzHf4. And here's a small article summarizing some of it: https://www.itsnicethat.com/articles/annie-atkins-grand-buda...

Good luck finding any of these details in any of the scripts. See minute 14:16 where she goes through the script

Edit 2: do watch The Kerning chapter at 22:35 to see what it actually takes to create something :)

17 days ago

shermantanktop

I can't upvote this enough. This topic in the media space has generated a huge amount of naive speculation that amounts to "how hard could it be to do <thing i know nothing about>?"

17 days ago

FranzFerdiNaN

> "how hard could it be to do <thing i know nothing about>?"

This is most Hacker News comments summarized lmao. It's kinda my favorite thing of this place: just open any thread and you immediately see so many people rushing to say ''well just do X or Y'' or ''actually it's X or Y and not Z like the experts claim''. Love it.

17 days ago

shermantanktop

In this case, it’s movies and TV, which most people enjoy. So there’s a superficial accessibility to the problem which encourages this attitude.

Of course, HN being the place that it is, the same type of comments are made about quantum entanglement and solar panel efficiency.

16 days ago

bunabhucan

I agree with you.

At the same time I am curious in the "that person has too many fingers" sense at what a system trained on tens of thousands of movies plus scripts plus subtitles plus metadata etc. would generate.

I thought about it for a bit and I would want to watch a computer generated Sharknado 7 or Hallmark Christmas movie.

17 days ago

robotresearcher

Of course normally other people contribute to a movie after the writer. My comment mentioned three of the important roles. This whole thread is about tech that automates away those roles. That's the whole point.

17 days ago

dbspin

I think you've misunderstood the objection.

Lets pick something concrete. It's a medieval script, it opens with two knights fighting. OK so later in the script we learn their characters, historic counterparts etc. So your LLM can match nefarious villain to some kind of embedding, and doubtless has trained on countless images of a knight.

But the result is not naively going to understand the level of reality the script is going for - how closely to stick to historic parallels, how much to go fantastical with the depiction. The way we light and shoot the fight and how it coheres with the themes of the scene, the way we're supposed to understand the characters in the context of the scene and the overall story, the references the scene may be making to the genre or even specific other films etc.

This is just barely scraping the surface of the beginnings of thinking about mise en scene, blocking, framing etc. You can't skip these parts - and they're just as much of a challenge as temporal coherence, or performance generation or any of the other hard 'technical issues' that these models have shown no capacity to solve. They're decisions that have to be made to make a film coherent at all - not yet good or tasteful or creative or whatever.

Put another way - you'd need AGI to comprehend a script at the level of depth required to do the job of any HOD on any film. Such a thing is doubtless possible, but it's not going to be shortcut naively the way generation an image is - because it requires understanding in context, precisely what LLMs lack.

17 days ago

robotresearcher

> but the result is not naively going to understand the level of reality the script is going for…

We can already get detailed style guidance into picture generation. Declaring you want Picasso cubist, Warner brothers cartoon, or hyper realistic works today. So does lighting instructions, color palettes, on and on.

These future models will not be large language models, they will be multi-modal. Large movie models if you like. They will have tons of context about how scenes within movies cohere, just as LLMs do within documents today.

17 days ago

troupo

So, we went from "just hand off movie script to automated director/DP/editor" we're now rapidly approaching:

- you have to provide correct detailed instructions on lighting

- you have to provide correct detailed instructions on props

- you have to provide correct detailed instructions on clothing

- you have to provide correct detailed instructions on camera position and movement

- you have to provide correct detailed instructions on blocking

- you have to provide correct detailed instructions on editing

- you have to provide correct detailed instructions on music

- you have to provide correct detailed instructions on sound effects

- you have to provide correct detailed instructions on...

- ...

- repeat that for literally every single scene in the movie (up to 200 in extreme cases)

There's a reason I provided a few links for you to look at. I highly recommend the talk by Annie Atkins. Watch it, then open any movie script, and try to find any of the things she is talking about there (you can find actual movie scripts here: https://imsdb.com)

17 days ago

throwup238

There's two reasons to be hopeful about it though: AI/LLMs are very good at filling in all those little details so humans can cherry pick the parts that they like. I think that's where the real value is in for the masses - once these models can generate coherent scenes, people can start using them to explore the creative space and figure out what they like. Sort of like SegmentAnything and masking in inpainting but for the rest of the scene assembly. The other reason is that the models can probably be architected to figure out environmental/character/light/etc embeddings and use those to build up other coherent scenes, like we use language embeddings for semantic similarity.

That's how I've been using the image generators - lots of experimentation and throwing out the stuff that doesn't work. Then once I've got enough good generated images collected out of the tons of garbage, I fine tune a model and create a workflow that more consistently gives me those styles.

Now the models and UX to do this at a cinematic quality are probably 5-10 years away for video (and the studios are probably the only ones with the data to do it), but I'm relatively bullish on AI in cinema. I don't think AI will be doing everything end to end, but it might be a shortcut for people who can write a script and figure out the UX to execute the rest of the creative process by trial and error.

16 days ago

troupo

> AI/LLMs are very good at filling in all those little details so humans can cherry pick the parts that they like.

Where did you find AI/ML that are good at filling in actual required and consistent details.

I beg of you to watch Annie Atkins' presentation I linked: https://www.youtube.com/watch?v=SzGvEYSzHf4 and tell me how much intervention would AI/ML need to create all that, and be consistent throughout the movie?

> once these models can generate coherent scenes, people can start using them to explore the creative space and figure out what they like.

Define "coherent scene" and "explore". A scene must be both coherent and consistent, and conform to the overall style of the movie and...

Even such a simple thing as shot/reverse shot requires about a million various details and can be shot in a million different ways. Here's an exploration of just shot/reverse shot: https://www.youtube.com/watch?v=5UE3jz_O_EM

All those are coherent scenes, but the coherence comes from a million decisions: from lighting, camera position, lens choice, wardrobe, what surrounds the characters, what's happening in the background, makeup... There's no coherence without all these choices made beforehand.

Around 4:00 mark: "Think about how well you know this woman just from her clothes, and workspace". Now watch that scene. And then read its description in the script https://imsdb.com/scripts/No-Country-for-Old-Men.html:

--- start quote ---

    Chigurh enters. Old plywood paneling, gunmetal desk, litter
          of papers. A window air-conditioner works hard.
          A fifty-year-old woman with a cast-iron hairdo sits behind
          the desk.

--- end quote ---

And right after that there's a section on the rhythm of editing. Another piece in the puzzle of coherence in a scene.

> Then once I've got enough good generated images collected out of the tons of garbage, I fine tune a model and create a workflow that more consistently gives me those styles.

So, literally what I wrote here: https://news.ycombinator.com/item?id=42375280 :)

16 days ago

skydhash

That’s the same thing with digital art, even with the most effortless one (matte painting), there’s a plethora of decisions to make and techniques to use to have a coherent result. There’s a reason people go to school or trained themselves for years to get the needed expertise. If it was just data, someone would have written a guide that others would mindlessly follow.

16 days ago

robotresearcher

Not sure why you jumped there. I was thinking more like ‘make it look like Bladerunner if Kurosawa directed it, with a score like Zimmer.’

You’re really failing to let go of the idea that you need to prescribe every little thing. Like Midjourney today, you’ll be able to give general guidance.

Now, I don’t expect we’ll get the best movies this way. But paint by numbers stuff like many movies already are? A Hallmark Channel weepy? I bet we will.

16 days ago

troupo

> Not sure why you jumped there.

No jump.

Your original claim: "Submit a whole script the way a writer delivers a movie to a director. The (automated) director/DP/editor could maintain internal visual coherence, while the script drives the story coherence."

Two comments later it's this: "We can already get detailed style guidance into picture generation. Declaring you want Picasso cubist, Warner brothers cartoon, or hyper realistic works today. So does lighting instructions, color palettes, on and on."

I just re-wrote this with respect to movies.

> I was thinking more like ‘make it look like Bladerunner if Kurosawa directed it, with a score like Zimmer.’

Because, as we all know, every single movie by Kurosawa is the same, as is every single score by Hans Zimmer, so it's ridiculously easy to recreate any movie in that style, with that music.

> You’re really failing to let go of the idea that you need to prescribe every little thing. Like Midjourney today, you’ll be able to give general guidance.

Yes, and Midjounrey today really sucks at:

- being consistent

- creating proper consistent details

A general prompt will give you a general result that is usually very far from what you actually have in mind.

And yes, you will have to prescribe a lot of small things if you want your movie to be consistent. And for your movie to make any sense.

Again, tell me how exactly your amazing magical AI director will know which wardrobe to chose, which camera angles to setup, which typography to use, which sound effects to make just from the script you hand in?

you can start ,with a very simple scene I referenced in my original reply: two people talking at the table in Whiplash.

> But paint by numbers stuff like many movies already are? A Hallmark Channel weepy? I bet we will.

Even those movies have more details and more care than you can get out of AIs (now, or in foreseeable future)

16 days ago

robotresearcher

> Again, tell me how exactly your amazing magical AI director will know which wardrobe to chose, which camera angles to setup, which typography to use, which sound effects to make just from the script you hand in?

I think you're still assuming I always want to choose those things. That's why we're talking past each other. A good movie making model would choose for me unless I give explicit directions. Today we don't see long-range coherence in the results of movie (or game engine) models, but the range is increasing, and I'm willing to bet we will see movie-length coherence in the next decade or so.

By the way, I also bet that if I pasted exactly the No Country for Old Men script scene description from up this thread into Midjourney today it would produce at least some compelling images with decent choices of wardrobe, lighting, set dressing, camera angle, exposure, etc etc. That's what these models do, because they're extrapolating and interpolating between the billion images they've seen that contained these human choices.

AFAIK Midjourney produces single images, so the relevant scope of consistency is inside the single image only. Not between images. A movie model needs coherence across ~160,000 images, which is beyond the state of the art today but I don't see why it's impossible or unreasonable in the long run.

> A general prompt will give you a general result that is usually very far from what you actually have in mind.

Which is only a problem if I have something in mind. Alternatively I can give no guidance, or loose guidance, make half a dozen variations, pick the one I like best. Maybe iterate a couple of times into that variation tree. Just like the image generators do.

15 days ago

krainboltgreene

This is such an incredibly confident comment. I'm in awe.

17 days ago

player1234

Cool since you know, at what point in the process do you swap out all the white ppl? Thanks in advance!

17 days ago

letmevoteplease

Shane Carruth (Primer) released interesting scripts for "A Topiary" and "The Modern Ocean" which now have no hope of being filmed. I hope AI can bring them to life someday. If we get tools like ControlNet for video, maybe Carruth could even "direct" them himself.

17 days ago

spoaceman7777

This exists already actually. Kling AI 1.5. Saw the demo on twitter two days ago, which shows a photo-to-video transformation on an image of three women standing on a beach, and the video transformation simulates the camera rotating, with the women moving naturally. Just involves a segment-anything style selection of the women, and drawing a basic movement vector.

> but image generations from it can (loosely) adhere to all rules and nuances in a multi-paragraph prompt

Flux certainly does not consistently do so across an arbitrary collection of multi-paragraph prompts, as anyone whose run more than a few long prompts past it would recongize; also, the tweet is wrong in the other direction, as well, longer language-model-preprocessed prompts for models that use CLIP (like various SD1.5 and SDXL derivatives) are, in fact, a common and useful technique. (You’d kind of think that the fact that generated prompt here is significantly longer than the 256 token window of T5 would be a clue that the 77 token limit of CLIP might not be as big of a constraint as the tweet was selling it as, too.)

17 days ago

lmm

> You might as well skip the English middleman and go straight to an embedding not constrained by a human language mapping.

How would you ever tweak or debug it in that case? It doesn't strictly have to be English, but some kind of human-readable representation of the intermediate stages will be vital.

17 days ago

amelius

Can't you just give it a photo of a dog, and then say "use this dog in this or that scene"?

17 days ago

artemisart

Yes, the idea works and was explored with dreambooth/textual inversion for image diffusion models.

https://dreambooth.github.io/ https://textual-inversion.github.io/

17 days ago

minimaxir

Both of those are of course out of date and require significant training instead of just feeding it a single image.

InstantID (https://replicate.com/zsxkib/instant-id) fixes that issue.

17 days ago

AuryGlenz

Dreambooth style training is in no way out of date.

If you just want a face, InstandID/Pulid work - but it’s not going to be very varied. Doing actual training means you can get any perspective, lighting, style, expression, etc - and have the whole body be accurate.

16 days ago

alpha_squared

How would that even work? A dog has physical features (legs, nose, eyes, ears, etc.) that they use to interact with the world around them (ground, tree, grass, sounds, etc.). And each one of those things has physical structures that compose senses (nervous system, optic nerves, etc.). There are layers upon layers of intricate complexity that took eons to develop and a single photo cannot encapsulate that level of complexity and density of information. Even a 3D scan can't capture that level of information. There is an implicit understanding of the physical world that helps us make sense of images. For example, a dog with all four paws standing on grass is within the bounds of possibility; a dog with six paws, two of which are on it's head, are outside the bounds of possibility. An image generator doesn't understand that obvious delineation and just approximates likelihood.

17 days ago

int_19h

A single photo doesn't have to capture all that complexity. It's carried by all those countless dog photos and videos in the training set of the model.

17 days ago

krainboltgreene

Actually, it does have to capture all of that complexity because it's a photon-based analysis of reality. You cannot take a photo without doing that.

17 days ago

fennecbutt

This is correct and even image generation models aren't really trained for comprehension of image composition yet.

Even the models based off danbooru and E621 still aren't the best at that. And us furries like to tag art in detail.

The best we can really do at the moment is regional prompting, perhaps they need something similar for video.

9 days ago

echelon

For those not in this space, Sora is essentially dead on arrival.

Sora performs worse than closed source Kling and Hailuo, but more importantly, it's already trumped by open source too.

Tencent is releasing a fully open source Hunyuan model [1] that is better than all of the SOTA closed source models. Lightricks has their open source LTX model and Genmo is pushing Mochi as open source. Black Forest Labs is working on video too.

Sora will fall into the same pit that Dall-E did. SaaS doesn't work for artists, and open source always trumps closed source models.

Artists want to fine tune their models, add them to ComfyUI workflows, and use ControlNets to precision control the outputs.

Images are now almost 100% Flux and Stable Diffusion, and video will soon be 100% Hunyuan and LTX.

Sora doesn't have much market apart from name recognition at this point. It's just another inflexible closed source model like Runway or Pika. Open source has caught up with state of the art and is pushing past it.

[1] https://github.com/Tencent/HunyuanVideo

17 days ago

circlefavshape

Their online version is all in Chinese (or at least some Chinese-looking script I don't understand) ... and they recommend an 80GB GPU to run the thing, which costs ~€15-18k. Yikes, guess I won't be doing this at home anytime soon

17 days ago

baserev

[flagged]

17 days ago

And I think this realistically is going to be the shape of the tools to come in the foreseeable future.

17 days ago

echelon

You should see what people are building with Open Source video models like HunYuan [1] and ComfyUI + Control Nets. It blows Sora out of the water.

Check out the Banodoco Discord community [2]. These are the people pioneering steerable AI video, and it's all being built on top of open source.

[1] https://github.com/Tencent/HunyuanVideo

[2] https://banodoco.ai/

17 days ago

prmoustache

The whole point of AI stuff is not to produce exactly what you have in mind, but what you are describing. Same with text, code, images, video...

17 days ago

szundi

Sounds like we achieved 50% of AI then. The artifical is there, now we need the intelligence part.

17 days ago

baq

Sora should be evaluated on xkcd strips as inputs.

17 days ago

miltonlost

The adage "a picture is worth a thousand words" has the nice corollary "A thousand words isn't enough to be precise about an image".

Now expand that to movies and games and you can get why this whole generative-AI bubble is going to pop.

17 days ago

TeMPOraL

> Now expand that to movies and games and you can get why this whole generative-AI bubble is going to pop.

What will save it is that, no matter how picky you are as a creator, your audience will never know what exactly was that you dreamed up, so any half-decent approximation will work.

In other words, a corollary to your corollary is, "Fortunately, you don't need them to be, because no one cares about low-order bits".

Or, as we say in Poland, "What the eye doesn't see, the heart doesn't mourn."

17 days ago

jsheard

> What will save it is that, no matter how picky you are as a creator, your audience will never know what exactly was that you dreamed up, so any half-decent approximation will work.

Part of the problem is the "half decent approximations" tend towards a clichéd average, the audience won't know that the cool cyberpunk cityscape you generated isn't exactly what you had in mind, but they will know that it looks like every other AI generated cyberpunk cityscape and mentally file your creation in the slop folder.

I think the pursuit of fidelity has made the models less creative over time, they make fewer glaring mistakes like giving people six fingers but their output is ever more homogenized and interchangable.

17 days ago

samatman

Empirically, we've passed the point where that's true, for someone not being lazy about it.

https://www.astralcodexten.com/p/how-did-you-do-on-the-ai-ar...

In other words, someone willing to tweak the prompt and press the button enough times to say "yeah, that one, that's really good" is going to have a result which cannot in fact be reliably binned as AI-generated.

17 days ago

lmm

I mean, no? None of the AI-generated images managed to be indistinguishable. Some people were much better than others at spotting the differences. He even quotes, at length, an artist giving a detailed breakdown of what's wrong with one of the images he thought was good.

17 days ago

TeMPOraL

Did you read the article? Respondents performed barely better than chance. Sure, no one was actually 100% wrong[0]. Just almost always wrong, with a noticeable bias towards liking AI art more.

The detailed breakdown you mention? Maybe it's accurate to that artist's thought process, maybe it's more of a rationalization; either way, it's not a general rule they, or anyone, could apply to any of the other AI images. Most of those in the article don't exhibit those "telltale signs", and the one that does - the Victorian Megaship - was actually made by human artist with no AI in the mix.

EDIT:

Another image that stands out to me is Riverside Cafe. Myself, like apparently a lot of other people, going by articles' comments, assumed it's a human-made one, because we vaguely remembered Vang Gogh painted something like it. He did, it's called Café Terrace at Night - and yet, despite immediately evoking the association, Riverside Cafe was made by AI, and is actually nothing like Café Terrace at Night at any level.

(I find it fascinating how this work looks like a copy of Van Gogh at first glance, for no obvious reason, but nothing alike once you pause to look closer. It's like... they have similar low-frequency spectra or something?)

EDIT2:

Played around with the two images in https://ejectamenta.com/imaging-experiments/fourifier/. There are some similarities in the spectra, I can't put my finger on them exactly. But it's probably not the whole answer. I'll try to do some more detailed experimentation later.

[0] - Nor should you expect it - it would mean either a perfect calibration, or be the equivalent of flipping a coin and getting heads 30 times in a row; it's not impossible, but you shouldn't expect to see it unless you're interviewing fewer people than literally the entire population of the planet.

17 days ago

lmm

Yes, I read the article. Did you?

> The average participant scored 60%, but people who hated AI art scored 64%, professional artists scored 66%, and people who were both professional artists and hated AI art scored 68%.

> The highest score was 98% (49/50), which 5 out of 11,000 people achieved. Even with 11,000 people, getting scores this high by luck alone is near-impossible.

17 days ago

samatman

This accurately boils down to "cannot reliably be binned as AI-generated". Your objection amounts to a vanishing few people who are informed that this is a test being able to do a pretty good job at it.

If 0.0005% of people who are specifically judging art as AI or not AI, in a test which presumably attracts people who would like to be able to do that thing, can do a 98% accurate job, and the average is around 60%: that isn't reliable.

If that doesn't work for you, I encourage you to take the test. Obviously since you've read the article there are some spoilers, but there's still plenty of chances to get it right or wrong. I think you'll discover that you, too, cannot do this reliably. Let us know what happens.

16 days ago

lmm

I can't do it reliably and I don't want to - I learnt to spot certain popular video compression artifacts in my youth, and that has not enhanced my life. But any distinction that random people taking a casual internet survey get right 60% of the time is absolutely one that you can make reliably if you put in the effort. Look at something like chicken sexing.

16 days ago

randomcatuser

a somewhat counterintuitive argument is this: AI models will make the overall creative landscape more diverse and interesting, ie, less "average"!

Imagine the space of ideas as a circle, with stuff in the middle being more easy to reach (the "cliched average"). Previously, traversing the circle was incredibly hard - we had to use tools like DeviantArt, Instragram, etc to agglomerate the diverse tastes of artists, hoping to find or create the style we're looking for. Creating the same art style is hiring the artist. As a result, on average, what you see is the result of huge amounts of human curation, effort, and branding teams.

Now reduce the effort 1000x, and all of a sudden, it's incredibly easy to reach the edge of the circle (or closer to it). Sure, we might still miss some things at the very outer edge, but it's equivalent to building roads. Motorists appear, people with no time to sit down and spend 10000 hours to learn and master a particular style can simply remix art and create things wildly beyond their manual capabilities. As a result, the amount of content in the infosphere skyrockets, the tastemaking velocity accelerates, and you end up with a more interesting infosphere than you're used to.

17 days ago

TeMPOraL

To extend the analogy, imagine the circle as a probability distribution; for simplicity, imagine it's a bivariate normal joint distribution (aka. Gaussian in 3D) + some noise, and you're above it and looking down.

When you're commissioning an artist to make you some art, you're basically sampling from the entire distribution. Stuff in the middle is, as you say, easiest to reach, so that's what you'll most likely get. Generative models let more people do art, meaning there's more sampling happening, so the stuff further from the centre will be visited more often, too.

However, AI tools also make another thing easier: moving and narrowing the sampling area. Much like with a very good human artist, you can find some work that's "out there", and ask for variations of it. However, there are only so many good artists to go around. AI making this process much easier and more accessible means more exploration of the circle's edges will happen. Not just "more like this weird thing", but also combinations of 2, 3, 4, N distinct weird things. So in a way, I feel that AI tools will surface creative art disproportionally more than it'll boost the common case.

Well, except for the fly in the ointment that's the advertising industry (aka. the cancer on modern society). Unfortunately, by far most of the creative output of humanity today is done for advertising purposes, and that goal favors the common, as it maximizes the audience (and is least off-putting). Deluge of AI slop is unavoidable, because slop is how the digital world makes money, and generative AI models make it cheaper than generative protein models that did it so far. Don't blame AI research for that, blame advertising.

17 days ago

zmgsabst

A small technical point:

Tastes are almost never normally distributed along a spectrum, but multi-modal. So the more dimensions you explore in, the more you end up with “islands of taste” on the surface of a hyper sphere and nothing like the normal distribution at all. This phenomenon is deeply tied to why “design by committee” (eg, in movies) always makes financial estimates happy but flops with audiences — there is almost no customer for average anything.

I agree with your conclusion.

17 days ago

circlefavshape

"Design by committee" is also how most hit movies are made. Hit songs too

17 days ago

zmgsabst

Do you have an example?

My experience with customer surveys indicates the opposite — that customers prefer you have an opinion.

16 days ago

circlefavshape

An example of a hit movie or song that was created by committee?

Inside Out 2 had the largest box office of any movie in 2024. Checkout the "research and writing" section in its wikipedia article https://en.wikipedia.org/wiki/Inside_Out_2#Research_and_writ... ... psychological consultants, a feedback loop with a group of teenagers, test screenings.

Or how about "Die with a smile" - currently number 1 in the global top 50 on Spotify. 5 songwriters

Or "APT." - currently number 2 in the global top 50 on Spotify. 11 songwriters

You don't have to look very hard

15 days ago

zmgsabst

Inside Out 2 has a single writer, who also worked on the first.

Consulting with SMEs, testing with audiences, etc isn’t “design by committee”.

Similarly, “Die With a Smile” seems to have been the work of two people with developed styles with support — again, not a committee:

> The collaboration was a result of Mars inviting Gaga to his studio where he had been working on new music. He presented the track in progress to her and the duo finished writing and recording the song the same day.

Apt seems to have started with a single person goofing around, then pitched as a collaboration and the expanded team entered at that point.

14 days ago

etiam

I like the picture, but I'd be more impressed with the exploration argument if we were collectively actually doing a good job giving recognition to original and substantial works that already exist. It'd be of greater service in that regard to create a high-quality artificial stand-in for that limited-quantity "attention" and "engagement" all the bloodsuckers seem so keen on harvesting.

(And I do blame the advertisers, but frankly anyone handing them new amplifiers, with entirely predictable consequences, is also not blameless.)

17 days ago

js8

I read this argument/analogy and the "AI slop will win" idea reminds me of the idea that "fake news will win".

That is based on perception that it is easier than ever to create fake content, but fails to account for the fact that creating real content (for example, simply taking a video) is even much easier. So while there is more fake content, there is also lot more real content, and so manipulation of reality (for example, denying a genocide) is much harder today than ever.

Anyway, "the AI slop will win" is based on a similar misconception, that total creative output will not increase. But like with fake news, it probably will not be the case, and so the actual amount of good art will increase, too.

I think we are OK as long as normal humans prefer to create real news rather than fake news, and create innovative art rather than cliched art.

17 days ago

TeMPOraL

> I think we are OK as long as normal humans prefer to create real news rather than fake news, and create innovative art rather than cliched art.

So we're not OK.

I think I need to state my assumptions/beliefs here more explicitly.

First of all, "AI slop" is just the newest iteration on human-produced slop, which we're already drowning in. Not because people prefer to create slop, but because they're paid to do it, because most content is created by marketers and advertisers to sell you shit, and they don't want it to be better than strictly necessary for purpose.

It's the same with fake news, really. Fake news isn't new. Almost all news is fake news; what we call "fake news" is a particular flavor of bullshit that got popular as it got easier for random humans to publish stories competing with established media operations.

In both cases, AI is exacerbating the problem, but it did not create it - we were already drowning in slop.

Which leads me to related point:

> Anyway, "the AI slop will win" is based on a similar misconception, that total creative output will not increase.

It will. But don't forget Sturgeon's law - "ninety percent of everything is crap"[0]. Again, for the past couple decades, we've been drowning in "creative output". It's not a new problem, it's just increasingly noticeable in the past years, because the Web makes it very easy for everyone to create more "creative output" (most of which is, again, advertising), and it finally started overwhelming our ability to filter out the crap and curate the gems.

Adding AI to the mix means more output, which per Sturgeon's law, means disproportionately more crap. That's not AI's fault, that's ours; it's still the same problem we had before.

TeMPOraL

> I think the pursuit of fidelity has made the models less creative over time (...) their output is ever more homogenized and interchangable.

Ironically, we're long past that point with human creators, at least when it comes to movies and games.

Take sci-fi movies, compare modern ones to the ones from the tail end of the 20th century. Year by year, VFX gets more and more detailed (and expensive) - more and better lights, finer details on every material, more stuff moving and emitting lights, etc. But all that effort arguably killed immersion and believability, by making scenes incomprehensible. There's way too much visual noise in action scenes in particular - bullets and lighting bolts zip around, and all that detail just blurs together. Contrast the 20th century productions - textures weren't as refined, but you could at least tell who's shooting who and when.

Or take video games, where all that graphics works makes everything look the same. Especially games that go for realistic style, they're all homogenous these days, and it's all cheap plastic.

(Seriously, what the fuck went wrong here? All that talk, and research, and work into "physically based rendering", yet in the end, all PBR materials end up looking like painted plastic. Raytracing seems to help a bit when it comes to liquids, but it still can't seem to make metals look like metals and not Fischer-Price toys repainted to gray.)

So I guess in this way, more precision just makes the audience give up entirely.

> they will know that it looks like every other AI generated cyberpunk cityscape and mentally file your creation in the slop folder.

The answer here is the same as with human-produced slop: don't. People are good at spotting patterns, so keep adding those low-order bits until it's no longer obvious you're doing the same thing everyone else is.

EDIT: Also, obligatory reminder that generative models don't give you average of training data with some noise mixed up; they sample from learned distribution. Law of large numbers apply, but it just means that to get more creative output, you need to bias the sampling.

17 days ago

wongarsu

Video games (the much larger industry of the two, by revenue) seems to be closer to understanding this. AAA games dominate advertising and news cycles, but on any best-seller list AAA games are on par with indie and B games (I think they call them AA now?). For every successful $60M PBR-rendered Unreal 5 title there is an equally successful game with low-fidelity graphics but exceptional art direction, story or gameplay.

Western movie studios may discover the same thing soon, with the number of high-budget productions tanking lately.

17 days ago

robertlagrant

I agree. The one shining hope I have is the incredible art and animation style of Fortiche[0]'s Arcane[1] series. Watch that, and then watch any recent (and identikit) Pixar movie, and they are just streets ahead. It's just brilliant.

[0] https://en.wikipedia.org/wiki/Fortiche

[1] https://en.wikipedia.org/wiki/Arcane_(TV_series)

AI is neither.

A director letting actors "just be" knows exactly what he/she wants, and choses actors accordingly. Just as the directors that want the most minute detail.

Clint Eastwood tries to do at most one take of a scene. David Fincher is infamous for his dozens of takes.

AI is neither Fincher nor Eastwood.

17 days ago

wcfrobert

Do artist really have a fully formed vision in their head? I suspect the creative process is much more iterative rather than one-directional.

17 days ago

skydhash

No one can have a fully formed vision. But intent, yes. Then you use techniques to materialize it. Word is a poor substitute for that intent, which is why there’s so many sketches in a visual project.

> Not all details matter, some do

This is a key observation, unfortunately generally solving for what details matter is extremely difficult.

I don’t think video generation models help with that problem, since you have even less control of details than you do with film.

At least before post.

17 days ago

og_kalu

The visuals are the absolute bottom of why DC movies have performed worse over the years.

The movies have just had much worse audience and critical reception.

17 days ago

throwup238

“A frame is worth a billion rays”

The last production I worked on averaged 16 hours per frame for the final rendering. The amount of information encoded in lighting, models, texture, maps, etc is insane.

17 days ago

I'm not sure how well suited GPUs are to the workload. They're also rather memory constrained. The Moana dataset is from 2016 so it's not exactly cutting edge but good luck loading it into vram.

https://www.disneyanimation.com/data-sets/?drawer=/resources...

https://datasets.disneyanimation.com/moanaislandscene/island...

> When everything is fully instantiated the scene contains more than 15 billion primitives.

17 days ago

https://sciencebehindpixar.org/pipeline/rendering

17 days ago

raincole

The point is not to be precise. It's to be "good enough".

Trust me, even if you work with human artists, you'll keep saying "it's not quite I initially invisioned, but we don't have budget/time for another revision, so it's good enough for now." all the time.

17 days ago

beambot

Corollary: I couldn't create an original visual piece of art to save my life, so prompting is infinitely better than what I could do myself (or am willing to invest time in building skills). The gen-AI bubble isn't going to burst. Pareto always wins.

Maybe your AI bubble! If you define AI to be something like just another programming language yes you will be sadly disappointed. You see it as an employee with its own intuitions and ways of doing things that you're trying to micromanage.

I have a bad feeling that you'd be a horrible manager if you ever were one.

17 days ago

GistNoesis

(2020) https://arxiv.org/abs/2010.11929 : an image is worth 16x16 words transformers for image recognition at scale

(2021) https://arxiv.org/abs/2103.13915 : An Image is Worth 16x16 Words, What is a Video Worth?

After encoding the models are usually cascaded either with a LLM or a diffusion model.

Natural Image-> Sequence of token, but not all possible sequence of token will be reachable. Like plenty of letters put together form non-sensical words.

Sequence of token -> Natural Image : if the initial sequence of token is unsensical the Natural image will be garbage.

So usually you then modelize the sequence of token so that it produce sensical sequences of token, like you would with a LLM, and you use the LLM to generate more tokens. It also gives you a natural interface to control the generation of token. You can express with words what modifications to the image you should do. Which will allow you to find the golden sequence of token which correspond to the mona-lisa by dialoguing with the LLM, which has been trained to translate from english to visual-word sequence.

Alternatively instead of a LLM you can use a diffusion model, the visual words usually are continuous, but you can displace them iteratively with text using things like "controlnet" (stable diffusion).

17 days ago

stale2002

You are half right. Its funny because I use the same same. Mine is "A picture is worth a thousand words. thats why it takes 1000 words to describe the exact image that you want! Much better to just use Image to Image instead".

Thats my full quote on this topic. And I think it stands. Sure, people won't describe a picture. instead, they will take an existing picture or video, and do modifications of it, using AI. That is much much simpler and more useful, if you can file a scene, and then animate it later with AI.

17 days ago

ben_w

> Now expand that to movies and games and you can get why this whole generative-AI bubble is going to pop.

The prior sentence does not imply the conclusion.

17 days ago

meta_x_ai

A picture is worth a thousand words.

A word is worth a thousand pictures. (E.g Love)

It is abstraction all the way

17 days ago

gloosx

it is all Information to be precise.

17 days ago

8n4vidtmkvmk

Actually, I've gotten some great results with image2text2image with less than a thousand words. Maybe not enough for a video, but for some not too crazy images, it is enough!

17 days ago

fooker

Sure it's going to pop. But when is the important question.

Being too early about this and being wrong are the same.

17 days ago

szundi

Comment was probably rather about the 360 degree turning heads etc.

17 days ago

mrandish

I agree that people who want any meaningful precision in their visual results will inevitably be disappointed.

17 days ago

isoprophlex

And another thing that irks me: none of these video generators get motion right...

Especially anything involving fluid/smoke dynamics, or fast dynamic momements of humans and animals all suffer from the same weird motion artifacts. I can't describe it other than that the fluidity of the movements are completely off.

And as all genai video tools I've used are suffering from the same problem, I wonder if this is somehow inherent to the approach & somehow unsolvable with the current model architectures.

17 days ago

giantrobot

I think one of the biggest problems is the models are trained on 2D sequences and don't have any understanding of what they're actually seeing. They see some structure of pixels shift in a frame and learn that some 2D structures should shift in a frame over time. They don't actually understand the images are 2D capture of an event that occurred in four dimensions and the thing that's been imaged is under the influence of unimaged forces.

I saw a Santa dancing video today and the suspension of disbelief was almost instantly dispelled when the cuffs of his jacket moved erratically. The GenAI was trying to get them to sway with arm movements but because it didn't understand why they would sway it just generated a statistical approximation of swaying.

GenAI also definitely doesn't understand 3D structures easily demonstrated by completely incorrect morphological features. Even my dogs understand gravity, if I drop an object they're tracking (food) they know it should hit the ground. They also understand 3D space, if they stand on their back legs they can see over things or get a better perspective.

I've yet to see any GenAI that demonstrates even my dogs' level of understanding the physical world. This leaves their output in the uncanny valley.

17 days ago

jeroen

They don't even get basic details right. The ship in the 8th video changes with every camera change and birds appear out of nowhere.

17 days ago

When training an NN, you don’t have great control over what parts of the model does what or how.

Now instead of trying to discredit me, would you mind answering my question? Especially since, as you say, the theory is so simple.

Yeah I mean I would never pay you for anything.

You’ve convinced me that you’re small and know very little about the subject matter.

beefnugs

AI isn't trying to sell to you: a precise artist with real vision in your brain. It is selling to managers who want to shit out something in an evening that approximates anything, that writes ads that no one wants to see anyway, that produces surface level examples of how you can pay employees less because "their job is so easy"

17 days ago

spuz

Yes and the thing is, even for those tasks, it's incredibly difficult to achieve even the low bar that a typical advertising manager expects. Try it yourself for any real world task and you will see.

17 days ago

cornel_io

Counterpoint: our CEO spent 25 minutes shitting out a bunch of AI ads because he was frustrated with the pace of our advertising creative team. They hated the ads that he created, for the reasons you mention, but we tested them anyways and the best performing ones beat all of our "expert" team's best ads by a healthy margin (on all the metrics we care about, from CTR to IPM and downstream stuff like retention and RoAS).

Maybe we're in a honeymoon period where your average user hasn't gotten annoyed by all the slop out there and they will soon, but at least for now, there is real value here. Yes, out of 20 ads maybe only 2 outperform the manually created ones, but if I can create those 20 with a couple hundred bucks in GenAI credits and maybe an hour or two of video editing that process wipes the floor with the competition, which is several thousand dollars per ad, most of which are terrible and end up thrown away, too. With the way the platforms function now, ad creative is quickly becoming a volume-driven "throw it at the wall and see what sticks" game, and AI is great for that.

17 days ago

sarchertech

> Maybe we're in a honeymoon period where your average user hasn't gotten annoyed by all the slop out there and they will soon

It’s this. A video ad with a person morphing into a bird that takes off like a rocket with fire coming out of its ass, sure it might perform well because we aren’t saturated with that yet.

You’d probably get a similar result by giving a camera to a 5 year old.

But you also have to ask what that’s doing long term to your brand.

17 days ago

gonzobonzo

> Counterpoint: our CEO spent 25 minutes shitting out a bunch of AI ads because he was frustrated with the pace of our advertising creative team. They hated the ads that he created, for the reasons you mention, but we tested them anyways and the best performing ones beat all of our "expert" team's best ads by a healthy margin (on all the metrics we care about, from CTR to IPM and downstream stuff like retention and RoAS).

My guess is that the criticism of AI not being that good is correct, but many people don't realize that most humans also aren't that good, and that it's quite possible that the AI performs better than mediocre humans.

This shouldn't be much of a surprise, we've seen automation replace low skilled labor in a lot of industries. People seem uncomfortable with the possibility that there's actually a lot of low skilled labor in the creative industry that could also be easily replaced.

17 days ago

mewpmewp2

A/B/C/D testing is the perfect grounds for that. You can keep automatically generating and iterating quickly while A/B tests are constantly being ran. This data on CTR can later be used to train the model better as well.

17 days ago

soheil

You seem to speak from experience of being that manager... I'm not going to ask what you shit out in your evenings.

17 days ago

minimaxir

Way back in the days of GPT-2, there was an expectation that you'd need to cherry-pick atleast 10% of your output to get something usable/coherent. GPT-3 and ChatGPT greatly reduced the need to cherry-pick, for better or for worse.

A human artist keeps state :). They keep it between drawing sessions, and more importantly, they keep very detailed state - their imagination or interpretation of what the thing (house, grizzled detective, etc.) is.

Most models people currently use don't keep state between invocations, and whatever interpretation they make from provided context (e.g. reference image, previous frame) is surface level and doesn't translate well to output. This is akin to giving each panel in a comic to a different artist, and also telling them to sketch it out by their gut, without any deep analysis of prior work. It's a big limitation, alright, but researchers and practitioners are actively working to overcome it.

(Same applies to LLMs, too.)

17 days ago

Der_Einzige

Btw there’s a way to match characters in a batch in the forge webUI which guarantees that all images in the batch have the same figure in it. Trivial to implement this in all other image generators. This critique is baseless.

16 days ago

staticman2

So prove it. If you are in good faith arguing an AI, via automation can draw a comic script with consistent figures, please tell an AI to draw the images in the first 3 pages of this script I pulled from the comic book script archive:

https://www.comicsexperience.com/wp-content/uploads/2018/09/...

Or if you can't do this, explain why the feature you mentioned cannot do this, and what it or good for?

16 days ago

TeMPOraL

As long as you're not asking for a zero-shot solution with a single model run three times in a row, this should be entirely doable, though I imagine ensuring the result would require a complex pipeline consisting of:

- An LLM to inflate descriptions in the script to very detailed prompts (equivalent to artist thinking up how characters will look, how the scene is organized);

- A step to generate a representative drawing of every character via txt2img - or more likely, multiple ones, with a multimodal LLM rating adherence to the prompt;

- A step to generate a lot of variations of every character in different poses, using e.g. ControlNet or whatever is currently the SOTA solution used by the Stable Diffuison community to create consistent variations of a character;

- A step to bake all those character variations into a LoRA;

- Finally, scenes would be generated by another call to txt2img, with prompts computed in step 1, and appropriate LoRAs active (this can be handled through prompt too).

Then iterate on that, e.g. maybe additional img2img to force comic book style (with a different SD derivative, most likely), etc.

Point being, every subproblem of the task has many different solutions already developed, with new ones appearing every month - all that's left to have an "AI artist" capable of solving your challenge is to wire the building blocks up. For that, you need just a trivial bit of Python code using existing libraries (e.g. hooking up to ComfyUI), and guess what, GPT-4 and Claude 3.5 Sonnet are quite good at Python.

EDIT: I asked Claude to generate "pseudocode" diagram of the solution from our two comments:

http://www.plantuml.com/plantuml/img/dLLDQnin4BthLmpn9JaafOR...

Each of the nodes here would be like 3-5 real ComfyUI nodes in practice.

15 days ago

staticman2

I appreciate the detailed response. I had a feeling the answer was some variation of "well I could get an AI to draw that but I'd have to hack at it for a few hours...". If a human has to work at it for hours, it's more like using Blender than "having an AI draw it" in my mind.

I suspect if someone went to the trouble to implement your above solution they'd find the end result isn't as good as they'd hoped. In practice you'd probably find one or more steps don't work correctly- for example, maybe today's multimodal LLM's can't evaluate prompt adherence acceptably. If the technology was ready the evidence would be pretty clear- I'd expect to see some very good, very quickly made comic books shown off by AI enthusiast on reddit rather then the clearly limited/ not very good comic book experiments which have been demonstrated so far.

15 days ago

TeMPOraL

> If a human has to work at it for hours, it's more like using Blender than "having an AI draw it" in my mind.

A human has to work at it too; more than few hours when doing more than few quick sketches (memory has its limits; there's a reason artists keep reference drawings around), and obviously they already put years into learning their skills than before, but fair - the human artist already knows how to do things that any given model doesn't yet[0], we kind of have to assemble the overall flow ourselves for now[1].

Then again, you only need to assemble it once, putting those hours of work up front - and if it's done, and it works, it becomes fair to say that AI can, in fact, generate self-consistent comic books.

> I suspect if someone went to the trouble to implement your above solution they'd find the end result isn't as good as they'd hoped. In practice you'd probably find one or more steps don't work correctly- for example, maybe today's multimodal LLM's can't evaluate prompt adherence acceptably.

I agree. I obviously didn't try this myself either (yet, I'm very tempted to try it, to satisfy my own curiosity). However, between my own experience with LLMs and Stable Diffusion, and occasionally browsing Stable Diffusion subreddits, I'm convinced all individual steps work well (and have multiple working alternatives), except for the one you flagged, i.e. evaluating prompt adherence using multimodal LLM - that last one I only feel should work, but I don't know for sure. However, see [1] for alternative approach :).

My point thus is, all individual steps are possible, and wiring them together seems pretty straightforward, therefore the whole thing should work if someone bothers to do it.

> If the technology was ready the evidence would be pretty clear- I'd expect to see some very good, very quickly made comic books shown off by AI enthusiast on reddit rather then the clearly limited/ not very good comic book experiments which have been demonstrated so far.

I think the biggest concentration of enthusiasm is to be found in NSWF uses of SD :). On the one hand, you're right; we probably should've seen it done already. On the other hand, my impression is that most people doing advanced SD magic are perfectly satisfied with partially manual workflows. And it kind of makes sense - manual steps allow for flexibility and experimentation, and some things are much simpler to wire by hand or patch up with some tactical photoshopping, than to try and automate them fully. In particular, things judging the quality of output is both easy for humans and hard to automate.

Still, I've recently seen ads of various AI apps claiming to do complex work (such as animating characters in photos) end-to-end automatically - exactly the kind of work that's typically done in partially manual process. So I suspect fully-automated solutions are being built on a case-by-case basis, driven by businesses making apps for the general population; a process that lags some months behind what image gen communities figure out in the open.

[0] - Though arguably, LLMs contain the procedural knowledge of how a task should be done; just ask it to ELI5 or explain in WikiHow style.

[1] - In fact, I just asked Claude to solve this problem in detail, without giving it my own solution to look at (but hinting at the required complexity level); see this: https://cloud.typingmind.com/share/db36fc29-6229-4127-8336-b... (and excuse the weird errors; Claude is overloaded at the moment, so some responses had to be regenerated; also styling on the shared conversation sucks, so be sure to use the "pop out" button on diagrams to see them in detail).

At very high level, it's the same as mine, but one level below, it uses different tools and approaches, some of which I never knew about - like keeping memory in embedding space instead of text space, and using various other models I didn't know exist.

EDIT: I did some quick web search for some of the ideas Claude proposed, and discovered even more techniques and models I never heard of. Even my own awareness of the image generation space is only scratching the surface of what people are doing.

15 days ago

kevingadd

I work with professional artists all the time and this is not the case. They're generally quite good at extrapolating from a couple paragraphs into something fantastic, often exactly what I had in mind.

In comparison I've messed around with prompting image generator models quite a bit and it's not possible to get remotely close to the quality level of even rough paid concept work by a professional, and the credits to run these models aren't particularly cheap.

17 days ago

coffeebeqn

With real art you can start from somewhere and keep building on that foundation. Say you pick an angle to shoot from and test different actors and scenes from that angle. With AI you’re re-rolling the dice for every iteration. If you’re happy that it looks 80% correct then sure it’s maybe passable.

I think people are getting way ahead of their skis here. Even in 2D I can’t for example generate inventory images for weapons and items for a game yet. Which is an orders of magnitude simpler test case than video. They all are slightly different styles. If I don’t care that they all look different in strange ways then it’s useful - but any consumer will think it looks like crap

17 days ago

Yup, 100% agreed on that, and mentioned this caveat elsewhere. As you say - people don't pay attention to details (or lack of it), as long as the details are consistent. Inconsistencies stand out like sore thumbs. Which is why IMO it's best to have less details than to be inconsistent with them.

17 days ago

pbhjpbhj

>There is no problem unless you insist on reflecting what you had in mind exactly.

Not disagreeing, just noting: this is not how [most?] people's minds work {I don't think you're holding to that opinion particularly, I'm just reflecting on this point}. We have vague ideas until an implementation is shown, then we examine it and latch on to a detail and decide if it matches our idea or not. For me, if I'm imagining "a superhero planting vegetables in his garden" I've no idea what they're actually wearing, but when an artist or genAI shows me it's a brown coat then I'll say "no something more marvel". Then when ultimately they show me something that matches the idea I had _and_ matches my current conception of the idea I had... then I'll point out the fingernails are too long, when in the idea I hadn't even perceived the person had fingers, never mind too-long fingernails!

I'd warrant any actualised artistic work has some delta with the artists current perception of the work; and a larger delta with their initial perception of it.

16 days ago

janalsncm

I disagree. Even without exactness, adding any reasonable constraints is impossible. Ask it to generate a realistic circuit diagram or chess board or any other thing where precision matters. Good luck going back and forth getting it right.

These are situations with relatively simple logical constraints, but an infinite number of valid solutions.

Keep in mind that we are not requiring any particular configuration of circuit diagram, just any diagram that makes sense. There are an infinite number of valid ones.

17 days ago

TeMPOraL

That's using the wrong tool for a job :). Asking diffusion models to give you a valid circuit diagram is like asking a painter to paint you pixel-perfect 300DPI image on a regular canvas, using their standard paintbrush. It ain't gonna work.

That doesn't mean it can't work with AI - it's that you may need to add something extra to the generative pipeline, something that can do circuit diagrams, and make the diffusion model supply style and extra noise (er, beautifying elements).

> Keep in mind that we are not requiring any particular configuration of circuit diagram, just any diagram that makes sense. There are an infinite number of valid ones.

On that note. I'm the kind of person that loves to freeze-frame movies to look at markings, labels, and computer screens, and one thing I learned is that humans fail at this task too. Most of the time the problems are big and obvious, ruining my suspension of disbelief, and importantly, they could be trivially solved if the producers grabbed a random STEM-interested intern and asked for advice. Alas, it seems they don't care.

This is just a specific instance of the general problem of "whatever you work with or are interested in, you'll see movies keep getting it wrong". Most of the time, it's somewhat defensible - e.g. most movies get guns wrong, but in way people are used to, and makes the scenes more streamlined and entertaining. But with labels, markings and computer screens, doing it right isn't any more expensive, nor would it make the movie any less entertaining. It seems that the people responsible don't know better or care.

Let's keep that in mind when comparing AI output to the "real deal", as to not set an impossible standards that human productions don't match, and never did.

17 days ago

janalsncm

The issue isn’t any particular constraint. The issue is the inability to add any constraints at all.

In particular, internal consistency is one of the important constraints which viewers will immediately notice. If you’re just using sora for 5 second unrelated videos it may be less of an issue but if you want to do anything interesting you’ll need the clips to tie together which requires internal consistency.

17 days ago

mlboss

So what I am getting a use-case for brain-computer interface.

17 days ago

hmottestad

When I first started learning Photoshop as a teenager I often knew what I wanted my final image to look like, but no matter how hard I tried I could never get the there. It wasn't that it was impossible, it was just that my skills just weren't there yet. I needed a lot more practice before I got good enough to create what I could see in my imagination.

Sora is obviously not Photoshop, but given that you can write basically anything you can think of I reckon it's going to take a long time to get good at expressing your vision in words that a model like Sora will understand.

17 days ago

corytheboyd

Free text is just the fundamentally wrong input for precision work like this. Because it is wrong for this doesn’t mean it has NO purpose, it’s still useful and impressive for what it is.

FWIW I too have been quite frustrated iterating with AI to produce a vision that is clear in my head. Past changing the broad strokes, once you start “asking” for specifics, it all goes to shit.

Still, it’s good enough at those broad strokes. If you want your vision to become reality, you either need to learn how to paint (or whatever the medium), or hire a professional, both being tough-but-fair IMO.

17 days ago

londons_explore

I don't think it'll be long before GUI tools catch up for editing video.

Things like rearranging things in the scene with drag'n'drop sound implementable (although incredibly GPU heavy)

17 days ago

ohthehugemanate

If you have a specific vision, you will have to express the detailed information of that vision into the digital realm somehow. You can use (more) direct tools like premiere if you are fluent enough in their "language". Or you can use natural language to express the vision using AI. Either way you have to get the same amount of information into a digital format.

Also, AI sucks at understanding detail expressed in symbolic communication, because it doesn't understand symbols the way linguistic communication expects the receiver to understand them.

My own experience is that all the AI tools are great for shortcutting the first 70-80% or so. But the last 20% goes up an exponential curve of required detail which is easier and easier to express directly using tooling and my human brain.

Consider the analogy to a contract worker building or painting something for you. If all you have is a vague description, they'll make a good guess and you'll just have to live with that. But the more time you spend with them communicating (through description, mood boards rough sketches etc) the more accurate to your detailed version it will get. But you only REALLY get exactly what you want if you do it yourself, or sit beside them as they work and direct almost every step. And that last option is almost impossible if they can't understand symbolic meaning in language.

17 days ago

cube2222

Agreed. It’s still much better than what I could do myself without it, though.

(Talking about visual generative AI in general)

17 days ago

JKCalhoun

Yeah, but if I handed you a Maxfield Parrish it would be better than either of us can do — but not what I asked for.

Advertisers, I guess. Same folks who paid for everything else around here

17 days ago

adamc

Yeah, I just question if there are enough customers to make this work.

17 days ago

joe_the_user

The thing about Hollywood is that movies aren't made by a producer or director creating a description and an army of actors, tech and etc doing exactly that.

What happens is a description becomes a longer specification or script that's still good and hangs together in itself and then further iterations involving professionals who can't do "exactly what the director wants" but rather do something further that's good and close enough to what the director wants.

17 days ago

skydhash

Also, a team of experts and professionals that knows better than the director how a specific thing work.

16 days ago

diob

I believe it. I was just using AI to help out with some mandatory end of year writing exercises at work.

Eventually, it starts to muck with the earlier work that it did good on, when I'm just asking it to add onto it.

I was still happy with what I got in the end, but it took trial and error and then a lot of piecemeal coaxing with verification that it didn't do more than I asked along the way.

I can imagine the same for video or images. You have to examine each step post prompt to verify it didn't go back and muck with the already good parts.

17 days ago

planb

Iterations are the missing link.

jstummbillig

> A way to test this is to take a piece of footage or an image which is the ground truth, and test how much prompting and editing it takes to get the same or similar ground truth starting from scratch.

Sure, if you then do the same in reverse.

17 days ago

17 days ago

telenardo

For those curious (and still locked out) here’s direct a comparison of Sora vs. the open-source leaders (HunyuanVideo, Mochi and LTX):

https://app.checkbin.dev/snapshots/1f0f3ce3-6a30-4c1a-870e-2...

Pros:

- Some of the Sora results are absolutely stunning. Check out the detail on the lion, for example! - The landscapes and aerial shots are absolutely incredible. - Quality is much better than Mochi & LTX out of the box. Mochi/LTX seem to require specifically optimized workflows (I've seen great img2vid LTX results on Reddit that start with Flux image generations, for example). Hunyuan seems comparable to Sora!

Cons:

- Still nearly impossible to access Sora despite the “launch”. My generations today were in the 2000s, implying that it’s only open to a very small number of people. There’s no api yet, so it’s not an option for developers. - Sora struggles with physical interactions. Watch the dancers moonwalk, or the ball goes through the dog. HunyuanVideo seems to be a bit better in this regard. - Can't run it locally mode (obviously) - I haven't tested this, but I think it's safe to assume Sora will be censored extensively. HunyuanVideo is surprisingly open (I've seen NSFW generations!) - I’m getting weird camera angles from Sora, but that could likely be solved with better prompting.

Overall, I’d say it’s the best model I've played with, though I haven’t spent much time on other non-open-source ones. Hunyuan gives it a run for its money, though!

17 days ago

spondyl

I can't speak to any of those videos in a technical sense but personally, I don't feel like any of them are good?

The vibe they give me is similar to the iPhone photography commercials where yes, in theory, a picnic in the park could look exactly like this except for all the parts that seem movie perfect.

I guess it's really more of a colour grading question where most of the Sora colour grading triggers that part of my brain that says "I'm watching a movie and this isn't real" without quite realising why.

A few of the Hunyuan videos in contrast seem a bit more believable even though they have some obvious glitches at times.

The other thing I think Sora has is that thing in commercials where no one else except the protagonist exists and nothing is ever inconvenient. The video of the teacher in a classroom with no students reminds me of that as well as the picnic in the park where there's wide open space with no one around.

I suppose it depends if the goal is to generate believable video and how you define believable.

17 days ago

zuminator

Hunyuan was more realistic but lower quality than Sora, shorter videos with lower resolution or bitrate. The downside to Sora's sharpness is that it makes mistakes more apparent. Also funny that Sora didn't understand the rolling dunes metaphor.

17 days ago

CSMastermind

Based on this it really seems like Hunyuan is a significantly better model. In nearly every example I preferred its output.

17 days ago

pen2l

Every day that passes I grow fonder of Google's decision to delay or otherwise keep a lot of this under the wraps.

The other day I was scrolling down on YouTube shorts and a couple videos invoked an uncanny valley response from me (I think it was a clip of an unrealistically large snake covering some hut) which was somehow fascinating and strange and captivating, and then scrolling down a few more, again I saw something kind of "unbelievable"... I saw a comment or two saying it's fake, and upon closer inspection: yeah, there were enough AI'esque artifacts that one could confidently conclude it's fake.

We'd known about AI slop permeating Facebook -- usually a Jesus figure made out of unlikely set of things (like shrimp!) and we'd known that it grips eyeballs. And I don't even know in which box to categorize this, in my mind it conjures the image of those people on slot machines, mechanically and soullessly pulling levers because they are addicted. It's just so strange.

I can imagine now some of the conversations that might have happened at Google when they choose to keep a lot of innovations related to genAI under the wraps (I'm being charitable here of their motives), and I can't help but agree.

And I can't help but be saddened about OpenAI's decisions to unload a lot of this before recognizing the results of unleashing this to humanity, because I'm almost certain it'll be used more for bad things than good things, I'm certain its application on bad things will secure more eyeballs than on good things.

17 days ago

lelandfe

I saw my first AI video that completely fooled commenters: https://imgur.com/a/cbjVKMU

This was not marked as AI-generated and commenters were in awe at this fuzzy train, missing the "AIGC" signs.

Some do care, e.g. some camera manufacturers or some news agencies. Surprisingly some social media platforms[1] want clear labels for AI generated content.

[1]: e.g. tiktok https://newsroom.tiktok.com/en-us/partnering-with-our-indust...

16 days ago

netdevphoenix

that's the easiest position imo. It's AI unless proven otherwise. No one has the time to place this much detailed on a random video when the purpose of the video is just entertainment. What this might lead to though is people losing (or not learning) the skills needed to separate real content from AI generated content

16 days ago

jacobr1

And even if it isn't AI, it is quite possibly deceptively edited. Content provenance will be important in the future.

16 days ago

cess11

A precondition is likely that one has mainly watched CGI-heavy movies for most of one's life. Compared to old school analog movies or fairly raw photography that looks as fake as the Coca-Cola Santa. There's a rather obvious lack of detail that real photography would have catched.

17 days ago

Quekid5

> A precondition is likely that one has mainly watched CGI-heavy movies for most of one's life.

B is / will be huge; the largest amount of "mindless" content is consumed on phones, with half attention, often with other distractions going on and in between doing other stuff, and can be watched on older / lower fidelity devices, slower internet connections, etc. AI content needs high resolution / big screens and focused attention to "discover".

The truth is... most people will simply not care. Raised eyebrow, hm, cute, next. Critical watching is reserved for critics like the crowd on HN and the like, but they represent only a small percentage of the target audience and revenue stream.

16 days ago

KennyBlanken

You can see the perspective/angle of the objects changing slightly as the camera moves in a way that makes it pretty obvious they're CG, AI or otherwise. That's always been a problem with AI generated imagery in video/animation; it changes too much frame to frame. If researchers figure out how to address that, yeah, we've got a problem. Until then - this looks worse tha

17 days ago

dmazzoni

> I’ve worked in CG for many years and despite the online nerd fests that decry CG imagery in films, 99% of those people can’t tell what’s CG or not unless it’s incredibly obvious.

I've noticed people assume things are CG that turn out to be practical effects, or 90% practical with just a bit of CG to add detail.

17 days ago

dagmx

Yep I’ve had that happen many times , where people assume my work is real and the practical is CG.

Worse, directors often lie about what’s practical and we’ll have replaced it with CG. So people online will cheer the “practicals” as being better visually, while not knowing what they’re even looking at.

I’ve seen interviews with actors even where they talk about how they look in a given shot or have done something, and not realize they’re not even really in the shot anymore.

People just have terrible eyes once you can convince them something is a certain way.

17 days ago

Der_Einzige

But films without CG are clearly superior and it’s not even in contention.

Lawrence of Arabia or Cleopatra alone have incredible fully live shot special effects which can not be easily replicated with CG and have aged like fine wine, unlike the trash early CG of the 80s and 90s which ruined otherwise great films like the last starfighter

16 days ago

dagmx

I’m sorry, but you make an absurd argument.

You’re taking the best films of an era and comparing them to an arbitrary list of movies you don’t like? Adding to that, you’re comparing it to films in the infancy of a technology?

This is peak confusion of causality and correlation. There are tons of great films in that time frame with CG. Unless you’re going to argue that Jurassic Park is bad.

16 days ago

jacobr1

Jurassic Park isn't just a good example of CG, it also a good example of making the right choices on practical vs CG (in the context of technology of the time) and using a reasonable budget. You can have great CG and crappy CG by cutting corners. Plenty of people that decry CG don't actually know how much there is, even in non-sci-fi movies like romcoms, just for post-editing. But when it is done well nobody notices, the complaints only come when it looks like crap. Great use of technology to achieve the artistic vision will stand the test of time.

16 days ago

lancesells

It's also directed by one of the best directors in history.

16 days ago

qingcharles

The worst bit about working in CG, or film-making in general, is finding it harder to enjoy films because you are hypersensitized to bad work.

17 days ago

dagmx

Yeah, totally. It’s not even just bad work, but I’m constantly breaking down shots as I’m watching them.

Especially because I’ve done both on set and virtual production, it’s hard to suspend disbelief in a lot of films.

16 days ago

circlefavshape

> Still, most people cannot tell reality from fiction. If you just tell them it’s real, they’ll most likely believe it.

This goes for conversation too! My neighbour recently told me about a mutual neighbour who walks 200 miles per day working on his farm. When I explained that this is impossible he said "I'll have to disagree with you there"

17 days ago

schoen

Maybe not strictly impossible, just slightly better than an ultramarathon world record pace?

https://www.reddit.com/r/Ultramarathon/comments/xhbs4d/sorok...

https://en.wikipedia.org/wiki/Aleksandr_Sorokin

So, not very convenient for a non-world-champion runner to do (let alone while doing farm work) (let alone on more than one occasion).

17 days ago

Cthulhu_

That's a cultural issue that seems to have developed in the past years (decades? idk), where people take their own opinion (or what they think is their own opinion) as unchallengeable gospel.

In my opinion anyway, I'm gonna have to disagree with any counterpoints in advance.

16 days ago

dagmx

This is partially the result of being taught that every opinion is valid. What was taught as a nicety (don’t dismiss other people’s opinions was the intention) has evolved into all opinions are equal.

If all opinions are equal, and we’ve reinforced that you can find anything to strengthen an opinion, then facts don’t actually matter.

But I don’t think it’s actually all that recent. History is full of people saying that facts or logic don’t matter. The Americas were “discovered” by such a phenomenon.

16 days ago

Looks dope though. But what impressed me recently was some crypto-scam video, featuring "a clip" from Lex Fridman Podcast where Elon Musk "reveals" his new crypto or whatever (sadly, the one I saw is currently deleted). It didn't really look good, they were talking with weird pauses and intonations, and as awkward these 2 normally are, here they were even more unnatural. There was so much audacity to it I laughed out loud.

But what I was thinking while enjoying the show was: people wouldn't do that, if it didn't work.

This is the point. There is no such thing as "completely fools commenters". I mean, it didn't fool you, apparently. (But don't be sad, I bet you were fooled by something else: you just don't know it, obviously.) But some of it always fools somebody.

I really liked how Thiel mentioned on some podcast that ChatGPT successfully passed Turing test, which was implicitly assumed to be "the holy grail of AI", and nobody really noticed. This is completely true. We don't really think about ChatGPT, as something that passes Turing test, we think how fucking stupid useless thing mislead you with some mistake in calculations you decided to delegate to it. But realistically, if it doesn't it's only because it is specifically trained to try to avoid passing it.

17 days ago

lelandfe

I wish you were right that there is no way to completely fool viewers, but I know you are not. I was fooled! Note that I call out "AIGC." If that wasn't there (I only noticed it on repeat views), I would have simply had no way to tell. These are early, primitive AI generated videos, and I'm already unable to differentiate. Many in this thread talk about movie CG; there are countless movie scenes that fool all viewers.

16 days ago

coffeebeqn

If someone were to train a model on Joe Rogan podcasts whole run, I’m sure it would spit out extremely impressive fake results already

17 days ago

vintermann

> people wouldn't do that, if it didn't work.

You can't assume that with scams. Quite often, scams are themselves sold as a get-rich-quick scheme, which like all GRQ schemes, they wouldn't be if they worked well.

17 days ago

peab

Think about this: you very well may have already seen AI videos that fooled you - you wouldn't know if you did.

16 days ago

[deleted]

17 days ago

coffeebeqn

One of the clearest signs in the current gen is that the typography looks bad still.

17 days ago

darkerside

People are smart enough to know that what you see in movies isn't real. It will just take a little time for people to realize that now applies to all videos and images.

17 days ago

nihil2501

The frequency is so high, and I am getting so burned out on checking comments to gauge how much everything is changing, that I've nearly given up subconsciously. Pretty close to just ignoring all images I see.

17 days ago

[deleted]

17 days ago

nurettin

This is definitely something the Japanese would do, but it is not a real train unless a thousand salarymen are crammed into it.

16 days ago

matwood

The bigger problem is that people think something this ridiculous could happen.

17 days ago

marci

Weirder things have been created. I could definitely see one being made for a movie.

16 days ago

espadrine

> I'm quite nervous for the future.

Videos like these were already achievable through VFX.

booleandilemma

No one is looking at her face though, they're looking at the giant hello kitty train. And you were only looking at her face because you were told it's an AI-generated video. I agree with superfrank that extreme skepticism of everything seen online is going to have to be the default, unfortunately.

17 days ago

vlovich123

Hard to not discount that as a compression artifact.

17 days ago

magicalhippo

Just like all the obvious signs[1] the moon landings were faked.

underdeserver

I think as these things will get bigger and better much faster than we can learn to discern.

16 days ago

solfox

With zero doubt. Faster than we expect. And yet, it's nice that we are learning to distrust what we see before the "real real" stuff comes out.

16 days ago

echelon

Open source has already caught up with SOTA:

https://www.reddit.com/r/StableDiffusion/comments/1hav4z3/op...

These are even unfair comparisons because they're leveraging text-to-video instead of the more powerful image-to-video. In the latter case, the results are indistinguishable.

Video generation is about to be everywhere, and we're about to have the "Stable Diffusion" moment for video.

Look at the comments: people are already fawning over open source being uncensored.

Cat's out of the bag.

16 days ago

"More human than human" is our motto. https://youtu.be/ZbgmYhqFO-4?t=30

13 days ago

solfox

And yet, OP referred to a thread where the reality of the shorts were being questioned by "average" people. Imagine a world where OpenAI were the first out the gates with this and just started producing their own videos without telling anyone about their technology or letting creators play with it. They'd make loads of money, probably could topple governments... I'm glad these tools are being made generally available versus the alternative.

16 days ago

quenix

It saddens me. Innovations in AI 'art' generation (music, audio, photo) have been a net negative to society and are already actively harming the Internet and our media sphere.

Like I said in another comment, LLMs are cool and useful, but who in the hell asked for AI art? It's good enough to fool people and break the fragile trust relationship we had with online content, but is also extremely shit and carries no meaning or depth whatsoever.

17 days ago

anxoo

>who in the hell asked for AI art?

everyone who has ever used stock photography, custom illustrators, and image editing. as AI improves, it will come after all of those industries.

that said, it is not OpenAI's goal to beat shutterstock, nor is it the goal of anthropic or google or meta. their goal is to make god: https://ia.samaltman.com/ . visual perception (and generation) is the near-term step on that path. every discussion of AI that doesn't acknowlege this goal, what all of these billions of dollars are aiming for, is myopic and naive.

17 days ago

rurp

There was a recent discussion in another HN thread that I think summed it up well. Good art rewards a careful viewer; the more you look at and think about good art, the more you get out of it. AI art does the opposite and punishes thoughtful consumers. There's no logical underpinning to the various details, it's just stuff mashed together in a superficially nice looking way.

16 days ago

mojuba

I think AI "art" can be as useful as the text generators, i.e. only within certain limits of dull and stupid stuff that needs to exist but has little to no value.

For example, you need to generate a landing page for your boring company: text, images, videos and the overall design (as well as code!) can be and should be generated because... who cares about your boring company's landing page, right?

17 days ago

whatevertrevor

One could ask why the boring company landing page exists in the first place though. If it's not providing value to humans to warrant actual attention being paid to it...

17 days ago

tomjen3

The world is in need of soap. Not the fancy beautiful artistic kind, but the kind that comes in containers and you put in bathrooms. This objectively saves lives and is one of those boring things I can imagine.

17 days ago

carlosjobim

Then you don't understand the purpose of a landing page. If the boring company hires somebody to make the landing page who actually understands their job, the landing page will have great importance.

17 days ago

yunwal

> the landing page will have great importance.

Most companies don't need this. They need a page that has their contact info and some general information about services they provide so they can have a bare minimum internet presence and show up on google maps.

16 days ago

carlosjobim

Absolutely, if your company doesn't want to make sales or if you want to be bothered all the time by people calling and mailing only for them to find out your product isn't a fit for them. Or if you want third party sellers to take over most of your business like Booking.com, AirBnB DoorDash or Amazon.

Companies who understand the importance of a customer friendly and functional web presence get a great return on their investment. And it's much better for the customer.

16 days ago

yunwal

I have an ice cream shop by me that doesn't even have a website. They're mobbed every day, because good ice cream is fairly self explanatory, and doesn't need a web presence

13 days ago

lobsterthief

You’re conflating “website” with “landing page”.

Your ice cream shop doesn’t need a landing page because of word of mouth and foot traffic.

Some project management platform for plumbers needs a highly tuned webpage because they’re competing with 20 other such systems, and there’s no line to walk past and assume it’s there because the software is good.

Believing that if you build great plumbing SAAS software people, paying customers will magically appear, is naive.

A great product can sell itself. But that doesn’t mean that marketing and sales aren’t necessary in order to get the product in front of people, assuage their concerns, reassure them that it solves their problems, show social proof from others using it, and close the deal. A good landing page will do all of this ;)

13 days ago

dale_glass

> Like I said in another comment, LLMs are cool and useful, but who in the hell asked for AI art?

I did. I started messing around with computer graphics on DOS with QBASIC and consider AI art to be just an extension of that.

On the other hand I don't care all that much for LLMs most of the time. They're sometimes useful, but while I find AI art I enjoy very regularly, using a LLM for something is more a once every couple weeks event for me.

17 days ago

computerex

How do you know they are a net negative? What's your source?

17 days ago

quenix

My opinion ;-)

That's what HN is for

17 days ago

CamperBob2

It's quite well-supported on here, that's for sure.

Somewhere there's a site for "hackers" where it isn't, and I hope I stumble across that site at some point.

17 days ago

Cthulhu_

Do add "in my opinion" or prefix with "I think", because your definite wording implied you were stating a verifiable fact. Telling opinions like they are facts and then backtracking with "oh but it was just my opinion" is a big problem in (online?) society / discourse, and has led to a lot of misinformation and anti-scientific takes spreading.

"The earth is flat" - "Can you prove it?" - "Oh it's just my opinion". It's dishonest.

16 days ago

randomlurking

I agree with the first part. For me, AI art is the chance to have a somewhat creative outlet that I wouldn’t have otherwise, because I’m much worse at painting that I can stand. Drawing by prompts helps me be creative and work through some stuff - for that it’s also nice and interesting to see that the result differs from my mental image. I will tweak the prompt to some extent and to some extent go with some unintentioned elements of the drawing. I keep the drawing on my phone in the notes app with a title and the prompt.

sergiogdr

It isn’t about being smart (you assumed this is what ‘education’ was pointing at). Most people aren’t even aware of what’s happening besides extremely superficial things that they get here and there on the news. Can’t you honestly see the real potential for massive damage coming out of all this?

17 days ago

pixelsort

With respect to the American public, the majority can and do utilize nuanced thinking as a survival skill. The problem of modern American era, is not that our public is low in average intelligence. Rather, that on average, we have been miseducated to seek the eradication of discomfort, uncertainty, inconveniences, and unknowns.

17 days ago

cma

That radio station in hotel Rawanda could be a bad thing for you and people you cared about even if you personally could discern the lies so it wasn't fooling you.

17 days ago

krisboyz781

Actually you overestimate the general public's ability to discern what's real or not. On top of that, most people don't even care if it's real. This is exactly why Trump won.

Example: if a gen ai vid of a politician doing some crazy crime came out. Even if it were proven fake, people would start questioning everything and still act as if the politician were guilty

17 days ago

JPKab

"This is exactly why Trump won"

See the part of my comment you are replying to where I specifically stated that the motivation for all of this is that "Jethro doesn't vote the way I want him to". You've proven my point.

The censorious attitudes on HN were non-existent before Trump won in 2016. I know this for a fact. I've had my account on here since 2012, after 2 years of being just a reader.

Meanwhile, you overestimate how immune to misinformation and lies the average HN techy is. Just a few years ago, the majority of people on here believed, with utter conviction, that the bat-borne coronavirus lab in Wuhan had absolutely no connection with the bat-borne coronavirus epidemic that started in Wuhan and that only bigots and ignoramuses could draw such a conclusion. I experienced this whenever I brought up the blatantly obvious, common sense connection in these same comment threads in late 2020 or into mid 2021. The absolutely absurd denial of common sense by otherwise intelligent people was reminiscent of trying to talk to a religious fundamentalist about evolution while pointing at dinosaur fossils and having them continue to deny what was staring them in the face.

16 days ago

sekai

> "just be privileged as I was to get all the necessary education to be able to not be fooled by this tech". Yeah, very realistic and compassionate.

16 days ago

gambiting

>>isn't aware that videos can be faked with AI, or non-AI special effects

These two are very different things. My family believes all kinds of videos on the internet are fake. None of them have any idea what a tool like Sora can do. The gap between "oh this was probably special effects" to "you have to notice pixels shimmering around someone's hand to tell" is enormous.

>>My family is mostly working class in an economically depressed part of the Virginia/West Virginia coal country, and every single one of them is aware of this.

Your working class family has time to keep up with the advancements in generative AI for video? They have more free time than I do then. If we're sharing anecdotes about families then my family is from Polish coal country and their idea of AI is talking to your car and it responding poorly.

>>I maintain that the attitude driving this paternalistic, censorious attitude is arrogance and condescension.

I'm confused - who is displaying this "censorious" attitude here?

>> and your source of data for this is "your own experience"? Really?

Yes, really. I mean do you have anything else? You are also quoting things from your own experience.

16 days ago

sergiogdr

I’m not (exclusively) talking about formal education. There are lots of people (I would dare say the majority of the planet) that don’t have the ‘digital literacy’ required to handle what’s happening right now. Being from a developed country I am very much worried about this.

17 days ago

8n4vidtmkvmk

Fooled by what? Some of it looks real but is incredulous enough that it should set off your BS sensor. Other stuff is/will be more subtle and we will have no way of knowing.

17 days ago

mrcwinn

Too charitable indeed. Google was simply unprepared and has inferior alternatives.

My prediction is that next year they will catch up a bit and will not be shy about releasing new technology. They will remain behind in LLMs but at least will more deeply envelope their own existing products, thus creating a narrative of improved innovation and profit potential. They will publicly acknowledge perceived risks and say they have teams ensuring it will be okay.

17 days ago

tziki

>They will remain behind in LLMs

The latest Gemini version (1206) is at least tied for the best LLM, if not the best outright.

16 days ago

pier25

I wish Google would allow me to remove the AI stuff from search results.

99% of the times it's either useless or wrong.

17 days ago

titzer

Strong plus one here. Not only that, but it uses gobs of energy in total. Google has reneged on all of its carbon promises to stay in the running for AI domination and to head off disruption to search ads business. Since I've unconsciously trained my brain to not look at the top search results anymore because they long ago turned into impossible-to-distinguish ads, I've quickly learned to just ignore the stupid AI summary. So it's an absurd waste of computational power to generate something wrong that I don't even want to see, and I can't even tell them to stop when they're wasting their own money to do so.

17 days ago

[deleted]

KeplerBoy

dyauspitr

Pandora’s box is open, not releasing models and tools is just going to result in someone else doing it.

17 days ago

whywhywhywhy

They didn’t keep it under wraps, it’s just the team considered the paper shipping not the product. They still shipped the papers that decentralized the knowledge.

stronglikedan

> there were enough AI'esque artifacts that one could confidently conclude it's fake.

Der_Einzige

If you’re not actively publishing at top conferences (I.e. NeurIPS), than this is a trash question and shows the lack of knowledge that many who are now entering the field will have.

Anything that you or others can answer to this which isn’t some stupid “gotcha” puzzle shit (lol it’s video cus LLMs aren’t video models amiright?) will be wrong because of things like structured decoding and the fact that ultra high temperature works with better samplers like min_p.

https://openreview.net/forum?id=FBkpCyujtS&noteId=mY7FMnuuC9

16 days ago

woctordho

3e4a3ad9f05fdfb609dda6e5f512e52506f4c1053962e21bfd93f1ed81582d16ca0fef9574fb07ab62f8f5b1373b4ddd541804c0d176f4a557d900b05047e853

(This is the hash of a string randomly popped in my mind. An LLM will write this with almost 0 probability --- until this is crawled into the training sets)

17 days ago

Kiro

You go first.

17 days ago

definitelynotai

[dead]

17 days ago

raincole

Considering google image search is polluted by AI-generated images at this moment, perhaps google is afraid of making the search even worse?

17 days ago

submeta

What I desperately need is a model that generates perfectly made PowerPoint slides. I have to create many presentations for management, and it’s a very time consuming task. It’s easy to outline my train of thoughts and let an LLM write the full text, but then to create a convincing presentation slide by slide takes days.

I know there is Beautiful.ai or Copilot for PowerPoint, but none of the existing tools really work for me because the results and the user flow aren’t convincing.

17 days ago

buzzy_hacker

Have you checked out Marp? https://marp.app/

Basically it generates slides from markdown, which is great even without LLMs. But you can get LLMs to output in markdown/Marp format and then use Marp to generate the slides.

I haven't looked into complicated slides, but works well for text-based ones.

17 days ago

agnishom

Looks interesting. I am on the hunt for clean tools for producing presentations. I really like Powerpoint, mainly because of their animation and vector editing features. However, I don't want to keep using a proprietary tool.

17 days ago

terhechte

You could also try Hyperdeck which uses Markdown for slides as well, but supports most of the animation features of Powerpoint as well as MathML and stuff like that (no vector editing though)

I don't really understand how this would work. Writing long paragraphs to prompt the AI is much more tedious than writing a few bullet points for the slides.

If you need the AI to help you brainstorm a good narrative, that is a different story

17 days ago

ShakataGaNai

Never used it but seen it mentioned in that space: https://gamma.app/

17 days ago

[deleted]

17 days ago

ghita_

there is a YC company that does that I think: https://www.rollstack.com/ i've never used them but I think they have many satisfied customers, maybe worth a shot!

17 days ago

MyFirstSass

Wow this is bad. And by bad i mean worse than leading open source and existing alternatives.

Is it me or does it seem like OpenAI revolutionized with both chatGPT and Sora, but they've completely hit the ceiling?

Honestly a bit surprised it happened so fast!

17 days ago

lanthissa

I think we're in the snapdragon age of AI for the next little bit, if you were around for early smartphones.

Each company would either rush to get a phone out with the new snapdragon chip, or take their time to polish a release and have a better phone late cycle. But the real improvements we're just the chip.

Nvidia chips/larger data centers are the chips. the models are the plethora of android phones each generation.

>OpenAI and Google have to be extraordinarily strict

Why? Did the inventors of VHS tapes "have to be extraordinarily strict" and bake in safeguards because people might violate copyright laws, make porn, or tape something illegal?

Enforcing laws is the responsibility of the legal system. It sets a concerning precedent when companies like OAI would rather lobotomize their flagship products than risk them generating any Wrongthink.

17 days ago

lacoolj

If you're going to say something like this, you need to back it up with specific alternatives that provide a better result.

Besides just citing your sources, I'm genuinely curious what the best ones are for this so I can see the competition :)

17 days ago

echelon

HunYuan released by Tencent [1] is much better than Sora. It's 100% open source, is compatible with fine tuning, ComfyUI, control nets, and is receiving lots of active development.

That's not the only open video model, either. Lightricks' LTX, Genmo's Mochi, and Black Forest Labs' upcoming models will all be open source video foundation models.

Sora is commoditized like Dall-E at this point.

Video will be dominated by players like Flux and Stable Diffusion.

[1] https://github.com/Tencent/HunyuanVideo/

17 days ago

vlovich123

Something being available OSS is very different from a turnkey product solution, not to mention that Tencent's 60 GiB requirement requires a setup with like at least 3-4 GPUs which is quite rare & fairly expensive vs something time-sharing like Sora where you pay a relatively small amount per video.

I think the important thing is task quality and I haven't seen any evaluations of that yet.

17 days ago

echelon

> Something being available OSS is very different from a turnkey product solution, not to mention that Tencent's 60 GiB requirement requires a setup with like at least 3-4 GPUs which is quite rare & fairly expensive vs something time-sharing like Sora where you pay a relatively small amount per video.

It took two weeks to go from Mochi running on 8xH100s to running on 3090s. I don't think you appreciate the rapidity at which open source moves in this space.

HunYuan landed less than one week ago with just one modality (text-to-video), and it's already got LoRA training and fine tuning code, Comfy nodes, and control nets. Their roadmap is technically impressive and has many more control levers in scope.

I don't think you realize how "commodity" these models are and how closed off "turn key solutions" quickly get out-innovated by the wider ecosystem: nobody talks about or uses Dall-E to any extent anymore. It's all about open models like Flux and Stable Diffusion.

{Text/Image/Video}-to-Video is an inadequate modality for creative work anyway, and OpenAI is already behind on pairing other types of input with their models. This is something that the open ecosystem is excelling at. We have perfect syncing to dance choreography, music reactive textures, and character consistency. Sora has none of that and will likely never have those things.

> something time-sharing like Sora where you pay a relatively small amount per video.

Creators would prefer to run all of this on their own machines rather than pay for hosted SaaS that costs them thousands of dollars.

- Image quality (probably midjourney, or an SDXL checkpoint + upscaler)

- Prompt adherence (flux, DALL-E 3)

EDIT: This is strictly around image generation. The main video competitors are Kling, Hailuo, and Runway.

17 days ago

sebazzz

SD does not generate video, does it?

17 days ago

xvector

https://stable-diffusion-art.com/animatediff/

17 days ago

CryptoBanker

It does as of recently.

17 days ago

amrrs

Minimax (from China) and Kling 1.5 from China. Recently Tencent launched its own.

16 days ago

kranke155

UPDATE: After watching direct comparison videos between prompts, I do think now that Sora is ahead. It's not a huge leap but it seems much better at keeping fine details roughly aligned.

For anyone who is curious where to find tons of SORA videos, go to reddit r/aivideo

16 days ago

echelon

HunYuan by Tencent. It's 100% open source too.

17 days ago

ElectroNomad

RunwayML

17 days ago

joe_the_user

Bad also in the sense once you get over the "boy, it's amazing they can do that", you immediately think "boy, they really shouldn't do that".

17 days ago

torginus

My working theory is that OpenAI is the 'moonshot' kind of company full of super smart researchers who like tackling hard problems, but have no time and effort for things like 'how do we create an UX people actually want to use', which actually requires a ton of painful back-and-forth and thoughtful design work.

This is not a problem as long as they do the ChatGPT thing, and sell an API and let others figure out how to build an UX around it, but here they seem to be gunning for creating a boxed product.

17 days ago

doctorpangloss

Yeah… they have defined the UX that everyone else is copying thus far. So I feel like you are pretty far off the mark.

17 days ago

shadowerm

No doubt. I was waiting so long for Sora but Runway already burned me out on AI video.

It was fun for a few days but far more limited than I would have ever expected.

Maybe Sora 5.0 will be something special. Right now though all these video models are basically shit.

17 days ago

Banditoz

What are some of the open source video models?

17 days ago

wslh

Could it be that text sources are plenty, and more dense than training for videos, and images?

17 days ago

NoNotTheDuo

Their example videos: https://openai.com/sora/, of the doors opening, are hilarious.

1. The first set of doors doesn't have any doorknobs or handles. https://ibb.co/PwqfzBq

2. The second set of doors has handles, and some very large/random hinges on the left door. https://ibb.co/JkDtc6r

3. The third set doesn't have any handles, but I can forgive that, because we're in a spaceship now. The problem is that the inside of the doors seem to have windows, but the outside of the doors, doesn't have any windows. https://ibb.co/nwpXmtq & https://ibb.co/wr6v2g1

4. The best/most hilarious part for me. The doors have handles, but they are on the hinge side of the door. No idea how this would work. https://ibb.co/gWXDcfr

16 days ago

neop1x

There are more examples of its limitations.

The video with dogs shows three taxis transforming into one, the number of people under the tree changing https://player.vimeo.com/video/1037090356?h=07432076b5&loop=...

An example from the HunyuanVideo is terrible as well. Look at that awful tongue: https://hunyuanvideoai.com/part-1-3.mp4

And what we see in that marketing is probably the best they could generate. And I suppose it took a lot of prompt tweaking and regenerations.

The internet is already full of junk shorts and useless videos and soon there will be even more junk content everywhere. :(

15 days ago

ckcheng

I think they trained on one too many closet bifold doors [1].

If you look at the edge of the doors as they swing open, it seems their movement resembles bifold door movement (there's a wiggle to it common to bifold doors that normal doors never have). Plus they seem to magically reveal an inner fold that wasn't there before.

[1]: https://duckduckgo.com/?t=h_&q=interior+bifold+closet+doors&...

16 days ago

Imnimo

I feel like there is a sweet spot for AI generation of images and videos that I would describe as "charmingly bad", like the stuff we got from the old CLIP+VQGAN models. I feel like Sora has jumped past that into the valley of "unappealingly bad".

17 days ago

halyconWays

I think that's why humor and memes are such good targets for this type of stuff. If you look up videos like "luma memes compilation," it takes well-known memes and distorts them in uncanny, freaky, and bizarre ways. Yet the fact the original subject is a meme somehow bypasses the uncanny valley repulsion. We seem to accept that much more readily, for whatever reason.

17 days ago

[deleted]

17 days ago

azinman2

Technically it's amazing that this is possible at all. Yet I don't see how the world is better off for it on net. Aside from eliminating jobs in FX/filming/acting/set design/etc, what do we really gain? Amateur filmmakers can be more powerful? How about we put the same money into a fund for filmmakers to access. The negatives are plentiful, from the mundane reduction of our media to monolithic simulacra to putting the nail in the coffin for truth to exist unchallenged, let alone the 'fine tunes' that will continue to come for deepfakes that are literal (sexual) harassment.

Humans are not built for this power to be in the hands of everyone with low friction.

17 days ago

sumedh

> Amateur filmmakers can be more powerful?

YouTube turned everyone into broadcasters. Sora could help bring countless untold stories to life, straight from the imagination.

> Humans are not built for this power to be in the hands of everyone with low friction.

Why is having power concentrated in few hands better?

17 days ago

shortrounddev2

> Why is having power concentrated in few hands better?

Because most people are dangerous morons. I don't think most people should be allowed to operate a car, let alone the most powerful tool for misinformation that has ever existed

16 days ago

CamperBob2

I mean, you're clearly not wrong, but how do you propose implementing your worldview without doing even more harm to humanity?

The only thing worse than a powerful, dangerous tool in the hands of the masses is a powerful, dangerous tool controlled exclusively by powerful, dangerous people. (Cue the usual moronic analogies involving thermonuclear weapons...)

16 days ago

shortrounddev2

The problem of distributing access to dangerous things like AI and weapons has been a problem that humans have faced for a few thousand years. Governments are instituted among men, deriving their just powers from the consent of the governed. If there was a formula for good governance, we'd have fewer problems, but generally I believe in democracy, transparency, and liberalism.

16 days ago

washadjeffmad

I've taken to calling the digital artists I work with "the old masters" in light of the flood of inexpert, low effort AGI content. And they do use generative AI, pretty liberally for concept work and reference, but they know what they're doing and can turn it into great things.

>any video produced by Sora would be required to have a form of watermarking that's on par with what intellectual property owners require

It's a completely different thing. IP owners want watermarks on their IP so they can prosecute people who use their IP without giving credit, nobody's forcing them to watermark it.

17 days ago

phtrivier

I agree that's why they do it.

I happen to think that some states will want to prosecute people who publish realistic-looking AI generated images without making it explicit that they're generated. I'm wondering if watermarking could be an effective tool for that.

(If I was on a bad mood, I would say that we should make it explicit when images are too heavily photoshoped, too ; but that's an other debat, because tools like Sora make manufacturing lies several order of magnitude cheaper.)

17 days ago

BoorishBears

> People would not respect the mandate, and we would consider that illegal, and use the monopoly on force to take money out of their bank account.

Imagine a culture that would harness their frustration at being left out in the direction of innovating on their own.

Defining the status quo on things like watermarks by leading the field and then demonstrating how to act from the front.

Seems like they'd be more effective than one that settles for derision and calling for taxes and rules from the back of the pack, so they can presumably profit off the terrible evil things being built.

17 days ago

phtrivier

That's going to sound luddite and backwards, but to be completely honest, I'm not 100% "frustrated" about being "left out" from "far west"-style AI image generation.

At this point, really, I can think of exactly two use cases:

* cheaply producing ads

* cheaply producing fake news

And it's terrifying, and the people jumping in the bandwagon are scaring me.

There is this quote in "13 days" [1] where people are discussing the Cuban missile crises, and, while everyone is gladly / obliviously preparing for the upcoming nuclear holocaust, one gray-haired diplomat raises his hand and says "One of us in the room should be a coward" before asking for a more prudent option.

Maybe it's the age old tension between the "new world" racing forward and the "old world" hitting the brakes. Not necessarily a bad dynamic in the long run. [2]

Feel free to call me, and the whole block I live in, "coward" on this front.

[1] https://en.wikipedia.org/wiki/Thirteen_Days_(film)

[2] https://en.wikisource.org/wiki/French_address_on_Iraq_at_the...

17 days ago

AuryGlenz

I’ve used AI art generation to make birthday cards for all of my nieces and nephews, to entertain my friends (making them into crappy superheroes, anime girls, etc.), to quickly “brainstorm” logos, make assets for an app I’m making…

This feels like computer graphics and the 'screen space' techniques that got introduced in the Xbox 360 generation - reflection, shadows etc. all suffered from the inability to work with off screen information and gave wildly bad answers once off screen info was required.

The solution was simple - just maintain the information in world space, and sample for that. But simple does not mean cheap, and it led to a ton of redundant (as in invisible in the final image) having to be kept track of.

17 days ago

> not to see all the spam their model was making when they did release it.

All replaced by open source LLMs at this point.

Most AI video will be produced by Hunyuan [1], LTX [2], and Mochi [3] in short order. These are the Flux / Stable Diffusion models for generative video. These can all be fine tuned to produce incredible results, and work with the Comfy ecosystem for wildly creative and controllable workflows.

I don't think it'll be possible for a closed source tool to compete with the open image/video ecosystem. Dall-E certainly didn't stay competitive for long. It's a totally different game.

[1] https://github.com/Tencent/HunyuanVideo

[2] https://huggingface.co/Lightricks/LTX-Video

[3] https://github.com/genmoai/mochi

17 days ago

jsheard

> I don't think it'll be possible for a closed source tool to compete with the open image/video ecosystem.

And I don't think the current status quo of open source models being entirely subsidised by startups and corporations is sustainable, they're all hemorrhaging money and their investors will only have so much patience before they expect returns. Enjoy it while it lasts.

17 days ago

echelon

It's game theory. If you don't have market share for your closed model, you release it as open source and let a community build upon it.

Is this a "guns don't kill" argument ?

Microsoft Word and Excel aren't generative tools. If Excel added a new headline feature to scan your financial sheets and auto-adjust the numbers to match what's expected when audited, you bet there would be backlash.

And regarding scrutiny, morphine is a immensely usefulness tool and it's use surely extremely monitored.

On the general point, our society values intent. Tools can just be tools when their primary purpose is in line with our values and they only behave according to the user's intent. AI will have to prove a lot to match both criteria.

17 days ago

belfalas

> And regarding scrutiny, morphine is a immensely usefulness tool and it's use surely extremely monitored.

I went to high school in a fairly affluent area and I promise you this is not true. If you have money and know how to talk to your doctor, you can get whatever you want. No questions asked.

You can even get prescription methamphetamine - and Walgreens will stock generic for it!

17 days ago

freedomben

Definitely not if you're a white male under 60 years old. They won't even give you opioids after surgery now because you are "high risk" .

If you're really rich it may be a different story, but any of the "middle class" good luck. And if you do find a doctor with some compassion, they are probably about to retire.

17 days ago

belfalas

All I can say is that I am speaking from life experience. It sounds like our experiences have been different.

17 days ago

makeitdouble

> If you have money and know how to talk to your doctor

That's a decently high bar I think ?

Imagine what you can do if you have money and know how to talk to your local police...

17 days ago

boznz

> If Excel added a new headline feature to scan your financial sheets and auto-adjust the numbers to match what's expected when audited

- Sounds like what my accountant already does.

17 days ago

lmm

Right, but accountants have qualifications and, more importantly, have to sign their name and accept liability for the accounts they're submitting. That's the part that's missing when "computer says ok".

17 days ago

Right but a gun can be had and presumably a nuclear warhead can’t, so even in countries who call the wrong sport “football” the law takes into account that some tools need to be regulated more than others.

16 days ago

Maybe you should talk with image editor developers, copier/scanner manufacturers and governments about the safeguards they shall implement to prevent counterfeiting money.

Because, at the end of the day, counterfeiting money is already illegal.

...and we should not censor tools, and judge people, not the tools they use.

17 days ago

rixed

Interestingly, you must know that any printing equipment that is good enough to output realistic banknotes are regulated to embed a protection preventing this use case.

Even more interestingly, and maybe that could help understand that even in the most principled argument there should be a limit: molecular 3d printers able to reproduce proteins (yes, this is a thing) are regulated to recognise a design from a database of dangerous pathogens and refuse to print.

17 days ago

miohtama

Gimp doesn't have the secret binary blob to "prevent counterfeiting" and there is no flood of forged money

https://www.reddit.com/r/GIMP/comments/3c7i55/does_gimp_have...

17 days ago

jpc0

Gimp makes printers now?

17 days ago

mayukh

So guns are ok? How about bombs?

17 days ago

8note

that works for locally hosted models, but if its as a service, openai is publishing those verboten works to you, the person who requested it.

even if it is a local model, if you trained a model to spew nazi propaganda, youre still publishing nazi propaganda to the people who then go use it to make propaganda. its just very summarized propaganda

17 days ago

gus_massa

Does this apply to the spell checker in Office 365 or Google Docs?

17 days ago

jimkleiber

Are hunting knives regulated the same way as rocket launchers? Both can be used to kill but at much different intensity levels.

17 days ago

HeavyStorm

Censorship of tools...

Then let's parents choose when teenagers can start driving.

Also let's legalize ALL drugs.

Weapons should all be available to public.

Etc. Etc.

----

It's very naive to think that we shouldn't regulate "tools"; or that we shouldn't regulate software.

I do agree that on many cases the bad actors who misuse tools should be the ones punished, but we should always check the risk of putting something out there that can be used for evil.

17 days ago

AntiEgo

"Teens having a laugh" can escalate quickly to, "... at someone else's expense," and this distinction is EXACTLY the sort of subtlety an algorithm can't filter.

This does not need to become a thread about bullying and self harm, but it should be recognized that this example is not benign or victimless.

This genie is out of the bottle, let us hope that laws about users are enough when the tools evolve faster than legislative response.

[edit:spelling]

17 days ago

miltonlost

> It is unlikely no one is going to perform act of terrorism with this, or any kind of deep fakes that buy Easter European elections. The worst outcome is likely teens having a laugh.

And the teens are having a laugh by... creating deepfake nudes of their classmates? The tools are bad, and the toolmakers should feel deep guilt and shame for what they released on the world. Do you not know the story of Nobel and dynamite? Technology must be paired with morality.

17 days ago

miohtama

I am sure a school has a way to deal with pupils sharing such images, as the recent cases have proven. Deep fakes or real pictures. It it a social problem with existing framework of decades of proven history and should be dealt so.

17 days ago

tomjen3

I can assure you that at right now teens are sharing real nudes of their class mates. Do you want to restrict cameras and high speed internet too?

17 days ago

Aeolun

Technology is paired with morality. It’s just not the one you want.

17 days ago

botanical76

Is it? It seems to me to be paired with shareholders' interests, and nothing more.

The problem isn't whether we should regulate AI. It's whether it's even possible to regulate them without causing significant turmoil and damage to the society.

It's not hyperbole. Hunyuan was released before Sora. So regulating Sora does absolutely nothing unless you can regulate Hunyuan, which is 1) open source and 2) made by a China company.

How do we expect the US govt to regulate that? Threatening sanction China unless they stop doing AI research???

17 days ago

ssl-3

Easy-peasy. Just require all software to be cryptographically signed, with a trusted chain that leads to a government-vetted author, and make that author responsible for the wrongdoings of that software's users.

We're most of the way there with "our" locked-down, walled-garden pocket supercomputers. Just extend that breadth and bring it to the rest of computing using the force of law.

---

Can I hear someone saying something like "That will never work!"?

Perhaps we should meditate upon that before we leap into any new age of regulation.

17 days ago

kaszanka

This is well on its way thanks to Microsoft's aggressive push to put a TPM in every Windows 11 PC.

17 days ago

ssl-3

That's exactly the kind of logical conclusion I had hoped for someone here to reach in this bizarre sea of emotional pleas.

After over two decades of careful preparation, we're the stroke of a legislative pen away from having all of the software on our computers regulated by our friends in the government.

It's not even a slippery slope argument. In order to be effective, "We must regulate AI!" means the same thing as "We must regulate computer software!"

The two things are so identical that they're not even so different as two sides of the same coin are.

(Be careful what you wish for; you might just get it.)

17 days ago

FrustratedMonky

"to give society time to explore its possibilities and co-develop norms and safeguards"

Or, "this safety stuff is harder than we thought, we're just going to call 'tag you're it' on society"

Or,

-Oppenheimer : speaking "man, this nuclear safety stuff is hard, I'm just going to put it all out there and let society explore developing norms and safeguards".

-Society : Bombs Japan

-Oppenheimer : "No, not like that, oops".

17 days ago

usrnm

Oppenheimer was making a bomb from day 1, he knew exactly what he was doing and how it would be used. There aren't so many different use cases for a bomb, after all. It was a nice movie, but it does not absolve him

17 days ago

Arnt

Aren't you kind of saying that you don't have any answers so therefore OpenAI should have provided the answers?

17 days ago

[deleted]

17 days ago

xvector

Eh, society did a pretty good job overall.

The bomb was the end of conventional warfare between nuclear nations. MAD has created an era of peace unlike anything our species has ever seen before.

17 days ago

rurp

Well it works great, until is doesn't. We're perpetually a few bad decisions from a few possibly deranged actors away from obliterating all of those gains and then some.

17 days ago

xvector

Right, and in the meantime nuclear-armed countries mostly get to avoid the horrible, endless churn of death and war and teenagers being sent off to the meat grinder to push some border here or some border there.

17 days ago

nostromo

The irony is that users want more freedom and fewer safeguards.

But these companies are rightfully worried about regulators and legislatures, often led by a pearl-clutching journalists, so we can't have nice things.

17 days ago

DFHippie

Recent events (many events in many places) show "users" don't think too hard before acting. And sometimes they act with inadequate or inaccurate information. If we want better outcomes, it behooves us to hire people to do the thinking that ordinary users see no point in doing for themselves. We call the people doing the hard thinking scientists, regulators, and journalists. The regulators, when empowered to do so by the government, can stop things from happening. The scientists and journalists can just issue warnings.

geor9e

Pretty off-topic, but yes, domains and land are often bought via shell companies for this reason. They probably didn't settle upon the name Sora until they already secured the .com . That's a famous YC piece of advice. If you can't get the .com then rename. But for domains that everyone wants, like chat.com, OpenAI paid 8 figures for that one.

17 days ago

midasz

Huh didn't realize chat.com redirected to chatgpt

17 days ago

silvestrov

His review video is so much better than the announcement video at explaining what has been released.

17 days ago

gzer0

For the $20/month subscription: you get 50 generations a month. So it is included in your subscription already! Nice.

For the Pro $200/month subscription: you get unlimited generations a month (on a slower que).

17 days ago

To me this is what all AI feels like. People want "hard to make things" because they feel special and unordinary. If anybody with a prompt can do it, it ain't gonna sell

17 days ago

themagician

"People don't want to see arbitrary fake worlds or places on earth that aren't real."

What? This is 90% of the Instagram/TikTok experience, and has been for years. No one cares if something is real. They care how it makes them feel.

The audience for this is every "creator" or "influecner". No one cares if the content is fake. They'll sell you a vacation package to a destination that doesn't exist and people will still rate it 3/5 stars for a $15 Starbucks gift card.

17 days ago

TrackerFF

I know a bunch of marketing people that have fully incorporated these tools into their workflow. So that's one group.

Also seen GenAI replace more and more stock media in many facets of business/professional services.

17 days ago

MatrixMan

> primarily to trick Facebook users

You say it like that's not the majority of the web.

17 days ago

sensanaty

Anyone who wants to waste your time/attention/money(!!) for cheap. Think all the bullshit useless jobs aka marketers, scammers, identity thieves.

Other than that, it's also so people can spam every single website with millions of hours of AI generated spam and earn 7 cents off of the 5000 people the algorithm randomly decides to show it to.

oh yes, Suipercideman

I'm still waiting on the future waves of PTSD from hyper realistic horror games. I can't think of a worse thing to do then hand a kid a VR headset (or game system) and have them play a game that is designed to activate every single fight or flight nerve in the body on a level that is almost indistinguishable from reality. 20 years ago that would have been the plot to a torture porn flick.

Even worse than that is when people get USED to it and no longer have a natural aversion to horrific scenes taking place in the real world.

This AI stuff accelerates that process of illusion but in every possible direction at once.

As much as people don't want to believe it, by beholding we are indeed changed.

17 days ago

dartos

That argument can and probably was pointed towards movies with color, movies with audio before that, comics, movies without audio, books, etc.

I don’t think that slippery slope holds up.

IIRC there’s pretty solid research showing that even children beyond the age of 8 can tell the difference between fiction and reality.

17 days ago

normalaccess

Distinguishing reality from fiction is useful, but it doesn’t shape our desires or define our values. As a culture, we’ve grown colder and more detached. Think of the first Dracula film—audiences were so shaken by a simple eerie face that some reportedly lost control in the theater. Compare that visceral reaction to the apathy we feel toward far more shocking imagery today.

If media didn’t profoundly affect us, how could exposure therapy rewire fears? Why would billions be spent on advertising if it didn’t work? Why would propaganda or education exist if ideas couldn’t be planted and nurtured through storytelling?

Is there any meaningful difference between a sermon from the pulpit and a feature film in the theater? Both are designed to influence, persuade, and reshape our worldview.

As Alan Moore aptly put it: "Art is, like magic, the science of manipulating symbols, words, or images to achieve changes in consciousness."

In my opinion the old adage holds true, you are what you eat. And we will soon be eating unimaginable mountains of artificial content cooked up by dream engines tuned to our every desire and whim.

17 days ago

lmm

ics

There’s also a huge difference in what people, even children, expect when sitting down to watch a movie versus seeing a clip of some funny cat/seal hybrid playing football while I’m looking for the Bluey episode we left off on. My daughter is almost five and cautiously asks “is that real?” about a lot of things now. It definitely makes me work harder when trying to explain the things that don’t look real but actually are; one could definitely feel like it takes some of the magic away from moments. I feel alright in my ability to handle it, it’s my responsibility to try, but it isn’t as simple as the Looney Tunes argument or, I believe, dramatic effects in movies and TV.

17 days ago

zoover2020

Yet, in a movie setting it's clear something is a special effect or alike which is not the case for GenAI. Massive underestimation of the potential impact in this thread, scary.

17 days ago

brookst

Maybe. Or maybe some people massively underestimate our ability to cope with fiction and new media types.

I am sure that there were people decrying radio for all these same reasons (“how will the children know that the voices aren’t people in the same room?”)

17 days ago

kube-system

Not a bad point, those representations have, in some cases, caused widespread misunderstandings among people who learn about those concepts from movies... and this is all while simultaneously knowing "it's just a movie".

17 days ago

mojuba

Yes but a movie is a movie whereas these AI-generated videos will likely be used to replace stock footage in other (documentary, promotional, etc.) contexts

17 days ago

ssl-3

If the producer wants to publish bad physics, they get bad physics.

If the producer wants to publish good physics, they get good physics.

It doesn't matter if it is AI, CGI, live action, stop motion, pen-and-ink animation, or anything else.

The output is whatever the production team wants it to be, just as has been the case for as long as we've had cinema (or advertising or documentaries or TikToks or whatevers).

Nothing has changed.

17 days ago

People don't watch The Matrix expecting a documentary on how we all got plugged in. If someone generated the referenced ladybug movie for use in a science classroom, that's a problem.

17 days ago

fooker

I agree. The issue is in using it for teaching science though, not in generating it.

icepat

> The sinking of the elephant into snow - how deep is too deep? Should there be snow on the elephant or would it have melted from body heat? Should some of the snow fall off during movement or is it maybe packed down too tightly already?

Should there be an elephant in the snow? The layers of possible confusion, and subtle incorrect understandings go much deeper.

17 days ago

bbarnett

Yes, they were used to traverse mountains paths.

17 days ago

sccomps

With the same reasoning, do reindeer actually fly and pull a sleigh carrying a 200-pound man along with tons of gifts? I believe you're underestimating human intelligence and our ability to apply logic and reasoning.

17 days ago

anonu

> inaccurate impressions of physics

Or just inaccurate impressions of the physical world.

17 days ago

Terr_

> A little worried how young children watching these videos may develop inaccurate impressions of physics in nature.

I'm less concerned with physics for children--assuming they get enough time outdoors--and more about adulthood biases and media-literacy.

In particular, a turbocharged version of a problem we already have: People grow up watching movies and become subconsciously taught that flaws of the creation pipeline (e.g. lens flare, depth of field) are signs of "realism" in a general sense.

That manifests in things such as video-games where your human character somehow sees the world with crappy video-cameras for eyes. (Excepting a cyberpunk context, where that would actually make sense.)

17 days ago

gruntbuggly

Fair! I watched a lot of Superman as a kid and I killed myself jumping off a building

17 days ago

dylan604

Don't be an asshole. When learning to fly, learn by starting on the ground first, not from a tall building. --Bill Hicks

17 days ago

skybrian

Yes, entertainment spreads lots of myths. But bad physics from AI movies is only a tiny part of the problem. This is similar to worries about the misconceptions people might get from playing too many video games, reading too many novels, watching too much TV, or participating too much in social media.

It helps somewhat that people are fairly aware that entertainment is fake and usually don’t take it too seriously.

17 days ago

raincole

> A little worried how young children watching these videos may develop inaccurate impressions of physics in nature.

And why don't we worry this about CGI?

CGI is not always made with a full physical simulation, and is not always intended to accurately represent real-world physics.

17 days ago

TeMPOraL

Me too. While I'm generally optimistic about generative art, at this point the models still have this dreamlike quality; things look OK at first glance, but you often get the feeling something is off. Because it is. Texture, geometry, lights, shadows, effects of gravity, etc. are more or less inconsistent.

I do worry that, as we get exposed more and more to such art, we'll become less sensitive to this feeling, which effectively means we'll become less calibrated to actual reality. I worry this will screw with people's "system 1" intuitions long-term (but then I can't say exactly how; I guess we'll find out soon enough).

17 days ago

uludag

Here's the obligatory AI enthusiast answer:

What is physics besides next token/frame prediction? I'm not sure these videos deserve the label "inaccurate" as who's to judge what way of generating next tokens/frames is better? Even if you you judge the "physical" world to be "better", I think it's much more harmful to teach young children to be skeptical of AI as their futures will depend on integrating them in their lives. Also, with enough data, such models will not only match, but probably exceed "real-physics" models in quality, fidelity, and speed.

17 days ago

8note

i wouldnt expect young children to learn how to walk by watching people walk on a screen, regardless of if its a real person walking, or an ai animation.

the real world gives way more stimulus

watching the animations might help them play video games, but i again imagine that the feedback is what will do the real job.

even for the real ladybug video, who says the behaviour on screen is similar to what a typical ladybug does? if its on video, the ladybug was probably doing something weird amd unexpected

17 days ago

darepublic

Sure this is problematic for society although I'm not concerned about what you are mentioning. I remember as a kid noticing how in looney tunes wile e coyote could run off the cliff a few steps and thinking maybe there's a way to do that. Or kids arguing about whether it was possible to perform a sonic boom like in street fighter. Or jumping off the playground with an umbrella etc

17 days ago

Young generation that will grow up with this tools will have completely different approach to anything virtual. Remember how prople though that camera stole part of their soul when they see themselves copied on picture?

17 days ago

I know this sounds judgmental, but this reminds me of the idiom “touch grass”. Children should be outdoors observing real life and not be consuming AI slop. You are not overthinking this, this will most likely be bad for children and everyone in the long run.

17 days ago

tetris11

Also, I guess its just normal for a car lane to just merge seamlessly into a pedestrian zone

17 days ago

[deleted]

17 days ago

hash07e

Yes Bugs bunny and willie the coyote harmed ours physics.

17 days ago

throwawayian

Don’t worry, you are.

17 days ago

andrewstuart

Kids are fine with fiction.

17 days ago

EternalFury

Many people say:

> these things will get bigger and better much faster than we can learn to discern

I would like to ask “Why?”

Clearly, these models are just one case of “NN can learn to map anything from one domain to another” and with enough training/overfitting they can approximate reality to a high degree.

But, why would it get better to any significant extent?

Because we can collect an infinite amount of video? Because we can train models to the point where they become generative video compression algorithms that have seen it all?

16 days ago

herval

> But, why would it get better to any significant extent?

Two years ago, the very best closed-source image model was unable to represent anything remotely realistic. Today, there's hundreds of open source models that can generate images that are literally indistinguishable from reality (like Flux). Not only that, there's an entire collection of tools and techniques around style transfer, facial reconstruction, pose control, etc. It's mindblowing, and every week there's a new paper making it even better. Some of that could have been more training data. Most of it wasn't.

I guess it's fair to extrapolate that same trend to video, since it's the arc text, audio and images have taken? No reason it would be different.

16 days ago

EternalFury

I get that. But, let’s say you have a glass, you fill it to one third, then to half, then to three quarter, then to full. Can you expect to fill it beyond full? Not every process has an infinite ramp.

It seems frontier labs have been throwing all the compute and all the data they could get their hands on at model training for at least the past 2 years. Is that glass a third full or is it nearly full already?

Is the process of filling that particular glass linear or does the top 20% of the glass require X times as much water to fill as the bottom 20%?

16 days ago

herval

I don’t see how that analogy makes any sense. We’re not talking about containers of a known and fixed size here, nor a single technique, nor a single method. Stuff like LLMs using Transformer architectures might have reached a plateau, for instance. But there’s tons of techniques _around_ those models that keep making them more capable (o1, etc), and also other architectures.

16 days ago

[deleted]

16 days ago

rushingcreek

As there was no mention of an API for either Sora or o1 Pro, I think this launch further marks OpenAI’s transition from an infrastructure company to a product company.

17 days ago

metzpapa

It seems like there going that direction - especially the way they setup the Sora interface, It feels its nearing a video editing product.

17 days ago

jrflowers

“Right before the TikTok ban goes into effect” is incredible market timing for the release of a tool that is useless for anything other than terrible TikTok spam videos

17 days ago

jeroenhd

Hey now, no need to downplay the product here, it's also useful for spamming other video sharing platforms! Think Facebook timelines, which are already full of AI image barf, Twitter feeds, which mostly consist of AI text barf, and Youtube Shorts, which is full of existing AI animation barf!

Soon, lots of people can pay a modest sum to make the internet just a worse for everyone in exchange for a chance to make their money back!

17 days ago

itsdev

Who legitimately asked for, or wants this? It's cool on it's face, sure.

What legitimate problem does it solve? Isn't AI supposed to make our lives easier, or is that just "not what it's supposed to be bro", or whatever. I've lost track at this point with all the hallucinations and poor/bad/really fucking bad responses. It's not 100% of the time, but that's the point of companies like OpenAI releasing stuff like this to the public... to be helpful and believable.

Deep fakes were bad enough. Shit like this is not helpful when given to the largely ignorant public. It's not going to be used for anything helpful, conducive, or otherwise beneficial.

It's impressive. Sure. I just fail to see what it's the solution to.

17 days ago

pparanoidd

It's not a solution to anything, it's simply just money. If they don't release a for-profit video generator, someone else will

17 days ago

akomtu

Sorat? https://anthroposophy.eu/Sorat

17 days ago

IanCal

Not available in

> the United Kingdom, Switzerland and the European Economic Area. We are working to expand access further in the coming months

Excellent to announce this lack of access after the launch of pro. At least I have no business reason for sora so it's not a loss there so much but annoying nonetheless.

17 days ago

robomartin

Here's something I find interesting: We have multiple paid accounts with OpenAI. In other words, we are paying customers. I have yet to see a single announcement or new development that we learn about through email. In most cases we learn these things when they get covered by some online outfit, posted on HN, etc.

OpenAI isn't the only company that seems to act in this manner. I find this to be interesting. Your paying customers actively want to know about what you are doing and, more than likely, would love to get a heads-up before the word goes out to the world. Hearing about things from third parties can make you feel like a company takes your business for grant it or does not deem it important enough to feed you news when it happens.

It will not be available in the EU for now. I always feel disadvantaged when I read that sentence

17 days ago

hmmm-i-wonder

I'm not in the EU, but when I see something that is US only, I tend to assume its doing something with privacy/user data/otherwise that is restricted in the EU.

Which means I generally avoid things that are not EU available even if they are available to me. Its not 100% but its a fairly decent measure of how much companies care about users to ensure they meet EU privacy laws from the start, vs if they provide some limited version or delayed version to the EU.

17 days ago

xvector

[flagged]

17 days ago

sksrbWgbfK

I wonder how all those European companies are doing it. They ship everything all the time, avoid the $billions fines, yet make mistakes like everybody else.

> how much the EU slowed down innovation

You say this all the time, yet we're doing fine. How come?

17 days ago

xvector

> I wonder how all those European companies are doing it.

Carefully crafted/gerrymandered laws that only rent seek from American big tech.

> You say this all the time, yet we're doing fine. How come?

You're not doing fine. I don't know how you can look back at the stagnation of the past two decades in the EU and think you're "doing fine." One of our companies is worth more than your entire tech industry. Your engineers get paid a fifth of what they could make here, so they often move here. In tech, you've fallen so far behind others superpowers that it's not even funny, and you're gleefully positioning yourself to fall even further behind. Your relative share of the global GDP is dropping.

You think you're doing fine, but if the EU doesn't plan on amending the regulatory-industrial complex that has caused its undeniable stagnation, it will eventually fall into irrelevancy, and be on the losing side of the rising global wealth inequality.

17 days ago

talldayo

> Carefully crafted/gerrymandered laws that only rent seek from American big tech.

Alright then; who else should have been covered with the DMA in your opinion? Which other companies created unfair tax arrangements that have avoided scrutiny for decades?

Oh, nobody as large as Apple? Huh. Sounds like they're not targeting American companies at all, but instead prioritizing the biggest violators.

16 days ago

rtsil

Maybe if a lot of you in big tech hadn't misused our personal data and sold it to the highest bidders, or hadn't stiffled small tech innovations through monopoly, the EU wouldn't need to regulate you so hard.

17 days ago

talldayo

Bingo. People on this website love Apple products so much that they can't see past their own materialism to admit Apple is a bad business. It's fine to like Jony Ive's designs; fact of the matter is that Tim Cook is preventing innovation with his business decisions. Apple users are being segregated from novel and useful software because the first-party distributor gets cold feet thinking about it.

I guess they'll get their rude awakening someday. If xvectors comments here are any indication, it seems like they're starting to get out of the proverbial bed at least.

16 days ago

bcye

This is quite an exaggeration. Afaik there has been only a single GDPR fine over 1 billion € (Meta) and for some reason Apple seems to manage just fine (with GDPR).

17 days ago

xvector

> for some reason Apple seems to manage just fine (with GDPR).

Just fine?

Like the EU forcing Apple to pay $14B in back taxes after voiding a legal and consensual tax agreement between Apple and Ireland? [1]

Or the DMA resulting in an absurd $2B fine related to music streaming, in a transparent attempt to prop up Spotify (the dominant market leader in this space)?

Both of these in the last couple of months alone? It's just rent-seeking with a pretend "we're doing it for the good of the people" facade.

[1]: https://en.wikipedia.org/wiki/Apple%27s_EU_tax_dispute

[2]: https://www.reuters.com/technology/apple-set-face-fine-under...

> and the EU stepped in and "re-interpreted" it to rent-seek.

No, they overrode the Irish decision because it was illegally anticompetitive. Please stop using Hacker News if your intention is to solely be butthurt over unfair rulings when they get corrected. Everyone on this website knows that Apple wields illegal anticompetitive power, nobody here should be surprised when Apple is forced to remediate tax fraud and deliberate DMA violations.

> The EU suffers economically when it falls behind technologically.

Well then it's a good thing Apple isn't leading the industry.

"Noooooo! Think of how many Vision Pro sales that Apple would miss out on by pulling out of Europe!" ...said nobody ever.

16 days ago

hmmm-i-wonder

>> for some reason Apple seems to manage just fine (with GDPR).

toomuchtodo

From "12 Days of OpenAI: Day 3"

https://www.youtube.com/watch?v=2jKVx2vyZOY (live as of this comment)

17 days ago

bbor

Over now, and pretty short/light on info AFAICT. That said, knowing what we know now about Altman made me physically unable to watch while he engages in sustained eye contact with the camera, so maybe missed something while skimming! On the upside, I'm so glad we have three billionaires cultivating three different cinema-supervillain vibes (Musk, Altman, & Zuckerberg). Much more fresh than the usual "oil baron" aesthetic that we know from the gilded age

17 days ago

liendolucas

In a not so distant future we might need to have some sort of regulation that forces uploaders (or content creators) to declare if videos have been generated with ai tech or not and depending on the content such declaration might carry legal consequences. On the other side hosting platforms should display clearly if such content was declared ai generated or not as well. Right now I can't see a simple and good enough solution as this that could mitigate the spread of malicious content.

16 days ago

i5heu

Or we will have some cryptographic going on that is connected with your ID to prove that you are indeed human.

Trust on a society level is some other beast of difficult problem.

16 days ago

jusonchan81

My hunch is that it should be easy to decipher if it’s a AI video based on how the frames transitions.

16 days ago

ilaksh

This is actually a different version from what they had before. What they released today is Sora Turbo.

17 days ago

madihaa

Account creation currently unavailable

17 days ago

The mammoths are walking over some pre-existing footprints, but they don't leave any prints of their own. I guess I'm getting hung up on little things. For a prompt of a few words, it looks pretty nice!

17 days ago

adultsthroaway

Genuinely curious who is doing this for adult content?

Complaints about Sora's quality and prompt complexity likely not as important to auteur's in that category, especially with ability to load a custom character etc

17 days ago

This seems pretty broken at the moment, I haven't actually managed to create a video, every prompt results in "There was an unexpected error running this prompt".

17 days ago

knicholes

At least you get to even see the page! I'm seeing "Sign ups are temporarily unavailable We’re currently experiencing heavy traffic and have temporarily disabled sign ups. We’re working to get them back up shortly so check back soon."

17 days ago

ilaksh

I can't even sign up. I assume it's a capacity issue.

17 days ago

petercooper

I wonder what it is about EU and UK law, in particular, that restricts its availability there. Their FAQs don't mention this.

If it's about training models on potentially personal information, the GDPR (EU and UK variants) kicks in, but then that hasn't restricted OpenAI's ability to deploy (Chat)GPT there. The same applies to broader copyright regulations around platforms needing to proactively prevent copyright violation, something GPT could also theoretically accomplish. Any (planned) EU-specific regulations don't apply to the UK, so I doubt it's those either.

The only thing that leaves, perhaps, is laws around the generation of deepfakes which both the UK and EU have laws about? But then why didn't that affect DALL-E? Anyone with a more detailed understanding of this space have any ideas?

17 days ago

MrKristopher

A lot has changed since ChatGPT was released. https://en.wikipedia.org/wiki/Digital_Markets_Act wasn't in effect back then. Microsoft hadn't made their big investment yet either. OpenAi is a growing target, and the laws are becoming more strict, so they need to be more cautious from a legal perspective, and they need to consider that compliance with EU laws will slow down their product development.

17 days ago

ilaksh

Part of it might also be capacity problems.

17 days ago

jiggawatts

“The version of Sora we are deploying has many limitations. It often generates unrealistic physics and struggles with complex actions over long durations. Although Sora Turbo is much faster than the February preview, we’re still working to make the technology affordable for everyone.”

So they demo the full model and release the quantised and censored model.

Does anyone else find this kind of bait & switch distasteful?

17 days ago

echelon

You don't need to worry. Open source video is already pulling ahead of closed source.

Hunyuan [1] is better than Sora Turbo and is 100% open source. It's got fine tuning code, LoRA training code, multiple modalities, controlnets, ComfyUI compatibility, and is rapidly growing an ecosystem around it.

Hunyuan is going to be the Stable Diffusion / Flux for video, and that doesn't bode well for Sora. Nobody even uses Dall-E in conversation anymore, and I expect the same to hold true for closed source foundation video models.

And if one company developing foundation video models in the open isn't good enough, then Lightricks' LTX and Genmo's Mochi should provide additional reassurance that this is going to be commoditized and made readily available to everyone.

I've even heard from the Banodoco [2] grapevine that Meta is considering releasing their foundation video model as open source.

[1] https://github.com/Tencent/HunyuanVideo/

[2] Banodoco is one of the best communities for open source foundation AI video; https://banodoco.ai/

17 days ago

mewpmewp2

Maybe, but alternative would be to not demo results with state of the art processing at all, which I wouldn't like either.

17 days ago

[deleted]

17 days ago

matthewmorgan

"Sora is not available in The United Kingdom yet". Available elsewhere, from Albania to Zimbabwe. Any particular reason why?

17 days ago

jack_pp

I'm surprised they put in 2 legged poodles

17 days ago

mark_l_watson

I have been subscribing to ChatGPT Plus for a long time. I just cancelled my subscription today because every time I try to login to sora.com, I get the too busy message. I have never been able to try it. Pissed me off.

Hopefully these types of issues blow over as they increase capacity or load decreases.

The lengthy generation times aren't fun to deal with though in any case. As good as the UX for the app itself is, there's little they can do about how long it takes for a video to generate compared to images. The near instant feedback is gone (just like old times)

17 days ago

adregan

Why keep building AI to do the things that people find fun to do rather than the mundane bullshit? All we’ll be left with is cleaning, folding laundry, and doing the dishes while AI does all the interesting things.

17 days ago

amelius

Because we don't have as much data about mundane bullshit.

17 days ago

bronco21016

How do we get it? Serious question. The take makes sense but how do we digitize doing the dishes?

17 days ago

sandwichsphinx

not dishes but the other day I saw this recent papers for clothes

>RoboHanger: Learning Generalizable Robotic Hanger Insertion for Diverse Garments

https://arxiv.org/abs/2412.01083

>To overcome the challenge of limited data, we build our own simulator and create 144 synthetic clothing assets to effectively collect high-quality training data.

the strategy is simulation

17 days ago

zlies

Is there information when it will be available in other countries, like Germany for example?

17 days ago

LukaD

What do you mean? “Sora is here” is not enough?

Sorry for the sarcasm but I’m just tired of this fuck Germany attitude by certain companies.

16 days ago

dcchambers

Meh. It's a cool POC and immediately useful for abstract imagery, but not for anything realistic.

[deleted]

17 days ago

Nina1000

The competition in text-to-video tools is heating up, but a key challenge remains: achieving desired results without exhausting resources. Runway, for instance, often consumes all your credits before producing something usable, even if you stick to their guidelines. Hailuo AI shows better consistency, while Sora Turbo sounds promising with potentially more mature generations. Progress is clear, but there’s still a way to go in perfecting these tools.

16 days ago

domid

I've been thinking about your point regarding credit consumption with these tools. I started exploring this at https://liveimage.ai (an AI video avatar generator) with a low-res preview system first. I'm curious if you think this kind of approach would be more useful - generating quick low-res previews that use minimal credits, then only processing high-res versions once you're happy with the result? Seems like it could help avoid wasting resources on full quality generations that don't match what you're looking for. I often wonder if jumping straight to high-res without previewing is partly why people burn through credits so quickly. Would love to hear if you've encountered other tools taking this kind of approach.

15 days ago

system2

I wonder when in the future ai images and videos will be remotely useful and easy to create. These are still weird and garbage quality.

17 days ago

rossjudson

Sora makes movies less interesting, regardless of how they're created.

The part made by Sora? About as interesting as the latest chess programs doing well at chess. woohoo/nice job.

The overall effect? Now we spend mental energy trying to figure out which parts are machine generated, and hence not worth anything. That mental energy is gone, sucked out of the cultural economy, and fed to the machinery of mediocrity.

17 days ago

SideQuark

If you have to spend mental energy trying to figure out which is which solely to blindly and automatically disregard something, maybe there’s useful stuff there to enjoy instead.

I certainly don’t dislike all the cool movies where special effects are CG just because the old time stop motion artists from 1950s Flash Gordon aren’t using sparklers. Similarly I’m not going to discount new creation that can be enjoyable no matter the provenance.

17 days ago

lmm

Huh? AI chess is much more interesting to watch than human chess.

17 days ago

avree

Star Wars? Not worth anything, huge reliance on "machine-generated" imagery. Lord of the Rings? Useless film, uses "machine-generated" imagery. Don't even get me started on anything by Pixar.

17 days ago

rossjudson

You know what I mean, ChatGPTWhatever. Stay out of human business.

17 days ago

sergiotapia

sorry for the tangent: can't remember a launch they've had where you could just use it. it's always "rollout", "later this quarter", "select users", what's the deal here?

The most impressive part is the temporal consistency in the demo videos.

He flower one is the best looking.

17 days ago

tetris11

That cat skateboarding off the path cut out just when it was getting interesting.

So we are now a few years into the AI video thing.

HunYuan is all everybody on Banodoco and the broader comfy ecosystem are talking about. And that's with Lightricks' LTX model having just been announced too.

HunYuan is seriously amazing and it looks like it'll be the Flux/Stable Diffusion of AI video.

Sora is cooked.

17 days ago

msp26

It depends on where you're looking haha.

17 days ago

This is imperfect but also the best people ever really do in the general case, and just orders of magnitude better than most people are currently doing

The issue isn't models like this, it's that people are eating a ton of information but have been strongly encouraged to be credulous, and a lion's share of that training is directly coming from the tech grift industrial complex

Interesting creative people are currently creating interesting output _without_ generative AI.

These tools are fascinating, though I can't help but feel that the main benefactor after all is said and done will be venture capitalists and tech/entertainment execs.

17 days ago

zombiwoof

Win me an Academy Award

17 days ago

joshstrange

OpenAI is a masterclass in pissing off paying customers.

I'm just about ready to cancel my ChatGPT subscription and move fully over to Claude because OpenAI has spit in my face one too many times.

I'm tired of announcements of things being available only to find out "No, they aren't" or "It's rolling out slowly" where "slowly" can mean days, weeks, or month (no exaggeration).

I'm tired of shit like this:

    Sign ups are temporarily unavailable
    We’re currently experiencing heavy traffic and have temporarily disabled sign ups. We’re working to 
    get them back up shortly so check back soon.

> We’re releasing it today as a standalone product at Sora.com to ChatGPT Plus and Pro users.

No you aren't, you might be rolling it out (see above for what that means) but it's not released, I'm a ChatGPT Plus user and I can't use it.

17 days ago

EliBullockPapa

I really don't think it's reasonable to expect them to onboard what is likely tens of thousands of sign ups in the first hour.

17 days ago

minimaxir

ChatGPT has far, far more concurrent users than tens of thousands. Sora is not a small hobby project by an amateur hacker that blew up.

17 days ago

joshstrange

I don't disagree, what I'm asking for is "truth in advertising". I'm not saying they need to give everyone access on day 1, I'm saying don't _say_ you've given everyone access if you haven't.

17 days ago

thomastraum

"a tool never made an artist"

so incredible ugly.

16 days ago

iamleppert

Yawn, there are literally 10 different apps and wannabe startups that do video generation and AI videos have already flooded social media. This doesn't look any better than what is and has been already available to the masses. OpenAI announced this ages ago and never did give people access, now competitors have already captured the AI generated video for social media slop market.

We have yet to see any kind of AI created movie, like Toy Story was for computer 3D animation.

no API = not good enough

no pay per use = overpriced

17 days ago

zb3

not available in the EU = might use everything you did there against you, sell that data to the higgest bidder

17 days ago

einsteinx2

It's pretty cool though, the kind of thing that'd be hard if it was what you actually wanted!

17 days ago

vunderba

"The Pelican inexplicably morphs to cycle in the opposite direction half way through"

Oof, if sora can't even manage to maintain an internal consistency of the world for a 5 second short, I can't imagine how exacerbated it'll be at longer video generation times.

17 days ago

benatkin

That's an awful result. It turning around has absolutely nothing to do with what you asked for. It's similar in nature to what the chatbot in the recent and ongoing scandal said, saying to come home to her, when it should have known that the idea would be nonsensical or could be taken to mean something horrendous. https://apnews.com/article/chatbot-ai-lawsuit-suicide-teen-a...

So you were lucky indeed to be able to run your prompt and share it, because the result was quite illuminating, but not in a way that looks good for Sora and OpenAI as a whole.

17 days ago

vletal

Image details 9/10 Animation 3/10 Temporal consistency 2/10

Verdict 4/10

17 days ago

ByThyGrace

Did you notice the frame rate (so to speak) of what's happening down the lake is much lower than the pelican's bicycle animation?

17 days ago

pushcx

I don't have a lot of mental model for how this works, but I was surprised to note that it seems to maintain continuity on the shapes of the bushes and brown spots on the grass that track out of frame on the left and then reappear as it pans back into frame.

17 days ago

[deleted]

17 days ago

benatkin

That must be exactly it. The simulated scene extends beyond what the camera is currently capturing.

17 days ago

alberth

Thanks, would you mind elaborate more on what you wrote below:

  Sora is built entirely around the idea of directly manipulating and editing and remixing the clips it generates, so the goal isn't to have it produce usable videos from a single prompt.

17 days ago

simonw

If you watch the OpenAI announcement they spend most of their time talking about the editing controls: https://www.youtube.com/watch?v=2jKVx2vyZOY

17 days ago

rjtavares

One of the highlights of any model release for me is checking your "pelican riding a bicycle" test.

17 days ago

17 days ago

marban

> if you don't add anything original in your prompt

Define "original". You could generate a pregnant Spongebob Squarepants and that would be original, but it would still be noise that doesn't inherently expand the creative space.

> don't spend much time selecting

That's the unexpected issue with the proliferation of generative AI now being accessible to nontechnical people. Most are lazy and go with the first generation that matches the vibe, which is the main reason why we have slop.

17 days ago

999900000999

Imagine a movie like Napoleon, but instead of needing 100 million and thousands of extras, you just need 5 actors and maybe a budget of 50k.

You could get something much more creative or historically accurate than whatever Hollywood deems marketable.

I think about AI like any other tool. For example I make music using various software.

Are drum machines cheating? Is electronic music computer sloop compared to playing each instrument.

Is using a Mac and a 1k mic over a 30k studio cheating ?

17 days ago

xena

The main comparator is Kasane Teto and Suno. Kasane Teto is functionally a piano that uses generative AI for vocal synthesis: https://youtu.be/s3VPKCC9LSs. This is an aid to the creative process. Suno lets you put in a description and completely bypass the creative process by instantly getting to the end: https://youtu.be/UpBVDSJorlU

Kokoro is art. Driveway is content. Art uses the medium and implementation to say something and convey messages. Content is what goes between the ads so the shareholders see a number increase.

I wish there were more things like Kokoro and less things like Driveway.

17 days ago

999900000999

What if your making a short movie and driveway is playing in the background during a scene.

It's like everything else. It's just a tool.

You can create an entire movie using a high end phone with quality that would have cost millions 40 years ago. Do real movies need film?

17 days ago

tguedes

My hope is that it will be the death of the aggregators and there will be more value in high quality and authentic content. The past 10-15 years has rewarded people who appeal to the aggregation algorithms and get the most views. Hopefully going forward theres going to be more organic, word of mouth recommendations of high quality content.

17 days ago

skepticATX

I felt this same way as image generation was rapidly improving, but I've been caught by surprise and impressed with how resilient we have been in the face of it.

Turns out it's surprisingly, at least for me, to tune out the slop. Some platforms will fall victim to it (Google image search, for one), but new platforms will spring up to take their place.

17 days ago

huijzer

Put more weight on your subscriptions. I don’t have much AI content in my YouTube suggestions. (Good luck AI generating an interview with Chris Lattner or Stephen Kotkin for example. It won’t work.)

17 days ago

yaj54

It will work within thousands of days.

17 days ago

ghita_

yeah i already have so many AI-generated videos in my feed on all social media it's insane. i spot them from far for now but at some point i'll just be consuming content that took seconds to generate just to get money

17 days ago

[deleted]

17 days ago

meetpateltech

Pricing:

Plus Tier (20$/month)

- Up to 50 priority videos (1,000 credits)

- Up to 720p resolution and 5s duration

Pro Tier (200$/month)

- Up to 500 priority videos (10,000 credits)

- Unlimited relaxed videos

- Up to 1080p resolution, 20s duration and 5 concurrent generations

- Download without watermark

more info: https://help.openai.com/en/articles/10245774-sora-billing-cr...

17 days ago

jsheard

Called it, they were sitting on Sora until the $200 tier launched. Between the watermarking and 50 video limit the $20 tier is functionally a trial.

17 days ago

cube2222

Worth noting here that this is the existing ChatGPT subscription, you don’t need a separate one.

17 days ago

throwup238

From the FAQ [1], too:

>> Can I purchase more credits?

> We currently don’t support the ability to purchase more credits on a one-time basis.

> If you are on a ChatGPT Plus and would like to access more credits to use with Sora, you can upgrade to the Pro plan.

Ouch. Looks like they're really pushing this ChatGPT pro subscription. Between the watermark and being unable to buy more credits, the plus plan is basically a small trial.

[1] https://help.openai.com/en/articles/10245774-sora-billing-cr...

17 days ago

dbspin

Wow they're watermarking videos and limiting them to 720 at the 20 dollar price point? That's a bold move, considering their competition's pricing...

https://www.klingai.com/membership/membership-plan

Quality seems relatively similar based on the samples I've seen. With the same issues - object permanence, temporal stability, physics comprehension etc, being present in both. Kling has no qualms about copyright violation however.

17 days ago

minimaxir

At OpenAI's $20/mo price point, you can also only generate 16 720p 5s videos per month.

Kling doesn't seem to have more granular information publically but I suspect it allows for more than 16 videos per month.

17 days ago

dbspin

You can do more than 16 videos for free on Kling per month. Let alone with their price plans. I'm sure it's not equivalent in capability, but all these models suffer from the same technical issues understanding prompts and maintaining physics / temporal coherence anyway.

17 days ago

[deleted]

17 days ago

kelseyzachow84

The 'art' is always good enough to trick most humans at a glance but clearly fake, plastic, and soulless when you look a bit closer. It has instilled somewhat of a paranoia in me when browsing images and genuinely worsened my experience consuming art on the internet overall. I've just recently found out that a jazz mix I found on YouTube and thought was pretty neat is fully AI generated, and the same happens when I browse niche artstyles on Instagram. Don't get me started on what this Sora release will do...

It changed my relationship consuming art online in general. When I see something that looks cool on the surface, my reaction is adversarial, one of suspicion. If it's recent, I default to assuming the piece is AI, and most of the time I don't have time or effort to sleuth the creator down and check. It's only been like a year, and it's already exhausting.

No one asked for AI art. I don't understand why corporations keep pushing it so much.

17 days ago

huehehue

There's this FinTech ad on the NYC subway right now. I can't remember the company, but the entire ad is just a picture of a guitar and some text.

Anyway, the guitar is AI generated, and it's really bad. There are 5 strings, which morph into 6 at the headstock. There's a trem bar jammed under the pickguard, somehow. There's a randomly placed blob on the guitar that is supposed to be a knob/button, but clearly is not. The pickups are visually distorted.

It's repulsive. You're trying to sell me on something, why would you put so little effort into your advertising? Why would you not just...take a picture of a real guitar? I so badly want to cover it up.

17 days ago

imiric

> You're trying to sell me on something, why would you put so little effort into your advertising? Why would you not just...take a picture of a real guitar?

Is this not evident? Because using AI is much cheaper and faster. Instead of finding the right guitar, paying for a good photographer, location, decoration, and all the associated logistics, a graphics designer can write a prompt that gets you 90% of the vision, for orders of magnitude less cost and time. AI is even cheaper and faster than using stock images and talented graphic designers, which is what we've been doing for the past few decades.

All our media channels, in both physical and digital spaces, will be flooded with this low-effort AI garbage from here on out. This is only the beginning. We'll need to use aggressive filtering and curation in order to find quality media, whether that's done manually by humans or automatically by other AI. Welcome to the future.

17 days ago

huehehue

I was able to find a similar public domain image in all of 5 seconds, so neither faster nor cheaper in this case.

In fact, it's not hard to imagine people using AI tools even if they're slower, more expensive, and yield worse quality results in the long run.

"When all you have is a hammer...".

17 days ago

DebtDeflation

Just need to add a hand with 6 fingers strumming it and it could be a meme.

17 days ago

wumeow

Reminds me of the new Coca Cola Christmas ad which is equally off-putting.

17 days ago

imiric

I don't understand why you see a distinction between models that generate text, and those that generate images, video or audio. They're all digital formats, and the technology itself is fairly agnostic about what it's actually generating.

Can't text also be considered art? There's as much art in poetry, lyrics, novels, scripts, etc. as in other forms of media.

The thing is that the generative tech is out of the bag, and there's no going back. So we'll have to endure the negative effects along with the positive.

17 days ago

quenix

Simple: I am equally offput when LLMs are used for generating poetry, lyrics, novels, scripts, etc. I don't like it when low-effort generated slop is passed off as art.

I just think that LLMs have genuine use for non-artistic things, which is why I said it's dangerous but may be useful if we play our cards right.

17 days ago

imiric

I see. Well, I agree to an extent, but there's no clear agreement about what constitutes art with human-generated works either. There are examples of paintings where the human clearly "just" slapped some colors on a canvas, yet they're highly regarded in art circles. Just because something is low-effort doesn't mean it's not art, or worthy of merit.

So we could say the same thing about AI-generated art. Maybe most of it is low-effort, but why can't it be considered art? There is a separate topic about human emotion being a key component these generated works are missing, but art is in the eyes of the beholder, after all, so who are we to judge?

Mind you, I'm merely playing devil's advocate here. I think that all of this technology has deep implications we're only beginning to grapple with, and art is a small piece of the puzzle.

17 days ago

quenix

You make a good point. I'm just spitballing here, but I think what sets generative art apart for me is the element of deception.

I'd be perfectly fine with a hypothetical world in which all generated art is clearly denoted as such. Like you said, art is in the eyes of the beholder. I welcome a world in which AI art lives side-by-side with traditional art, but clearly demarcated.

Unfortunately, the reality is very different.

AI art inherently tries to pass off as if it were made by a human. The result of the tools released in the past year is that my relationship with media online has become adversarial. I've been tricked in the past by AI music and images which were not labelled as such, which fosters a sort of paranoia that just isn't there with the examples you mentioned.

Yeah it's a desire to do so in a really short amount of time because there's other things I prioritize.

14 days ago

ronsor

I'm glad someone else said this. Hopefully we can get rid of that terrible disruptive camera too.

17 days ago

tim333

There's some of that but it produces some cool stuff too. I mean you have these new virtual worlds like this that didn't exist before https://youtu.be/y_4Kv_Xy7vs?t=13

The video there is kind of a combination of human design and AI which produces something beyond that which either would come up with on their own.

17 days ago

I am so intrigued with the new sora release. I hope it turns out well.

16 days ago

Daniela4565

[flagged]

> You all understand that Sam, the gaylord of OpenAI, is here to control you, correct?

I see this kind of comment from time to time. do you have any evidence to support this claim, or just paranoia vibes?

17 days ago

ruskyrubble

[flagged]

17 days ago

khushy

I can't wait for the safety features because I know there are those in society that would do bad things. But not me, though. I'd like the unlocked version.

17 days ago

ruskyrubble

[flagged]

17 days ago

rvz

That’s around more than 20+ VC-backed AI video generation startups destroyed in a microsecond and scrambling to compete against Sora in the race to zero.

17 days ago

ALittleLight

I'm not sure the little details are enough of a moat. Consider TikTok - people use cheap "special effects" to get the message across, e.g. if a man is playing a woman he might drape a towel over his head - it's silly and low quality but it gets the idea across to the viewer. Think too about programs like Archer or South Park that have (stylistically) low quality animation but still huge fan bases.

What I think this will unlock, maybe with a bit of improvement, is low quality video generation for a vast number of people. Do you have a short film idea? Know people with some? Likely millions of people will be able to use this to put together good enough short films - that yes, have terrible details, but are still good enough to watch. Some of those millions of newly enabled videos will have such strong ideas or writing behind them that it will make up for, or capitalize on, the weak video generation.

As the tools become easier, cheaper, faster, better etc more and more hobbyists will pick them up and try to use them. The user base will encourage the product to grow, and it will gradually consume film (assuming it can reach the point of being as or nearly as good as modern special effects).

I think of it like - when Steven Spielberg was young he used an 8mm camera, not as good as professional film equipment in the day, but good enough to create with. If I were a high school student interested in film I would absolutely be using stuff like this to create.

17 days ago

whynotminot

> What I think this will unlock, maybe with a bit of improvement, is low quality video generation for a vast number of people. Do you have a short film idea? Know people with some? Likely millions of people will be able to use this to put together good enough short films - that yes, have terrible details, but are still good enough to watch.

Sure, this is already happening on Reels, Tik Tok, etc. People are ok with low quality content on those platforms. Lazy AI will undoubtedly be more utilized here. But I don’t think it’s threatening Hollywood (well, aside from slowly destroying people’s attention spans for long form content, but that’s a different debate). People will still want high quality entertainment, even if they can also be satisfied with low fidelity stuff too.

I think this has always been true — think the difference between made for TV CGI and big-budget Hollywood movie CGI. Expectations are different in different mediums.

This current product is not good enough for Hollywood. As long as people have some desire for Hollywood level quality, this will not take those jobs.

The big caveat here is “yet” — when does this get good enough? And this is where my skepticism comes in, because the last mile is the hardest, and getting things mostly right isn’t really good enough for high quality content. (Remember how much the internet lost it over a Starbucks cup in Game of Thrones?)

The other caveat is maybe that our minds melt into stupidity to the point that we only watch things in low fidelity 10 seconds clips that AI can capably run amock with. In which case I don’t really think AI actually takes over Hollywood so much as Hollywood — effectively high fidelity long form content — just ceases to exist altogether. That is the sad timeline.

17 days ago

I'd take that bet at 10:1 odds.

17 days ago

onlyrealcuzzo

I'd be careful.

OpenAI could be a big enough bubble in less than 5 years to buy the Oscar winner, even if the film is terrible.

Also, OP only said "an Oscar".

The Oscar committee could easily get themselves hyped enough on the AI bubble, to create an AI Oscar Film award.

No one said anything about making a "good" movie.

17 days ago

mdp2021

> OP only said "an Oscar"

...For soundtrack. (Sorry.)

But seriously: like the democratization which made music production cheap brought some interesting or commercially successful endavours, the increased effort from people who could not bring their dreams to reality because of the basic constraint of budget will probably bring some very good results, even anthology worth - and lots of trash.

17 days ago

rsynnott

... Have you _seen_ the output from these things? I'm not sure actors need to panic just yet.

17 days ago

qilthewise

I mean thats a bold claim. I'd first let chatgpt win an Oscar for writing the best screenplay, and only then would Sora come into the picture.

17 days ago

null_investor

I hope somebody pays 100.000 pro subscriptions and uses AI to request Sora to generate videos 24/7. Maybe Elon?

Even if they use queues, I'm sure they are running at a loss and the GPU time is going to cost 100x more than what they charge.

Creating false demand for AI can easily bankrupt their business, as they will believe people actually want to use that crap for that purpose.

17 days ago

minimaxir

Deliberately wasting electricity isn't exactly a moral win.

17 days ago

toasteros

Generative AI is a waste of electricity by definition.

17 days ago

mdp2021

> by definition

"Definition" does not mean "...plus your own assumptions".

The results are there. Optimal, no; somehow valuable, yes.

17 days ago