The Open Source KING is BACK. Stability's NEW AI Image Generator!

Рет қаралды 51,849

MattVidPro AI

Жүктеу

Пікірлер: 262

@matsleonrichter5305
7 ай бұрын
Thanks for covering our work, thrilled to see how our research gets adopted this way. Also, I still find it hilarious that "Würstchen" stuck as the name of our architecture. Sorry in advance for all non-German speakers who break their tongues while trying to pronounce it.
@paulmuriithi9195
7 ай бұрын
Wow never knew tounge could be broken..must be a bony tounge
@Athari-P
7 ай бұрын
I'll just call it Worse Ten.
@CryptoTonight9393
7 ай бұрын
Small sausage?
@KurtWoloch
7 ай бұрын
@@CryptoTonight9393Yeah, small sausage... that's the translation. It's actually hard to find an English sequence of characters that sounds remotely like "Würstchen"... the "ü" being an umlaut of "u" which isn't used in English, and the "ch" is a single phoneme as well... imagine "k" but speaking it softly... in a way "ch" would be to "k" what "f" is to "p".
@RideShareRocks
7 ай бұрын
The biggest problem I have is duplicating a face I've created in different poses. It's infuriating.
@MattVidPro
7 ай бұрын
King's back.
@PigeonyStudios
7 ай бұрын
Emperor Pigeon is back (me)
@The_Questionaut
7 ай бұрын
@@SW-fh7heit's subjective
@hipjoeroflmto4764
7 ай бұрын
@@SW-fh7hestop boofing monster energy
@SW-fh7he
7 ай бұрын
@@hipjoeroflmto4764 what do you mean?
@CrystalBreakfast
7 ай бұрын
The king never left. 😅
@Lar_me
7 ай бұрын
Trying to wrap my head around how it can get a 1024x1024 image from 24x24 o_o I really REALLY want to see Stability's models pull ahead of the competition soon! I hope the (supposedly) easier training times can allow Stable Cascade to reach Midjourney's level of detail somehow.
@pon1
7 ай бұрын
It probably can, this is only the base model, it is very general so it can probably do a lot better than SDXL when finetuned, and SDXL can achieve Midjourney level of detail in some circumstances (like in Fooocus using certain styles and settings).
@jeffwads
7 ай бұрын
Reminds one of the quants.
@kuromiLayfe
7 ай бұрын
Just wait till they figure out how to encode the image in subpixels 😂 1024x1024 encoded to 0.2 x 0.2 pixels
@jonmichaelgalindo
7 ай бұрын
@@kuromiLayfe You can actually escape the pigeon hole limit by just setting the font size to 0.
@Kylo27
7 ай бұрын
Lol
@anthony_leckie
7 ай бұрын
Great video as always, Matt. Very happy to see this new model. I got my first job using stable diffusion and video diffusion 1.1 last week. Very happy to see the new model.
@oholimoli
7 ай бұрын
"Würstchen" is german and the translation could be "small sausage" 😂
@MattVidPro
7 ай бұрын
ah..
@christianstein5130
7 ай бұрын
always funny to hear ü, ä und ö in english, in poland its easier to do a "smaller" version of a word, like wodka is small woda (water)
@hy7at
7 ай бұрын
I watched this whole thing mainly because of Matt saying "Würstchen" multiple times throughout this video 😁
@pierruno
7 ай бұрын
Haha
@vitesh6429
7 ай бұрын
I translated it from german, and it translated to 'hot dog'
@martianingreen
7 ай бұрын
As a German speaker that's a really funny architecture name, literally just means sausage 😅
@CanadaBlue85
7 ай бұрын
Sausage AI™
@pierruno
7 ай бұрын
Haha
@sasbe1852
7 ай бұрын
*The trivialization of sausage, to be more precise.
@justinwhite2725
7 ай бұрын
I used to work at a german pub called Wurst. Closed during the pandemic.
@jtjames79
7 ай бұрын
Even the text kerning was basically perfect. 😯
@GearForTheYear
7 ай бұрын
Anyone else get the feeling that we're hitting diminishing returns with what's possible using the current NN architectures?
@pepenakamoto3675
7 ай бұрын
Yes. But I think there is a clear movement of capital and intelligence towards advancement in other areas of AI
@GearForTheYear
7 ай бұрын
@@blakecasimir Right, I agree. It's just a bummer that we may see another protracted plateau before getting something genuinely revolutionary to use within a commercial context (i.e better than humans). The Transformer arch is so close and yet so far away.
@BeginningInfluence55
7 ай бұрын
@@GearForTheYearYou are right in terms of image fidelity/aesthetics. It won’t get any better than midjourney v6. However prompt understanding and following is still not optimal. DALL-E 3 shows that it can be much better still. The problem is the training data. They lack more concepts than they provide. You can’t create truly creative images because for example there is no training example of a horse riding a human - so it can’t do it at all.
@clickpwn
7 ай бұрын
It’s not just the limitation of the architecture. Lot of it stems from the limitation of our language itself. We train and guide these models by using natural language however words are not sufficient for pinpointing an exact image you are looking for. One picture is worth more than thousand words and and using just few sentences as prompt will only get you just general image that could look okay but not exactly what you want down to nuance. Even if AI becomes smarter than humans, it still cannot read your mind and have only your words to go off of. Words carry too low-bandwith of information and only breakthrough I can think of is when we are able to upload our mind and thoughts directly to AI.
@jdietzVispop
7 ай бұрын
@@clickpwngreat comment! So what to do about it?
@ahsookee
7 ай бұрын
Würstchen is pronounced Vürst-yen. V as in view, ü like the u in lurk, st as in stash and yen like the currency.
@1Know1tHurts
7 ай бұрын
Americans never give a fuck about how names and words from other languages are pronounced.
@chanm01
7 ай бұрын
Just awesome. I kinda lost interest in text-to-image for a while. It isn't reliable enough to use in commercial applications yet (imo), and it didn't feel as competitive as text gen where almost every week there was news. Nice to see open source text-to-image making progress towards catching up to the state of the art in this field.
@Athari-P
7 ай бұрын
Open-source isn't catching up with gpt-4, gpt-4 is still costly, gpt-5 tier doesn't exist. Overall, pretty meh too.
@2CSST2
7 ай бұрын
Matt, I absolutely adore all your videos, but 42 is not orders and orders of magnitude greater than 8, it is barely half an order of magnitude!
@Athari-P
7 ай бұрын
That's more than two orders of magnitude in binary though.
@ShoMorphias
7 ай бұрын
This comment is half an order of magnitude more accurate than the subject matter!
@IcyLucario
7 ай бұрын
Awesome, glad to see SD keeping up. 1.5 is still relevant from the community, hope to see something like this treated the same way.
@jeffbull8781
7 ай бұрын
I think this is more focused on efficiency and speed, which means things like animation and video (using similar methods) is going to be much more realistic. As currently the static models are being sort of shoehorned into animation workflows.
@abandonedmuse
7 ай бұрын
Their video is insanely realistic. Been beta testing it for a few days already.
@drew5564
7 ай бұрын
my boy!!!!!! whats good matt! just been sick recently and i have been away from yt as usual. im here now though, amazing video its looking like and i cant wait to get my popcorn and watch
@JonnyCrackers
7 ай бұрын
Sick! Been hoping they'd come out with something to compete with Midjourney and Dall-E. I love Dall-E 3, but I get so tired of getting "prompt blocked" with prompts that have nothing offensive or copyrighted in them. Wasn't aware of Pinokio either, so I'm excited to give that a try. Thank you!
@DiceDecides
7 ай бұрын
15:20 even though no mustache, there's something about the quality that's really soothingly satisfying I think!
@MyAmazingUsername
7 ай бұрын
This was something I looked for a few days ago, since I am tired of SDXL being pretty bad compared to Dalle and Midjourney. Especially SDXL's extremely deformed hands and feet. So I checked Stability for news and saw nothing. Then your news dropped. Thanks. I just got excited about open source AI again.
@Airbender131090
7 ай бұрын
Sont get your hopes up. This is not the model that will rival mj. Next ine probably will ( but mj will already release v7 till then )
@RodgerE2472
7 ай бұрын
Updated Forge UI is out too!!!
@hipjoeroflmto4764
7 ай бұрын
Well I'dk what that is so yes matt should make a video
@Elwaves2925
7 ай бұрын
I thought you meant a new update, with the ControlNet fixes but it's the one that's been out a few days. 😞
@abandonedmuse
7 ай бұрын
Which one is forge? Hard to keep up. Not sure i have used it.
@Elwaves2925
7 ай бұрын
@@abandonedmuse Search for SD Webui Forge.
@LouisGedo
7 ай бұрын
From my testing, SDXL Turbo is utter garbage 💩 🤮. I'm looking forward to Cascade
@ahsookee
7 ай бұрын
I didn't like it either, although I really tried.
@aouyiu
7 ай бұрын
Garbage how? It just needs tweaking to reach its potential.
@LouisGedo
7 ай бұрын
@@aouyiu The quality of the images is like that of Midjourney 2 based on my testing.......utter garbage
@AmandaFessler
7 ай бұрын
I was starting to lose hope, but here they are! And with a focus on cost efficiency too! I hope it has backwards compatibility with 1.5. I have way too many loras of it stored up.
@Athari-P
7 ай бұрын
All loras are tightly coupled with base models, nothing will be compatible with sd 1.5 ever.
@okolenmi7511
7 ай бұрын
It will be better than Midjourney. 16x training performance + open source = magic
@gameswithoutfrontears416
7 ай бұрын
I just did a quick text test. Wow, perfect on the first one, but then not so great on the follow ups.
@povang
7 ай бұрын
bro this is crazy, looks like it'll blow midjourney out of the water once it gets in the hands of opensource trainers for a few more months down the line.
@davidbangsdemocracy5455
7 ай бұрын
Perhaps but image generators use Convolutional Neural Networks and Transformers are for sequential data such as text. So, I assume huge improvements will be realized with both types of models and whatever improvements are made to them. It may seem more subtle because they are already great, but the will be faster, more controllable, more efficient, and integrated into useful apps.
@MrTk3435
7 ай бұрын
Good Job Matt!! Truly Exciting... We need more competition so, the subscription price will go Lower! ✨✨🤟✨✨
@MrPablosek
7 ай бұрын
Does this mean it will require less VRAM to use? My 3070 struggles with SDXL without setting up various parameters and such to make it work and then it takes a pretty long time to generate an image.
@sherpya
7 ай бұрын
I've read something on reddit about needing more instead
@kuromiLayfe
7 ай бұрын
Think pretty much the same amount.. the concepts of this is similar to running a workflow in comfy that generates an image at 256x256 then does image to image with a upscale to 1024x1024 and then once more to detail the final sampler output.
@FRareDom
7 ай бұрын
This came at a time we needed it most
@AttenBot
7 ай бұрын
i would love to make consistent 16-bit style video game character sprite sheets
@petitemasque5784
7 ай бұрын
This model is non-commercial but if you want to make free games...
@AttenBot
7 ай бұрын
Nah i dont care for non commercial, more of a personal project to achieve, go have a look at wwf royal rumble sprite sheets for example. One sheet thats of one character, Walking running jumping punching kicking etc.
@MilesBellas
6 ай бұрын
Elon needs to take over ! "Robin Rombach, Andreas Blattmann, and Dominik Lorenz essentially created Stable Diffusion while at a German university. Stability AI got involved after the publication of their research and offered them the company’s computing resources. According to Forbes, all three have now left Stability AI which is also experiencing cash flow problems." - Petapixel
@saymydomain9504
7 ай бұрын
Mage and Leonardo will probably implement this model soon as possible.
@I-Dophler
7 ай бұрын
It's an interesting concern, especially with the rapid evolution in AI. While Transformers have indeed been groundbreaking, the tech field's nature is to innovate continuously. Who knows, the next big breakthrough could be just around the corner, rendering today's limitations a thing of the past.
@sinayagubi8805
7 ай бұрын
I think you don't realize, this means opensource totally won today. just need to do this with language models too
@MattVidPro
7 ай бұрын
You haven’t seen anything yet :)
@aouyiu
7 ай бұрын
Meta might get us that, maybe sooner than you think now that Gemini is officially competing with ChatGPT.
@haileycollet4147
7 ай бұрын
Miqu is getting there... It's not gpt4 level but it's definitely better than 3.5 all around, nearly as good as Gemini Ultra... And it's 70B 😂 It's coming!
@HouseOfSynister
7 ай бұрын
Thanks for these videos! I learn so much from them, keep it up!
@vi6ddarkking
7 ай бұрын
The Stable Zero123 model still has and the Stable diffusion video had the same limited licence during it's experimental phase. So nothing new here. Still being vigilant is always the way to go.
@starblaiz1986
7 ай бұрын
Do we have any idea based on past experience how long that licence will be limited? Are we talking weeks? Months? Over a year? 😮
@vi6ddarkking
7 ай бұрын
@@starblaiz1986 Once Version 1.0 releases usually it bounces to the new fully open source licence.
@Modioman69
7 ай бұрын
I can’t wait to see what the trained models of Cascade end up producing later. Heck I say later but someone will probably have trained model by end of week or something with the current pace of things lol.
@hipjoeroflmto4764
7 ай бұрын
Matt I just had or still have covid need to retest but this video made me feel good
@fire17102
7 ай бұрын
Soon In SD5... For my kids, Remake this folder of movies to take out all the non wholesome parts. For example, in Bambi the mother doesn't die, no one is in life danger, they all meet happily in the end. In the lion king, Mufasa and Scar are good friends and Simba is raised with his Dad. Ariel doesnt loose her voice. Remove nightmare fuel from Pinokio and Dumbo, etc etc etc etc etc. Generate new wholesome scenes, keep characters and style as the originals, voice with 11Labs. We will actually be able to give nice content to our kids, without passing any horror from the hydra studios.
@KurtWoloch
7 ай бұрын
Interestingly, at 11:07 when the picture of Barack Obama comes together, at times it looks a bit like Alfred E. Neuman from the Mad magazine.
@jay_sensz
7 ай бұрын
Not sure if I'm just spoiled by community-finetuned SDXL models and Fooocus, but I'm not terribly impressed by what I've seen so far. But then again I was initially underwhelmed by SDXL as well. What keeps me interested is the possibility of much more efficient finetuning compared to SDXL, but it might take a while for tooling and fine-tuned models to become available/usable.
@vitesh6429
7 ай бұрын
With the same prompting, you can get better images (not definitive testing, just a couple of tests) than SDXL (NightVision XL), the images have a HDR midjourney look to them.
@tonyzed6831
7 ай бұрын
Wow, and in Pinokio already??? Love that!
@jeffwads
7 ай бұрын
Wow. Never heard of this before.
@tonyzed6831
7 ай бұрын
@@jeffwads I think he made a video about it... pinokio allows you to run AI tools on your PC without the hassle of installing complicated stuff, it's truly gamechanging. But you'll need a good GPU with a lot of vram (I went "cheap" by buying a used 1080ti, and 11gb of vram seems to be enough for what I do... for now).
@AzoreanProud
7 ай бұрын
Nice
@BTMYYY
7 ай бұрын
yoo this is so exiting i love open source :D unfortunately it takes like 30 minutes to generate a photo locally on my 3060 with pinokio
@ilplopperz
7 ай бұрын
xD
@BTMYYY
7 ай бұрын
Updated pinokio now it takes like 15 minutes
@goodtothinkwith
7 ай бұрын
Würstchen? Um… little sausage? Hot dog?
@shaunralston
7 ай бұрын
Always appreciate your being on the cutting edge of OS reporting, Matt.
@ahsookee
7 ай бұрын
11:50 it's easier to finetune this way than starting from a model biased towards photorealism
@LoneBagels
7 ай бұрын
God: "walter white eating a big mac inside of mcdonalds, there are blue crystals in the big mac burger, walter white is dressed in a yellow hazmat suit" Dall-E: "Even though I am just a tool and don't have a soul; I will pretend I have one. Therefore, I cannot do what my master commanded me to create, even though I'm fully capable of doing the job." God: "Kicks Dall-E from the heavens; Downloads Stable Cascade!"
@zingsnapbites
7 ай бұрын
Are the images commercial free to use?
@ilyass-alami
7 ай бұрын
Hi Matt you can test the LLaVA 1.6- 34bit demo llm vision assistant,
@godnyx117
5 ай бұрын
Thanks for sharing!
@CrystalBreakfast
7 ай бұрын
Nothing currently beats SD1.5 due to Controlnet, IP-Adapter, and LoRA training, just to name a few massive game-changers with NO comparable equivalents among closed models. SD1.5 can even animate. With a HUGE amount of control over just about everything. SDXL is still catching up while the community continues to expand 1.5's capabilities. It'll probably be a while before Cascade can be controlled to that degree. Everything else is just a toy. If you think there's anything SD1.5 can't do that a closed source model can, then quite frankly you're in the dark. The community has expanded 1.5 in so many ways that nothing else comes close.
@DezorianGuy
7 ай бұрын
Why does it take Stable Cascade several minutes to generate an image with my RTX 3060 12GB? No problems with Stable Diffusion etc.
@LukePellen
7 ай бұрын
Open Source FTW. Open Source means everyone is a winner.
@julianopajaro2005
7 ай бұрын
Hey, Matt. Do you know any A.I. that makes Cinemagraphs?
@doben
7 ай бұрын
I think "Imagen 2" can do that.
@twilightfilms9436
7 ай бұрын
You mention Krea, and Krea uses SDXL under the hood, so I wonder if you have found a way to get Krea or Magnific results but for free using comfy or a1111? I actually wonder how come no one is even trying to do it……anyways, great video!
@consig1iere294
7 ай бұрын
I am curious, why did it take so long for implementing the Würstchen tech? This was shown by the actual people behind Würstchen last year.
@3DArtistree
7 ай бұрын
Of course when I just uninstalled Pinokio to make room for more checkpoint models! lol Hope someone ports it to Comfy in the next few days!
@havemoney
7 ай бұрын
Happy Valentines day 💓
@moelleunbelievable
7 ай бұрын
As a german, I have to admit, they did y'all dirty by calling an international used software (or at least part of it) "Würstchen" 😂😂😂 ... It means small sausage if someone is wondering.
@jopansmark
7 ай бұрын
It's over for Midjourney and OpenAI.
@A-uz3uj
7 ай бұрын
It’s crazy though open ai just released Sora yesterday, way ahead of anyone else on ai video
@cysshorts1529
7 ай бұрын
People: 1980: we will have flying ca- *literally 2024:*
@shazolislam6359
7 ай бұрын
Honestly, I have a really interesting Question @mattvidpro. What is the relation between You and Lemon?
@Fustercluck06
7 ай бұрын
I also feel mppy inside lol
@toCatchAnAI
7 ай бұрын
curious why they didnt show a benchmarking with MJ
@AndreFelipeF
7 ай бұрын
niccee, going to check right now!
@PostmetaArchitect
7 ай бұрын
The model is not open source. Its non-commercial use only, the dataset is not available, training method undisclosed. Just because you can run a model locally doesn't mean its open source.
@AscendantStoic
7 ай бұрын
What are the Hardware requirements for running it local?
@isajoha9962
7 ай бұрын
This video makes me happy for the future.
@Ariane-qq9co
7 ай бұрын
Nightshade is coming.
@RomiWadaKatsu
7 ай бұрын
I'm running it locally and it's far slower than sdxl for some reason, the web demo works better. Also the results are clearly inferior to dall e 3 so there must be some setting I'm missing. I'd say one can skip it until it's in the hands of someone that can run it to satisfactory levels
@fontenbleau
7 ай бұрын
creating something from nothing by spells, is it Harry Potter in real life? It's a magic!
@alexnorth3393
7 ай бұрын
Exciting news!!
@USBEN.
7 ай бұрын
Looks a lot better.
@tradehut2782
7 ай бұрын
OH my god... Talk about seeing something unexpected when opening KZitem
@nachod9772
7 ай бұрын
tried it, but idk dalle 3 give me a lot more specific and good results
@raaghavgr1990
6 ай бұрын
How many free prompts in a day do you get in the free plan of stable cascade?
@KlimovArtem1
7 ай бұрын
I’ve tried it. Not even close to Dalle3 in following difficult prompt. Not even close to realism of MJ v6.
@christopherd.winnan8701
7 ай бұрын
Can it handle compoond nouns yet? How about magnet fishing for example?
@realWorsin
7 ай бұрын
Requires 20gigs of VRAM though. That will eliminate most people.
@hermeticsense8805
7 ай бұрын
42 is not orders of magnitude larger then 8. It isn't even 1 order of magnitude larger. I'm not completely confident in my criticism, but I hope my comment is useful. 2:00 4:51
@BlackMita
7 ай бұрын
How censored is it though?
@MrPablosek
7 ай бұрын
From what I saw, not at all.
@gionicol_
7 ай бұрын
My honest reaction was: "Oh no..." 🤣 I'm really trying to catch up with everything, but oh boy, it's hard
@DesignDesigns
7 ай бұрын
Stability AI is cool....
@TheGoodContent37
7 ай бұрын
What specs a pc should have to be prepared to run a SD model relatively fast? Is all about the graphic card?
@0ceanswave
7 ай бұрын
Close, but not even 1 order of magnitude, 1 if we round up.
@RickPMandel
7 ай бұрын
The question I have, is, as always, how does it handle censorship? What happens if you give it a prompt that many AIs will label as NSFW, and will not render?
@GearForTheYear
7 ай бұрын
It seems to just ignore those parts of the prompt. I couldn’t even get two mechs to shoot at each other.
@andyone7616
7 ай бұрын
Can this model will be used in automatic 1111?
@morizanova
7 ай бұрын
Just trying it . Not full test but generating text seem OK
@zodiacblue9312
7 ай бұрын
One of the example generations being literally a cherry being picked is hilarious
@ctrlartdel
7 ай бұрын
Slow Magic!
@zodiacblue9312
7 ай бұрын
@@ctrlartdel yeah really like his music and the album cover is goated
@SchusterRainer
7 ай бұрын
try photo taken on Fujifilm XT3
@FactsYall
7 ай бұрын
less stats more image gens
@bladechild2449
7 ай бұрын
I tried this locally and it took 3 minutes to create an image lol, and the textures just look awful.
@maxziebell4013
7 ай бұрын
Würstchen = sausages
@ahsookee
7 ай бұрын
Not quite, it can be both singular and plural. It's the diminutive of sausage, so a small sausage (both one and multiple)
@SW-fh7he
7 ай бұрын
@@ahsookeegood German
@elihusolano5993
7 ай бұрын
wow, just wow
@xbon1
7 ай бұрын
sadly not as good as dall-e 3 but... it's a huge improvement. prompting is so manual compared to DALL-E 3 lol
@darkman237
7 ай бұрын
No way to install apart from pinokeo?
@J0r1ckV
7 ай бұрын
2:39 the images have been (image: UD, LR (UD 3, LR - { } 2, 5),
@roadrunner_meepmeep
7 ай бұрын
I tried Stable Video Diffusion and it blew chunks... I went back to using Pica. And Pica is really.. not.. great.
@aouyiu
7 ай бұрын
All of AI video is still in the early stages, like ChatGPT 1 stages. It will be where images are now, in a few years. Maybe sooner.
@pn4960
7 ай бұрын
I ami hyped !
@SjarMenace
7 ай бұрын
Byebye paid image creators