Негізгі бет ChatGPT O1 Preliminary test comparison with previous model test videos

Күн бұрын

ChatGPT O1 Preliminary test comparison with previous model test videos

Рет қаралды 17,018

Internet of Bugs

Жүктеу

Пікірлер: 184

@InternetOfBugs
3 күн бұрын
iob.fyi/codecrafters will let you sign up to try CodeCrafters challenges yourself. If you're interested in seeing if you're smarter than an AI.
@Cephandrius016
2 күн бұрын
I'm sure the hype of this model has nothing to do with OpenAI trying to fundraise $100 Billion right now
@cbnewham5633
2 күн бұрын
Only a fool would join those dots... 😏
@realdevdiv
2 күн бұрын
Duhhh
@RomeTWguy
2 күн бұрын
Phd lvl bruh
@TheReferrer72
Күн бұрын
It's not hype as the Channel owner now has to admit these LLM's are improving fast. Tech usually moves much slower than this.
@cbnewham5633
Күн бұрын
@@TheReferrer72 do you think claiming "PhD level" intelligence is not hype? It's clearly not at that level, despite what OpenAI may claim.
@artscollab
2 күн бұрын
I appreciate the grounded opinions shared on this channel. Particularly as someone who has been building applications for almost 30 years using traditional patterns while also adopting new techniques. Whew, where does the time go?
@ApexFunplayer
2 күн бұрын
I'm in the same boat here. I started as a kid and I'm still building them regularly. To me o1 preview is highly useful for certain things, and other things it tends to produce only substandard or completely unnecessary code solutions. Just like standard AI too it often rewrites things you don't want or includes things that shouldn't be there. It's okay with python and the rest of the languages I've tried haven't resulted very much.
@trappedcat3615
2 күн бұрын
Much respect for adding chapter marks on a 5 minute video. You are amazing!
@danielraftery4550
2 күн бұрын
Signed on as a member. Big fan of your stuff - only about 4 years of experience programming professionally, but I've been losing my mind at all the AI code bots. I am way faster without it - if that ever changed I'd be happy to use them, so it's nice to follow your progress in testing different models. Getting a more experienced perspective is appreciated too.
@hindsightcapital
2 күн бұрын
I love this guy man, amazing counterbalance to otherwise overwhelming narratives. And exactly the right person to deliver this information
@MynamedidntFitDonkey
2 күн бұрын
I like how from bashing AI coding this channel has turned to AI coding benchmark channel.
@Tverse3
2 күн бұрын
Soon he will be promoting to use AI
@2639theboss
2 күн бұрын
I mean they're the same thing right now. Any basic benchmarking is "bashing" simply on the sheer hype these companies are pushing.
@johnsandro7735
2 күн бұрын
@@Tverse3 The bashing was for the unnecessary hype these all was getting. But, if it proves actually useful to even very experienced programmers, then why not use them? A tool's a tool, if it fails to help, discard it and move on.
@goldsucc6068
2 күн бұрын
I tested this model for real enterprise task (I even tried breaking task to small easy steps and removed some steps that require domain understandable) and it failed. But then I found real use for it - sample generation. It created some test sample soap requests for provided wsdl and structure was correct, so saved me some time. Better to extract samples from actual system indeed, but due to nature of project it was nearly impossible until other team finished their job
@RomeTWguy
2 күн бұрын
You can achieve the same results with Sonnet 3.5 for a fraction of the cost
@goldsucc6068
2 күн бұрын
@@RomeTWguy what do you mean? It cost me nothing, I have subscription. It saved me time because constructing soap xml by hand takes time and it did it in 50 seconds with supplied data
@TheReferrer72
Күн бұрын
@@goldsucc6068 Don't understand you could get GPT 3 you know before ChatGPT to that.
@cbnewham5633
2 күн бұрын
Yes. You are correct. As i show on my channel, it cannot do simple geometry. It's a lot better than other models, but everyone is swallowing OpenAI's hype about "PhD level" and the cherry-picked examples some KZitemrs have published for likes. It's nowhere near PhD in the maths sphere. More like high school. You are about the only other person I've seen on KZitem so far who is pointing out real deficiencies. Subscribed.
@drdca8263
2 күн бұрын
Didn’t the famous mathematician Terry Tao say that it seemed to him like it was getting close to like, a mediocre grad student, or something like that?
@cbnewham5633
2 күн бұрын
@@drdca8263 it's very good at some things - especially when it comes to language (which is, after all, the main part of the architecture). Until they can integrate mathematics into it flawlessly and remove hallucinations then these models will remain stuck at the "passable" level averaged across all types of queries. Maths is really important because without that you don't get the rest of the sciences - certainly not for doing anything serious.
@Nnm26
Күн бұрын
I passed it questions from the 2023 Putnam and it solved 80% of them. Idk why you even ask geometry questions to a model that can’t even see but do try it with other problems from other math domains.
@Happyduderawr
Күн бұрын
Its like a failing topology student in math. High school is a bit harsh. I ask it measure theory questions and it trips up. I reckon it might be a C's get degrees graduate :)
@cbnewham5633
19 сағат бұрын
@@Happyduderawr ok, a bit harsh perhaps, but in certain things it is a real brainiac while in others brain-impaired. If we take the average, it is muddling along with a checkered academic history, but won't be sweeping up the academic prizes any time soon. 😄
@roycohen.
2 күн бұрын
It felt like a huge nothingburger. I really feel that we hit the wall of LLMs, not to mention that these models cannot inherently reason regardless as much as Altman wants to make his investors think it can.
@arnavprakash7991
2 күн бұрын
Altman is not the only one working on this. This is not some handcrafted product made by open AI. Its a discovery that when you give transformers compute and data they display emergent abilities. So anyone with compute and data can make these, and as we are seeing they are
@lyznav9439
2 күн бұрын
@@arnavprakash7991 emergent stupidity
@RomeTWguy
2 күн бұрын
@@arnavprakash7991 more compute at inference time anyone can, but they must have also fine tuned it on cot datasets to simulate reasoning
@arnavprakash7991
2 күн бұрын
@@RomeTWguy yeah but now we have multiple solid LLMs (claude, llama, gemini… I guess) What Open AI did is replicatable, llama 3 papers prove it. The next generation of LLMs now have 3 avenues for enhancement: Training (model learning off data) Hardware Inference (model thinking/processing of inputs)
@Nnm26
Күн бұрын
@@arnavprakash7991yeah hundred of billions of dollars compute
@DrMwenya
Күн бұрын
Thank you for your honesty. Im not s tech person and i knew there was too much hype but very small changes for the average user
@szebike
2 күн бұрын
Its a bit strange to me that they charge the user for tokens it takes for reasoning yet they don't show the reasoning part in detail. Its like they can add a x amount of extra tokens to your bill without being able to check it.
@rezNezami
2 күн бұрын
The part about being as good as a Ph.D in writing a software piece, I think they likely are correct, just pay attention, Ph.D.'s even in CS are not known for writing proper software application!! haha
@Tverse3
2 күн бұрын
I think software engineers are overestimating the value they provide, we could now have an Ai model just trained for every programming language , which would do better than most junior and mid level devs...this profession is doomed and about senior engineers they arr not special, with Ai the midlevel engineers will become seniors quickly.
@quantum5768
2 күн бұрын
@@Tverse3 Citation needed, AI models need training data. There are tons of problems in industry where training data is sparse if it exists at all. The tests Internet of Bugs has been doing show that AI can't even handle a simple language like Python for relatively straight forward tasks. Why do you expect that AI models can beat junior devs for things like C in embedded software, or C# in a more application oriented field if it can't handle a batteries included language like Python?
@Easternromanfan
2 күн бұрын
@@Tverse3Useless hype comment. These LLMs capabilities are vastly overestimated
@realurilordjonhnsoni7342
2 күн бұрын
@@Tverse3Jesus Christ being so confidently wrong and shallow is a skill within itself
@carultch
2 күн бұрын
@@realurilordjonhnsoni7342 Now that we have LLM's, it's trivially easy to be confidently wrong.
@Gredias
2 күн бұрын
From Open AI's own article: "These results do not imply that o1 is more capable than a PhD in all respects - only that the model is more proficient in solving some problems that a PhD would be expected to solve."
@Cephandrius016
2 күн бұрын
Translates to: We hyper-trained the model on a subset of questions and now it can solve those problems “better” than some Ph.D.
@califresh0807
2 күн бұрын
@@Cephandrius016People like you and him think your smart saying blatantly obvious shit like: “iTs jUSt sToCHaStiC graDiEnT dEsCeNT!!! hOw cOUlD it eVeR (bullshit false equivalence here)” As if the researchers working on this shit, day in and day out don’t know everything you do and more. These models have gotten consistently better given more compute, that is just a fact. You can only argue the degree. But I bet fucking anything that in 2-3 years time, this guy will be doing this video with an increasingly more complex problem that he continues to downplays as “I classify this as easy to very easy”. People like you and this guy think you’re clever pointing out obvious fucking flaws whilst completely overseeing the broader direction, and none of you…this guy in the video especially could ever fucking hope to build anything remotely as useful as these models.
@oldchris5258
2 күн бұрын
What an extremely scientific way for them to measure their product's capabilities.
@jshowao
2 күн бұрын
Yet a PhD student is expected to solve or make progress on a original problem through original research so Im not sure what that statement even means. I mean ask the AI to solve one of the millennium problems or quantum gravity. I doubt it could.
@namensklauer
2 күн бұрын
honestly im looking forward to 1-2 years in the future, where i expect AI models on this level will be open source and no longer locked behind some paid service
@genericgorilla
5 сағат бұрын
That's probably the most unlikely scenario
@leversofpower
2 күн бұрын
I was looking forward to your cometary. Thanks.
@하하호호-h3u
2 күн бұрын
Limitations of o1-preview: 1. The limitations of being a language model are still evident. Its perception of the physical world is very poor, making it difficult to utilize for tasks requiring spatial awareness. 2. While it actively uses the Chain of Thoughts technique, which significantly improves accuracy on tasks where there is a clear logical answer, this simultaneously makes its thinking process rigid. As a result, it performs worse than GPT-4 in areas where subjective nuances and no clear answers are involved, such as writing. In contrast, traditional language models like GPT-4 may have a higher occurrence of hallucinations, but this also makes them more adept at generating plausible responses, which ultimately aids in tasks like creative writing. Therefore, o1 is not a one-size-fits-all solution, and it seems necessary to first determine whether to use the Chain of Thoughts technique or the traditional language model approach based on the given task before proceeding with the process. Furthermore, o1 is merely another language model and not a fundamental leap forward; it's simply a specialization of the existing method. Due to the inherent limitations of language models, which learn about the world through language, achieving AGI (Artificial General Intelligence) is still a distant goal.
@bfranceschin1
2 күн бұрын
Thanks for the update! The o1-preview is available in the supermavem vscode extension
@tlz124
2 күн бұрын
I ask Chat GPT to do something and every time, it starts doing things I don't want it to and I start to lose my mind figuring out how to ask the right thing to make it do what I want
@EleroyGreen
2 күн бұрын
I like how your facial expression on the video thumbnail provides the tl:dr on this 😀
@ladsbois7302
2 күн бұрын
Thank you for your brief thoughts.
@Nnm26
Күн бұрын
LLM capabilities are a bit weird, you can’t based its intelligence on a few questions and declare it’s better/worse than a human. It’s subpar to a Phd student in a lot of domain but in others it’s nothing short of superhuman. It’d be great if you could check out Kyle Kabasares channel on different tests he conducted on O1, there he actually uses PhD level questions and it blew all of them out of the water.
@RandyRanderson404
2 күн бұрын
I was looking forward to your assessment of Oi1.
@riser9644
2 күн бұрын
Fact that he's saying it's better, now that's some progress
@SuperMarioTomma95
2 күн бұрын
It seems like they really have to hype these models to make people worry about job displacement and AI taking over most fields; otherwise, they wouldn’t be able to justify the billions invested in training and development needed to keep the improvements advancing at a competitive pace. Either this hype will turn into a self-fulfilling prophecy in the short to medium term, or the industry will hit a plateau as diminishing returns set in, leading to the AI bubble bursting. Ultimately, we’ll be left with advanced tools that, while highly capable, remain far from the true AGI people envision.
@arnavprakash7991
2 күн бұрын
@@SuperMarioTomma95 chatgpt is now the 13th most visited website globally. Right behind amazon. It’s the only site besides google, amazon, and yahoo in the top 13 that is not social media. So clearly hundreds of millions of people find it useful enough.
@Horizon-hj3yc
2 күн бұрын
Yep... overhyped... one hype train arriving after the other... that's why I lost interest in AI.... too much hype and not enough progress, still stuck in that old language model with all its flaws.
@arnavprakash7991
2 күн бұрын
So transformers were invented in 2017. At most, we have seen 8 years of work on these types neural networks, mostly niche work We did not see industry wide efforts until chatgpt 3.5, released November 30 2022. It has not even been 2 years All of these developments have been maxing out what transformers can do. So even without further breakthroughs in architecture this is enough to change society, it already has started to Also, this is not mentioning diffusion models
@HelloCorbra
Күн бұрын
It’s already integrated into Cursor or other AI IDEs I’m not mistaken. You have a look at it for next video. And looking forward to it, good stuffs as usual
@Michael-yu9ix
2 күн бұрын
The camera angle... If hes not moving his hands, it looks like a recording from a locked-in patient.
@theaugur1373
2 күн бұрын
o1 is available in Cursor, but it’s no included with the monthly fee. You have to pay separately.
@DoubleOhSilver
2 күн бұрын
AI definitely isn't taking my job anytime soon, but it has been helping me a lot at work lately. If I already know what I need to do, I can tell AI to write it for me. Then I just fix it up a bit, rename stuff, clean it up, etc. But it has probably saved me a couple of hours at work this week.
@DoubleOhSilver
2 күн бұрын
Anyway my pay hasn't gone up so I'm just taking those extra hours I gain off from work.
@JP-ek3mc
2 күн бұрын
@@DoubleOhSilver This is the way
@jshowao
2 күн бұрын
I've seen AI produce written prose and it produces a lot of repetitive slop. Good to know you are happy with sentences starting with the same words over and over again. From what I've seen, you'd have to rewrite the whole thing.
@samuelyao2637
Күн бұрын
Thank you so much!!
@mitchlindgren
2 күн бұрын
To be fair, I’ve seen some PhDs who write pretty bad code 😂
@marcovoetberg6618
Күн бұрын
I don't even know what it means to program at a PhD level. I'm not saying there are no PhD's that are good programmers, but there is nothing about being a PhD that makes someone a good programmer.
@cuentadeyoutube5903
2 күн бұрын
4:07 o1 is already integrated in Cursor. But it is expensive
@artscollab
2 күн бұрын
Interesting. Fixed monthly price or per token cost? I like Cursor so far.
@Happyduderawr
Күн бұрын
But im a phd and im shit at writing code.
@CherryBlossomStorm
6 сағат бұрын
ok but every PHD I've worked with has been garbage at writing code.
@absta1995
2 күн бұрын
My prediction: even when AGI is achieved, this channel will call it overhyped
@krasensspenevpenev3167
Күн бұрын
Now I'm doing a masters degree in software architecture for 2 years, because my bachelor's degree is something else, with these new models coming out, I'm quite stressed that I'll have a future as a programmer. do you think it is worth studying something that is not directly related to artificial intelligence, for example, there was a specialty in software technologies with artificial intelligence, this is what I mean as directly related to artificial intelligence.
@dabbieyt-xv9jd
23 сағат бұрын
I don't understand why you directly made the video on o1 and skipped the project strawberry?
@altffyra2365
2 күн бұрын
try to make it give you "hello world" from Bend Language, i gave 4o 8 attempts then i gave it the code and it actually got that wrong aswell
@estefencosta1835
2 күн бұрын
That's actually one thing I'm unclear about in these videos, none of these models are going to give you the same response every time. If you just run it once it doesn't really tell you much. I'd rather understand not just that it fails, but how badly it fails each time if you run it 10 or 100 times. Is it 10% sort of ok, 40% pretty bad and 50% terrible?
@NukelimerCodes
2 күн бұрын
Is it worth it to go back to uni for a CSC degree (Open Source Society University)?
@InternetOfBugs
2 күн бұрын
It depends on what you're trying to accomplish. I did a discussion about university degrees on a podcast here: kzitem.info/news/bejne/x2-YtW2XjYqgmmU (Although I have never heard of Open Source Society University - so I don't know anything about it)
@doesthingswithcomputers
2 күн бұрын
I’ve worked with phds, using that comparison is a really bad idea…
@EnigmaCodeCrusher
Күн бұрын
Thansk
@martinsherry
2 күн бұрын
I’m not really familiar with o1 at all at this stage. But is it possible to create a prompt to get o1 to ask you questions to get the details it needs to understand your requirements better. (ie to simulate the task of gathering reqs better).
@artscollab
2 күн бұрын
It’s well worth a try as a supplement to human effort, in my opinion. The new o1 model is not available yet for OpenAI assistants, however the 4o model does well enough for now.
@spencerjames9417
Күн бұрын
Altman is a bit evil considering how much he’s willing to throw lives under the bus for his toy that doesn’t do near what he claims
@eye776
10 сағат бұрын
It's all about the money, money, money.
@ancwhor
2 күн бұрын
10x the compute for 0.2% improvement imo
@generichuman_
2 күн бұрын
if you think we got 0.2% improvement, then your opinion clearly isn't worth that much
@ancwhor
2 күн бұрын
@@generichuman_ if you think it's more then your opinion clearly isn't worth that much
@justafreak15able
2 күн бұрын
@@generichuman_What was the most complex system you both worked on without AIs then compare who's opinion is worth more lol.
@ancwhor
2 күн бұрын
@@justafreak15able express API backend linked to python for an algo to manage distribution. Vue frontend. Self thought. In prod.
@JohnDoe-jp4em
2 күн бұрын
@@ancwhorThis channels viewers are like the inverse of an AI techbro sometimes. Instead of insane hype it's constant insane lowballing. Did you even listen to the video? If someone with a track-record of being skeptical of AI admits that it's significantly better at coding tasks and beats all other AIs he's tested, it's clearly a lot more than 0.2%.
@vintagewander
Күн бұрын
isn't the entire AI thing is running on hype fuel?
@theneedytechie2468
2 күн бұрын
There is no pleasing this guy😂.
@Protocultor
9 сағат бұрын
If they're selling you something, and it doesn't achieve what they say it achieves, then no one should be pleased.
@leojack1225
Күн бұрын
I am a Math PhD and I can not write any software.
@young9534
2 күн бұрын
This is still o1 preview, not o1. If the benchmark results they released aren't lying, then o1 should be a nice jump in capabilities. I look forward to seeing your test videos when o1 is released
@personzorz
2 күн бұрын
Several previous benchmarks have been lies.
@young9534
2 күн бұрын
@@personzorz are you talking about the o1 results they released?
@Easternromanfan
2 күн бұрын
@young9534 He might be referring to the chatgpt 4 benchmarks where they said it could pass the bar in the 95th percentile but they used a very faulty way to measure it. IOB mentioned it previously. They also do the same thing here when they say it is a gold medalist in the math Olympia with "adjusted time restrictions". They just don't mention if they took it by the actual rules it would've failed first question
@young9534
2 күн бұрын
@@Easternromanfan yeah that makes sense. This is why I look forward to seeing this channel run tests on o1 when it gets released. I trust him more than OpenAI
@RomeTWguy
2 күн бұрын
The actual model isn't far off from this based on the benchmarks
@LouStoriale
2 күн бұрын
I've been working with AI for content creation and research for a few months now, and while there are still some flaws, the improvement has been significant. It's gone from 30-40% accurate to 60-80%, and even though I still need to edit most of the output, it’s saving me a ton of time. In just the last 5 days, it’s cut down weeks of work! If it keeps progressing like this, it’ll be incredibly useful by the end of 2025.
@mythbuster6126
2 күн бұрын
What kind of research
@austinclay427
2 күн бұрын
I use both claude and chatgpt on a daily basis but it has serious limitations and constantly makes mistakes that I need to point out. Aside from that it's very useful and can often times point me to new technology's and solutions. Aside from that I'd say the tech has been relatively the same since chatgpt 3.0.
@saxtant
2 күн бұрын
Http server in a prompt? Use express or fastapi, or even better... Go.
@VoodooD0g
Күн бұрын
It was integrated into cursor on day 1....
@DeniSaputta
Күн бұрын
2:44 sam altman Saying what is easy and difficult for human . is a different for ai.
@defnlife1683
2 күн бұрын
Doesn't really understand the reqs? Sounds like it can substitute a scrum manager or boss. Not a dev.
@karlwest437
2 күн бұрын
If models hallucinate, then surely telling them to think step by step just gives them more opportunities to hallucinate?
@drdca8263
2 күн бұрын
Logic gates implemented in silicon sometimes have errors. It is possible to use a larger collection of logic gates implemented in silicon in order to make something which can detect and correct these errors. (For classical computing, the errors are, AIUI, more likely to occur in memory than during the computation, and so most of the hardware error correction is for correcting errors in data which is being stored, but the same thing applies to a lesser extent for errors that happen as part of the computation. As a side note: for quantum computers, the errors happening during the computation steps is a bigger issue and needs more attention than it needs in classical computing.) It is true that more steps does mean more opportunities for errors, but that doesn’t necessarily imply that each step on net increases the probability of an error.
@estefencosta1835
2 күн бұрын
It does if hallucinations are just a fancy marketing term for fuckups that happen when not picking the most likely result from a dataset in an attempt to mimic creativity and the program based on computational linguistics doesn't know when it needs to not do that.
@drdca8263
2 күн бұрын
@@estefencosta1835 Hm, seems like a bit of a run-on sentence, but I suppose it would be hypocritical of me to complain too much about that… Yes, “hallucinations” is just a term people use for errors. Maybe slightly more specific? Your specific explanation for these errors, seems a bit unclear to me?
@estefencosta1835
2 күн бұрын
@@drdca8263 If GenAI just chose the most likely response from a set of data then every time you queried it, it would give back the same response. The reason ChatGPT gives the illusion of an intelligent response is because of what can be conceptualized as a probabilistic responses, which is why it builds it's responses bit by bit. If it's given too much latitude, it's answers quickly lose coherency. If given too little, it doesn't present anything that seems novel and loses any potential capacity to solve tasks. But there isn't a sweet spot where it won't just sometimes give you things that are either nonsensical or just flat out wrong. Hallucinations is a term we use for disturbances in sensory experience for humans, but was co-opted for GenAI back when they were trying to con people into believing these algorithms are sentient (see the Sparks of Life paper.) By using the term hallucination it's implying that GenAI is simply misrepresenting something. This is not accurate. GenAI simply runs on algorithm and tries to parse meaningful strings from it's training data with computational linguistics as the backbone of how it goes about trying to decide what is or isn't meaningful. The more complex the task, the less accurate or interesting it gets. It's like giving an algorithm a set of Legos, and asking it to build something new. But since the algorithm only knows what previous sets of Legos looked like (and most of which isn't even relevant to what it's being asked to build) the best it can do is try and mash together bits of other sets based on probability, but not always the most probable pieces, otherwise it would do it wrong the same way every time. But it can't actually build it's own new Lego set from the ground up, and it also has no way of verifying if the Lego set it ends up building is even correct or satisfies the query. This is why programs like ChatGPT can be confidently incorrect. What I'm less familiar with but it sounds like they're trying to do is use non-GenAI methods to verify if a piece of code is actually viable in order to try and correct it before it spits out the code. Even if this were perfected which I very much doubt, it gets you no closer to writing code that actually does what you want it to do, it simply eliminates some of the more obvious and elementary errors.
@karlwest437
Күн бұрын
@@drdca8263 my point is, error correction doesn't work, if the error correction system itself hallucinates
@tear728
2 күн бұрын
These things will not be autonomous. At best they will be a "living" stackoverflow.
@personzorz
2 күн бұрын
And what happens to the real stack overflow that they are parasitic upon to function?
@tear728
2 күн бұрын
@@personzorz it remains as relevant as ever
@Elintasokas
2 күн бұрын
Alright, see you in a couple of years.
@arnavprakash7991
2 күн бұрын
@@tear728 do you actually use any of these LLMs? Have you used the most recent models? Or are you just making statements to make yourself feel better
@drdca8263
2 күн бұрын
What do you mean by “autonomous”? Do you mean like, “takes actions to earn enough money to pay for its continued server costs”, or do you just mean, “takes actions as if to accomplish some kind of goal”? If the latter: people have already kinda set up harnesses that do this?
@andreas_tech
2 күн бұрын
Are you the future version of David Shapiro?
@cbnewham5633
2 күн бұрын
Please no. David is all over the place - the new Bindu Reddy.
@darylallen2485
2 күн бұрын
I knew videos of this nature would come. I think its valid to point out when ai fails. However, when are people going to acknowledge what seems obvious to me, but seems to get lost in these, "I asked ai to do X and if failed" videos? Is the new ai model better than the previous version? Was the previous version better than the one before that? And before that, was the version released better than what came before? If yes, what is hype about this ai thing? Every version is better for years now. Billions of dollars are being invested to figure out to continue the obvious trend that a 5 year old could point out. Yet we still have the, " this is all hype and bs, ai is completely fake" crowd. I don't get how anyone could still believe this ai trend is hype.
@InternetOfBugs
2 күн бұрын
I tried to point out in this video that O1 is better than all the other models I've tried on the tests I've been using (and I'll be coming up with more tests). There is definitely a trend of it getting better, but it seems (to me, and to other people) that the rate at which it's getting better is slowing, and that the value of this cycle of AI improvement is going to be worth far less than what has already been invested in it. But I could be wrong - we'll see.
@darylallen2485
2 күн бұрын
@@InternetOfBugs I agree that LLMs are a terrible product in the sense that the cost of inputs are significantly higher than the value of output.
@reboundmultimedia
Күн бұрын
@@InternetOfBugso1 preview is significantly worse in many areas than final o1. The leap in math ELO scores o1 was able to achieve is not at all consistent with 'slowing down.'
@diadetediotedio6918
Күн бұрын
@@reboundmultimedia The problem is that your measure is a bunch of specific tests, we measure the value of intelligence by the things it can bring to the world, not by blindly taking tests and measuring them.
@darylallen2485
2 сағат бұрын
@@diadetediotedio6918 I'm curious what industry you work in. I work as an datacenter infrastructure engineer. I got to this role through my degree and multiple certifications (e.g. lots of blind test and measuring them). Preparing for the tests taught me skills that the labor market values. Please elaborate on what industry we have that doesn't discriminate based on ability, as determined by testing. Would you see a doctor who flunked out of medical school, but had a real passion for helping people?
@rickandelon9374
2 күн бұрын
Agentic, self improving and self aware AI is going to change the economy, not these fancy demo products.
@rickandelon9374
2 күн бұрын
bullish on SSI inc and Ilya
@4l3dx
2 күн бұрын
It's impossible that you tested the o1; we only have access to the o1-preview version
@Easternromanfan
2 күн бұрын
That's what he means
@jshowao
2 күн бұрын
As if o1 preview and o1 are significantly different. Come on. Gmail was in beta for like a thousand years, when they released it finally, it was the same damn product.
@GordonFreeman-xd8rw
Күн бұрын
I like watching this channel because it's like watching the wise man or the witch doctor of a cannibal tribe when they first encountered an airplane... b-b-but why would the Sun-God Bogosun give pale face such magic? I'm willing to bet that the goalposts will shift by EO2025 to "but it can't give you youtube clone from a single prompt, still meh"
@alonzoperez2470
2 күн бұрын
It will replace programmers eventually 😌
@jshowao
2 күн бұрын
God these AIs just suck
@Tverse3
2 күн бұрын
I love programmers freaking out after every new gpt release, looks like they will face the same fate as artists. 😮
@larsfaye292
2 күн бұрын
only the shit ones
@arnavprakash7991
2 күн бұрын
So programming languages are human readable formats to interact with a computer Do people think AI = computer stuff only? Any language based/knowledge based job can be automated as LLMs excel at this All white collar work is at risk
@cbnewham5633
2 күн бұрын
AI is a tool. Sensible artists will incorporate it. The rest are Luddites with barely a grasp of how AI works. Programmers too will use this as a tool - but software engineering requires far more than writing a bunch of code.
@darkspace5762
2 күн бұрын
Why do you love it? What job do you do?
@jshowao
2 күн бұрын
@@cbnewham5633Exactly, most people who think AI will "replace everything" have never actually done the things they claim AI will replace. Because if they actually tried to do that, they'd realize, real quick, that LLM's leave a lot to be desired. Ive only ever used it as a tool to supplement my work, and this is only after Ive double checked the code it generates.
@Rh22-c9l
2 күн бұрын
Just wait for chat gpt 5 thats the big thing moving forward thats the inflexion point
@drdca8263
2 күн бұрын
Is “inflection” vs “inflexion” a dialect/regional-spelling-differences thing, or just a “you personally spell it differently” thing? It reminds me of how some old letters about math were written
@Rh22-c9l
2 күн бұрын
@@drdca8263 made a mistake too lazy to fix it to be honest
@Rh22-c9l
2 күн бұрын
Made a mistake too lazy to fix it @@drdca8263