Training AI Without Writing A Reward Function, with Reward Modelling

Рет қаралды 238,624

Robert Miles AI Safety

Жүктеу

Пікірлер: 925

@NoahTopper
4 жыл бұрын
"If you squint, the training process is sort of like a compiler." Totally brilliant statement.
@shouldb.studying4670
4 жыл бұрын
I had to squint AND tilt my head but I see what he means 🤣
@ZyTelevan
4 жыл бұрын
Code is data is code
@BlargVison
4 жыл бұрын
yeah that was a fantastic comparison that i won't forget
@kacperozieblowski3809
4 жыл бұрын
I agree
@filipgara3444
4 жыл бұрын
„No.”
@discosteve
4 жыл бұрын
Your point still stands, but neverless the scissors have a butt load of tech in the background that us normies aren't aware of (material science). Just wanted to mention that the humble pair of scissors deserves some praise.
@DDvargas123
4 жыл бұрын
I was thinking the same thing. We take for granted a lot of the cool tech around us all the time. Levers and Pulleys and other simple machines most of all. But rob makes a good point that people dont commonly think of them as tech even though perhaps they should. Language is a cruel mistress.
@infinummjb
4 жыл бұрын
scissors are relatively low-tech, but a tech nonetheless.
@columbus8myhw
4 жыл бұрын
Would you consider a scissors company a "tech company" the same way you'd consider Apple and SpaceX tech companies? What about post-its? Is 3M a tech company?
@DDvargas123
4 жыл бұрын
@@columbus8myhw 3M's company description is literally: "applies science and innovation to make a real impact by igniting progress and inspiring innovation in lives and communities across the globe." That sounds really tech company to me
@RobertMilesAI
4 жыл бұрын
I think if you took someone to a scissors factory and showed them all the machines and equipment of the production line, they'd call that technology. But not so much the scissors themselves
@TheLifeInMotion
4 жыл бұрын
According to Strongbad: "Technology is anything that you don't understand how it works and if you break it you have to buy a new one."
@chrisdaley2852
4 жыл бұрын
So retractable pens are technology. Got it.
@CH-bd6jg
4 жыл бұрын
@@chrisdaley2852 pen goes in, pen goes out. you can't explain that! just buy a new one!
@columbus8myhw
4 жыл бұрын
Chris Daley I mean, yes… but also I was willing to say scissors are technology so maybe I'm not a good judge of these things
@OrchidAlloy
4 жыл бұрын
@@chrisdaley2852 Yes they are
@diablominero
4 жыл бұрын
So my desktop computer isn't technology because I built it and could replace a single broken component rather than the whole thing?
@columbus8myhw
4 жыл бұрын
"Like, there's no point asking for feedback if you're already pretty sure you know what the answer is, right?" …Do you want me to answer that question?
@FortoFight
4 жыл бұрын
If you think about it, this is a lot closer to how a human learns. A human won't constantly bug you for feedback every single time it does something, nor will it learn how to do something properly from a standardised function (e.g. exam mark schemes). A human will independently use its available knowledge, and occasionally ask for help when it's unsure what to do.
@dannygjk
Жыл бұрын
Do you have any children? Kids seek approval.
@owenpawling3956
Жыл бұрын
@@dannygjk no, but he is right. Kids are just unsure more often.
@Nico-ur2po
Жыл бұрын
@@dannygjk You don't correct a kid every time they talk using improper grammar or mix up word order. You correct them every now and then, and they learn over time combined with observing how other humans talk.
@dannygjk
Жыл бұрын
@@Nico-ur2po I didn't, (I have two kids).
@atimholt
4 жыл бұрын
“Are scissors technology?” Me: yeah, of course. “Most people would say no.” ¯\_(ツ)_/¯
@totally_not_a_bot
4 жыл бұрын
Those of us who watch these videos don't really qualify as most people.
@pauljs75
4 жыл бұрын
Even sticks can count as technology, if implemented as tools in some way. (Combination of tools and methods to achieve some goals. Usually making a task easier, or doing something else that improves conditions for the tool user.) Obviously such is not the latest and greatest technology, which seems to be the definition this video is going for.
@lucar6897
4 жыл бұрын
I also think of calculators as artificial intelligence...
@shayneoneill1506
4 жыл бұрын
Yeah the part of my brain that did those anthropology units would never let me think scisors arent technology
@NoahTopper
4 жыл бұрын
When I was a kid I definitely would have said no. But I remember at some point being taught that anything along the line of a pencil or chair was technology, and that sunk in. But I imagine a lot of people still have that initial instinct.
@Henrix1998
4 жыл бұрын
I can already imagine the Indian ML farms where thousands of people just evaluate learning
@TurkishLoserInc
4 жыл бұрын
Sounds a lot like the premise for The Matrix. "On a scale of 1-10, how real do you think this is?"
@Encypruon
4 жыл бұрын
It's called Amazon Mechanical Turk.
@Verrisin
4 жыл бұрын
Damn, that actually sounds likely... - Here is my idea: since AI will take all our jobs... There will be one job of the future: *Specifying preference.* - I actually don't hate it. :D
@Verrisin
4 жыл бұрын
... thinking about it: It kind of is the ideal job, isn't it? Do we, as humans, even want to do anything more than that? - Our job will be saying what we want in the world, and how we want things to work... It will even work as a voting mechanism for policies since they will be run by AI - that figures out how to best match our preferences... - I think this is the way... (or at least a good direction for now ^^)
@benalias5766
4 жыл бұрын
I can already imagine a complex AI which is surprisingly good at a wide variety of tasks... and turns out to have hired a load of people in India to do its work for it.
@stefano8936
4 жыл бұрын
Robert Miles: "what is technology?" Me: move the finger to calibrate the amount of video to skip Robert miles: "don't skip ahead" Me: humbly obey
@GrixM
4 жыл бұрын
I feel betrayed because the next 5 minutes were just repetition of previous videos so I wish I had in fact skipped ahead.
@jnevercast
4 жыл бұрын
Yeah he got me too. I was about to skip just as he said don't skip. "Well okay!"
@Atariese
4 жыл бұрын
The thing is... the question he poses after that leads me down that rabbit hole and away from his video... definitely not the intent i would say
@riperian8954
2 жыл бұрын
@@GrixM lol i did exactly what you and OP did, only I was like 'okay okay that's enough of that' after about 2 minutes. still a brilliant video overall though xd.
@Noxeus1996
4 жыл бұрын
Definitely one of the best educational channels on KZitem.
@zacharieetienne5784
4 жыл бұрын
hold on to your papers and i'll see you, next time!
@CynicatPro
4 жыл бұрын
@@zacharieetienne5784 TwoMinutePapers is also super good X3
@hypebeastuchiha9229
Жыл бұрын
@@CynicatPro he sucks
@TheMan83554
4 жыл бұрын
The thing about your channel is the little touches of 4th wall humour. Having backflip you say "wait I don't have to do a backflip?" Was brilliant.
@riccardoorlando2262
4 жыл бұрын
So in a couple years captchas will be reward predictor training? "Which of these is the better shoe design"?
@toxicpsion
4 жыл бұрын
nah, i'd bet they do it already; just more subtly than that.
@LoveScreamTrue
4 жыл бұрын
@@toxicpsion Like Google CAPTCHA? - "Select all traffic lights"
@johnnymellon7414
4 жыл бұрын
"Select all the pictures with Sarah Connor in them" ... wait what?
@z-beeblebrox
4 жыл бұрын
@@LoveScreamTrue Except it'll become "Select your favorite traffic lights"
@stribika0
4 жыл бұрын
Which of these places do you prefer as a shelter during a robot uprising?
@Macieks300
4 жыл бұрын
"in a later video" well... see you in 3 months then
4 жыл бұрын
This channel always worth the wait :)
@griest5493
4 жыл бұрын
IKR, what a tease.
@MatthewStinar
4 жыл бұрын
You can't rush this kind of quality! Do you know how long it takes to read and digest all those research papers?
4 жыл бұрын
... almost there.
@Macieks300
4 жыл бұрын
@ to be fair Robert was on Computerphile in the meantime kzitem.info/news/bejne/lGeou2GMs3hmqqw
@megajor232
4 жыл бұрын
Whatcing your videos make me feel smart without actually having to be
@benalias5766
4 жыл бұрын
Sounds like you're gaming your reward metric.
@ephemeralvapor8064
4 жыл бұрын
Maybe your evaluation of his teaching is: Good teacher = true Because he brings understanding lesser teachers could not in the same time and effort on your part.
@sharkinahat
4 жыл бұрын
I wouldn't mind an ad. YT trained me how to skip paid promotion.
@jacoblysinger
4 жыл бұрын
KZitem Vanced vanced.app
@rr.studios
4 жыл бұрын
@@jacoblysinger lol im using this app rn
@HansLemurson
4 жыл бұрын
What sort of reward function did you use?
@zeikjt
4 жыл бұрын
8:50 That backflip part was super enjoyable :D
@MrCreeper20k
4 жыл бұрын
17:25 Don't worry Robert, at least I don't mind an ad at the end. And if anyone should get that bread, it's you.
@briandoe5746
4 жыл бұрын
I am in a room by myself and I audibly cussed when I heard that openai and deepmind we're working together on something. Google's apparent lack of concern with safety is one of the reasons I want your videos sir
@naturegirl1999
4 жыл бұрын
Brian Doe why is two AIs with different modes of thought working together a problem? Humans have different modes(parts of the brain specialized for different tasks) that combine the inputs from these disparate programs into a coherent idea of the world. Imagine trying to learn about your surroundings when the only sense you have is the ability to differentiate temperature and you will understand why certain AIs need others to help with things.
@briandoe5746
4 жыл бұрын
@@naturegirl1999 my main concern with AI is not the expediency that it gets to general intelligence. My concern with a i is the safety mechanisms and their capabilities when it gets to general intelligence. Google has multiple times proven to be unconcerned about the safety question in This is highly concerning
@jessgold551
4 жыл бұрын
I have watched all of Robert's videos several times. Its perfectly paced, well considered and clearly communicated. There is so much there its interesting to watch, sleep on it, and watch again later to catch more. I also enjoy the presentation and multiple interesting ways of presenting things like word popups and cut to screen as well as some graphics and clips. If it helps with demographics I am a former software engineer and still work in I.T.
@the1gip
4 жыл бұрын
You, sir, remain one of the most interesting educators in KZitem. The effort you've put in to making this video watchable and entertaining really shows. There's not too many people I can watch for nearly 18 minutes in front of a beige backdrop and still be hooked.
@Varue
Жыл бұрын
Humans being able to simulate problems in their head to predict different outcomes is one of their greatest strengths, it means they can be confronted with new experiences they haven’t evolved specifically for and come up with a solution from a list of possible solutions and stand a much greater chance of overcoming the problem without dying
@OrioPrisco
4 жыл бұрын
Hey it's really cool for the viewers that you turned down that sponshorip offer, thanks
@dontfeo
4 жыл бұрын
Nah he should've taken it. U can skip it anyway and it would help him bring more content.
@FrotLopOfficial
4 жыл бұрын
That last few minutes of your video will go unnoticed but for those who do, we very much appreciate it.
@amyshaw893
4 жыл бұрын
just replace the human with another ai, and get the human to rate that ai. not good enough? MOAR AI!!11!!
@DDvargas123
4 жыл бұрын
It's AIs all the way down!
@thehypnotoad5184
4 жыл бұрын
Just make an AI trained on footage of people doing back flips, no need for human input Even if the AI is "only" 99% accurate it should be enough
@DDvargas123
4 жыл бұрын
@@thehypnotoad5184 "footage of people doing backflips" IS human input
@thehypnotoad5184
4 жыл бұрын
@@DDvargas123 I mean the input already exist its just need to be collected, its kinda going full circle but it would be interesting to see if you can speed up the reward model that way
@rumplstiltztinkerstein
4 жыл бұрын
@@thehypnotoad5184 but the ai will find ways to exploit it. Nothing stops us from giving the footage and having a human checking it from time to time telling it to stop using it's head as a catapult when the ai was supposed to be running
@weirdsciencetv4999
Жыл бұрын
This channel is so underrated. I had to do just what he proposes in one of my experiments in college. The technique most definitely works!
@arthurguerra3832
4 жыл бұрын
I've been so long without your videos. Please upload more frequently so we can drink your intelligence and knowledge.
@haldir108
4 жыл бұрын
I am EAGERLY awaiting that video about self-teaching or whatever it is.
@wilhem13
4 жыл бұрын
A video upload ?? My day's already better. Great content my friend, THIS is why I don't watch TV anymore.
@rosborr4330
4 жыл бұрын
I subbed because you knew I'd skip ahead the moment you said 'What is technology?'. You win this round, Robert.
@explogeek
4 жыл бұрын
Loving your videos, I understand it takes time to research and script and edit, but I wish they came out more often...
@dontyoufuckinguwume8201
4 жыл бұрын
The guy has a full time job, the only way to get him to make more videos is to donate ^^
@brendanjackman3600
4 жыл бұрын
"Hmm, reward functions are a limiting factor on some ML capabilities. This is a problem. How do we solve problems? WITH ML"
@DDvargas123
4 жыл бұрын
Sometimes a solution is so good it can solve its own cons
@MichaelWBauer
4 жыл бұрын
It's definitely funny when you frame it this way, but it's also interesting to note the similarity here with the brain. The brain is a system of interconnected neural networks which each are responsible for certain aspects of our thinking capabilities. It's not too hard to imagine the connection between the logical extension of the results in this video and the architecture of the human brain.
@default632
4 жыл бұрын
@@MichaelWBauer Remember where the word neural network came from. Duh
@MatthewStinar
4 жыл бұрын
I think you're describing a Generative Adversarial Network. en.m.wikipedia.org/wiki/Generative_adversarial_network
@gus2747
4 жыл бұрын
"If you squint the training process is sort of like a compiler " --- great sentence!
@NoahTopper
4 жыл бұрын
12:19 I approve very greatly of your use of "eachother" as one word. The world needs this change. I don't know if you and I talked about this at all at the EA Hotel, but I've been trying to convince everyone to write it like that.
@squirlmy
4 жыл бұрын
I started to do that, but "spell correct" too often comes on and I've gotten used to following automated corrections. I'm wondering if automated (or even AI writing assistants) will slow the evolution of language and grammar, and perhaps even pronunciation will remain in stasis not because of any changing dialect cues of social status, origin (or adopted location), or otherwise, but because of how our "correcting" algorithms are programmed in communication devices.
@qwertyTRiG
4 жыл бұрын
@@squirlmy You've reminded me that I really need to create a dictionary with Oxford Spelling (en-GB-oed).
@discipleoferis549
4 жыл бұрын
I've been writing "eachother" for 15 years now. I've even told off some of my English teachers for trying to correct me. Heck... I remember back in 6th grade, I think, telling off my teacher for incorrectly correcting another student that had written "ain't". I was an opinionated 11-year-old, haha.
@NoahTopper
4 жыл бұрын
@@discipleoferis549 I told my high school English teach that I was attempting to turn "eachother" into one word, and if she'd be willing to not mark it wrong when I used it. She was super on board.
@qwertyTRiG
4 жыл бұрын
@@NoahTopper It definitely makes sense. Similarly, I tend to distinguish between "alright" (acceptable) and "all right" (completely correct).
@augustinaslukauskas4433
4 жыл бұрын
I'm not surprised this result is amazing considering both OpenAI and DeepMind worked on it. I dream of working for one of them after uni. Thank you for explaining the paper so clearly and in an entertaining way!
@sam-you-is
Жыл бұрын
did you make it sir
@igordmitriev7211
4 жыл бұрын
>We'll talk about them in a later video //Gets hyped, realises that it's the latest video on the channel, gets reminded of Patreon, enlists to see the video a bit sooner
@DamianReloaded
4 жыл бұрын
I would define intelligence as "the ability to autonomously identify problems and search for solutions to achieve goals"
@morkovija
4 жыл бұрын
Been a long time Rob! Hope you brought the sauce!
@non_complete
4 жыл бұрын
I agree wholeheartedly with your name.
@wilhem13
4 жыл бұрын
Most videos I MUST watch them on, at least x1.25.
@morkovija
4 жыл бұрын
@@wilhem13 means that your content information density is quite high. No way I can speed up mathologer for example. But easily 2-3x some non-narrated restoration videos
@V1ctoria00
4 жыл бұрын
Damn. I dont usually find a new channel by its latest video. I was hoping I could binge this topic here.
@wiktormigaszewski8684
4 жыл бұрын
This is what I always thought of making a good robot - you give a feedback to it, while it learns, just like parents to a child. Very good, that this concept has been put into practice. It is definitely going to be helpful for AI companies making robots for their clients, who do not know exactly, what they need. The guy from "two minute papers" would say "what a great time to be alive!" :-)
@reneko2126
4 жыл бұрын
Yeah, why not just raise AI like kids? kzitem.info/news/bejne/xpePr4lskoqjZqw
@narita_i
Жыл бұрын
what a time to be alive
@bensonmiakoun7674
4 жыл бұрын
Highly interested for the next video! Thanks
@Alex2Buzz
4 жыл бұрын
Miles: "What is technology?" *VSauce music*
@ohokcool
4 жыл бұрын
Did u go to Palms Middle?
@panstromek
4 жыл бұрын
This is really on point for a problem I am trying to solve now. I do some computer vision for which it is way too complicated to create training data and way too complicated to write reward function, but it's the "You know it, when you see it" type of thing. Thanks for making this video ;)
@StromyYTA
4 жыл бұрын
These videos are awesome. Feel almost like I can keep up to date with AI progress.
@hypnotourist
4 жыл бұрын
Very clear presentation for a fascinating topic ! Your "patreon/human discussions" reward function has trained you well, so to speak :-)
@jayteegamble
4 жыл бұрын
meh, we don't mind a 60 second spiel if it gets us more of your awesome content (and we can skip forward anyway). Grab that bag imo
@diribigal
4 жыл бұрын
This is a tough problem since watching to the end is probably valued by KZitem's AI, and even though you and I wouldn't mind, some would. So how do the short term gains of the sponsorship compare to the long term dividends of the KZitem algorithm and extra subscribers, which increase visibility over time (perhaps by a minor amount) ?
@sevret313
4 жыл бұрын
@@diribigal That's why you don't put the sponsor at the end, but the start.
@MidnightSt
4 жыл бұрын
...i don't know much about this area of IT, but the first thing that came to my mind after reading the video title was: "oh, yeah, what's a better idea than creating a black box that nobody knows how and why it works, and what its boundary conditions actually are? why, yes, creating such a black box without even explaining to it what is good and what is bad! BRILLIANT!"
@Telhias
4 жыл бұрын
With regards to puppeteering the robot to perform a backflip. There is a whole community of the Toribash game who do exactly that. It is a game in which every time period (measured in ms) you decide which joints to flex, extend, hold rigid and relax.
@MrLuMax5
4 жыл бұрын
In my opinion you could have done the sponsorship. It helps you as you help us, 60 seconds is like not that much and you deserve it for all the work.
@xxThabaxx
4 жыл бұрын
This is something I've been thinking a lot about as it could work similarly to how we tend to train children. It seems like you could first train a machine learning algorithm to recognize social cues (lingual and physical responses) regarding it's behavior and build a reward function based on that. I think you still run into some complicated reward hacking situations like the machine wanting to force certain reactions. But it seems like it would get us closer.
@eathonhowell7414
Жыл бұрын
This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.
@eathonhowell7414
Жыл бұрын
This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.
@gwen9939
Жыл бұрын
@@eathonhowell7414 You should probably watch the video called Why not just Raise AI like Kids.
@travboat
4 жыл бұрын
I think your opening question (what is technology) exemplifies the difficulties in making an intelligent AI (and the statement I just made is another example). Humans have a good ability to interpret things, or simply put, we know it when we see it. we know what technology is a, we understand what pollution is a, but trying to put those terms and a definite box is very difficult and, and unfortunately that's how computers pretty much operate. We give them specific instructions, and that's what they do. Your channel is excellent, thanks for the interesting content!
@Felixkeeg
4 жыл бұрын
I am actually a bit dissappointed that you didn't go for the backflip lol
@ruvimlashchuk6134
4 жыл бұрын
My disappointment is immeasurable, and my day is ruined.
@ruvimlashchuk6134
4 жыл бұрын
My disappointment is immeasurable, and my day is ruined.
@Suush
4 жыл бұрын
He forgot to program a reward function :P
@AsmageddonPrince
4 жыл бұрын
Your voice is so soothing, and videos so informative.
@EU_DHD
4 жыл бұрын
I like watching you talk about AI safety more than I like learning about AI safety. And I really like learning AI safety!
@unvergebeneid
4 жыл бұрын
Shade much? So you're not learning AI safety by watching him talk about it?
@EU_DHD
4 жыл бұрын
@@unvergebeneid Those are two aspects of the same thing. I just like the one aspect more than the other.
@cmoxiv
4 жыл бұрын
Mate, you are brilliant. Great content with a philosophical flavour. The last part about Patreon is probably the only thing that actually convinced me about supporting content creators on Patreon. Well done mate. Well done.
@crypticnomad
4 жыл бұрын
When people ask me what AI is I generally say that it is a universal function approximator.
@ZachAgape
4 жыл бұрын
The first videos I saw u in were the computerphile videos on AI which I enjoyed a lot, and thanks, this video was very interesting too! Also thank you for not wanting to waste 60 seconds of our time ^^
@firefoxmetzger9063
4 жыл бұрын
hmm. If samples are chosen based on unusual examples where the ensemble disagrees, what happens if the exploiting strategy has high agreement among members of the ensemble? It would never show up to the human for "correction" right, because the ensemble is confident about it? So rather then having to trust the network that performs the task, we now have to trust the ensemble training the reward function?
@MatthewStinar
4 жыл бұрын
I was thinking you would still want to throw in some strong matches just to verify.
@vincentguttmann2231
3 жыл бұрын
So really, this video was brought to you by... you, the patreons! You should make them decide on a sponsorship message!
@mrWade101
4 жыл бұрын
Scissors would be Old technology, whilst when most people say Technology they mean New technology.
@DarkPrject
4 жыл бұрын
This continues to be one of the most interesting channels on KZitem. Fascinating video. Can't wait to see the next one.
@BinaryReader
4 жыл бұрын
Technology is just another word for "Tool". Everything created by humans of some utility is a tool, and is therefore technology. I wasnt aware there was confusion around the definition.
@oldvlognewtricks
4 жыл бұрын
Queueing was created by humans and is of some utility. Queueing is not technology. Stand-up comedy was created by humans, and is of some utility. Stand-up comedy is not technology. It is difficult (or perhaps impossible) to write a definition that doesn’t raise exceptions, which I suspect was the point Robert was trying to make. Your example only confirms the point.
@BinaryReader
4 жыл бұрын
Not to get into a huge discussion here, but both of those could be loosely defined as technologies. What are jokes if not tools of social interaction? What is queuing if not a tool for social order (assuming you mean standing in line and not the computer science definition, which is also a technology)
@oldvlognewtricks
4 жыл бұрын
@@BinaryReader I continue to agree, and disagree. A joke and a queue might be tools, but 'technology' is more of a push. technology /tɛkˈnɒlədʒi/ - noun the application of scientific knowledge for practical purposes, especially in industry. "advances in computer technology" machinery and equipment developed from the application of scientific knowledge. "it will reduce the industry's ability to spend money on new technology" the branch of knowledge dealing with engineering or applied sciences. There is perhaps some science to comedy, but a social convention like queueing is hardly an application of science, so much as an emergent social expediency, or whatever. I'm not getting 'engineering' from either, except in the loosest sense. Alternatively, to take the definition to its logical conclusion, all human action is technology and the definition loses its usefulness. But you're right - no potential for confusion whatsoever ;) At best, there is comparative 'technology-ness' - a joke might be technology, but it's less technology than a smartphone. Maybe moreso than a punch to the face. Maybe it depends on context. Still works to make the 'this is not straightforward to define' point.
@squirlmy
4 жыл бұрын
@@BinaryReader Perhaps it's an Americanism, but there's another definition of "tool", and you're well on your way towards demonstrating it. Both of you actually, because none of us need or want an in depth discussion of the definitions of either word. Rob's brief mention of it doesn't warrant further commentary.
@drdca8263
4 жыл бұрын
Rob’s definition kind of closely matches Strong Bad’s definition, of “anything that’s really cool and you don’t know how it works”. Ryan North’s definition includes language, and I think basically any technique which has been invented. But yeah, like Rob says, it isn’t a big deal how we define it. Slightly different definitions can can be used in different social circles, or even in different conversations among the same people.
@stefx5994
4 жыл бұрын
Hi Rob, Firstly many thanks for the amazing videos you produce - as a fellow Dev and Techie i find your content and delivery style some of the best and most informative on KZitem. Could i request a future video in which you explain the coding side of developing a basic AI Agent? It would be great to learn how to explore some of the concepts and interesting problems your videos highlight. There's a lot of frameworks, open source projects and tutorials out there already, but they present a very black box, end result focused approach rather than explaining what components we have and how they are working together to reach the end result..the type of complexity you seem to be fantastic at explaining :)
@RobertMilesAI
4 жыл бұрын
I've been thinking about a "Write an AGI from scratch" series, but it would be a lot
@dsdy1205
4 жыл бұрын
When you realise you've reinvented the parent-child relationship
@AugustusBohn0
3 жыл бұрын
nature wins again
@dsdy1205
3 жыл бұрын
God coming back to this comment a year later it sounds so stupid
@stephentaylor356
Жыл бұрын
Not having a sponsor earned you a like an extra comment from me...for what that's worth. Keep up your fantastic work.
@johnopalko5223
4 жыл бұрын
Thank you for not accepting sponsorship from a company that wanted you to do a 60-second spiel. There are companies who sponsor videos and are happy with just having their logo displayed in the corner once or twice. At most, they have the presenter start out with, "This video is sponsored by So-and-So. [One or two brief sentences.] Link in the description below." These are the companies that get it.
@Laborejo
4 жыл бұрын
"It is easier to write a program to evaluate a solution". This is also why artificial music composition does not produce even half-decent outcomes yet. Creating an artificial listener (or many of them) is still far down on the to-do list.
@postvideo97
4 жыл бұрын
There have been no research (that I know of) that uses human reward modeling for music generation. It could be the next breakthrough in music generation!
@Sceleri
4 жыл бұрын
this method could work for that tho you just tell it which beat is more fire
@ToriKo_
4 жыл бұрын
Sceleri exactly
@dasc000
4 жыл бұрын
emily howell: hold my beer
@JsbWalker
4 жыл бұрын
Have none of you heard of Emily Howell?
@tedstokes57
4 жыл бұрын
I like that there's a hint about the next video at the end
@bencrossley647
4 жыл бұрын
This sounds like a method to solve NP problems. Easy to verify Hard to solve.
@4.0.4
4 жыл бұрын
The year is 2069. A computer is granted the prize for solving the P vs NP problem. Despite the judges being unable to confirm that the overly-complex thesis the computer came up with was correct or not, it looked quite correct to all experts. A mathematician was quoted saying: "...I mean, in the two new branches of mathematics that the computer invented, the math does check out." It is unknown what the computer will do with the prize, but several paperclip factories report being contacted shortly after the prize money was deposited.
@bencrossley647
4 жыл бұрын
Chrysippus +1 for paperclips (assuming you’re referencing the game) It will work it’s way to a galactic army at some point.
@Kevin________
4 жыл бұрын
@@4.0.4 Alright... you win this comment section.
@griest5493
4 жыл бұрын
I was thinking the same thing when he said that. Also, the halting problem is a thing. The catch is that NNs are just making approximations.
@default632
4 жыл бұрын
@@4.0.4 universalist paperclips, hours of waste time for a reference on the interwebs. Worth it
@stephen-torrence
4 жыл бұрын
Closest thing to a literal "bicycle for the mind" I've seen in AI research. Cool!
@fergochan
4 жыл бұрын
Great video, but there's still one thing I'm confused about: how do I tell if that simulated robot is doing a back flip or a front flip?
@SkyboxMonster
Жыл бұрын
Patreon feedback. Congrats you are now smarter than many hundreds of Advertising specialists. Just think of every advertising blunder that the public caught. but the designers did not.
@xenoblad
4 жыл бұрын
You've been playing Raid: Shadow Legends for 10 years?!
@bscutajar
4 жыл бұрын
This is one of the best channels of youtube. The guy's explanations are extremely well done.
@sk8rdman
4 жыл бұрын
"Mattresses and VPNs." Someone watches SmarterEveryDay
@philipripper1522
4 жыл бұрын
The power of simple questions. I was like OH OH I KNOW THE ANSWER TO THAT QUESTION and immediately wanted to comment it. It took about ten full seconds before I realized that everyone would, and that it was rhetorical, regardless. I was so proud of myself and eager to show off for a few seconds, with such a simple thing. We humans are weird.
@BubbleManxx
4 жыл бұрын
I laughed at the Vsauce reference.
@Hexanitrobenzene
4 жыл бұрын
Could you provide a timestamp ? Looks like I missed it.
@BubbleManxx
4 жыл бұрын
@@Hexanitrobenzene Lol, it's at the very start of the video. When he pops up from the lower half of the screen and asks "What is technology?".
@Hexanitrobenzene
4 жыл бұрын
@@BubbleManxx Oh, that one :) Looks like I'm rusty on VSauce, haven't watched him in awhile...
@andersenzheng
4 жыл бұрын
@@Hexanitrobenzene Not your fault. There hasnt been one for a while
@Havermeijer
4 жыл бұрын
Your videos made AI an accessible topic for me. I love the pure logic and game-like thinking.
@DigitalicaEG
4 жыл бұрын
"Don't skip ahe..." Me: **skipping**
@geronimomiles312
Жыл бұрын
You choose to tackle issues which really clarify the meat of the process , and do fantastic. Really good stuff👍
@Deez-Master
4 жыл бұрын
We are getting close to having P=NP
@governmentofficial1409
4 жыл бұрын
Silicon Valley spoiler
@CyberAnalyzer
4 жыл бұрын
You are a hero. You democratise AI. I can't wait for the next video!
@realityChemist
4 жыл бұрын
"How do you learn when there's nobody who can teach you?" Read a textbook or a WikiHow article?
@Vode_ika
4 жыл бұрын
That is someone teaching you, via a book.
@realityChemist
4 жыл бұрын
@@Vode_ika True, I was thinking in the context of someone sitting there teaching you, like in this video. So I guess the answer is just unsupervised learning? Although I could have sworn Rob already did a video on that... Maybe it was someone else on Computerphile?
@drdca8263
4 жыл бұрын
Isn’t the answer “think very hard, write things down, and when you can do so safely, try many options, test your previous ideas both by the results of the options you took and by more thinking, repeat”?
@Biped
4 жыл бұрын
@@drdca8263 but that all requires some way of evaluating your results (aka having a reward function that teaches you)... It seems weird that there would be a way without that. I mean... the information has to come from somewhere...
@SimonBuchanNz
4 жыл бұрын
I would suspect the answer is, in fact, something like googling it, but this, of course, requires a pretty complete internal model of the world to start generating and testing against your own predictions. I'm struggling to think of alternatives that aren't just this in disguise though: the best I have is looking at a small set of successful examples and trying to break down from the solution used what the problem is, so you have something to test your own solutions against. If there's a decent way to describe that that isn't going to fall prey to small training data issues like overfitting, I'm excited: that's starting to really sound like the casual meaning of learning!
@LeanMeanLearningMachine
Жыл бұрын
Refreshing approach to the actor-critic model :)
@roberthoople
4 жыл бұрын
"Training AI Without Writing A Reward Function..." *Capitalism Drools*
@MatthewStinar
4 жыл бұрын
Watching this video made be realize how much corporations are like poorly programmed artificial intelligence, like the stamp collecting AI that decided to "Kill all humans." We take our instrumental goal of maximizing profits and assign that as the corporation's terminal goal. In pursuing it's terminal goal of maximising profits, the corporation decides to "Kill all humans." 😲
@rerere284
4 жыл бұрын
9:00 There's a game called Toribash where you do exactly this, but with a more complex body. It lets you specify the states of all the joints in 1 second tine segments, playing out like speed clocks from chess when playing multiplayer.
@Havermeijer
4 жыл бұрын
I remember that game! You could pull someones head off and stuff. Pretty difficult to master though. Also, the game kept sending me happy birthday emails for years and years. I didn't get one last time :(
@Karpata1
4 жыл бұрын
Hey if I have to hit the "L" button a couple times so you can get a couple hundreds or even a couple thousands of pounds I'm fine with it.
@BaronVonScrub
Жыл бұрын
No idea what video you followed up this one with, but my guess as to the solutions to the overcomplex problems would be doing them piecewise. E.g., design a city plan. Start with the pieces. Reward model what makes a good road. Reward model what makes a good path. Reward model what makes a good bike lane. etc. Once you have these pieces to a reasonable level, you provide those as puzzle pieces to the next tier of model up. Reward model what makes a good intersection. Reward model what makes a good highway. Reward model what makes a good backstreet. Once again, next level up. Reward model what makes a good bus route. Reward model what makes a good suburb. Reward model what makes an area walkable. Continue this until you've reached a singularity of: Reward model what makes a good city plan. Now, this is more work intensive, obviously, and still requires some level of understanding by the trainer, but it does make it remotely tacklable by providing you some kinds of building block pieces to make use of, minimizing each new layer's phase of "flailing aroud randomly". One weakness would be that you lose one of the strengths of more wide-focused NN's, the ability to consider the intercontext of the constitutent parts on how they build themselves. My suggestion would be that you could provide limited-but-discouraged access to the lower rungs for adjustments within the higher models, to allow a level of intercontext. That is, the reward models continue to be developed, but to a lesser degree, as you work on the greater models.
@TheRealFaceyNeck
4 жыл бұрын
I wholeheartedly agree: it is MUCH easier to evaluate a solution than to generate a solution. You could pretty much define mathematics that way: trying to evaluate as many known solutions as possible, to get new information, and generate a solution, if-and-only-if previous solution evaluations proved unsuccessful.
@fish_wizard618
Жыл бұрын
It seems like this method of evaluation could also help AI's learn to do much more arbitrary things. Like if you wanted a “pretty” pattern, you could train it to make more patterns that you find pretty using this.
@mildpass
4 жыл бұрын
I wonder what it means for the reward model to be 'unsure'. Sounds similar to a reward function since it essentially determines how well the network will train. The idea of having neural networks all the way down is certainly interesting. Can't wait to see what kind of crazy stuff they can pull off with this architecture. Amazing content as always.
@esquilax5563
4 жыл бұрын
He put some text on the screen explaining it at 12:20. They use multiple reward models, and the ensemble is considered unsure if the individual components disagree with one another :)
@mildpass
4 жыл бұрын
@@esquilax5563 Those are an ensemble of "predictors" not reward models. Otherwise there would be an issue of how you prevent the ensemble of reward models from converging to the same reward model. I guess I will rephrase the question to what are the predictors then. I haven't read the paper yet but probably will at some point soon.
@esquilax5563
4 жыл бұрын
@@mildpass yes, predictors of the "real" human reward function. The individual predictors are reward models, and the ensemble is another model.
@RichardEricCollins
4 жыл бұрын
Very interesting paper. Thanks. I'm currently studying "managing technical innovation" for my Masters. Some people on the course are taking the view that no innovation is closed, it's all open innovation if you buy even one screw for the product you're creating. Your definition early on that once a technology is common place it ceases to be technology is an interesting idea. I will be discussing this on our forums. The course I am doing took four weeks to define what technology was. I will be writing some reports on innovation in AI. This paper will be useful. :)
@harrisonfackrell
3 жыл бұрын
This makes me very, _very_ excited. You're kinda' blowing my mind.
@injinii4336
4 жыл бұрын
Surely scissors are an example of some of our most cutting-edge technology. Ba-dum-tss!
@esquilax5563
4 жыл бұрын
Good to see you on here again! You have some of the most fascinating content on KZitem
@maloxi1472
4 жыл бұрын
Thank you for bringing this idea to my attention ! Holy cow ! This is such a simple, yet beautiful idea !
@frib75
4 жыл бұрын
An amazing video. Never heard such a beautiful explanation of what reinforcement learning is. Thank you !
@Metrolonx
4 жыл бұрын
Love how the video quality grows with every video! Keep it up!
@orcu
4 жыл бұрын
I liked this explanation very much. Great work!
@gorgolyt
Жыл бұрын
Funny this appeared in my recommended feed, as I just came across this concept of Reward Models for the first time today, because they were used for ChatGPT. Great explanation, thanks.
@lukasmrazik3485
4 жыл бұрын
Quite a good chance I will talk about this in my master's State final exams. Thank you, sir, for saving a lot of my time!
@GermanTopGameTV
3 жыл бұрын
If you think about it, this is basically how any technology sport works. There is a reward (winning points in a championship) and a ruleset that was build upon evaluation of past performances (such as technological developments in motorsports). Every rule change is a means of "Left is better/Right is better" to keep Teams (Agents) from doing stuff they shouldn't do to gain higher scores and keep the sport in the way you want it to be. Motorsport in particular fits this example quite well. What we might take away from this is that groups of humans with a certain goal act similar to AI agents - They follow the rules as written and don't give a damn about the intention of the rule.