Julian Schrittwieser

33:55
3 жыл бұрын

MuZero - ICAPS 2020

0:34
3 жыл бұрын

Mandelbrot Set

9:07
4 жыл бұрын

MuZero - Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

0:47
13 жыл бұрын

Game of Life

Пікірлер

@HarutakaShimizu
3 ай бұрын
go go Julian!
@itshypevideos
4 ай бұрын
You are a legend, don’t let Muesli or Agent57 steal any of your thunder, only your muzero beat VP9 video codec and did anything in the real world. Alphafold also owes many of its achievements to your work, you are going to have saved many lives.
@weserfeld4417
5 ай бұрын
You're great as well. Thank you for sharing
@The_animated_one
2 жыл бұрын
Man, such in error
@PolishBagle
2 жыл бұрын
To infinity and beyond
@PasseScience
2 жыл бұрын
Do the internal states (whose encoding emerges) may have several coding for the same state? is Muzero using internal states to store some kind of search metadata (that could be highlighted if reaching a state with a h.g.g.g.g.g chain leads to a different coding than reaching it through h called once alone from its corresponding position).
@PasseScience
2 жыл бұрын
We stay tuned (more than ever) waiting for the big Alphastar, Agent57, Muzero merge! Seeing the impressive improvement in things like Dall-e 2 I was wondering if it could be possible to see planification as an inpainting task of the states timeline (possibly as various levels of abstraction, kind of auto encoding convolution on the state x time space). You have the states-time history, you prefill what correspond to the abstract concept of "I win" at the end of the states timeline and you ask your inpainting component to complete what's in between in this multi-level timeline as an inpainting task, (which could be free to do an accurate inpainting at some abstract level, like predicting the macro concept of "attack on the right" without planning necessarily in details the thing). The capabilities of spatial consistency of Dalle-2 makes me think that it's largely sufficient to have usable planification in a states timeline even for a game like starcraft. This predictor model (that can be seen as a planificator if we prefill "I win" at the end of the timeline) could generate a couple of future timelines (which could be seen as playouts but it's more general than a tree it's just a set of timelines) that could be used by some kind of actor-critic architecture (hopefully extracting of it some relevant data for decision making). Have you seen somewhere this kind of component predicting more that one step ahead (in fact predicting at various level of of state-scale and time-scale, like in music generation) used as a useful-features generator to be inputed to the decision making part? And this Idea of prefilling the abstract end time-line with "I win" concept and seeing the inpainting towards it as a planification?
@vishalkhombare
2 жыл бұрын
Can you please please share some sample CHESS games played by MuZero!
@pacotaco1246
2 жыл бұрын
Huh hmmmghghflarble
@blvckdelavie
2 жыл бұрын
where's the original video with this. the original video was long af.
@3wij
2 жыл бұрын
It's an effect called Fractal noise on after effects I can do this in two seconds
@tylerbray8233
2 жыл бұрын
Fractal noise on After Effects simply repeats the same exact patterns whereas a true mandlebrot doesn't repeat exact same shapes. There is only one repeating shape witch is the shape you start with. Each time you zoom into a true mandlebrot set, there a subtle changes.
@3wij
2 жыл бұрын
@@tylerbray8233 he simply can play with settings and do this
@JoHn-if6wy
2 жыл бұрын
@@3wij youre missing the whole point 🤣✌
@3wij
2 жыл бұрын
@@JoHn-if6wy is that so 😱✌🏻🤣🤪⭕️🙃
@RogerH_CxP
2 жыл бұрын
@@3wij You're trying to be a smartass but you're really not.
@MrProezas
2 жыл бұрын
Beautiful!
@MrBeastCoy
2 жыл бұрын
I thought this burning ship
@HalfBreadOrder
Жыл бұрын
no
@joao333hd6
2 жыл бұрын
Thanks for sharing
@felipeolivos8934
3 жыл бұрын
Man, you are awesome, I finally understood this algorithm, many thanks! 👏🏻👏🏻🤓👍🏻
@TheAIEpiphany
3 жыл бұрын
Julian's hair is brighter than my future
@yhwang366
3 жыл бұрын
great talk!
@joshuayao-yulin6835
3 жыл бұрын
Very cool work, Julian! Is the Julia code open source?
@JulianSchrittwieser
3 жыл бұрын
Thanks! The code is linked in the video description.
$@bensfractals43$
@bensfractals43
3 жыл бұрын
i really like how you messed around with color division to make it seem like the denser parts were spirals. AWESOME!
@MrBeastCoy
2 жыл бұрын
Hah
@shovel4689
3 жыл бұрын
Cool
@mr-boo
3 жыл бұрын
Impressive talk. You managed to dumb it down enough for me to be able to follow most of what is going on here. Thanks a bunch! :) One question: I am building a simulator for Dungeons and Dragons combat as a pet project and was thinking about applying machine learning to explore various questions (optimise combat actions given certain character designs, optimise character design itself, optimise team design given a certain opponent team, etc.). Mainly because I would like to get some objective answers (and offer the opportunity to others who are curious), and the math gets complicated in many cases. It seems to me that MuZero is generic enough to learn the most recurring parts of DnD combat. Assuming you are aware of the game, do you have any recommendations for approaching such questions?
@NextFuckingLevel
3 жыл бұрын
Cant wait for next gen of muzero
@channagirijagadish1201
3 жыл бұрын
Thanks, Julian. Extremely helpful.
@samlouiscohen
3 жыл бұрын
Thank you for the awesome presentation, this is really exciting work. Out of curiosity, how does the agent "select" an action from an action space that it hasn't yet fully learned? For example, what's to stop MuZero from attempting to move the king multiple squares in a game of Chess? Some immediate feedback from the environment preventing the action?
@MichaelSchwabTX
3 жыл бұрын
The best approach is that the game engine provides the list of legal move to the AI before each move. It's also possible for the game engine to just do nothing (ie error silently) when an invalid move is presented or accept the invalid move, and do something useless like show an error on the game field . In any case the game engine ENFORCES the rules. The AI will learn not to make moves that have no utility because the path cannot be explored and no reward is found when that's done. So eventually it will cull out bad moves even if the engine doesn't filter the list of moves on each turn. In application I prefer to cull the legal moves list since that allows learning to happen faster than expecting the AI to learn what error text looks like on the game screen.
@samlouiscohen
3 жыл бұрын
@@MichaelSchwabTX isn’t the env providing a list of legal actions prior to taking an action similar to being “given the rules”? Or is it that because the environment doesn’t provide the next state after taking an action we say that we don’t have the “rules” (in which case the rules == the dynamics function)?
@PSModelling
3 жыл бұрын
Interesting both MuZero and AlphaZero converge to about the same Chess ELO. On the surface looks like there's some bottleneck in the ability perhaps inherent to self-play models.
@alexanderschmitz4474
3 жыл бұрын
Maybe the boundary is defined by the game itself. Maybe this performance means perfect game?
@dilyan-2904
3 жыл бұрын
Can Muzero play original starcraft broodwar and how strong would it be?!
@PasseScience
3 жыл бұрын
Reanalyze is very good! I was thinking to could even be pushed further more generally with a "study" mode in which, instead of new thought only about past games it could produced thoughts about "dreamt" games (or dreamt lines), only playing in it's own head with it's learnt model. Of course we should be careful not to overfit a wrongly idealized version of reality, but I guess in some extent it could generate even more training data without new real world ones and strengthen internal consistency of its networks.
@PasseScience
3 жыл бұрын
Hello, during the search, at a node (not leaf node) the selection policy of which branch to explore is still "hardcoded" (by some function depending on eval aggregation and number of past explorations)? the aggregation policy is hard coded as well ? (I guess it's still the "mean of playouts"). Any tries to have small networks to replace those 2 methods of "branch selection during playouts" and "results aggregation"? (Of course muzero is very impressive, I am just already hyped for the next gen).
@PasseScience
3 жыл бұрын
Hello, Muzero struggles a little on the "5th percentile" of atari 57 compared to agent57. Would you say it's mainly because muzero does not have (yet) the good exploration features of agent57 (during training) ?
@JulianSchrittwieser
3 жыл бұрын
Yes, you could for instance combine the exploration methods from agent57 with MuZero to address this.
@TheAIEpiphany
3 жыл бұрын
Amazing! Thanks for sharing Julian!
@Arcticwhir
3 жыл бұрын
So is MuZero reanalyzed using data from previous model to train, so technically it didnt use 100x fewer frames to train itself, as it wasnt from scratch like the other model which trained on 200million frames...
@JulianSchrittwieser
3 жыл бұрын
No, it does not use any data from previous models. The reanalyze is only applied to data from its own training.
@Arcticwhir
3 жыл бұрын
@@JulianSchrittwieser oh alright, thanks for the clarification
@nichevo
3 жыл бұрын
Thank you for the great talk!
@arthurdequeiroz8393
3 жыл бұрын
Is it possible to see any of the chess games that it played ?
@mim8312
3 жыл бұрын
Future AIs being developed: creating an AI by a combination of multiple AI's, which reportedly is similar to how our brain functions, with different portions performing specific functions, which can then understand and perform a multiple set of completely different tasks better than humans? What could go wrong? Has no one else read Kurt Vonnegut or what the atomic scientists wrote?
@mim8312
3 жыл бұрын
I think that too many people are focusing on the games-playing design as if this were an ordinary player. Since I have significant knowledge, and since I believe that Hawking and Musk were right, I am really anxious by the self-taught nature of this AI. These particular AIs including the more generalized, more recent variant MuZero are not the worrisome thing, albeit each has obvious, potential applications in military logistics, military strategy, etc. The really scary part is how fast these were developed after AlphaGO debuted. We are not creeping up on the goal of human-level intelligence. We are likely to shoot past that goal amazingly soon without even realizing it if things continue progressing as they have. The early, true AIs will also be narrow and not very competent or threatening, even if they become "superhuman" in intelligence. They will also be harmless, idiot savants at first. Upcoming Threat to Humanity. The scary thing is the fact that computer speed (and thereby, probably eventually AI intelligence) doubles about every year, and will likely double faster when super-intelligent AIs start designing chips, working with quantum computers as co-processors, etc. How fast will our AIs progress to such levels that they become indispensable -- while their utility makes hopeless any attempts to regulate them or retroactively impose restrictions on beings that are smarter than their designers? At first, they may have only base functions, like the reptilian portion of our brain. However, when will they act like Nile crocodiles and react to any threat with aggression? Ever gone skinny dipping with Nile crocodiles? I fear that very soon, before we realize it, we will all be doing the equivalent of skinny dipping with Nile crocodiles, because of how fast AIs will develop by the time that the children born today reach their teens or middle age. Like crocodiles that are raised by humans, AIs may like us for a while. I sure hope that lasts. In Jurassic Park, I believe the quote was that someone did not stop to think if they should but thought only if they could, or words to that effect. As the announcer in Jeopardy said about a program that was probably not really an advanced AI long ago, I, for one, welcome our future, AI overlords.
@mim8312
3 жыл бұрын
I think that too many people are focusing on the games, which I also follow, as if this were an ordinary player. Since I have significant knowledge, and since I believe that Hawking and Musk were right, I am really anxious by the self-taught nature of this AI. This particular AI (and its more generalized, even more recent variant MuZero) is not the worrisome thing, albeit it has obvious, potential applications in military logistics, military strategy, etc. The really scary part is how fast these were developed after AlphaGO debuted. We are not creeping up on the goal of human-level intelligence. We are likely to shoot past that goal amazingly soon without even realizing it, if things continue progressing as they have. The early, true AIs will also be narrow and not very competent or threatening, even if they become "superhuman" in intelligence. They will also be harmless, idiot savants at first. Upcoming Threat to Humanity. The scary thing is the fact that computer speed (and thereby, probably eventually AI intelligence) doubles about every year, and will likely double faster when super-intelligent AIs start designing chips, working with quantum computers as co-processors, etc. How fast will our AIs progress to such levels that they become indispensable -- while their utility makes hopeless any attempts to regulate them or retroactively impose restrictions on beings that are smarter than their designers? At first, they may have only base functions, like the reptilian portion of our brain. However, when will they act like Nile crocodiles and react to any threat with aggression? Ever gone skinny dipping with Nile crocodiles? I fear that very soon, before we realize it, we will all be doing the equivalent of skinny dipping with Nile crocodiles, because of how fast AIs will develop by the time that the children born today reach their teens or middle age. Like crocodiles that are raised by humans, AIs may like us for a while. I sure hope that lasts. In Jurassic Park, I believe the quote was that someone did not stop to think if they should but thought only if they could, or words to that effect. As the announcer in Jeopardy said about a program that was probably not really an advanced AI long ago, I, for one, welcome our future, AI overlords.
@dariusduesentrieb
3 жыл бұрын
Very interesting. Currently, the dynamics of the environment must be available as simulation anyway, to be able to generate enough training samples in a feasible time frame, right? Do you think there is any way to use this method or really any reinforcement learning method if we can only train the agent in the real world? Reanalysis seems a step in the right direction. Are there any numbers on how much auxiliary loss usage improves the sample efficiency? What I find most intriguing the idea of hierarchical planning.
@JoaoPedro-pi9ee
3 жыл бұрын
Very nice work! How this agent performs in robotics tasks, compared to HER? Thanks
@furkandurmus7860
3 жыл бұрын
Hello, may I ask what "HER" stands for?
@JoaoPedro-pi9ee
3 жыл бұрын
@@furkandurmus7860 Hi! HER is 'hindsight experience replay', this is the name of the paper from 2017, its available for free on arxiv!
@davidsewell4999
3 жыл бұрын
Thanks for posting this! Great talk. The tips were appreciated. Do you just use tensorboard or something custom for visualizations? For example for viewing game replays in GO.
@JulianSchrittwieser
3 жыл бұрын
Glad you enjoyed it! To view training summaries (mse, cross entropy, loss) we use TensorBoard; to view replays, episodes being played and search statistics we use a custom browser-based visualization built in TypeScript. You can see a screenshot at kzitem.info/news/bejne/rWZ3nmqCpnpojKw
@davidsewell4999
3 жыл бұрын
@@JulianSchrittwieser Nice thanks! I have been trying to integrate more visualizations as a regular part of the development process so this was neat to see.
@mathmo
4 жыл бұрын
Thank you!
@NikolajKuntner
4 жыл бұрын
Very nice, I just read the AlphaZero paper. I might just do a cursory review of it, although it's not my field. If you're in the mood for talk/interview, that might also be interesting. Grüße aus Wien!
@DiapaYY
4 жыл бұрын
coolest qr-code i've seen
@sharonmian5174
4 жыл бұрын
Love the MuZero work, and of course all of your other works in the domain. It's just a pity that MuZero got less media excitement as this is MUCH more meaningful than past works in my opinion.
@DeanRKern
4 жыл бұрын
WOW !
@juanrajagopal2443
4 жыл бұрын
Could this work on a hybrid dicrete action space? e.g. two mini actions in 1 turn
@truthteller4689
4 жыл бұрын
Can you tain it on homework problems? So it can do my homework. Thanks. Also, can this learn semantic knowledge about the world? It seems like it might be able to...but not sure if it has a working memory, an internal grammar, or such like.
@RoboticusMusic
4 жыл бұрын
Is there anywhere I can find working code?
@atypocrat1779
4 жыл бұрын
So interesting.
@JulianSchrittwieser
11 жыл бұрын
you can render it yourself, see the link in the description
@periloussnake
11 жыл бұрын
I wish I could see more generations...