Негізгі бет Can a Reinforcement Learning Agent Learn with NO Rewards? Intrinsic Curiosity Coding Tutorial

Күн бұрын

Can a Reinforcement Learning Agent Learn with NO Rewards? Intrinsic Curiosity Coding Tutorial

Рет қаралды 7,569

Machine Learning with Phil

Жүктеу

Пікірлер: 24

@softerseltzer
3 жыл бұрын
Nice! Some weekend activity, thanks!
@WilliamChen-pp3qs
Ай бұрын
How would it perform compare with HER (hindsight experience replay)?
@leo.y.comprendo
3 жыл бұрын
I was just reading about this!
@MachineLearningwithPhil
3 жыл бұрын
It's an awesome topic.
@tanerylmaz8340
Жыл бұрын
Hello there Can we save the trained model in this example? Then is it possible to test the model we trained for another environment? How are we going to do? Thus, we can see the success and performance of the trained model more clearly. Could you help?
@yualan2158
2 жыл бұрын
First of all, I have to thank you for making this video. I have made some necessary modification to apply "MountainCar-v0" problem, which is a real "sparse reward" environment. However, it doesn't work. Can you check the code if it is successful in this environment? Thanks!
@bobingstern4448
3 жыл бұрын
Hey, I was working on a genetic NEAT like algorithm but I don’t how to crossover two neural networks with different topology. Is there a procedure to doing this or do you just choose a random one when this happens?
@royvivat113
3 жыл бұрын
If you look at the neat paper it explains specifically how to do it. It has to do with keeping track of the topological history I believe.
@chadmcintire4128
3 жыл бұрын
This seems really similar to the entropy of SAC.
@sounakmojumder5689
2 ай бұрын
HI, did anyone run this in google colab? is there any problem with spawning
@tsunamio7750
2 жыл бұрын
I'm pretty sure we can compact everything you said with fewer words and fewer domain-specific words. At some points, I can follow you, but the jargon is exploding my face.
@orsimhon133
Жыл бұрын
Hi Phill, thank you very much for this tutorial ! As I understood the ICM, the Inverse model should be trained together with the encoder NN (which we do not use here) in order to inform the encoder about the parts of the states that controllable by the agent. So if we dont need the encoder here, we also dont need the Inverse model, isnt ? Expecting to some answers, thanks again!
@akashvyas7715
3 ай бұрын
I was thinking the same thing. Did you try removing the inverse model?
@61Marsh
3 жыл бұрын
I worked on this last year and ended up developing it, but my full solution never quite held up to my expectations. I always wondered if implemented it correctly, time to verify against yours. Thanks.
@MachineLearningwithPhil
3 жыл бұрын
Let me know if you've any improvements
@masternobody1896
3 жыл бұрын
@@MachineLearningwithPhil you are the best
@masternobody1896
3 жыл бұрын
@@MachineLearningwithPhil can you do ai course beginner to expert
@amegatron07
2 жыл бұрын
Thank you very much for giving an example of how to implement ICM. I'm looking forward to try it myself, and also to make my own further experiments with it. I could perhaps give one tip: as a strong adherent of separation of concerns, I believe it would be better to focus less on other parts of the code, which are less relevant to the core topic, and perhaps just take already written components. I believe that would save a lot of time :)
@elijahberegovsky8957
2 жыл бұрын
First I’ve gotta say thank you for making this video. I’ve just read the paper, enjoyed it immensely, and wanted to find an implementation. And bang! here you are, with an in-depth guide on making it work. Also, please, do Never Give Up as well!
@tsunamio7750
2 жыл бұрын
feature vector, featur map. We have so many terms.
@TaganMorgul
3 жыл бұрын
Thank you very much for such a detailed ICM explanation! I was trying to implement it some time ago but with gym envs like cart pole or lunar lander I found it doesn't perform as expected, probably due to absence of "states encoding" part which I thought is a very important part of the work. I also didn't use a3c for my experiments but rather used a2c. In the end, I found that "Random Network Distillation" algorithm works way better for the same purpose and also free of "TV on the wall" defect like ICM.
@mehranzand2873
2 жыл бұрын
thanks a lot
@qiluo6299
3 жыл бұрын
This is great video and thanks for sharing !
@MachineLearningwithPhil
3 жыл бұрын
Thanks for watching