1:12 Outline 1:36 Approaching New Problems 2:00 When you have a new algorithm 4:50 When you have a new task 6:21 POMDP design 9:31 Run baselines 10:56 Run algorithms reproduced from paper with more samples than stated 13:00 Ongoing development and tuning 13:18 Don't be satisfied if it works 14:50 Continually benchmark your code 15:25 Always use multiple random seeds 17:10 Always be ablating 18:21 Automate experiments 19:17 Question on frameworks for tracking experiment results 19:47 General tuning strategies for RL 19:58 Standardizing data 22:17 Generally important hyperparameters 25:10 General RL Diagnostics 26:15 Policy Gradient strategies 26:21 Entropy 27:02 KL 28:07 Explained variance 29:41 Policy initialization 30:21 Q-learning strategies 31:27 Miscellaneous advice 35:00 Questions 35:21 how long to wait until deciding whether code works or not 36:18 unit tests 37:35 what algorithm to choose 39:28 recommendations on older textbooks 40:27 comment on evolution strategies and OpenAI blog post on it 43:49 favorite hyperparameter search framework
@TheAIEpiphany
3 жыл бұрын
I love John's presenting style he's super positive and enthusiastic, great tips thank you!
@agarwalaksbad
6 жыл бұрын
This is a super useful lecture. Thanks, John!
@FalguniDasShuvo
Жыл бұрын
Wow! I love how simply John conveys great ideas. Very interesting lecture!
@SinaEbrahimi-ee3fq
3 ай бұрын
Awesome talk! Still very relevant!
@ProfessionalTycoons
5 жыл бұрын
this was a great talk .
@cheeloongsoon9090
7 жыл бұрын
What a number to end the video, 44:44.
@BahriddinAbdiev
6 жыл бұрын
We (3 students) exploring DQN and different types of it i.e. Double DQN, Doube Duelling DQN, Prioritized Experience Replay, etc. There is one thing that we all are facing: even it converges, if you run it long enough at some point it diverges again. Is this normal or it should converge and stay there or become even better always? Cheers!
@alexanderyau6347
5 жыл бұрын
Hi, I think it normal. But I don't know how does it come out. Maybe the model learned too much and become stupid, LOL.
@yoloswaggins2161
5 жыл бұрын
No this is not supposed to happen. I've seen it happen for a couple of reasons but the most common is people scaling by a standard deviation that gets very close to 0 due to too much similar data.
@zhenghaopeng6633
4 жыл бұрын
Hi there! Can I upload this lecture in Bilibili, a similar-to-youtube, famous video website in China? Many students are there and wish to get access to this insightful talks! Thanks!
@piyushjaininventor
Жыл бұрын
may be view on youtube? its free :)
@georgeivanchyk9376
4 жыл бұрын
If you cut all the times he said 'ah', the video would be 2 times shorter
Пікірлер: 15