Негізгі бет Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Күн бұрын

Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Рет қаралды 71,139

Stanford Online

1 1

Пікірлер: 10

@albert2266
4 ай бұрын
Just to clarify a concept as I think 7:29 is not true because value function shouldn't be equal to the Q value. Value function is the expected utility for "all possible actions" at a given state. Therefore, it should be the expected Q_pi rather than just simply equal to Q_pi since Q_pi is the expected utility for "a given action" at a given state. Please correct me if I'm wrong.
@black-sci
6 ай бұрын
Somehow Lecture left me confused in the end. may be I should rewatch.
@aojing
6 ай бұрын
A legacy question from last MDP-1 is still hovering around 2: What is the Transition function for this class? Is it a function of Action?
@inventwithdean
3 ай бұрын
It is a function of both State and Action.
@henkjekel4081
Жыл бұрын
Yeah, u really need to be having an episode to play this game
@Moriadin
4 ай бұрын
not as good as the previous lecture. harder to follow.
@JumbyG
Жыл бұрын
I think there may be a typo at 28:27, it states that the Qpi is (4+8+16)/3 however I believe it should be (4+8+12)/3? Please correct me if I am wrong
@seaotterlabs1685
Жыл бұрын
I think it should be (4+8+16)/3, as I believe their last run has four 4 values.
@endoumamoru3835
9 ай бұрын
he is calculating sum of all rewards you can get. First time sum was 4 as only one reward was present and next was 8 as 2 rewards and then next it was 16 as 4 rewards were there.
@BeckCaesar-r8l
7 күн бұрын
Davis Kenneth Gonzalez John Young John