LSTM part 2 - Stateful and Stacking

Рет қаралды 19,666

DeepSchoolAi

Жүктеу

Пікірлер: 36

7 жыл бұрын
Better than majority of LSTM tutorials
@VictorHCandido
5 жыл бұрын
Tyvm!! Statefulness is exactly the feature I couldn't find how to use correctly.
@julienkrywyk2216
7 жыл бұрын
Thank you very much for yours videos!
@杨子文-q5k
6 жыл бұрын
Great videos! I have one question. How to eliminate the forecast delay in the "Stateful LSTMs, Stacked" and "Stateful LSTM stacked DEEPER!" ?
@scotth.hawley1560
7 жыл бұрын
Hi, great tutorial! Thanks! I notice that in the previous video you had a lookback of 20, and your notes even show x_t+0.... x_t+19, but then in this video at 0:38 you draw a green curly-brace below these 20 items and say there are 10 of them. How did the 20 become a 10?
@deepschoolai
7 жыл бұрын
Meant 20. Sorry.
@rajilsaraswat9763
7 жыл бұрын
Why did you pick 200 in the for loop for the fitting? Is it someway related to the batch_size?
@2Moaka
7 жыл бұрын
Thanks for the video! Do you have any reference talking about what you mentioned about stacking LSTM, that they can capture different frequencies?
@christianc8265
6 жыл бұрын
I think there is no need to rescale the data as the tanh activation function is perfectly fine with the doman [-1, 1]. However if you put the predicted chart into the training loop then you will see that this net is actually not converging at all but randomly hitting very good results (even in less then 10 epochs). I would recommend to make one single LSTM layer with only 2 hidden state neurons and without any dropout layer or anything else. Also use the tanh on the final dense layer. Now plot the prediction chart and the loss within the loop and try out different loss functions and different optimizers. This helped my to understand a bit better how LSTM are working. With only 2 hidden states, mean_absolute_error and rmsprop you already get surprising good results. And with 3 hidden states you will see the network converge within 20 epochs
@deepschoolai
6 жыл бұрын
You should always rescale data, and Im assuming you mean the domain is (-infinity, infinity)? The reason we scale is to make it easier for the learning rate, and so that we dont change it across different problems. Scaling is a standard step that you do for any kind of ML work. And btw when you say 3 hidden states, did you mean LSTM(3, other_params)?
@christianc8265
6 жыл бұрын
The Math Student no I mean the data you have generated is already [-1,1] rescale it to [0,1] is not necessary using this exact data you generated.
@liochon4417
5 жыл бұрын
Nice tutorial! My question is what would the input be in a stacked LSTM network of 2 LSTM layers of different hidden unit sizes? In your example both LSTM layers have 32 hidden units, so the input of the second LSTM layer can receive the 32 vector that the 1st LSTM layer outputs, but what if the first LSTM layer has 64 hidden units and the second 32?
@deepschoolai
5 жыл бұрын
Same way that the first Dense layer can have 64, and second dense layer can have 32, LSTMs can do the same. Only thing is that you need to make sure the layer closest to data has return_sequence=True, regardless of the task.
@liochon4417
5 жыл бұрын
@@deepschoolai so in practice the first LSTM(64) layer will output a 64-size vector as it's hidden state promoted into the second LSTM(32) layer. Then this 64-size vector will be concatenated with the 32-size hidden state of the next LSTM(32) layer, creating a 64+32=96-size vector, which after the forward pass of the gates will end up being a 32-size vector of the new hidden state?
@172yogendra
2 жыл бұрын
Hi nice video ... Could you please let me know what if in case we have multiple sequences in our input datasets .i.e. let's we have 2000 data points with 20 sequences so there are 100 data points for each sequence and 10 no. Of epochs.... so do we need to reset states after 2000 data points i.e. total reset = 10 or after each sequence? Like after each 100 data points i.e. total reset states = 20*10=200 ?
@TheEichII
7 жыл бұрын
Thank you for the introduction. Is dropout rate of 0.3 here referring to the edges kept in the network or the ones which are randomly deleted? I know in TF it is rerferring to the ones kept, so I thought this was a kind of high dropout rate
@deepschoolai
7 жыл бұрын
yeah keras and tf differ in this respect. 0.3 is the ones thats being deleted. If I was using tf I would have set it to 0.7.
@NilavraPathak
7 жыл бұрын
Hi again , is there a way I can retrieve the hidden layer values ? Especially for a stacked one . I don't know whether Keras has such a feature. I am interested to see what properties each layers are capturing, like you mentioned there can be two layers are capturing two periods .
@deepschoolai
7 жыл бұрын
try get_weights: keras.io/layers/about-keras-layers/ (google is your friend :P )
@kuatroka
7 жыл бұрын
I'm having troubles differentiating look_back and features? what is the difference between them. If it's a stock price, for example. I can have a vector of 5 daily percent changes [0.6, 0.5, 0.1, -0.7, 0.5] as one observation of input data X with 5 features and a corresponding output Y which would be the 6th day's value = [0.6] Or I can represent the same input as input X as one observation of vector of size 1 - [0.6] and have look_back=5 which would represent the same situation. Is it right? I think I have a big confusion between look_back/time_step and features. Maybe you could explain what is what. Thanks
@deepschoolai
7 жыл бұрын
look_back is the number of time steps you look into the past before you do your prediction. Features would be different stocks like AAPL, GOOG, etc.
@kuatroka
7 жыл бұрын
Thanks.
@h.m.5054
7 жыл бұрын
My question is about stateful LSTMs : at 5:45 you say that we send in a batch of 10, however, it seems to me that what you sent in was a batch of 1 with 10 time steps, where your data is in a 3D-tensor (n_samples, sequence_length, n_features) and sequence_length corresponds here to our look back time steps.
@deepschoolai
7 жыл бұрын
Sorry, yes I should be careful with my terminology. So yes 10 timesteps and batch_size = 1
@h.m.5054
7 жыл бұрын
Ok, thanks for answering so fast :) I was working on this tutorial : machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/ and I have some questions related to stateful LSTMs. Is this an appropriate place to post them or would you rather talk about it on stackoverflow or another forum?
@deepschoolai
7 жыл бұрын
stackoverflow is best. But this might answer your questions: stackoverflow.com/questions/38714959/understanding-keras-lstms
@h.m.5054
7 жыл бұрын
Thank you very much for that stackoverflow link, it was really helpful. For info, I posted the remaining questions here : stackoverflow.com/questions/41695117/keras-stateful-lstm
@NilavraPathak
7 жыл бұрын
What if I want to infer from multiple lookbacks ... like a weekly daily yearly basis at the same time ... do I stack with different lookbacks
@deepschoolai
7 жыл бұрын
Having 365 lookbacks would mean that incorporate everything from those previous days, including weekly and daily ones. It might be cheaper though to have the last 7 days and then only 365 days ago as an input as opposed to a continuos set of 365 days.
@nilavrapathak136
7 жыл бұрын
Thanks ... I was using forecasting data and it had an auto-regression at a weekly level hence I used that also tried with a daily autoregression ... the monthly factor just hot encoded that as a feature it worked pretty well ... It works better with stacking, which lead to my next question is there any principle for Stacking ... or you just keep stacking and observe the generalization error
@deepschoolai
7 жыл бұрын
ah thats something Im not sure there is a science for. In general more stacked layers is always better. And remember to look at the error in the validation error or you'll over train it.
@magmelianimaamar8560
7 жыл бұрын
why you put 32 in LSTM
@deepschoolai
7 жыл бұрын
These are the number of hidden nodes of that network. Quite similar to 'normal' neural nets in that respect.
@laksmeythong8419
6 жыл бұрын
Sir, 32 in LSTM is the number of hidden nodes , and in each nodes have the input gate, cell, forget gate, and output gate. Is that correct?
@wolfisraging
6 жыл бұрын
laksmey thong, no, that is wrong. After the dot product of your inputs with the number of hidden notes then with that final values you create all those gates
@oholimoli
4 жыл бұрын
I think this tutorial is wrong or at least leaves out what should be noted about stacking, you are showing how to disable the reset of the hidden state after one batch was processed. Stacking gets interesting when it is done with the hidden states => www.tensorflow.org/api_docs/python/tf/keras/layers/StackedRNNCells