Thanks for explaining the length normalization of beam search! I was already wondering at the end of your last video what would happen if in some branches you predict an EOS token.
@vinitabaniwal685
3 жыл бұрын
Thank you so much for this explanation!!
@AbhishekKumar-wf6io
Жыл бұрын
Voice of the century :D
@sandipansarkar9211
3 жыл бұрын
nice explanation
@luck3949
6 жыл бұрын
I guess beam width can be made dynamic, for example, if NN tells that with p = 1 next letter is Z, than we can safely take width=1, and if on some step NN has no idea what should be next, than it's better to use bigger width. Right?
@zhifengyang1850
6 жыл бұрын
Even in the extreme case that one word's p is equal to 1, you still need to use fixed beam width. Suppose beam width is 3, RNN outputs A, B, C at time step 1, with prob 0.1, 0.2, 0.3 respectively. When we choose A as the input of the time step 2, we get the output word Z with prob 1, and we will get other outputs when we choose other words. But after RNN outputs in time step 2, you still need to compare all outputs from the 3 words A, B and C.
@luck3949
6 жыл бұрын
Zhifeng Yang thank you, now I see it. I did a little googling, and I found that there actually are some papers about dynamic width or dynamic puring, and it improves speed of the search by approximately 10%, with same quality.
Пікірлер: 10