The confusion about the message passing (ref slide at 56:30 time stamp) happened because the presenters failed to communicate that Y_{i, k} is constant, irrespective of the number of layers. Only different tunable weights (W_{i, k}) are introduced in each layer. So in each passing the GPU doesn't need to communicate with other GPUs to get the updated value of Y's, and h_{i, j} is already available in the local GPU.
Пікірлер: 1