Machine learning - Gaussian processes

Рет қаралды 92,152

Nando de Freitas

Жүктеу

Пікірлер: 31

@pedrobonilla9902
9 жыл бұрын
I agree with Lucius, the mean drags the curve away form known data. We are assuming (and imposing) that the model is a multivariate distribution with mean=0, so when we are away from known data the value tends to this mean.
@MB-pt8hi
6 жыл бұрын
Excellent professor. Wish we had someone like him here
@lucius1701
10 жыл бұрын
the question on the 15 min about small red peak. My thinking is this small red peak is dragged by the mean value of all points, because the length parameter is quite small. i.e 0.1. then the other points will impact to that red peak more strongly than scenario when l is larger, e.g. 1
@lucius1701
10 жыл бұрын
about the curve on the video at 15 min, we can also find the estimation of outrange points. it will go to the mean value. we can check the curve direction at the beginning and end of x-axis. when l smaller, the curve on outrange will go to mean quicker.
@amerel-samman9929
2 жыл бұрын
The cultural and racial sensitivity of this class is on point. The professor coming in with that Xenophobic question only to be counteracted by a unanimous uniform probability of 1 for all countries
@amerel-samman9929
2 жыл бұрын
ahh fuck he pushed on it... and the girl is like Mexico 0.5 eyyyy sorry Mexico
@JaysonSunshine
6 жыл бұрын
@51:19, the matrix should not be labeled as K_y, but merely K. Compare to slide 16 where K_y is initially defined.
@ner30
8 жыл бұрын
Hi professor Nando, thanks a lot for your lectures, helps a lot to better understand thing, could you please provide me with the python code you're talking about in your GP lecture? thanks again
@beincheekym8
7 жыл бұрын
you can find it here: www.cs.ubc.ca/~nando/540-2013/lectures/gp.py
@doncanas
7 жыл бұрын
It's a bit confusing though. The code he explains in these lectures differs from what he actually does there. For example, instead of calculating two consecutive linear systems for the mean, he just calculates one (corresponding to m) and then another between L and K_* (which corresponds to v here). The calculation for alpha is never directly done. The result is the same because he just uses two different linear systems, one for the first two terms (K_*^T · L^-1) and another for the other two terms (L^-1 · y). Finally, he does a dot product between the two.
@LunnarisLP
6 жыл бұрын
The code is meant to be changed by the students anyway, different hyperparameters should be testes
@ScieLab
9 ай бұрын
Hi Nando, is it possible to access the codes that you have mentioned in the lecture?
@kotslike
7 жыл бұрын
at 24:05 Nando asks "Which prior would you put on sigma?". I cannot hear the answer coming from the audience. And the subtitles didnt really help. Any thoughts?
@kotslike
7 жыл бұрын
Ok it is inverse Wishart, i guess
@vermajiutube
6 жыл бұрын
I agree it that it was hard to hear. I think he said Inverse-Wishart distribution en.wikipedia.org/wiki/Inverse-Wishart_distribution . Inverse Wishart is Inverse-Gamma when it is univariate.
@MrSupermonkeyman34
2 жыл бұрын
I've tried coding a baysian optimizer from scratch using this method and my covariance matrix always ends up becoming singular meaning I can't perform the inverse of it. Is this a problem with my code or is that something that can just happen. If it is how do I deal with it?
@MeshRoun
2 жыл бұрын
Singular with Cholesky's decomposition?
@charlesreynolds8696
6 жыл бұрын
Does anyone know where we can find the thing from David MacKay that Nando mentions about a construction showing the equivalence between a single hidden layer ANN and a GP? At ~53:40
@sksqhubham
5 жыл бұрын
You can watch here: kzitem.info/news/bejne/r5udvKmrgamSa4Y
@yoij-ov3sd
6 жыл бұрын
Around 28:10 he says the models are *insensitive* to _____ parameters and *sensitive* to _____ parameters. Which ones was he referring to?
@JaysonSunshine
6 жыл бұрын
The specification of the hyperparameters.
@mehdimashayekhi1675
6 жыл бұрын
I have a question, in the code: ```# draw samples from the posterior at our test points. L = np.linalg.cholesky(K_ + 1e-6*np.eye(n) - np.dot(Lk.T, Lk))```, I don't know where this part (```np.dot(Lk.T, Lk)```)come from, either in lecture or book, if anybody can explain, would be appreciated
@DarioCazzani
6 жыл бұрын
I base my answer from the algorithm at page 19 from the book www.gaussianprocess.org/gpml/chapters/RW.pdf The predictive variance from equation 2.26 is given by K_ - v.T*v Where K_ = k(Xstar, Xstar) and v = is what in the code they call Lk. Now, if you want to sample from a Gaussian with variance L, you need the square root.. therefore the cholesky. the 1e-6*np.eye(n) is because the function is noisy. (Instead of 1e-6, we could have used the variance of the noise - which is given. But I guess we do not always have all the information in life) Hope this helps
@beincheekym8
7 жыл бұрын
anyone has a link for the explanation of how we can show that a neural network with infinite number of neurons is a Gaussian Process? Nando de Freitas talks about it at 53:40.
@looper6394
7 жыл бұрын
Chapter 2 in Baysian Learning for Neural Networks by Neal www.csri.utoronto.ca/~radford/ftp/thesis.pdf
@alexfan1109
7 жыл бұрын
Good lecture. Suggest to watch in 1.25x
@ardeshirmoinian
4 жыл бұрын
Does anyone have a link to a description of calculating the hyperparameters using k-fold cross validation?
@franciscos.2301
3 жыл бұрын
I'm looking for the same. Did you find anything?
@logicboard7746
7 жыл бұрын
@7:40 important
@kiliandervaux6675
3 жыл бұрын
But we don't have the last class he speaks about, right ?
@HYBAIHUAN
6 жыл бұрын
Watch in 1.25x is just the right speed