Understanding Graph Attention Networks

Рет қаралды 83,417

DeepFindr

Жүктеу

Пікірлер: 179

@xorenpetrosyan2879
2 жыл бұрын
This is the best and most in detail explanation on Graph CNN attention I've found. Great job!
@anupr567
2 жыл бұрын
Explained in terms of basic Neural Network terminologies!! Great work 👍
@celestchowdhury2605
Жыл бұрын
very good explanation! clear and crisp, even I, a beginner, feeling satisfied after watching this. Should get more recognition!
@DeepFindr
Жыл бұрын
Thanks
@NadaaTaiyab
2 жыл бұрын
I'd love it if you could explain multi-head attention as well. You really have such a good grasp of this very complex subject.
@DeepFindr
2 жыл бұрын
Hi! Thanks! Multi-head attention simply means that several attention mechanisms are applied at the same time. It's like cloning the regular attention. What exactly is unclear here? :)
@NadaaTaiyab
2 жыл бұрын
@@DeepFindr The math and code are hard to fully grasp. If you could break down the linear algebra with the matrix diagrams as you have done for single head attention, I think people would find that very helpful.
@NimaDmc
2 жыл бұрын
I can admit that this is the best explanation for GAT and GNN one can find. Fantastic explanation with very simple English. The quality of sound and video is great as well. Many thanks.
@DeepFindr
2 жыл бұрын
Thank you for your kind words
@toluolu9390
2 жыл бұрын
Very well explained. Thank you very much!
@Moreahead1
Жыл бұрын
clearly clear explanation, super best video lecture about GNN ever seen.
@leorayder-r5x
6 ай бұрын
amazing!!! author well done!!!
@sadhananarayanan1031
Жыл бұрын
Thank you so much for this beautiful video. Have been trying out too many videos on GNN and GAN but this video definitely tops. I finally understood the concept behind it. Keep up the good work :)
@sapirharary8262
3 жыл бұрын
Great video! your explanation was amazing. Thank you!!
@DeepFindr
3 жыл бұрын
Thanks :)
@Ryan라이언
Жыл бұрын
best video for learning GNN thank you so much!
@marcusbluestone2822
Жыл бұрын
Very clear and helpful. Thank you so much!
@omarsoud2015
Жыл бұрын
Thanks for the best explanation.
@zheed4555
Жыл бұрын
This is very helpful!
@sajjadayobi688
2 жыл бұрын
A great explanation, many thanks
@PaxonFrady
20 күн бұрын
why would the attention adjacency matrix be symmetrical? If the weight vector is learnable, then it does matter which order the two input vectors are concatenated. It doesn't seem like there would be any reason to enforce symmetry.
@RyanOng-t2o
Жыл бұрын
Thanks for the great explanation! Just one thing that I do not really understand, may I ask how do you get the size of the learnable weight matrix [4,8]? I understood that there are 4 rows due to the number of features for each node. However, not sure where the 8 columns come from.
@mistaroblivion
11 ай бұрын
I think 8 is the arbitrarily chosen dimensionality of the embedding space.
@KingMath22232
3 жыл бұрын
THANK YOU!
@james.oswald
3 жыл бұрын
Great Video!
@lightkira8281
3 жыл бұрын
شكرا لك
@nastaranmarzban1419
2 жыл бұрын
Hi hope you're doing well Is there any graph neural network architecture that receives multivariate dataset instead of graph-structured data as an input? I'll be very thankful if you answer me i really nead it Thanks in advanced
@DeepFindr
2 жыл бұрын
Hi! As the name implies, graph neural networks expect graph structured input. Please see my latest videos on how to convert a dataset to a graph. It's not that difficult :)
@nastaranmarzban1419
2 жыл бұрын
@@DeepFindr thanks for prompt response Sure; I'll see it right now.. Would you please sent its link?
@DeepFindr
2 жыл бұрын
kzitem.info/news/bejne/ooeLmZWhp5amoWk
@dmitrivillevald9274
3 жыл бұрын
Thank you for the great video! I wanted to ask - how is training of this network performed when the instances (input graphs) have varying number of nodes and/or adjacency matrix? It seems that W would not depend on the number of nodes (as its shape is 4 node features x 8 node embeddings) but shape of attention weight matrix Wa would (as its shape is proportional to the number of edges connecting node 1 with its neighbors.)
@DeepFindr
3 жыл бұрын
Hi! The attention weight matrix has always the same shape. The input shape is twice the node embedding size because it always takes two neighbor - combinations and predicts the attention coefficient for them. Of course if you have more connected nodes, you will have more of these combinations, but you can think of it like the batch dimension increases, but not the input dimension. For instance you have node embeddings of size 3. Then the input for the fully connected network is for instance [0.5, 1, 1, 0.6, 2, 1], so the concatenated node embeddings of two neighbors (size=3+3). It doesn't matter how many of these you input into the attention weight matrix. If you have 3 neighbors for a node it would look like this: [0.5, 1, 1, 0.6, 2, 1] [0.5, 1, 1, 0.7, 3, 2] [0.5, 1, 1, 0.8, 4, 3] The output are then 3 attention coefficients for each of the neighbors. Hope this makes sense :)
3 жыл бұрын
@@DeepFindr If graph sizes are already different, I mean if one have graph_1 that has 2200 nodes(that results in 2200,2200 adj. matrix, and graph_2 has 3000 nodes (3000,3000 adj matrix), you can zero pad graph_1 to 3000. This way you'll have fixed size of input for graph_1 and graph_2. Zero padding will create dummy nodes with no connection. So the sum with the neighboring nodes will be 0. And having dummy features for dummy nodes, you'll end up with fixed size graphs.
@DeepFindr
3 жыл бұрын
Hi, yes that's true! But for the attention mechanism used here no fixed graph size is required. It also works for a different number of nodes. But yes padding is a good idea to get the same shapes :)
@王涛-d3y
3 жыл бұрын
Thanks for your awesome explanation, it's very clear and enlightening. But I have a question about the self-attention mechanism in this paper since it seems not very similar to the method in NLP. When it comes to NLP, the most common method of self-attention would do three times linear transform, which need 3 weight matrices `W_q`, `W_k` and `W_v`. Then it uses the results derived from W_q and W_k to get `a_ij`, which is the attention weight between token i and token j in a sentence. In this paper, it firstly uses `W`, `a` and `two node embedding` to compute `alpha_ij` for each node pairs. Then it uses `W`, `alpha` and `all node embedding` to get `new node embedding`. Is my understanding correct? But I'm curious why the paper don't use different `W` in the two period. For example, we can use 2 weight matrices `W1` and `W2`, when the first `W1` can be used to get `alpha_ij` and the second `W2` can be used to calculate `new node embedding`.
@DeepFindr
3 жыл бұрын
Hi, yes you are right in NLP everything is differentiated with queries, keys and values. This means, for word vectors they apply different transformations depending on the context (input query, key to map against and output value multiplied with attention). In the GAT paper all node vectors are transformed with only one matrix W. So there is no differentiation between q, k and v. Additionally however, the attention coefficients are calculated with a weight vector, which is not done in the transformers model (there it's the dot product). So I would say GAT uses just another flavor of attention and we cannot compare them directly - the idea is the same but the implementation slightly different. I dont know if I understood you correctly, but W is only applied once to transform all nodes. Then there is a second weight vector to calculate a_ij. Also, there are many variants of GNNs - some also do the same separation as its done in NLP. For example if you have no self loops, you usually apply a different matrix for a specific node W_1 and for its neighbors W_2 - we can see this like q and k above. Hope that helps! If not, let me know!
@王涛-d3y
3 жыл бұрын
@@DeepFindr Yes, I think I have figured it out. Thank you very much for your detail and clear reply.
@kenbobcorn
3 жыл бұрын
This was simply a fantastic explanation video, I really do hope this video gets more coverage than it already has. It would be fantastic if you were to explain the concept of multi-head attention in another video. You've earned yourself a subscriber +1.
@DeepFindr
3 жыл бұрын
Thank you, I appreciate the feedback! Sure, I note it down :)
@sharadkakran531
3 жыл бұрын
Hi, Can you tell which tool you're using to make those amazing visualizations? All of your videos on GNNs are great btw :)
@DeepFindr
3 жыл бұрын
Thanks a lot! Haha I use active presenter (it's free for the basic version) but I guess there are better alternatives out there. Still experimenting :)
@Eisneim1
10 ай бұрын
very helpful tutorial, clearly explained!
@jianxianghuang1275
3 жыл бұрын
I especially love your background pics.
@metehkaya96
Күн бұрын
Perfect video to understand GATs. However, I guess, you forgot to add sigmoid function when you demonstrate h1' as a sum of multiplications of hi* and attention values, in the last seconds of the video: 13:51
@牢獄プンレク
3 жыл бұрын
Amazingly easy to understand. Thank you.
@dominikklepl7991
2 жыл бұрын
Thank you for the great video. I have one question, what happens if weighted graphs are used with attention GNN? Do you think adding the attention-learned edge "weights" will improve the model compared to just having the input edge weights (e.g. training a GCNN with weighted graphs)?
@DeepFindr
2 жыл бұрын
Hi! Yes I think so. The fact that the attention weights are learnable makes them more powerful than just static weights. The model might still want to put more attention on a node, because there is valuable information in the node features, independent of the weight. A real world example of this might be the data traffic between two network nodes. If less data is sent between two nodes, you probably assign a smaller weight to the edge. Still it could be that the information coming from one nodes is very important and therefore the model pays more attention to it.
@eelsayed9380
2 жыл бұрын
Great explination, really appretiated. If you Please could u make a videa explain the loss calculation and backpropagation in gnn?
@anastassiya8526
25 күн бұрын
it was the best explanation that gave me hope for the understanding these mechanisms. Everything was so good explained and depicted, thank you!
@pi5549
10 ай бұрын
2:55 Looks like it should be sum(H * W) not sum(W * H). 5x4 * 4x8 works.Suggest you provide errata at the top of the description. Someone else has noticed an error later in the video.
@dharmendraprajapat4910
Жыл бұрын
4:00 do you multiply "feature node matrix" with "adjacency matrix" before multiplying it with "learnable weight matrix" ?
@adityashahane1429
2 жыл бұрын
very well explained, provides a very intuitive picture of the concept. Thanks a ton for this awesome lecture series!
@sangramkapre
2 жыл бұрын
Awesome video! Quick question: do you have a video explaining Cluster-GCN? And if yes, do you know if similar clustering idea can be applied to other networks (like GAT) to be able to train the model on large graphs? Thanks!
@陈肇坤
2 жыл бұрын
Good explanation to the key idea. One question, what is the difference between GAT and self attention constrained by a adjacency matrix(eg. Softmax(Attn*Adj) )? The memory used for GAT is D*N^2, which is D times of the intermediate ouput of SA. The node number of graph used in GAT thus cannot be too large because of memory size. But it seems that they both implement dynamic weighting of neighborhood information constrained by a adjacency matrix.
@DeepFindr
2 жыл бұрын
Hi, Did you have a look at the implementation iny PyG? pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/nn/conv/gat_conv.html#GATConv One of the key tricks in GNNs is usually to represent the adjacency matrix in COO format. Therefore you have adjacency lists and not a nxn matrix. Using functions like gather or index_select you can then do a masked selection of the local nodes. Hope this helps :)
@PostmetaArchitect
27 күн бұрын
Ist almost as if its just a normal neural network but projected onto a graph
@nurkleblurker2482
2 жыл бұрын
Extremely helpful. Very well explained in concrete and abstract terms.
@GaoyuanFanboy123
Жыл бұрын
please use brackets and multiplication signs between matrices so i can map the mathematical formula to the visualization
@huaiyuzheng5577
3 жыл бұрын
Very nice video. Thanks for your work~
@pu239
3 жыл бұрын
This is pretty amazing content. The way you explain the concept is pretty great and I especially like the visual style and very neat looking visuals and animations you make. Thank you!
@DeepFindr
3 жыл бұрын
Thank you for your kind words :)
@NadaaTaiyab
2 жыл бұрын
Great! Thank you for explaining the math and the linear algebra with the simple tables.
@samuel2318
2 жыл бұрын
Clear explanation and visualization on attention mechanism. Really helpful in studying GNN.
@nazarzaki44
2 жыл бұрын
Great video! Thank you
@zacklee5787
3 ай бұрын
I have come to understand attention as key, query, value multiplication/addition. Do you know why this wasn't used and if it's appropriate to call it attention?
@DeepFindr
3 ай бұрын
Hi, Query / Key / Value are just a design choice of the transformer model. Attention is another technique of the architecture. There is also a GNN Transformer (look for Graphormer) that follows the query/key/value pattern. The attention mechanism is detached from this concept and is simply a way to learn importance between embeddings.
@Kevoshea
5 ай бұрын
great video, thanks
@SylwiaNano
Жыл бұрын
Thx for the awesome explanation! A video with attention in CNN e.g. UNet would be great :)
@DeepFindr
Жыл бұрын
I slightly capture that in my video on diffusion models. I've noted it down for the future though.
@MariaPirozhkova
Жыл бұрын
Hi! Are what you explain in the "Basics" and the message-passing concept the same things?
@DeepFindr
Жыл бұрын
Yes, they are the same thing :) passing messages is in the end nothing else but multiplying with the adjacency matrix. It's just a common term to better illustrate how the information is shared :)
@nastaranmarzban1419
2 жыл бұрын
Hi, sorry to bother you I have a question What's the difference between soft-attention and self-attention?
@DeepFindr
2 жыл бұрын
Hi! There is soft vs hard attention, you can search for it on Google. For self attention there are great tutorials, such as this one peltarion.com/blog/data-science/self-attention-video
@snsacharya1737
Ай бұрын
A wonderful and succinct explanation with crisp visualisations about both the attention mechanism and the graph neural network. The way the learnable parameters are highlighted along with the intuition (such as a weighted adjacency matrix) and the corresponding matrix operations is very well done.
@n.a.7271
2 жыл бұрын
how is learnable weight matrix is formed ? have some material to understand it better?
@DeepFindr
2 жыл бұрын
This simply comes from dense (fully connected layers). There are lots of resources, for example here: analyticsindiamag.com/a-complete-understanding-of-dense-layers-in-neural-networks/#:~:text=The%20dense%20layer's%20neuron%20in,vector%20of%20the%20dense%20layer.
@mohammadrzakarimi2140
2 жыл бұрын
Your visual explanation is super great, help many people to learn some-hour stuff in minutes! Please make more videos on specialized topics of GNNs! Thanks in advance!
@DeepFindr
2 жыл бұрын
I will soon upload more GNN content :)
@ayushsaha5539
Жыл бұрын
Why does the new state calculated have more features than the original state? I dont understand
@DeepFindr
Жыл бұрын
It's because the output dimension (neurons) of the neural network is different then the input dimension. You could also have less or the same number of features.
@raziehrezaei3156
2 жыл бұрын
such an easy-to-grasp explanation! such a visually nice video! amazing job!
@DeepFindr
2 жыл бұрын
Thanks, I appreciate it :)
@etiennetiennetienne
Жыл бұрын
why replacing dot product attn with concat proj + leaky relu?
@DeepFindr
Жыл бұрын
That's a good point. I think the TransformerConv is the layer that uses dot product attention. I'm also not aware of any reason why it was implemented like that. Maybe it's because this considers the direction of information (so source and target nodes) better. Dot product is cummutative, so i*j is the same as j*i, so it can't distinguish between the direction of information flow. Just an idea :)
@phamtam716
2 жыл бұрын
Hello, where do sigma come from, I mean, how do we identify it?
@DeepFindr
2 жыл бұрын
Sigma is typically the activation function in the neural network like ReLU, if that is what you are referring to :)
@ilyasaroui7745
2 жыл бұрын
how do you think it will behave with complete graphs only ?
@DeepFindr
2 жыл бұрын
Well it will simply calculate attention weights with all neighbor nodes. So every node attends to all other nodes. Its a bit like the transformer that attends to all words. This paper might also be interesting: arxiv.org/abs/2105.14491
@snp27182
2 жыл бұрын
Good video, but you should have mentioned how in NLP, a sequence of words is used to build a fully connected adjacency graph. This is why attention can can be used in graph data; because even in NLP, it's already ON graph data!
@tobigm1917
7 ай бұрын
Thank you very much! This was my introduction into GAT and helped me to immediately get a good grasp of the basic concept :) I like the graphical support you provide to the explanation, it's gerat!
@kevon217
11 ай бұрын
Great walkthrough.
@SaketRamBandi
Ай бұрын
This might be the best and simple explanation of GAT one can ever find! Thanks man
@cw9249
Жыл бұрын
thank you. what if you also wanted to have edge features?
@DeepFindr
Жыл бұрын
Hi, I have a video on how to use edge features in GNNs :)
@Ssc2969
11 ай бұрын
Fantastic explaination.
@AbleLearners
8 ай бұрын
A Great explanation
@hlew2694
9 ай бұрын
This is the MOST BEST video of GCN and GAT, very great, thank you!
@aditijuneja1848
Жыл бұрын
hi.. Your explanations are really nice and easy to understand and seem rooted in fundamentals. Thank you for that. I am new to reading research papers, and i find it difficult to understand them sometimes and end up wasting a lot of time on not-so-important things. But this is what I think my problem is, but it can be something else too...idk... like sometimes i don't have the pre req or have gap in my knowledge... Could you please make a video about it or help in the comments, or recommend some other resource to get better at reading papers and understanding from the bottom up? thank you very much 🙏🙏
@scaredheart6109
Ай бұрын
AMAZING!
@khoaphamang3413
2 жыл бұрын
Supper explaination
@benjamintan3069
2 жыл бұрын
I need more Graph Neural Network related video!!
@DeepFindr
2 жыл бұрын
There will be some more in the future. Anything in particular you are interested in? :)
@טסטטסט-ג3ש
2 жыл бұрын
Very understandable! Thank you. Can you share your presentation?
@DeepFindr
2 жыл бұрын
Sure! Can you send me an email to deepfindr@gmail.com and I'll attach it :) thx
@keteverma3441
2 жыл бұрын
@@DeepFindr Hey I have also sent you an email, could you please attach the presentation?
@AkhmadMizkat
Жыл бұрын
This is a very great explanation covering basic GNN and the GAT. Thank you so much
@daesoolee1083
2 жыл бұрын
well explained.
@mamore.
3 жыл бұрын
most understandable explanation so far!
@kanalarchis
3 жыл бұрын
At 11:30, should the denominator have k instead of j? Also, this vector w_a, is it the same vector used for all edges, there isn't a different vector to learn for each node i, right? Thank you!
@DeepFindr
3 жыл бұрын
Ohh yeah you are right. Should be k... Yes its a shared vector, used for all edges. Thank you for the finding!
@maudentable
2 жыл бұрын
Awesome.....
@alexvass
Жыл бұрын
Thanks
9 ай бұрын
Your work has been an absolute game-changer for me! The way you break down complex concepts into understandable and actionable insights is truly commendable. Your dedication to providing in-depth tutorials and explanations has tremendously helped me grasp the intricacies of GNNs. Keep up the phenomenal work!
@mahmoudebrahimkhani1384
10 ай бұрын
simple and informative! Thank you!
@wenqichen4151
3 жыл бұрын
I really salute you for this detailed video! that's very intriguing and clear! thank you again!
@kodjigarpp
3 жыл бұрын
Thank you for sharing this clear and well-designed explanation.
@hyeongseonpark7018
3 жыл бұрын
Very Helpful Explanation! Thank you!
@mbzf2773
2 жыл бұрын
Thank you so much for this great video.
@salahaldeen1751
Жыл бұрын
Wonderful explination! thanks
@hainingliu3471
Жыл бұрын
Very clear explanation. Thank you!
@البداية-ذ1ذ
3 жыл бұрын
Hello ,thanks for sharing, could you plz explain how you get learnable method,is it matrix randomly chosen or there is method behind,and is this equal to lablacian method. One more question ,your embedding only on node level ,right
@DeepFindr
3 жыл бұрын
Hi, the learnable weight matrix is randomly initialized and then updated through back propagation. It's just a classical fully-connected neural network layer. Yes the embedding is on the node level :)
@王硕-s3m
2 жыл бұрын
Very helpful video! Thank you for your great work! Two questions, 1. Could you please explain the Laplacian Matrix in GCN, the GNN explained in this video is spatial-based, and I hope I can get a better understanding of those spectral-based ones. 2. How to draw those beautiful pictures? Could you share the source files? Thanks again!
@DeepFindr
2 жыл бұрын
Hi! The Laplacian is simply the degree matrix of a graph subtracted by the adjacency matrix. Is there anything in particular you are interested in? :) My presentations are typically a mix of PowerPoint and active presenter, so I can send you the slides. For that please send an email to deepfindr@gmail.com :)
@AndreaStevensKarnyoto
3 жыл бұрын
very helpful video, but I still confuse in some part. Maybe I should watch this for few times. thanks
@DeepFindr
3 жыл бұрын
Hi! What is unclear to you? :)
@barondra38
3 жыл бұрын
Love your work and thick accent, thank you! These attention coefficients look very similar to weighted edges for me, so I want to ask a question: If my graph is unweighted attributed graph, would GATConv produce different output compared with GCNConv by Kipf and Welling?
@DeepFindr
3 жыл бұрын
hahah, thanks! I'm not sure if I understood the question correctly. If you have an unweighted graph, GAT will anyways learn the attention coefficients (which can be seen as edge weights) based on the embeddings. It can be seen as "learnable" edge weights. So I'm pretty sure that GATConv and GCNConv will produce different outputs. From my experience, using the attention mechanism, the output embeddings are better than using plain GCN.
@amansah6615
2 жыл бұрын
easy and best explanation nice work
@anvuong1099
2 жыл бұрын
Thank you for wonderful content
@farzinhaddadpour7192
Жыл бұрын
Very nice, thanks for effort!
@bennicholl7643
2 жыл бұрын
How is the adjacency matrix derived?
@DeepFindr
2 жыл бұрын
Hi, what exactly do you mean by derived? :)
@bennicholl7643
2 жыл бұрын
@@DeepFindr What criteria decides what feature vector is zero'd out?
@DeepFindr
2 жыл бұрын
This depends on the input graph. For the molecule it's simple the atoms that are not connected with a specific atoms. All nodes that are not connected to a specific node have a 0 in the adjacency matrix entries.
@sqliu9489
2 жыл бұрын
Thanks for the video! There's a question: at 13:03, I think the 'adjacency matrix' consists of {e_ij} could be symmetric, but after the softmax operation, the 'adjacency matrix' consists of {α_ij} should not be symmetric any more. Is that right?
@DeepFindr
2 жыл бұрын
Yes usually the attention weights do not have to be symmetric. Is that what you mean? :)
@sqliu9489
2 жыл бұрын
@@DeepFindr Yes. Thanks for your reply!
@Bwaaz
7 ай бұрын
Great quality thank you !
@sukantabasu
7 ай бұрын
Simply exceptional!
@geletamekonnen2323
2 жыл бұрын
Thank you bro. Confused head now gets the idea about GNN.
@DeepFindr
2 жыл бұрын
Hehe
@mydigitalwayia956
2 жыл бұрын
Muchas gracias por el video. Despues de haber visto muchos otros, puedo decir que el suyo es el mejor, el mas sencillo de entender. Estoy muy agradecido con usted. Saludos
@DeepFindr
2 жыл бұрын
Thank you! :)
@dariomendoza6079
2 жыл бұрын
Excellent explanation 👌 👏🏾
@yusufani8
2 жыл бұрын
Amazing thank you 🤩