Videos » A Dive Into Multihead Attention, Self-Attention and Cross-Attention

A Dive Into Multihead Attention, Self-Attention and Cross-Attention

Posted by admin
In this video, I will first give a recap of Scaled Dot-Product Attention, and then dive into Multihead Attention. After that, we will see two different ways of using the attention mechanism, which is Self-Attention and Cross-Attention. Solution of the exercise: We have X: T1xd Y: T2xd So, we build Q from Y, so that means Q will be Q: T2xd And we build K and V from X, therefore, K: T1xd V: T1xd Then, QK^t (compatibility matrix) will be QK^t: T2xT1 And the final output Z, will be Softmax(1/sqrt(d) QK^t) * V Z: T2xd
Posted July 9, 2023
click to rate

Embed  |  179 views