SEARCH
You are in browse mode. You must login to use MEMORY

   Log in to start


From course:

Intro to AI 2

» Start this Course
(Practice similar questions for free)
Question:

Transformer - Multi-Head Attention

Author: Christian N



Answer:

1) Concentenate all the attention heads 2) Multiply with a weight matrix W^o that was trained jointly with the model 3) The result should be the Z matrix that captures information from all the attention heads. We can send this forward to the FFNN.


0 / 5  (0 ratings)

1 answer(s) in total