In the previous article, we calculated the similarities between Queries and Keys.
We can use the output of the softmax function to determine how much each input word should contribute when encoding the word βLetβsβ.
Interpreting the Weights
In this case, βLetβsβ is much more similar to itself than to βgoβ.
So after applying softmax:
- βLetβsβ gets a weight close to 1 (100%)
- βgoβ gets a weight close to 0 (0%)
This means:
- βLetβsβ contributes almost entirely to its own encoding
- βgoβ contributes very little
Creating Value Representations
To apply these weights, we create another set of values for each word.
- First, we create two values to represent βLetβsβ
Then, we scale these values by 1 (since its weight is 100%)
Next, we create two values to represent βgoβ
- These values are scaled by 0 (since its weight is 0%)
Combining the Values
Finally, we add the scaled values together:
The result is a new set of values that represent the word βLetβsβ, now enriched by its relationship with all input words.
These final values are called the self-attention values for βLetβsβ.
They combine information from all words in the sentence, weighted by how relevant each word is to βLetβsβ.
We can now repeat the same process for the word βgoβ, which we will explore in the next article.
Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.
Just run:
ipm install repo-name
β¦ and youβre done! π














![Defluffer - reduce token usage π by 45% using this one simple trick! [Earthday challenge]](https://media2.dev.to/dynamic/image/width=1000,height=420,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekbgepcutl4jse0sfs0.png)


