Breaking Down Transformers: The Core of Attention as Low-Rank Approximation Why attention? How does it work?