I received the B.E. degree in Computer Science and Technology from Anhui University in 2021. I am currently a student pursuing a Ph.D degree of Computer Science and Technology in Southeast University.
When I was a sophomore in Anhui University, I entered the Intelligent Software and Edge Computing (ISEC) lab to do scientific research, under the guidance of Prof. Xuejun Li and Prof. Yi Xu. My research interests include Edge Computing and Edge Intelligence. Compared to my peers, I have more profound experience in scientific research. Now my supervisor is Prof. Fang Dong in Southeast University.
BEng in Computer Science and Technology, 2021
Anhui University
Flash Attention 是目前针对 Attention 计算最优解决方案的开山工作,旨在从底层 GPU 的 HBM(High Bandwidth Memory)和 GPU 的片内 SRAM(Static Random Access Memory)的角度尽可能降低访存开销,从而加速 Attention 的计算,在长序列的情况下展现出了优良的性能。
然而,Flash Attention 对于 LLM 初学者来说很不好理解,因为它需要我们对 Attention 的计算过程有非常深入的了解,而其中的难点在于 Softmax 的计算的可分割性的理解。本文希望通过丰富的插图乃至动画,让 Flash Attention 能够通俗易懂。