The Library
On the convergence of encoder-only shallow transformers
Tools
Wu, Yongtao, Liu, Fanghui, Chrysos, Grigorios and Cevher, Volkan (2023) On the convergence of encoder-only shallow transformers. In: Thirty-seventh Conference on Neural Information Processing Systems, New Orleans, USA, 10-16 Dec 2023
|
PDF
WRAP-On-the-convergence-encoder-only-shallow-transformers-23.pdf - Accepted Version - Requires a PDF viewer. Download (1227Kb) | Preview |
Official URL: https://openreview.net/forum?id=8ZveVHfmIE
Abstract
In this paper, we aim to build the global convergence theory of encoder-only shallow Transformers under a realistic setting from the perspective of architectures, initialization, and scaling under a finite width regime. The difficulty lies in how to tackle the softmax in self-attention mechanism, the core ingredient of Transformer. In particular, we diagnose the scaling scheme, carefully tackle the input/output of softmax, and prove that quadratic overparameterization is sufficient for global convergence of our shallow Transformers under commonly-used He/LeCun initialization in practice. Besides, neural tangent kernel (NTK) based analysis is also given, which facilitates a comprehensive comparison. Our theory demonstrates the separation on the importance of different scaling schemes and initialization. We believe our results can pave the way for a better understanding of modern Transformers, particularly on training dynamics.
Item Type: | Conference Item (Paper) | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software T Technology > TK Electrical engineering. Electronics Nuclear engineering |
||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Electric transformers, Convergence, Neural networks (Computer science) | ||||||
Official Date: | 10 December 2023 | ||||||
Dates: |
|
||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Date of first compliant deposit: | 8 November 2023 | ||||||
Date of first compliant Open Access: | 8 November 2023 | ||||||
Conference Paper Type: | Paper | ||||||
Title of Event: | Thirty-seventh Conference on Neural Information Processing Systems | ||||||
Type of Event: | Conference | ||||||
Location of Event: | New Orleans, USA | ||||||
Date(s) of Event: | 10-16 Dec 2023 | ||||||
Related URLs: |
Request changes or add full text files to a record
Repository staff actions (login required)
View Item |
Downloads
Downloads per month over past year