
The Library
Addressing token uniformity in transformers via singular value transformation
Tools
Yan, Hanqi, Gui, Lin, Li, Wenjie and He, Yulan (2022) Addressing token uniformity in transformers via singular value transformation. In: Uncertainty in Artificial Intelligence, Eindhoven, The Netherlands, 01-05 Aug 2022. Published in: Proceedings of Machine Learning Research, 180 pp. 2181-2191. ISSN 2640-3498.
|
PDF
WRAP-Addressing-token-uniformity-transformers-via-singular-transformation-22.pdf - Accepted Version - Requires a PDF viewer. Download (4046Kb) | Preview |
Official URL: https://proceedings.mlr.press/v180/yan22b.html
Abstract
Token uniformity is commonly observed in transformer-based models, in which different tokens share a large proportion of similar information after going through stacked multiple self-attention layers in a transformer. In this paper, we propose to use the distribution of singular values of outputs of each transformer layer to characterise the phenomenon of token uniformity and empirically illustrate that a less skewed singular value distribution can alleviate the token uniformity problem. Base on our observations, we define several desirable properties of singular value distributions and propose a novel transformation function for updating the singular values. We show that apart from alleviating token uniformity, the transformation function should preserve the local neighbourhood structure in the original embedding space. Our proposed singular value transformation function is applied to a range of transformer-based language models such as BERT, ALBERT, RoBERTa and DistilBERT, and improved performance is observed in semantic textual similarity evaluation and a range of GLUE tasks.
Item Type: | Conference Item (Paper) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||||||||
Library of Congress Subject Headings (LCSH): | Natural language processing (Computer science) -- Congresses, Modeling languages (Computer science) -- Congresses, Machine learning -- Congresses | ||||||||||||
Journal or Publication Title: | Proceedings of Machine Learning Research | ||||||||||||
Publisher: | ML Research Press | ||||||||||||
ISSN: | 2640-3498 | ||||||||||||
Official Date: | 1 August 2022 | ||||||||||||
Dates: |
|
||||||||||||
Volume: | 180 | ||||||||||||
Page Range: | pp. 2181-2191 | ||||||||||||
Status: | Peer Reviewed | ||||||||||||
Publication Status: | Published | ||||||||||||
Access rights to Published version: | Restricted or Subscription Access | ||||||||||||
Copyright Holders: | © The authors and PMLR 2022. MLResearchPress | ||||||||||||
Date of first compliant deposit: | 5 September 2022 | ||||||||||||
Date of first compliant Open Access: | 5 September 2022 | ||||||||||||
RIOXX Funder/Project Grant: |
|
||||||||||||
Conference Paper Type: | Paper | ||||||||||||
Title of Event: | Uncertainty in Artificial Intelligence | ||||||||||||
Type of Event: | Conference | ||||||||||||
Location of Event: | Eindhoven, The Netherlands | ||||||||||||
Date(s) of Event: | 01-05 Aug 2022 |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year