TRAILS
TRAILS
Home
News
People
Publications
Contact
Light
Dark
Automatic
Paper-Conference
Symmetric Dot-Product Attention for Efficient Training of BERT Language Models
Initially introduced as a machine translation model, the Transformer architecture has now become the foundation for modern deep …
Martin Courtois
,
Malte Ostendorff
,
Leonhard Hennig
,
Georg Rehm
PDF
Cite
Code
URL
«
Cite
×