ELMO & Transformer & BERT

Deep contextualized word representations

Abstract

(1)complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).

Introduction

ELMo representations are deep, in the sense that they are a function of all of the internal layers of the biLM.

Attention is All you need

Transformer PyTorch 实现

Transformer 动图讲解