paper review: “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension”

arxiv: https://arxiv.org/abs/1910.13461 key points propose autoregressive model named BART, which is architecturally similar to standard transformer encoder + decoder Check out 5 pretraining tasks, and experiment which pretraining task is most helpful test BART performance with large scale pretraining on downstream tasks Model Architecture This work introduces BART, which is fundamentally Read more…