pytorch implementation of sinusoidal position encoding

There are existing sinusoidal position encoding modules out there, but the ones that I confronted were mostly assuming the position to be incrementing from 0 to the size of sequence. For example, when a token embedding sequence with shape of (B, L, D_token) is given then the sinusoidal position encoding module will take this tensor as input and manually create a tensor (B,L) where the values for each row is (0,1,2,3, …., L-1) and then apply sinusoidal encoding on this.

(more…)

paper review: “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension”

arxiv: https://arxiv.org/abs/1910.13461 key points propose autoregressive model named BART, which is architecturally similar to standard transformer encoder + decoder Check out 5 pretraining tasks, and experiment which pretraining task is most helpful test BART performance with large scale pretraining on downstream Read more…