pytorch implementation of sinusoidal position encoding

There are existing sinusoidal position encoding modules out there, but the ones that I confronted were mostly assuming the position to be incrementing from 0 to the size of sequence. For example, when a token embedding sequence with shape of (B, L, D_token) is given then the sinusoidal position encoding module will take this tensor as input and manually create a tensor (B,L) where the values for each row is (0,1,2,3, …., L-1) and then apply sinusoidal encoding on this.

(more…)