Properly setting dataloader and callback for validation in pytorch DDP

pytorch distributed data parallel(DDP) is very useful and relatively well provided for creating a distributed training setup. However, the provided documentations and tutorial are mostly about “training” part and didn’t talk much about validation callbacks that run during training.

It is easy to think just using DistributedSampler for the validation dataloader would do all the work for you like it did in training dataloader, but it doesn’t. There are two main problems.

pytorch implementation of sinusoidal position encoding

There are existing sinusoidal position encoding modules out there, but the ones that I confronted were mostly assuming the position to be incrementing from 0 to the size of sequence. For example, when a token embedding sequence with shape of (B, L, D_token) is given then the sinusoidal position encoding module will take this tensor as input and manually create a tensor (B,L) where the values for each row is (0,1,2,3, …., L-1) and then apply sinusoidal encoding on this.

creating your own pytorch scheduler

here is an example of a scheduler that I subclassed in pytorch.

error fix: onnxruntime “Type Error: Type ‘tensor(int64)’ of input parameter of operator(Min) in node is invalid’

Background I was trying to convert pytorch bert-like model from torch to onnx and see if I can run it in onnxruntime. Here’s the environment I used python: 3.9 torch: 1.9.0 onnx: 1.10.2 onnxruntime-gpu: 1.9.0 I converted by pytorch model to onnx with the following line then I tried to Read more…

good summary on torch optim schedulers

is it okay to calculate loss in cpu and backpropagate in torch?

When using torch, while it is common to run the network in gpu, I wasn’t always so sure if it was mandatory for me to calculate the loss on the same gpu as the output at all times. If I can do loss calculation in cpu, then it would help Read more…

torchvision fakedata example

torchvision supports various datasets and one of them is fakedata dataset. I was curious what this actually generated and here is an example the result is just an random RGB noised image.

cross entropy loss / focal loss implmentation in pytorch

at the moment, the code is written for torch 1.4 binary cross entropy loss currently, torch 1.6 is out there and according to the pytorch docs, the torch.max function can receive two tensors and return element-wise max values. However, in 1.4 this feature is not yet supported and that is Read more…

check cuda version build to torch package and find cudnn version used in torch

use the following python snippet to check cuda version the torch package was built against use the following python snippet to check cudnn version used by torch

calculating gradient for selected tensors in pytorch

the above is an example code of showing how to calculate gradients for a few wanted tensors. In this case, I only wanted to calculate the gradient of conv2.weight so that I can later on update only this weight with the amount calculated based on the the gradient produced by Read more…