paper review: “Graph Attention Networks”

arxiv: https://arxiv.org/abs/1710.10903 key points introduce “graph attention network(GAT)” which consists of “graph attention layers” which incorporate the “self-attention” idea from transformers to graph neural network the graph attention layers work by calculating weights of a node’s neighbors from the features of the neighbors by taking other neighbor’s existence into account. Read more…

Properly setting dataloader and callback for validation in pytorch DDP

pytorch distributed data parallel(DDP) is very useful and relatively well provided for creating a distributed training setup. However, the provided documentations and tutorial are mostly about “training” part and didn’t talk much about validation callbacks that run during training.

It is easy to think just using DistributedSampler for the validation dataloader would do all the work for you like it did in training dataloader, but it doesn’t. There are two main problems.

(more…)