paper review: “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension”

arxiv: https://arxiv.org/abs/1910.13461 key points propose autoregressive model named BART, which is architecturally similar to standard transformer encoder + decoder Check out 5 pretraining tasks, and experiment which pretraining task is most helpful test BART performance with large scale pretraining on downstream tasks Model Architecture This work introduces BART, which is fundamentally Read more…

paper summary: “DocFormer: End-to-End Transformer for Document Understanding”

arxiv: https://arxiv.org/abs/2106.11539 this work proposes a backbone for visual document understanding domain. It uses text, visual, spatial features. Key points use text, visual, spatial features at each encoding layer, keep feeding in visual and spatial features on the input side. This has the ‘residual’ connection effect. text and visual features Read more…

paper summary: “VarifocalNet: An IoU-aware Dense Object Detector”(VFNet)

arxiv: https://arxiv.org/abs/2008.13367 key points another anchor-free point based object detection network introduce new loss, varifocal loss which is a forked version from focal loss. Makes some changes from focal loss to compensate positive/negative imbalance futher. instead of prediction classification and IOU score separately, this work predicts a single scalar which Read more…

paper summary: “Aggregated Residual Transformations for Deep Neural Networks” (ResNext Paper)

key point compared to resnet, the residual blocks are upgraded to have multiple “paths” or as the paper puts it “cardinality” which can be treated as another model architecture design hyper parameter. resnext architectures that have sufficient cardinality shows improved performance tldr: use improved residual blocks compared to resnet Different Read more…