paper summary: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

arxiv: https://arxiv.org/abs/2103.14030 Key points multi scale feature extraction. Could think of as adoption of FPN idea. restrict transformer operation to within each window and not the entire feature map → allows to keep overall complexity linear instead of quadratic apply shifted window to allow inter-window interaction fuse relative position information in Read more…