trying to attach attention module to CNN. but instead of blindly attaching it which would compute a 3D attention map which is computationally expensive, this work proposes to compute spatial attention and channel attention separately which achieves a similar effect with much less parameters.(more…)
arxiv: https://arxiv.org/pdf/1704.04861.pdf key points focus on optimizing for latency, small networks. use depthwise separable convolutions, to reduce computation as much as possible further reduce size of models based on width/resolution multiplier, but at the cost of accuracy depthwise separable convolution This is a combination of depthwise convonlution + pointwise convolution. Read more…
- model to predict depth map
- maximize speed by making it light as possible
- focus not only on encoder network but also on decoder network for speed improvement
- mobilenet for encoder, nearest-neighbor interpolation + NNConv5 for decoders, use skip connection, use depthwise separable convolution where ever possible, do network pruning, use TVM compiler stack to optimize depthwise separable convolution which is not optimized in populate DL frameworks.
arxiv link: https://arxiv.org/abs/1911.09070 key points multi scale with weighted bi-directional fpn. model scaling. compound scaling method, which jointly scales up resolution/depth/width for all backbone, feature network, box/class prediction network. use efficientnet backbone Bi-directional FPN Here are the key points of bi-directional FPN enhancement from PANet with some modifications remove nodes Read more…
arxiv link: https://arxiv.org/abs/1902.09630 key points using l1/l2 norm losses may not always align with the objective of improving IOU IOU is a good metric but they cannot be directly used as loss because it cannot backpropagate when there is no overlap at all GIOU is a modified IOU formula that Read more…