trying to attach attention module to CNN. but instead of blindly attaching it which would compute a 3D attention map which is computationally expensive, this work proposes to compute spatial attention and channel attention separately which achieves a similar effect with much less parameters.(more…)
arxiv: https://arxiv.org/pdf/1704.04861.pdf key points focus on optimizing for latency, small networks. use depthwise separable convolutions, to reduce computation as much as possible further reduce size of models based on width/resolution multiplier, but at the cost of accuracy depthwise separable convolution Read more…
- model to predict depth map
- maximize speed by making it light as possible
- focus not only on encoder network but also on decoder network for speed improvement
- mobilenet for encoder, nearest-neighbor interpolation + NNConv5 for decoders, use skip connection, use depthwise separable convolution where ever possible, do network pruning, use TVM compiler stack to optimize depthwise separable convolution which is not optimized in populate DL frameworks.
arxiv link: https://arxiv.org/abs/1911.09070 key points multi scale with weighted bi-directional fpn. model scaling. compound scaling method, which jointly scales up resolution/depth/width for all backbone, feature network, box/class prediction network. use efficientnet backbone Bi-directional FPN Here are the key points of Read more…