arxiv: https://arxiv.org/pdf/2102.06171v1.pdf key points introduce NF nets which combines multiple ideas to avoid using batch norm to get on-par performance but along with just using a bunch of non-BN techniques, this paper introduces adaptive gradient clipping(AGC) to make it actually train well to reach comparable results matching that of using Read more…
trying to attach attention module to CNN. but instead of blindly attaching it which would compute a 3D attention map which is computationally expensive, this work proposes to compute spatial attention and channel attention separately which achieves a similar effect with much less parameters.(more…)
arxiv: https://arxiv.org/pdf/1704.04861.pdf key points focus on optimizing for latency, small networks. use depthwise separable convolutions, to reduce computation as much as possible further reduce size of models based on width/resolution multiplier, but at the cost of accuracy depthwise separable convolution This is a combination of depthwise convonlution + pointwise convolution. Read more…
- model to predict depth map
- maximize speed by making it light as possible
- focus not only on encoder network but also on decoder network for speed improvement
- mobilenet for encoder, nearest-neighbor interpolation + NNConv5 for decoders, use skip connection, use depthwise separable convolution where ever possible, do network pruning, use TVM compiler stack to optimize depthwise separable convolution which is not optimized in populate DL frameworks.
arxiv link: https://arxiv.org/abs/1911.09070 key points multi scale with weighted bi-directional fpn. model scaling. compound scaling method, which jointly scales up resolution/depth/width for all backbone, feature network, box/class prediction network. use efficientnet backbone Bi-directional FPN Here are the key points of bi-directional FPN enhancement from PANet with some modifications remove nodes Read more…