EfficientDet paper review

Jun 6, 2020

efficientdet paper-review

paper link: https://arxiv.org/pdf/1911.09070.pdf

BiFPN

multiple bifpn layers for scaling

use depth-wise convolution layers

bidirectional cross-scale connections + weighted feature fusion.

weighted feature fusion

different weight for each resolution features
learnable weights

summarize that there are three different approaches for doing weighted feature fusion

unbounded fusion: because it is unbounded, can cause training instability
softmax-based fusion: better than unbounded fusion, since it normalizes the weights thus removing training instability. But through experiments, found that it is very slow.
fast normalized fusion: much faster than softmax fusion

experiments show that this give similar performance to softmax while much faster.

efficient det

bifpn + optimized backbone
one stage detector paradigm
use EfficientNet as backbone
multiple bifpn layers
on the final stage, each layer features are fed to box/class networks.
box/class network weights are shared

compound scaling

if the user wants to scale efficientdet, then it should scale the network width, depth, input resolution all together based on scale coefficient.
jointly scaling.

bifpn network: width and depth scaling
box/class prediction network: width fixed to match bifpn network. linearly increase depth
input image resolution: linearly increase

ablation study

both bifpn and backbone is crucial

bifpn uses depth-wise conv layers

comments

paper doesn’t mention about box/class network specifics. just mention that it does use anchors.