paper link:


multiple bifpn layers for scaling

use depth-wise convolution layers

bidirectional cross-scale connections + weighted feature fusion.

weighted feature fusion

  • different weight for each resolution features
  • learnable weights

summarize that there are three different approaches for doing weighted feature fusion

  • unbounded fusion: because it is unbounded, can cause training instability
  • softmax-based fusion: better than unbounded fusion, since it normalizes the weights thus removing training instability. But through experiments, found that it is very slow.
  • fast normalized fusion: much faster than softmax fusion

experiments show that this give similar performance to softmax while much faster.

efficient det

  • bifpn + optimized backbone
  • one stage detector paradigm
  • use EfficientNet as backbone
  • multiple bifpn layers
  • on the final stage, each layer features are fed to box/class networks.
  • box/class network weights are shared

compound scaling

if the user wants to scale efficientdet, then it should scale the network width, depth, input resolution all together based on scale coefficient.
jointly scaling.

  • bifpn network: width and depth scaling
  • box/class prediction network: width fixed to match bifpn network. linearly increase depth
  • input image resolution: linearly increase

ablation study

both bifpn and backbone is crucial

bifpn uses depth-wise conv layers


paper doesn’t mention about box/class network specifics. just mention that it does use anchors.


Leave a Reply

Your email address will not be published.