paper review: “High-Performance Large-Scale Image Recognition Without Normalization”

arxiv: https://arxiv.org/pdf/2102.06171v1.pdf key points introduce NF nets which combines multiple ideas to avoid using batch norm to get on-par performance but along with just using a bunch of non-BN techniques, this paper introduces adaptive gradient clipping(AGC) to make it actually train well to reach comparable results matching that of using Read more…