## relu, gelu , swish, mish activation function comparison

RELU(2018) arxiv: https://arxiv.org/abs/1803.08375 f(x) = max(0,x) GELU(2016) despite introduced earlier than relu, in DL literature its popularity came after relu due to its characteristics that compensate for the drawbacks of relu. Like relu, gelu as no upper bound and bounded below. while relu is suddenly zero in negative input ranges, Read more…