1 possible answer(s) on “same dnn model with different params, the speed is different

  1. I have tested it on MNN, still different speed..From profiling, the slower one’s convolution layer is much slower. Is weights value can effect the speed of a layer? too weird …

    I zeroed all weights that were smaller than 1e-15 and both give the same efficiency. I suspect that the fusion process is leading to a lot of denormals by multiplying small numbers with small numbers. I have some doubts on my claim though because it’s a bit unusual to have models filled with so many tiny weights to cause serious performance degradation.

    Denormals have leading zeros in the mantissa which is not-so-normal representation. Normally, you would have leading zeros counted in the exponent to make room for having as many significant digits as possible in the mantissa. When the number becomes so small that you cannot make the exponent any smaller without an overflow, you will use leading zeros in the mantissa to store the value. Most hardware are optimized for handling normal numbers efficiently and often have alternate slow paths for dealing with denormals. When you enable fast math, you are also enabling flush-to-zero (treat denormals as zero). With FTZ, the hardware deals with them efficiently by simply ignoring them.

    The CUDA backend didn’t face this issue probably because the convolutions are largely implemented using fused multiply-add ops and the FMA pipeline can handle denormals. Only multi-instruction sequences like sqrt require special handling of denormals.