I’m new to Deep Learning and TensorFlow. From studying tutorials / research papers / online lectures it appears that people always have the execution order: ReLU -> Pooling. But in case of e.g. 2×2 max-pooling it seems that we can save 75%% of the ReLU operations by simply reversing the execution order to: Max-Pooling -> ReLU. This should calculate the exact same thing using only a quarter of the ReLU operations. This reversal of operations can be done in general for max-pooling and all non-decreasing activation functions (which I guess they all are?), but it won’t work for average-pooling.
This is an optimization that TensorFlow could perform automatically when compiling the computation graph. I haven’t quite figured out how to use TensorBoard yet, so I can’t tell if this automatic reversal of ReLU and max-pooling is already being done in TensorFlow.
So I’ve done a few experiments instead, timing the optimization of a convolutional net on MNIST, but the results are inconclusive for the two execution orders. Perhaps this means that TensorFlow already does the reversal automatically, or it means that there’s no consistent advantage because the saved ReLU operations are such a tiny fraction of the overall computational cost, or perhaps it takes much larger images than MNIST and much deeper convolutional networks for the performance difference to become apparent.