I have two caffe models with same network construction but different param files. Curiously,they have very different inference speed, one is 2X times faster than the another. Why this could be happen?
Filtering out the most rated answers from issues on Github |||||||||||_______|||| Also a sharing corner
I have two caffe models with same network construction but different param files. Curiously,they have very different inference speed, one is 2X times faster than the another. Why this could be happen?
Copyright © 2021 Fantas...hit
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
I zeroed all weights that were smaller than
1e-15
and both give the same efficiency. I suspect that the fusion process is leading to a lot of denormals by multiplying small numbers with small numbers. I have some doubts on my claim though because it’s a bit unusual to have models filled with so many tiny weights to cause serious performance degradation.Denormals have leading zeros in the mantissa which is not-so-normal representation. Normally, you would have leading zeros counted in the exponent to make room for having as many significant digits as possible in the mantissa. When the number becomes so small that you cannot make the exponent any smaller without an overflow, you will use leading zeros in the mantissa to store the value. Most hardware are optimized for handling normal numbers efficiently and often have alternate slow paths for dealing with denormals. When you enable fast math, you are also enabling flush-to-zero (treat denormals as zero). With FTZ, the hardware deals with them efficiently by simply ignoring them.
The CUDA backend didn’t face this issue probably because the convolutions are largely implemented using fused multiply-add ops and the FMA pipeline can handle denormals. Only multi-instruction sequences like sqrt require special handling of denormals.