[cv::transform] Enable CV_SIMD for the 16U case on AArch64.

System information (version)
  • OpenCV => 4.2 (master branch)
  • Operating System / Platform => Linux/ARM64 (aarch64 – NEON)
  • Compiler => gcc
Detailed description

Enable CV_SIMD for cv::transform in the 16U case. My current configuration reports an x-factor of ~2.2 ~2.5 on the Mat_Transform::Size_MatType and Transform::OCL_TransformFixture tests, an no degradation in the other cases.

I will link a patch to this issue.

Issue submission checklist
  • [X ] I report the issue, it’s not a question
  • [ X ] I checked the problem with documentation, FAQ, open issues,
    answers.opencv.org, Stack Overflow, etc and have not found solution

  • [ X ] I updated to latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc

1 possible answer(s) on “[cv::transform] Enable CV_SIMD for the 16U case on AArch64.

  1. BTW, I realized that the title of this issue and PR says 16U, when the actual PR is targeting about 32F so there are conflicts between the title and the contents.
    I removed the both ifdef and ran the performance test.

    $ git diff
    diff --git a/modules/core/src/matmul.simd.hpp b/modules/core/src/matmul.simd.hpp
    index 38973ea..50e85ad 100644
    --- a/modules/core/src/matmul.simd.hpp
    +++ b/modules/core/src/matmul.simd.hpp
    @@ -1537,7 +1537,7 @@ transform_8u( const uchar* src, uchar* dst, const float* m, int len, int scn, in
     static void
     transform_16u( const ushort* src, ushort* dst, const float* m, int len, int scn, int dcn )
     {
    -#if CV_SIMD && !defined(__aarch64__) && !defined(_M_ARM64)
    +#if CV_SIMD
         if( scn == 3 && dcn == 3 )
         {
             int x = 0;
    @@ -1606,7 +1606,7 @@ transform_16u( const ushort* src, ushort* dst, const float* m, int len, int scn,
     static void
     transform_32f( const float* src, float* dst, const float* m, int len, int scn, int dcn )
     {
    -#if CV_SIMD && !defined(__aarch64__) && !defined(_M_ARM64)
    +#if CV_SIMD
         int x = 0;
         if( scn == 3 && dcn == 3 )
         {
    
    platform compiler chip test before after x-factor
    Le Potato GCC 5.4.0 Cortex A53 Size_MatType_Mat_Transform.Mat_Transform/16 146.46 48.88 2.996
    Jetson Nano GCC 7.5.0 Cortex A57 Size_MatType_Mat_Transform.Mat_Transform/16 60.76 27.09 2.243
    Raspberry Pi 4 GCC 8.3.0 Cortex A72 Size_MatType_Mat_Transform.Mat_Transform/16 54.13 23.59 2.295
    Le Potato GCC 5.4.0 Cortex A53 Size_MatType_Mat_Transform.Mat_Transform/19 23.14 45.76 0.506
    Jetson Nano GCC 7.5.0 Cortex A57 Size_MatType_Mat_Transform.Mat_Transform/19 11.10 15.42 0.720
    Raspberry Pi 4 GCC 8.3.0 Cortex A72 Size_MatType_Mat_Transform.Mat_Transform/19 9.58 27.73 0.345

    Size_MatType_Mat_Transform.Mat_Transform/16 is for 16U
    Size_MatType_Mat_Transform.Mat_Transform/19 is for 32F

    So, I’m not sure why it changed since #9753 but we shouldn’t enable universal SIMD version for 32F, but this proposal seems valid for 16U.
    BUT, the PR must be modified, because it’s changing the wrong line