Skip to content

Data type error when building SwiftTransformer for DistServe #5

@HalberdOfPineapple

Description

@HalberdOfPineapple

Hi all. I encountered some problems when building SwiftTransforemer as the dependency for DistServer (https://github.com/LLMServe/DistServe).

The versions of dependencies in my machine are like following:

gcc (Spack GCC) 12.3.0

(NVCC)
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0

cmake version 3.27.7

And my machine is Ubuntu 20.04.1 LTS.

The installation instruction for DistServe includes the following commands:

git clone https://github.com/LLMServe/SwiftTransformer.git && cd SwiftTransformer && git submodule update --init --recursive
cmake -B build
cmake --build build -j$(nproc)

But when I execute cmake --build build -j$(nproc), it reports the following two errors caused by incorrect data types:
1):

[ 71%] Built target gmock_main
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc: In instantiation of 'void naiveGemmStridedBatched(cublasOperation_t, cublasOperation_t, int, int, int, T, const T*, long long int, const T*, long long int, T, T*, long long int, int) [with T = __half]':
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:120:27:   required from 'void CublasWrapperTestSuite_gemmStridedBatched_Test<gtest_TypeParam_>::TestBody() [with gtest_TypeParam_ = __half]'
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:70:1:   required from here
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: error: ambiguous overload for 'operator*' (operand types are 'const __half' and 'float')
   50 |                 Carray[batch * stride_c + i * ldc + j] = alpha * sum + beta * Carray[batch * stride_c + i * ldc + j];
      |                                                          ~~~~~~^~~~~
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long long unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long long int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(long int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(unsigned int, float)' (built-in)
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:50:64: note: candidate: 'operator*(float, float)' (built-in)

This is caused by that the two operands alpha and sum cannot serve as compatible ones for operator *

2):

/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:45: error: call of overloaded 'fabs(__half)' is ambiguous
   93 |                                         fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i]));
      |                                         ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/features.h:490,
                 from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/x86_64-pc-linux-gnu/bits/os_defines.h:39,
                 from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/x86_64-pc-linux-gnu/bits/c++config.h:655,
                 from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/functional:48,
                 from /scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:1:
/usr/include/bits/mathcalls.h:162:1: note: candidate: 'double fabs(double)'
  162 | __MATHCALLX (fabs,, (_Mdouble_ __x), (__const__));
      | ^~~~~~~~~~~
In file included from /appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/random:38,
                 from /scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/cublas_wrapper.cc:2:
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:241:3: note: candidate: 'constexpr float std::fabs(float)'
  241 |   fabs(float __x)
      |   ^~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:245:3: note: candidate: 'constexpr long double std::fabs(long double)'
  245 |   fabs(long double __x)
      |   ^~~~
/scratch/work/liw10/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:75: error: call of overloaded 'fabs(__half)' is ambiguous
   93 |                                         fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i]));
      |                                                                       ~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/bits/mathcalls.h:162:1: note: candidate: 'double fabs(double)'
  162 | __MATHCALLX (fabs,, (_Mdouble_ __x), (__const__));
      | ^~~~~~~~~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:241:3: note: candidate: 'constexpr float std::fabs(float)'
  241 |   fabs(float __x)
      |   ^~~~
/appl/scibuilder-spack/aalto-rhel9-dev/2024-01-compilers/software/linux-rhel9-haswell/gcc-11.4.1/gcc-12.3.0-xh5vv5d/lib/gcc/x86_64-pc-linux-gnu/12.3.0/../../../../include/c++/12.3.0/cmath:245:3: note: candidate: 'constexpr long double std::fabs(long double)'
  245 |   fabs(long double __x)
      |   ^~~~

which is caused by that the value of answer[i]-reference[i] has type __half (more accurately template T as declared).

For these two problems, I made the following changes within my machine:

  1. Change line 49 in .../SwiftTransformer/src/unittest/util/cublas_wrapper.cc into:
Carray[batch * stride_c + i * ldc + j] = alpha * static_cast<T>(sum) + beta * Carray[batch * stride_c + i * ldc + j];
  1. Change line 93 in .../SwiftTransformer/src/unittest/util/../unittest_utils.h to be:
fabs(static_cast<float>(answer[i]-reference[i])), 
fabs(static_cast<float>(answer[i]-reference[i])) / fabs(static_cast<float>(reference[i])));

Now the build command can successfully be completed and the later pip install for DistServe can be done. But I am not sure whether such changes will result in extra overheads as changing __half to be float increases the bits required for the variable. (Sorry for my naiveness in this field).
If my modifications are reasonable, I can submit a new push request for that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions