Skip to content
Snippets Groups Projects

external: Add xpu library.

Merged Felix Weiglhofer requested to merge fweig/cbmroot:add-xpu into master
All threads resolved!

Refs #2490.

Edited by Volker Friese

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @f.uhlig

  • Volker Friese requested review from @f.uhlig

    requested review from @f.uhlig

  • Volker Friese changed the description

    changed the description

  • @fweig,

    there is a problem with the dlopen call on macosx. The flag RTLD_DEEPBIND is not defined on macosx. If I understand the information correctly RTLD_DEEPBIND searches for symbols first in the local scope before searching in the global scope. (https://man7.org/linux/man-pages//man3/dlmopen.3.html) In some other discussion there was a statement that this order os the default on macosx. (https://stackoverflow.com/questions/7203161/does-rtld-first-on-mac-do-the-job-of-rtld-deep-bind-on-linux) Unfortunately the documentation isn't really good.

    I would propose to add a preprocessor statement which in case of macosx create the handle in the following way

    handle = dlopen(name, RTLD_LAZY | RTLD_DEEPBIND);

    so the code block would become

    #if defined __APPLE__
      handle = dlopen(name, RTLD_LAZY);
    #elif defined __linux__
      handle = dlopen(name, RTLD_LAZY | RTLD_DEEPBIND);
    #endif
    • Resolved by Florian Uhlig

      @fweig,

      if compiling the code with the proposed fix the compilation continuous but fails with the following error

      external/xpu/src/xpu/driver/cpu/cpu_driver.cpp:75:32: error: use of undeclared identifier '_SC_AVPHYS_PAGES'
          *free = pagesize * sysconf(_SC_AVPHYS_PAGES);

      sysconfig is available for macosx but the argument is not known on macosx. Currently I have no idea how to receive the information on macosx. Could ou explain what the function cpu_driver::meminfo should do.

  • @fweig,

    I have created a merge request to your source code which at least compiles on my mac. Since the changes are implemented using a preprocessor statement I don't expect any problems with the Linux implementation. I did not test if the code works. Do you have a test suite such that I can try to run it on macosx?

  • Florian Uhlig resolved all threads

    resolved all threads

  • added 1 commit

    Compare with previous version

  • Felix Weiglhofer added 10 commits

    added 10 commits

    Compare with previous version

  • Updated MR to include macosx fixes.

    • Resolved by Florian Uhlig

      @fweig,

      I fixed (work around) the compiler errors concerning the math functions but finally I run into some linker errors on macosx. Maybe you have an idea what could be the problem.

      [ 38%] Linking CXX shared library libTestKernels.dylib
      cd /Users/uhlig/software/fair/cbm/cbmroot_git/external/xpu/build/test && /usr/local/Cellar/cmake/3.22.1/bin/cmake -E cmake_link_script CMakeFiles/TestKernels.dir/link.txt --verbose=1
      /Library/Developer/CommandLineTools/usr/bin/c++  -Wall -Wextra -Werror -O3 -Xclang -fopenmp -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.1.sdk -mmacosx-version-min=10.15 -dynamiclib -Wl,-headerpad_max_install_names -o libTestKernels.dylib -install_name @rpath/libTestKernels.dylib CMakeFiles/TestKernels.dir/TestKernels.cpp.o 
      Undefined symbols for architecture x86_64:
        "xpu::detail::logger::write(char const*, ...)", referenced from:
            xpu::detail::action_runner<xpu::detail::kernel_tag, empty_kernel, xpu::no_smem, void (*)(xpu::no_smem&)>::call(float*, xpu::grid) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, vector_add, xpu::no_smem, void (*)(xpu::no_smem&, float const*, float const*, float*, int)>::call(float*, xpu::grid, float const*, float const*, float*, int) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, vector_add_timing, xpu::no_smem, void (*)(xpu::no_smem&, float const*, float const*, float*, int)>::call(float*, xpu::grid, float const*, float const*, float*, int) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, sort_float, sort_floats_smem, void (*)(sort_floats_smem&, float*, int, float*, float**)>::call(float*, xpu::grid, float*, int, float*, float**) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, sort_struct, sort_kv_smem, void (*)(sort_kv_smem&, key_value_t*, int, key_value_t*, key_value_t**)>::call(float*, xpu::grid, key_value_t*, int, key_value_t*, key_value_t**) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, merge, xpu::block_merge<float, 64, 8, (xpu::driver_t)0>::storage_t, void (*)(xpu::block_merge<float, 64, 8, (xpu::driver_t)0>::storage_t&, float const*, unsigned long, float const*, unsigned long, float*)>::call(float*, xpu::grid, float const*, unsigned long, float const*, unsigned long, float*) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, merge_single, xpu::block_merge<float, 64, 1, (xpu::driver_t)0>::storage_t, void (*)(xpu::block_merge<float, 64, 1, (xpu::driver_t)0>::storage_t&, float const*, unsigned long, float const*, unsigned long, float*)>::call(float*, xpu::grid, float const*, unsigned long, float const*, unsigned long, float*) in TestKernels.cpp.o
            ...
        "xpu::detail::logger::instance()", referenced from:
            xpu::detail::action_runner<xpu::detail::kernel_tag, empty_kernel, xpu::no_smem, void (*)(xpu::no_smem&)>::call(float*, xpu::grid) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, vector_add, xpu::no_smem, void (*)(xpu::no_smem&, float const*, float const*, float*, int)>::call(float*, xpu::grid, float const*, float const*, float*, int) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, vector_add_timing, xpu::no_smem, void (*)(xpu::no_smem&, float const*, float const*, float*, int)>::call(float*, xpu::grid, float const*, float const*, float*, int) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, sort_float, sort_floats_smem, void (*)(sort_floats_smem&, float*, int, float*, float**)>::call(float*, xpu::grid, float*, int, float*, float**) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, sort_struct, sort_kv_smem, void (*)(sort_kv_smem&, key_value_t*, int, key_value_t*, key_value_t**)>::call(float*, xpu::grid, key_value_t*, int, key_value_t*, key_value_t**) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, merge, xpu::block_merge<float, 64, 8, (xpu::driver_t)0>::storage_t, void (*)(xpu::block_merge<float, 64, 8, (xpu::driver_t)0>::storage_t&, float const*, unsigned long, float const*, unsigned long, float*)>::call(float*, xpu::grid, float const*, unsigned long, float const*, unsigned long, float*) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, merge_single, xpu::block_merge<float, 64, 1, (xpu::driver_t)0>::storage_t, void (*)(xpu::block_merge<float, 64, 1, (xpu::driver_t)0>::storage_t&, float const*, unsigned long, float const*, unsigned long, float*)>::call(float*, xpu::grid, float const*, unsigned long, float const*, unsigned long, float*) in TestKernels.cpp.o
            ...
        "xpu::sincos(float, float*, float*)", referenced from:
            void test_device_funcs::impl<xpu::no_smem>(xpu::no_smem&, variant*) in TestKernels.cpp.o
        "thread-local wrapper routine for xpu::detail::this_thread::grid_dim", referenced from:
            _.omp_outlined. in TestKernels.cpp.o
            _.omp_outlined..11 in TestKernels.cpp.o
            _.omp_outlined..14 in TestKernels.cpp.o
            _.omp_outlined..17 in TestKernels.cpp.o
            _.omp_outlined..20 in TestKernels.cpp.o
            _.omp_outlined..23 in TestKernels.cpp.o
            _.omp_outlined..26 in TestKernels.cpp.o
            ...
        "thread-local wrapper routine for xpu::detail::this_thread::block_idx", referenced from:
            void vector_add::impl<xpu::no_smem>(xpu::no_smem&, float const*, float const*, float*, int) in TestKernels.cpp.o
            void vector_add_timing::impl<xpu::no_smem>(xpu::no_smem&, float const*, float const*, float*, int) in TestKernels.cpp.o
            void sort_float::impl<sort_floats_smem>(sort_floats_smem&, float*, int, float*, float**) in TestKernels.cpp.o
            void sort_struct::impl<sort_kv_smem>(sort_kv_smem&, key_value_t*, int, key_value_t*, key_value_t**) in TestKernels.cpp.o
            void get_thread_idx::impl<xpu::no_smem>(xpu::no_smem&, int*) in TestKernels.cpp.o
            _.omp_outlined. in TestKernels.cpp.o
            _.omp_outlined..11 in TestKernels.cpp.o
            ...
        "___kmpc_for_static_fini", referenced from:
            _.omp_outlined. in TestKernels.cpp.o
            _.omp_outlined..11 in TestKernels.cpp.o
            _.omp_outlined..14 in TestKernels.cpp.o
            _.omp_outlined..17 in TestKernels.cpp.o
            _.omp_outlined..20 in TestKernels.cpp.o
            _.omp_outlined..23 in TestKernels.cpp.o
            _.omp_outlined..26 in TestKernels.cpp.o
            ...
        "___kmpc_for_static_init_4", referenced from:
            _.omp_outlined. in TestKernels.cpp.o
            _.omp_outlined..11 in TestKernels.cpp.o
            _.omp_outlined..14 in TestKernels.cpp.o
            _.omp_outlined..17 in TestKernels.cpp.o
            _.omp_outlined..20 in TestKernels.cpp.o
            _.omp_outlined..23 in TestKernels.cpp.o
            _.omp_outlined..26 in TestKernels.cpp.o
            ...
        "___kmpc_fork_call", referenced from:
            xpu::detail::action_runner<xpu::detail::kernel_tag, empty_kernel, xpu::no_smem, void (*)(xpu::no_smem&)>::call(float*, xpu::grid) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, vector_add, xpu::no_smem, void (*)(xpu::no_smem&, float const*, float const*, float*, int)>::call(float*, xpu::grid, float const*, float const*, float*, int) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, vector_add_timing, xpu::no_smem, void (*)(xpu::no_smem&, float const*, float const*, float*, int)>::call(float*, xpu::grid, float const*, float const*, float*, int) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, sort_float, sort_floats_smem, void (*)(sort_floats_smem&, float*, int, float*, float**)>::call(float*, xpu::grid, float*, int, float*, float**) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, sort_struct, sort_kv_smem, void (*)(sort_kv_smem&, key_value_t*, int, key_value_t*, key_value_t**)>::call(float*, xpu::grid, key_value_t*, int, key_value_t*, key_value_t**) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, merge, xpu::block_merge<float, 64, 8, (xpu::driver_t)0>::storage_t, void (*)(xpu::block_merge<float, 64, 8, (xpu::driver_t)0>::storage_t&, float const*, unsigned long, float const*, unsigned long, float*)>::call(float*, xpu::grid, float const*, unsigned long, float const*, unsigned long, float*) in TestKernels.cpp.o
            xpu::detail::action_runner<xpu::detail::kernel_tag, merge_single, xpu::block_merge<float, 64, 1, (xpu::driver_t)0>::storage_t, void (*)(xpu::block_merge<float, 64, 1, (xpu::driver_t)0>::storage_t&, float const*, unsigned long, float const*, unsigned long, float*)>::call(float*, xpu::grid, float const*, unsigned long, float const*, unsigned long, float*) in TestKernels.cpp.o
            ...
      ld: symbol(s) not found for architecture x86_64
      clang: error: linker command failed with exit code 1 (use -v to see invocation)
  • Dear @f.uhlig,

    you have been identified as code owner of at least one file which was changed with this merge request.

    Please check the changes and approve them or request changes.

  • Felix Weiglhofer added 2 commits

    added 2 commits

    • e465a41f - 1 commit from branch computing:master
    • 9bc776b5 - external: Add xpu library.

    Compare with previous version

  • Felix Weiglhofer added 2 commits

    added 2 commits

    • de9e8423 - 1 commit from branch computing:master
    • 45d43d92 - external: Add xpu library.

    Compare with previous version

  • Florian Uhlig resolved all threads

    resolved all threads

  • Florian Uhlig approved this merge request

    approved this merge request

  • Florian Uhlig added 6 commits

    added 6 commits

    Compare with previous version

  • Florian Uhlig enabled an automatic merge when the pipeline for a244d714 succeeds

    enabled an automatic merge when the pipeline for a244d714 succeeds

  • merged

Please register or sign in to reply
Loading