Skip to content
Snippets Groups Projects

Unpacker for STS using XPU

Merged Dominik Smith requested to merge d.smith/cbmroot:UnpackStsGpu into master
All threads resolved!

A port of cbm::algo::UnpackSts and related classes to the XPU framework. Results of the original unpacker are reproduced on the CPU. Further work on external classes is needed to enable GPU execution.

Update:

  • Rebased to include !1112 (merged) (needed as precursor).
  • Removed "Draft" flag (ready for merge, see comments below).
Edited by Dominik Smith

Merge request reports

Merge request pipeline #22133 passed

Merge request pipeline passed for 0ea1f88b

Merged by Volker FrieseVolker Friese 2 years ago (May 9, 2023 5:07pm UTC)

Loading

Pipeline #22134 passed

Pipeline passed for a62d40ea on master

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Dominik Smith requested review from @v.friese

    requested review from @v.friese

  • assigned to @v.friese

    • Resolved by Dominik Smith

      This MR supplies two new classes, UnpackStsXpu (in cbm::algo) and a corresponding task CbmTaskUnpackXpu. Compilation of the CPU succeeds, and the reproduction of the output of the old class (cbm::algo::UnpackSts) has been verified.

      The logic of the implementation is such, that execution on the GPU should also be possible, with little to no modifications (the unpacker has been fully parallelized across timeslice components and microslices). At present, GPU execution is prevented mainly because the classes CbmStsDigi and stsxyter::Message do not fully support XPU. For the later of these, XPU support is entirely absent. For the former, partial XPU support exists, but no standard constructor (that receives address, channel, time and charge) has been implemented (what mainly appears to be missing is the PackAddressAndTime() function). This needs to be addressed before the new unpacker can be tested on the GPU.

      Otherwise, I think this version is essentially in the desired form. Configuration of the unpacker is done using the StsReadoutConfig class, in a way which is fully compatible with the original cbm::algo::UnpackSts.

      A review by @fweig would be appreciated.

      There are some areas where improvements may be possible upon further development of the XPU framework. I will comment on these individually below.

      My also be of interest to: @se.gorbunov @p.-a.loizeau

      Edited by Dominik Smith
  • Dominik Smith
  • Dominik Smith
  • Dear @f.uhlig, @v.friese, @p.-a.loizeau,

    you have been identified as code owner of at least one file which was changed with this merge request.

    Please check the changes and approve them or request changes.

  • Dominik Smith
  • Dominik Smith added 1 commit

    added 1 commit

    • 83dbc212 - UnpackStsXpu: Added missing device-to-host copy operations.

    Compare with previous version

    • Resolved by Dominik Smith

      @d.smith.

      in my opinion the license header in all files is wrong. If I am correct these files are new and were committed by you so the license header should be

      /* Copyright (C) 2023 GSI Helmholtzzentrum fuer Schwerionenforschung, Darmstadt
         SPDX-License-Identifier: GPL-3.0-only
         Authors: Dominik Smith [committer] */

      Probably you started with an already existing version such that Volker and Pierre were in the list. I think we should start in such cases with a new list of authors.

      @v.friese, @p.-a.loizeau, would you agree, since I removed you from the list.

      Edited by Florian Uhlig
    • Resolved by Dominik Smith

      Yes, I kept Volker and Pierre in because these files are based on earler files and parts of the code was copied over. Whatever the policy is is fine with me.

      Actually, strictly speaking I should also replace GSI with FAIR since that is my employer.

  • Dominik Smith added 18 commits

    added 18 commits

    • 83dbc212...26317081 - 7 commits from branch computing:master
    • 5a67af8c - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
    • fd0a43a1 - Rearranging the XPU unpacker classes.
    • aca3fea1 - UnpackStsXpu: Moving stuff from task to algo.
    • d3e59ffa - UnpackStsXpu: Started implementation of parameter buffers.
    • 8f764de0 - UnpackStsXpu: Implemented filling of buffers.
    • b21c5af6 - UnpackStsXpu: Started to implement unpacker logic.
    • e66e89a3 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
    • 06cec441 - UnpackStsXpu: Writing digis to buffer now.
    • 530d2ce1 - UnpackStsXpu: Results seem to come out right on CPU.
    • 18b3aec7 - UnpackStsXpu: Added missing device-to-host copy operations.
    • f652b647 - UnpackStsXpu: Fixed copyright headers.

    Compare with previous version

  • Felix Weiglhofer
  • Dominik Smith added 14 commits

    added 14 commits

    • f652b647...e6b096cb - 2 commits from branch computing:master
    • e73afeef - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
    • aca544a2 - Rearranging the XPU unpacker classes.
    • e3a2bd2f - UnpackStsXpu: Moving stuff from task to algo.
    • 56fbe02d - UnpackStsXpu: Started implementation of parameter buffers.
    • 0d25e3ec - UnpackStsXpu: Implemented filling of buffers.
    • 06fe0737 - UnpackStsXpu: Started to implement unpacker logic.
    • 04b019c5 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
    • a2fab665 - UnpackStsXpu: Writing digis to buffer now.
    • 96a62739 - UnpackStsXpu: Results seem to come out right on CPU.
    • 2372327a - UnpackStsXpu: Added missing device-to-host copy operations.
    • d64d74b7 - UnpackStsXpu: Fixed copyright headers.
    • 48252411 - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.

    Compare with previous version

  • Dominik Smith mentioned in merge request !1112 (merged)

    mentioned in merge request !1112 (merged)

  • Dominik Smith added 14 commits

    added 14 commits

    • 2066ec75 - 1 commit from branch computing:master
    • 85fc84eb - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
    • 7a23360d - Rearranging the XPU unpacker classes.
    • 192e00be - UnpackStsXpu: Moving stuff from task to algo.
    • 64bc0f37 - UnpackStsXpu: Started implementation of parameter buffers.
    • 0ddd4ed0 - UnpackStsXpu: Implemented filling of buffers.
    • cc21a13f - UnpackStsXpu: Started to implement unpacker logic.
    • eb60249b - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
    • c24a4247 - UnpackStsXpu: Writing digis to buffer now.
    • 68327de4 - UnpackStsXpu: Results seem to come out right on CPU.
    • ef14d52c - UnpackStsXpu: Added missing device-to-host copy operations.
    • 3f8f4554 - UnpackStsXpu: Fixed copyright headers.
    • 77d730fd - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
    • a9f88d1c - UnpackStsXpu: Added blocksize setting.

    Compare with previous version

  • @fweig Added a command that sets the block size (currently 32, should be optimized). Not sure whether some other settings are needed (gridsize is passed to the kernel).

  • Dominik Smith
  • Felix Weiglhofer mentioned in merge request !1117 (merged)

    mentioned in merge request !1117 (merged)

  • Dominik Smith added 46 commits

    added 46 commits

    • a9f88d1c...59ccf451 - 32 commits from branch computing:master
    • 182cd608 - Minimal changes to the StsSyterMessage class for XPU compatibility.
    • b2adc016 - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
    • a4ec1ed8 - Rearranging the XPU unpacker classes.
    • c53d8093 - UnpackStsXpu: Moving stuff from task to algo.
    • 9321b7ed - UnpackStsXpu: Started implementation of parameter buffers.
    • e6541f69 - UnpackStsXpu: Implemented filling of buffers.
    • 3b538cae - UnpackStsXpu: Started to implement unpacker logic.
    • 533e9188 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
    • 0624e0e0 - UnpackStsXpu: Writing digis to buffer now.
    • b7d962f6 - UnpackStsXpu: Results seem to come out right on CPU.
    • 6458a3a0 - UnpackStsXpu: Added missing device-to-host copy operations.
    • 874f7b8e - UnpackStsXpu: Fixed copyright headers.
    • 2bce1fb1 - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
    • 764ae3b3 - UnpackStsXpu: Added blocksize setting.

    Compare with previous version

  • Dominik Smith marked this merge request as ready

    marked this merge request as ready

  • Dominik Smith changed the description

    changed the description

    • Resolved by Volker Friese

      @f.uhlig @fweig @v.friese @p.-a.loizeau

      Using the changes from !1112 (merged) and !1117 (merged) I was now able to compile the code with the CUDA backend and test it on a Quadro RTX 4000 using the first timeslice of the 1588_node8_1_0000.tsa file. It appears to reproduce the output of the CPU version. This MR is hence ready for merge in principle (removed the "Draft" flag).

      Regarding performance, a simple comparison shows that the original CPU version takes ca. 1200 ms to complete (measuring only the "Exec" stage of the Task), while the XPU GPU version takes around 1300 ms, so slightly slower. This, however, is misleading. Enabling the "verbose" mode of XPU shows that the actual execution of the unpacker kernel takes around 10 ms. This implies that nearly the entire runtime is due to host-device-transfer operations and / or stuff that still happens on the CPU (filling of input buffers, sorting of output etc.).

      This, in a sense, is good news, since it implies that we can probably completely forget about optimizing the GPU-memory access pattern for now (i.e. restoring coalesced access). It also implies that we will likely win against the CPU by far once we move additional operations (e.g. parsing the output buffer and sorting the digis) to the GPU and once we chain different steps of the readout chain together.

      @v.friese @fweig If you both agree, I would like to hand this over to the FIAS group now.

  • Dominik Smith added 17 commits

    added 17 commits

    • 764ae3b3...ee7ffcac - 2 commits from branch computing:master
    • 6a81a9e3 - Minimal changes to the StsSyterMessage class for XPU compatibility.
    • c8280927 - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
    • 087a4044 - Rearranging the XPU unpacker classes.
    • 28da657c - UnpackStsXpu: Moving stuff from task to algo.
    • 49b517d0 - UnpackStsXpu: Started implementation of parameter buffers.
    • 3a4396d3 - UnpackStsXpu: Implemented filling of buffers.
    • cef9c090 - UnpackStsXpu: Started to implement unpacker logic.
    • 01c945e5 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
    • cbe990d3 - UnpackStsXpu: Writing digis to buffer now.
    • 0974fa80 - UnpackStsXpu: Results seem to come out right on CPU.
    • ccaf1fe6 - UnpackStsXpu: Added missing device-to-host copy operations.
    • 7053ce74 - UnpackStsXpu: Fixed copyright headers.
    • b0a74b53 - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
    • b7e0aba2 - UnpackStsXpu: Added blocksize setting.
    • 295cc1d5 - UnpackStsXpu: Added filter of timeslice components by system id (equipment ids...

    Compare with previous version

  • Dominik Smith
  • Dominik Smith mentioned in merge request !1132 (closed)

    mentioned in merge request !1132 (closed)

  • Dominik Smith mentioned in merge request !1135 (closed)

    mentioned in merge request !1135 (closed)

  • Can be merged if @fweig agrees.

  • Dominik Smith requested review from @fweig and removed review request for @v.friese

    requested review from @fweig and removed review request for @v.friese

  • Felix Weiglhofer approved this merge request

    approved this merge request

  • Merging is fine with me. We'll have to revisit this anyway, once i have all the GPU changes from my fork merged.

  • Volker Friese resolved all threads

    resolved all threads

  • Dominik Smith added 104 commits

    added 104 commits

    • 295cc1d5...486529b2 - 90 commits from branch computing:master
    • 486529b2...49cab63f - 4 earlier commits
    • 052a70d4 - UnpackStsXpu: Implemented filling of buffers.
    • ca94a8d1 - UnpackStsXpu: Started to implement unpacker logic.
    • 77615fd2 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
    • 23f98b51 - UnpackStsXpu: Writing digis to buffer now.
    • e57c27ce - UnpackStsXpu: Results seem to come out right on CPU.
    • f99b38f0 - UnpackStsXpu: Added missing device-to-host copy operations.
    • e7f0c178 - UnpackStsXpu: Fixed copyright headers.
    • 55d4f67d - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
    • d5a9cf6f - UnpackStsXpu: Added blocksize setting.
    • 0ea1f88b - UnpackStsXpu: Added filter of timeslice components by system id (equipment ids...

    Compare with previous version

  • Dominik Smith resolved all threads

    resolved all threads

  • merged

  • Please register or sign in to reply
    Loading