Unpacker for STS using XPU
A port of cbm::algo::UnpackSts and related classes to the XPU framework. Results of the original unpacker are reproduced on the CPU. Further work on external classes is needed to enable GPU execution.
Update:
- Rebased to include !1112 (merged) (needed as precursor).
- Removed "Draft" flag (ready for merge, see comments below).
Merge request reports
Activity
requested review from @v.friese
assigned to @v.friese
- Resolved by Dominik Smith
This MR supplies two new classes,
UnpackStsXpu
(incbm::algo
) and a corresponding taskCbmTaskUnpackXpu
. Compilation of the CPU succeeds, and the reproduction of the output of the old class (cbm::algo::UnpackSts
) has been verified.The logic of the implementation is such, that execution on the GPU should also be possible, with little to no modifications (the unpacker has been fully parallelized across timeslice components and microslices). At present, GPU execution is prevented mainly because the classes
CbmStsDigi
andstsxyter::Message
do not fully support XPU. For the later of these, XPU support is entirely absent. For the former, partial XPU support exists, but no standard constructor (that receives address, channel, time and charge) has been implemented (what mainly appears to be missing is thePackAddressAndTime()
function). This needs to be addressed before the new unpacker can be tested on the GPU.Otherwise, I think this version is essentially in the desired form. Configuration of the unpacker is done using the
StsReadoutConfig
class, in a way which is fully compatible with the originalcbm::algo::UnpackSts
.A review by @fweig would be appreciated.
There are some areas where improvements may be possible upon further development of the
XPU
framework. I will comment on these individually below.My also be of interest to: @se.gorbunov @p.-a.loizeau
Edited by Dominik Smith
- Resolved by Dominik Smith
- Resolved by Volker Friese
Dear @f.uhlig, @v.friese, @p.-a.loizeau,
you have been identified as code owner of at least one file which was changed with this merge request.
Please check the changes and approve them or request changes.
added CodeOwners label
- Resolved by Dominik Smith
added 1 commit
- 83dbc212 - UnpackStsXpu: Added missing device-to-host copy operations.
- Resolved by Dominik Smith
in my opinion the license header in all files is wrong. If I am correct these files are new and were committed by you so the license header should be
/* Copyright (C) 2023 GSI Helmholtzzentrum fuer Schwerionenforschung, Darmstadt SPDX-License-Identifier: GPL-3.0-only Authors: Dominik Smith [committer] */
Probably you started with an already existing version such that Volker and Pierre were in the list. I think we should start in such cases with a new list of authors.
@v.friese, @p.-a.loizeau, would you agree, since I removed you from the list.
Edited by Florian Uhlig
- Resolved by Dominik Smith
Yes, I kept Volker and Pierre in because these files are based on earler files and parts of the code was copied over. Whatever the policy is is fine with me.
Actually, strictly speaking I should also replace GSI with FAIR since that is my employer.
added 18 commits
-
83dbc212...26317081 - 7 commits from branch
computing:master
- 5a67af8c - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
- fd0a43a1 - Rearranging the XPU unpacker classes.
- aca3fea1 - UnpackStsXpu: Moving stuff from task to algo.
- d3e59ffa - UnpackStsXpu: Started implementation of parameter buffers.
- 8f764de0 - UnpackStsXpu: Implemented filling of buffers.
- b21c5af6 - UnpackStsXpu: Started to implement unpacker logic.
- e66e89a3 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
- 06cec441 - UnpackStsXpu: Writing digis to buffer now.
- 530d2ce1 - UnpackStsXpu: Results seem to come out right on CPU.
- 18b3aec7 - UnpackStsXpu: Added missing device-to-host copy operations.
- f652b647 - UnpackStsXpu: Fixed copyright headers.
Toggle commit list-
83dbc212...26317081 - 7 commits from branch
- Resolved by Dominik Smith
added 14 commits
-
f652b647...e6b096cb - 2 commits from branch
computing:master
- e73afeef - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
- aca544a2 - Rearranging the XPU unpacker classes.
- e3a2bd2f - UnpackStsXpu: Moving stuff from task to algo.
- 56fbe02d - UnpackStsXpu: Started implementation of parameter buffers.
- 0d25e3ec - UnpackStsXpu: Implemented filling of buffers.
- 06fe0737 - UnpackStsXpu: Started to implement unpacker logic.
- 04b019c5 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
- a2fab665 - UnpackStsXpu: Writing digis to buffer now.
- 96a62739 - UnpackStsXpu: Results seem to come out right on CPU.
- 2372327a - UnpackStsXpu: Added missing device-to-host copy operations.
- d64d74b7 - UnpackStsXpu: Fixed copyright headers.
- 48252411 - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
Toggle commit list-
f652b647...e6b096cb - 2 commits from branch
mentioned in merge request !1112 (merged)
added 14 commits
-
2066ec75 - 1 commit from branch
computing:master
- 85fc84eb - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
- 7a23360d - Rearranging the XPU unpacker classes.
- 192e00be - UnpackStsXpu: Moving stuff from task to algo.
- 64bc0f37 - UnpackStsXpu: Started implementation of parameter buffers.
- 0ddd4ed0 - UnpackStsXpu: Implemented filling of buffers.
- cc21a13f - UnpackStsXpu: Started to implement unpacker logic.
- eb60249b - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
- c24a4247 - UnpackStsXpu: Writing digis to buffer now.
- 68327de4 - UnpackStsXpu: Results seem to come out right on CPU.
- ef14d52c - UnpackStsXpu: Added missing device-to-host copy operations.
- 3f8f4554 - UnpackStsXpu: Fixed copyright headers.
- 77d730fd - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
- a9f88d1c - UnpackStsXpu: Added blocksize setting.
Toggle commit list-
2066ec75 - 1 commit from branch
@fweig Added a command that sets the block size (currently 32, should be optimized). Not sure whether some other settings are needed (gridsize is passed to the kernel).
- Resolved by Dominik Smith
mentioned in merge request !1117 (merged)
added 46 commits
-
a9f88d1c...59ccf451 - 32 commits from branch
computing:master
- 182cd608 - Minimal changes to the StsSyterMessage class for XPU compatibility.
- b2adc016 - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
- a4ec1ed8 - Rearranging the XPU unpacker classes.
- c53d8093 - UnpackStsXpu: Moving stuff from task to algo.
- 9321b7ed - UnpackStsXpu: Started implementation of parameter buffers.
- e6541f69 - UnpackStsXpu: Implemented filling of buffers.
- 3b538cae - UnpackStsXpu: Started to implement unpacker logic.
- 533e9188 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
- 0624e0e0 - UnpackStsXpu: Writing digis to buffer now.
- b7d962f6 - UnpackStsXpu: Results seem to come out right on CPU.
- 6458a3a0 - UnpackStsXpu: Added missing device-to-host copy operations.
- 874f7b8e - UnpackStsXpu: Fixed copyright headers.
- 2bce1fb1 - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
- 764ae3b3 - UnpackStsXpu: Added blocksize setting.
Toggle commit list-
a9f88d1c...59ccf451 - 32 commits from branch
- Resolved by Volker Friese
@f.uhlig @fweig @v.friese @p.-a.loizeau
Using the changes from !1112 (merged) and !1117 (merged) I was now able to compile the code with the CUDA backend and test it on a Quadro RTX 4000 using the first timeslice of the 1588_node8_1_0000.tsa file. It appears to reproduce the output of the CPU version. This MR is hence ready for merge in principle (removed the "Draft" flag).
Regarding performance, a simple comparison shows that the original CPU version takes ca. 1200 ms to complete (measuring only the "Exec" stage of the Task), while the XPU GPU version takes around 1300 ms, so slightly slower. This, however, is misleading. Enabling the "verbose" mode of XPU shows that the actual execution of the unpacker kernel takes around 10 ms. This implies that nearly the entire runtime is due to host-device-transfer operations and / or stuff that still happens on the CPU (filling of input buffers, sorting of output etc.).
This, in a sense, is good news, since it implies that we can probably completely forget about optimizing the GPU-memory access pattern for now (i.e. restoring coalesced access). It also implies that we will likely win against the CPU by far once we move additional operations (e.g. parsing the output buffer and sorting the digis) to the GPU and once we chain different steps of the readout chain together.
@v.friese @fweig If you both agree, I would like to hand this over to the FIAS group now.
added 17 commits
-
764ae3b3...ee7ffcac - 2 commits from branch
computing:master
- 6a81a9e3 - Minimal changes to the StsSyterMessage class for XPU compatibility.
- c8280927 - Implemented a minimal example of executable XPU code as a stub for a new Sts unpacker.
- 087a4044 - Rearranging the XPU unpacker classes.
- 28da657c - UnpackStsXpu: Moving stuff from task to algo.
- 49b517d0 - UnpackStsXpu: Started implementation of parameter buffers.
- 3a4396d3 - UnpackStsXpu: Implemented filling of buffers.
- cef9c090 - UnpackStsXpu: Started to implement unpacker logic.
- 01c945e5 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
- cbe990d3 - UnpackStsXpu: Writing digis to buffer now.
- 0974fa80 - UnpackStsXpu: Results seem to come out right on CPU.
- ccaf1fe6 - UnpackStsXpu: Added missing device-to-host copy operations.
- 7053ce74 - UnpackStsXpu: Fixed copyright headers.
- b0a74b53 - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
- b7e0aba2 - UnpackStsXpu: Added blocksize setting.
- 295cc1d5 - UnpackStsXpu: Added filter of timeslice components by system id (equipment ids...
Toggle commit list-
764ae3b3...ee7ffcac - 2 commits from branch
- Resolved by Volker Friese
mentioned in merge request !1132 (closed)
mentioned in merge request !1135 (closed)
Can be merged if @fweig agrees.
- Resolved by Dominik Smith
@d.smith I cannot rebase from gitlab - can you do it locally?
added 104 commits
-
295cc1d5...486529b2 - 90 commits from branch
computing:master
- 486529b2...49cab63f - 4 earlier commits
- 052a70d4 - UnpackStsXpu: Implemented filling of buffers.
- ca94a8d1 - UnpackStsXpu: Started to implement unpacker logic.
- 77615fd2 - UnpackStsXpu: Implemented message processing loop. Only hit messages missing.
- 23f98b51 - UnpackStsXpu: Writing digis to buffer now.
- e57c27ce - UnpackStsXpu: Results seem to come out right on CPU.
- f99b38f0 - UnpackStsXpu: Added missing device-to-host copy operations.
- e7f0c178 - UnpackStsXpu: Fixed copyright headers.
- 55d4f67d - UnpackStsXpu: Optimized code a bit by using d_buffers where appropriate.
- d5a9cf6f - UnpackStsXpu: Added blocksize setting.
- 0ea1f88b - UnpackStsXpu: Added filter of timeslice components by system id (equipment ids...
Toggle commit list-
295cc1d5...486529b2 - 90 commits from branch