Add multithreading to CbmTaskUnpack and fix compiler errors on several systems
This MR adds simple multithreading to CbmTaskUnpack. It is meant as a starting point in this direction and to allow us to gain relevant experience.
A significant performance improvement is achieved by using std::sort
with the std::execution::par_unseq
parameter when time-sorting all digis in the end.
In addition, the components are processed in parallel, yielding another improvement. Here, the performance gains appear to be limited by the process of adding all digis to a common vector.
Further parallelization inside the components (i.e., on microslice level) is not yet attempted in this MR.
On the mFLES login node, the Unpack tasks runs at about 300 MB/s for our benchmark file.
bin/cbmreco_fairrun -i ~/tmp/1588_node8_1_0000.tsa -c ../reco/tasks/CbmRecoConfigExample.yaml