More cleanup of tracking core.
Summary:
- Cleaned up ca::TrackFinder.
- In ca::TrackFinder, replaced separate vector instances which store upper and lower bounds of windows by single vectors that store these bounds together as a pair.
- Cleaned up ca::TripletConstructor.
- Removed global ca::TrackFit instance from ca::TripletConstructor.
- Removed access to ca::InputData from ca::TripletConstructor (all needed data is already in ca::WindowData).
- Cleaned up and refactored ca::TrackFinderWindow.
- Removed global triplets array in ca::TrackFinderWindow.
Supercedes !1899 (closed). See !1899 (closed) for additional details.
Merge request reports
Activity
requested review from @s.zharko
assigned to @se.gorbunov
mentioned in merge request !1899 (closed)
added 1 commit
- fd22a015 - Removed global triplets array in ca::TrackFinderWindow.
added 1 commit
- 8d4fb6ea - Removed global triplets array in ca::TrackFinderWindow.
added 1 commit
- a0623944 - Cleaned up and optimized ca::TripletConstructor.
added 1 commit
- 4be99d8d - Cleaned up and optimized ca::TripletConstructor.
@s.zharko @se.gorbunov Regarding the topic of memory allocations (continuing the discussion from the software meeting):
In this MR I removed some class-fields and replaced them by local objects, to improve encapsulation. I understand that this can, at least in principle, come at the expense of having additional memory allocation operations. It was pointed out by @se.gorbunov that class-fields were used specifically to avoid this.
My opinion regarding this, is that it is not necessarily ideal to make this blanket assumption, without profiling the code. This would be a case of "premature optimization". It is possible that the compiler already optimizes away many of the memory allocations, while others occur outside of critical areas of the code and hence don't have much of an impact.
Concerning the specific example of the code here, I did in fact profile the code using "perf" and found the following: There are two instances where having strictly local objects slows the code down.
The first is the array
ca::Branch [constants::size::MaxNstations] new_tr
inTrackFinderWindow::CreateTrackCandidates()
. It is only used in this function, but having this strictly local slows the code down by a lot. I have thus used a class-field here, to which a reference is created, and the class-field is clearly marked as having only one use case.The second is the
ca::TrackFit
instances which are used inTripletConstructor
in many different places. Here it is disadvantageous to have strictly local objects, which introduces a sizable performance hit. This can be avoided by creating the object once per call toTripletConstructor::CreateTripletsForHit()
and passing it by reference.Regarding the other arrays, particularly the storage of triplets
fvTriplets
, or the temporary storage of hits and tracks, i.e.fHitM_3
etc., removing these did not increase the runtime in any significant fashion (if there is a difference it was smaller than the between-runs fluctuations on my machine). There was a small effect on the performance profile shown byperf
, such thatmalloc
operations moved from about 1,0% of runtime up to about 1,5% of run time. But since this doesn't seem to be noticeable in practice, I would go for stronger encapsulation here.To be sure, I will investigate this again on a different PC, which has a significantly smaller cache, amongst other things.
Regarding the issue of keeping this code in a "GPU-ready" form (raised by @se.gorbunov), I would propose not to worry too much about this here. My impression is the code will have to be substantially changed anyway, if a GPU port is attempted, and the final result will look very different.
@s.zharko @se.gorbunov Ok, I have now thoroughly tested this code also on an older PC, and can confirm that these changes produce absolutely no detectable performance hit.
@s.zharko @se.gorbunov The following just occured to me: As of now, we are actually storing the input hit data twice.
There is the full array inside the
ca::InputData
class, as well as the partial array inside theca::WindowData
class. In the inner functions (such as calls to the triplet constructor) both versions are used, and references to both object types (WindowData and InputData) are kept.It should be sufficient to keep only the WindowData and do aways with InputData in the inner functions.
added 9 commits
-
6390397e - 1 commit from branch
computing:master
- eb42d9f8 - Removed global ca::TrackFit instance from ca::TripletConstructor.
- 71aface5 - Cleaned up ca::TripletConstructor.
- 89392c25 - - Cleaned up ca::TrackFinder.
- a9253a66 - Cleaned up and refactored ca::TrackFinderWindow.
- dc5b9bde - Removed global triplets array in ca::TrackFinderWindow.
- fb57b65e - Cleaned up and optimized ca::TripletConstructor.
- 1f6a7ea0 - Refactored ca::TrackFinderWindow.
- 567491cb - Removed access to ca::InputData from ca::TripletConstructor (all needed data...
Toggle commit list-
6390397e - 1 commit from branch
added 1 commit
- 0c829877 - Switched from global to window-based hit index in ca::Branch. Allows...
added 1 commit
- 51c36b59 - Switched from global to window-based hit index in ca::Branch. Allows...
added 1 commit
- 6a1579e8 - Switched from global to window-based hit index in ca::Branch. Allows...