Skip to content
Snippets Groups Projects

Fix mrich 2024 online crash

All threads resolved!

Try to catch/avoid the out of range exception in a map.at() call in the online unpacker algo for the mRICH.
For this the sequence error detection from the old unpacker was copied with the same treatment: skip only the faulty word

The current state of the MR works in the sense that it can process the tsa files 2906_node8_0?_0000.tsa without crashing.
However it still is not 1:1 with the old unpacker as in addition to the errors found by this one for addresses 0x7321 and 0x7170, it also finds errors for address 0x9600, which as far as I know does not exist. This would hint that the new algo unpacker can sometime get seriously offset in the buffer.

As a more robust solution, I would propose to skip the full subsubevent instead of a single word, in both the old and the new unpacker algos.
I think this would provide the best chance to rescue the data of the other DiRICHes, including some coming after the corrupted ones in the MS buffer.
You can find an example of this for the new algo as comments in my [TEMP] commit.
Let me know if this is ok, in which case I would implement this in this MR

PS: For testing purposes, this MR also includes the local commit from cbmfles01 which was kntroducing larger TOT ranges for the NCAL/FSD DiRICHes

Of interest to: @c.pauly, @ma.beyer. @v.friese, @d.smith

Merge request reports

Merge request pipeline #29044 passed

Merge request pipeline passed for b4e72d23

Approval is optional
Ready to merge by members who can write to the target branch.

Merge details

  • 41 commits will be added to master.
  • Source branch will be deleted.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 49 commits

    • 269ed6ff...5cb9f2f5 - 45 commits from branch computing:master
    • 5eb12c7a - [mRICH] increased TOT range for NCAL/FSD in monitor
    • 3f0f0ddd - [TEMP] In mRICH online unpacker, try to catch/skip invalid data
    • 1640f04b - [TEMP] In mRICH online unpacker, handle data with less words than normal but proper header info
    • e2d738d4 - [TEMP] In mRICH online unpacker ProcessTimeDataWord, fix return to avoid out of bounds access

    Compare with previous version

  • Martin Beyer
  • Martin Beyer
  • Martin Beyer
  • Martin Beyer
  • Martin Beyer
  • Martin Beyer
  • added 21 commits

    • e2d738d4...79173df5 - 18 commits from branch computing:master
    • c6d518ff - [mRICH] increased TOT range for NCAL/FSD in monitor
    • 9971960b - In mRICH online unpacker, skip subsubevents with corrupt/too few data
    • 1036c2c5 - In mRICH legacy unpacker, skip subsubevents with corrupt/too few data

    Compare with previous version

  • Martin Beyer resolved all threads

    resolved all threads

  • I applied the changes while

    1. keeping the logs as debug
    2. adding some monitor counter to keep track of how many blocks are skipped and why

    I also applied the same change to the old unpacker.

    Now checking with a "bad" run that both unpackers work fine, then I will rebase and remove the draft state

  • added 4 commits

    • 48031ef5 - 1 commit from branch computing:master
    • 2a4fe648 - [mRICH] increased TOT range for NCAL/FSD in monitor
    • 38631b59 - In mRICH online unpacker, skip subsubevents with corrupt/too few data
    • 47f63a7f - In mRICH legacy unpacker, skip subsubevents with corrupt/too few data

    Compare with previous version

  • Pierre-Alain Loizeau resolved all threads

    resolved all threads

  • Also added a note that the CTS subevents should be better handled as I see that most "bad ID" detections are CTS subsubevent with address 0x8xxx (and the online version has an additional block of themwhich I suspect is the "last" not being skipped")

  • Pierre-Alain Loizeau marked this merge request as ready

    marked this merge request as ready

  • added 1 commit

    • e96bfa17 - In mRICH legacy unpacker, skip subsubevents with corrupt/too few data

    Compare with previous version

    • Resolved by Martin Beyer

      For file /lustre/cbm/prod/beamtime/2024/03/mcbm/2906/2906_node8_00_0001.tsa
      I observe a difference of 5 digis between online and offline (offline has 5 more).

      Comparing RichDigi.fAddress and tree2.RichDigi.fAddress
      Different number of entries:    2943817 vs    2943812
  • added 6 commits

    • e96bfa17...939d833c - 2 commits from branch computing:master
    • bfa59558 - [mRICH] increased TOT range for NCAL/FSD in monitor
    • 6ebbcd75 - In mRICH online unpacker, skip subsubevents with corrupt/too few data
    • dbcc1ed3 - In mRICH legacy unpacker, skip subsubevents with corrupt/too few data
    • bf9530f0 - [mcbm 2024] in rich only macro, allow switch on/off overlap MS

    Compare with previous version

  • Martin Beyer resolved all threads

    resolved all threads

  • From my side the MR looks good. Many thanks @p.-a.loizeau for your effort.

  • Martin Beyer approved this merge request

    approved this merge request

  • Fixing the format and I will set it to approved so that @v.friese can merge it

  • added 3 commits

    • 607bbd4e - In mRICH online unpacker, skip subsubevents with corrupt/too few data
    • 800609cf - In mRICH legacy unpacker, skip subsubevents with corrupt/too few data
    • d697f0b7 - [mcbm 2024] in rich only macro, allow switch on/off overlap MS

    Compare with previous version

  • Pierre-Alain Loizeau approved this merge request

    approved this merge request

  • Also ready for merging from my side

  • Volker Friese added 9 commits

    added 9 commits

    • d697f0b7...6463a47e - 5 commits from branch computing:master
    • d4122fdd - [mRICH] increased TOT range for NCAL/FSD in monitor
    • 7f7d3952 - In mRICH online unpacker, skip subsubevents with corrupt/too few data
    • f9e77b11 - In mRICH legacy unpacker, skip subsubevents with corrupt/too few data
    • b4e72d23 - [mcbm 2024] in rich only macro, allow switch on/off overlap MS

    Compare with previous version

  • merged

  • Please register or sign in to reply
    Loading