Skip to content
Snippets Groups Projects

Adding storage of online histograms and a conversion function online->ROOT

Open Sergei Zharko requested to merge s.zharko/cbmroot:online-histo-dump into master
5 unresolved threads

The merge request introduces a option of storing online histograms into an output binary file to the cbmreco, if the histogram server is not available. To enable histogram storage one has to provide an output filename with -o option and add the histogram storage step with "-O Histograms".

cbmreco execution

Here several scenarios of running QA in the cbmreco are available:

(a) Running with a histogram server (the "--histogram " option is enabled): a standard cbmreco execution with a histogram server. The histograms are reset each time, when the cbmreco forms a message with the histograms and sends it to the histogram server.

(b) Running with the "-O Histograms" option and without the "--histogram" one: QA tasks are executed, the histograms collect data from all timeslices and are stored on the finishing step of the cbmreco.

(c) Running with the "--histogram" and "-O Histograms" simultaneously [wrong configuration of the online binary]: the "-O Histogram" option is ignored with a warning, and the scenario (a) is processed.

(d) Running without "--histogram" or "-O Histogram" option: the QA tasks are not processed.

The feature is available only for those QA tasks, which are registered using the qa::Manager class.

The file with the histograms has the same name prefix as the data output and the ".qa.out" extension.

Conversion of online histograms

To convert the online histograms output into a ROOT file with ROOT histograms, one can call the cbm::qa::OnlineInterface::ConvertOutput(const std::string& inFilename, std::string outFilename) function from a ROOT session or a ROOT macro.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
82 85 void RegisterNewTask(std::string_view name);
83 86
84 87 private:
88 friend class boost::serialization::access;
89 /// \brief Serialization method
90 /// \note To be used in a batch mode without a histogram server (otherwise histograms are reset on
91 /// each call of Send(histoSender)).
92 template<class Archive>
93 void serialize(Archive& ar, const unsigned int /*version*/)
  • This may be a minor thing... I find it a bit inconsistent that when sending, we serialize the qa::HistogramContainer class (and the canvas configuration), but when writing to file we do not use the serialization of fHistograms but instead dive into the class to serialize its members. Also, this is not really a serialization of the qa::Data class because many members are missing.

    Maybe it would be more consistent to have the QaManager::WriteHistograms call fpData->Write(...) as it is the case with Send()? Then we would not need to add any additional serialization really.

  • Author Maintainer

    The issue here, that the qa::HistogramContainer class contains not only the histograms, but also extra information such as timeslice ID or timeslice time stampt, which are needed to monitor the histogram evolution with time on the histogram server side. When one stores the integrated histograms in the batch mode, this extra info is not needed. Some extra information, which is not required for the histogram server, may be also included here. So, the way with the serialize method looks more clear to me, then one with custom functions for writing and reading histograms.

  • Yes, I see your point. However, the philosophy of how the class serialization is supposed to be used, to my understanding, is that you add it to a struct or class to be able to recreate this same structure later or elsewhere. That means that if we use it for transferring or storing data, we should expect to get the same structure again. This structure can either be a specific interface structure, which is only filled for the purpose of data transfer. Or it may be a structure that is directly used. The latter of course has some disadvantages if we use the structure already for slightly different purposes internally, as we do here: when serializing, the not-precisely-fitting structure stays this way.

    I can think of two somehow clean ways to handle this situation:

    1. Use container classes that contain exactly the right data fields for the application. In this case it may mean that you would not re-use the histogram container class but instead create a purpose-built one, maybe with a common base class, or templating. Then, you can cleanly (de)serialize precisely that class.
    2. Use a common superset of data members, which might not all be used all the time, and take the very slight overhead.

    In the MR, however, I see that the problem is somehow dodged by implementing the serialization twice on two different levels of hierarchy, with one of them not really able to recreate the class it claims to serialize.

  • Please register or sign in to reply
  • 45 45 Cluster,
    46 46 Hit,
    47 47 Track,
    48 Histograms, //< Optional output of histograms for debuging, if the histogram server is not provided
    • This may be a bit confusing and not the ideal way to go. As I understand it, the QA output is not part of the RecoData structure and also not stored in the RecoResultsArchive. This enum, to my understanding, is used to specify which components of the .rra file should be filled. The QA output as I understand it here is a completely separate and independent output, in which case I think we should treat it as such.

    • Author Maintainer

      I am going to change it then in the next commit:

      • remove "Histograms" from the reco archive
      • introduce an option "--batch-qa-output=<file.qa.out>", which explicitly specifies the filename with histograms and enables the histogram storage
    • Please register or sign in to reply
  • 120 121 {"DigiEvent", RecoData::DigiEvent},
    121 122 {"Cluster", RecoData::Cluster},
    122 123 {"Hit", RecoData::Hit},
    123 {"Track", RecoData::Track}
    124 {"Track", RecoData::Track},
    125 {"Histograms", RecoData::Histograms}
  • 526 533 file << MakeReportYaml(fTimesliceTimesAcc);
    527 534 }
    528 535 }
    536
    537 if (fSender == nullptr && Opts().HasOutput(RecoData::Histograms)) {
    538 fs::path outPath = Opts().OutputFile();
    539 std::string outFilename = (outPath.parent_path() / (outPath.stem().string() + ".qa.out")).string();
    540 fQaManager->WriteHistograms(outFilename, Opts().CompressArchive());
    541 }
    • I fully welcome the additional possibility to write QA information to a file for later debugging or analysis. I think it makes a lot of sense, especially when running on the batch system and for development and debugging of the QA.

      However, an issue that I see is that the code cannot decide between two possible clean implementation options:

      1. Add the QA information to the RecoResults. It can stay empty in normal operation, creating almost no overhead. Conversely, when only the QA is of interest, the remaining fields can stay empty, but the structure stays intact.
      2. Create a second data path, completely independent of RecoResults. Call it QaData or QaResults and copy the functionality of RecoResults with respect to code structure and program options. Then both exist fully independently.

      I personally have no clear favorite between these two. The structurally best way may depend on the outlook: Will we also store non-histogram QA data for debugging in the future, maybe the monitoring information? Will we always just store one set of histograms per program invocation and not one per timeslice?

      Now the issue here is that I have the feeling that this decision was not really taken. Some structures and functionality were duplicated, but some were not. This causes interdependencies that are hard to understand and to document. I think once we know the direction, the required changes to the code will not be massive.

    • Author Maintainer

      I would go for the second variant

    • Please register or sign in to reply
  • Jan de Cuveland requested changes

    requested changes

  • Please register or sign in to reply
    Loading