Upload
bennett-elliott
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Calorimeter Data Monitoring News
Benoit Viaud
(LAL-in2p3)
B. Viaud, Calo Mtg Aug. 31st 2011 0
Overview
• Reminder: too many alarms make the monitoring inefficient;
• A survey of the Monitors' behavior over 2011;
• A few proposed improvements.
B. Viaud, Calo Mtg Aug. 31st 2011 1
Reminder
• Marie-Noelle (early May 2011) :
• there are too many alarms issued by the monitoring. Not all have real consequences.
• this brings Data Quality shifters vigilence down: they eventually overlook important issues.
• I surveyed 2011 monitoring data to determine what alarms are indeed to noisy and see what can be done.
B. Viaud, Calo Mtg Aug. 31st 2011 2
Survey of the monitors over 2011
• Most of the Monitoring is based on those monitors:
+ A few others based on collision data. 3
Survey of the monitors over 2011
• Most of the Monitoring is based on those monitors:
• Quantities like PMT's answer to a LED pulse, pedestal position, etc... measured in each cell:it's faulty if the average over n events is outside a certain range. The number of faulty cells determines the severity of the conclusion: warning/alarm/fatal
• This monitoring is repeated every 10-15 minutes. 4
Survey of the monitors over 2011
• I analyzed all the 15-minute savesets taken in 2011 (up to Aug. 13th, only physics fills, discard those created automatically at the end of a run);
• The goal is to count the number of warnings and alarms issued by each monitor, per unit of time (fill): spot those which "overwhelm" the DM.
• Action to be taken: to be discussed with the corresponding experts (re-tune the ranges and thresholds to reduce the nb of alarms while keeping the calo safe)
• Correlations are expected among the monitors: confirm them in practice. Correlated monitors can be grouped into a single item to simplify the DM's work.
• Scripts developped for this study can easily determine the effect of thresholds' variation.
5
Example: Ecal_Unexpected Signal
NB: All the other monitors shown in back-up
# of Savesets in the fill
# of Savesets at least in Warning
# of Savesets at least in Alarm
# of Savesets in Fatal.
Fill Number
# S
av
es
ets
6
Example: Ecal_Unexpected Signal
Normalized to the number
of fills in the Saveset
Fill Number
# S
av
es
ets
Fill 1806: 25-05-2011Fill 1613: 13-03-2011Fill 1944: 14-07-2011
Fill 2025: 13-08-2011
7
Correlated Monitors
• PedestalChi2 & PedestalAverageNoise alarms always accompanied by a PedestalNoise alarm;
• Most of the PedestalNoise & PedestalShiftOverNoise alarms accompanied by a PedestalShift alarm
Ecal_AveragePedestalNoise
Ecal_PedestalChi2
10
Ecal_PedestalNoise
Ecal_PedestalShiftOverNoise
Ecal_PedestalShift
13
Hcal_AveragePedestalNoise
Hcal_PedestalChi2
Hcal_PedestalNoise
16
Hcal_PedestalShift
Correlated Monitors
• PedestalChi2 & PedestalAverageNoise alarms always accompanied by a PedestalNoise alarm;
• Most of the PedestalNoise & PedestalShiftOverNoise alarms accompanied by a PedestalShift alarm
Group them into a single Pedestal alarm in the DM page. Keep the full picture in the Piquet page for finer diagnostics.
Correlated Monitors
• LEDNoise & LargeLEDNoise
• LowLEDSignal & OutRangeLED & NoGainMonitor
19
Ecal_LEDNoise
Ecal_LargeLEDNoise
Hcal_LEDNoise
22
Hcal_LargeLEDNoise
Ecal_LowLEDSignal
Ecal_OutRangeLED
25
Ecal_NoGainMonitor
Hcal_LowLEDSignal
Hcal_OutRangeLED
28
Hcal_NoGainMonitor
Correlated Monitors
• LEDNoise & LargeLEDNoise
• LowLEDSignal & OutRangeLED & NoGainMonitor
Group them into a single LEDNoise and a single NoGainMonitor
Even vs. Odd in Prs/Spd
Group Odd and
Even in DM plots.
31
Replace this:
Proposal
Proposal
By this :
Quite simpler for the DM.
Noisy Monitors
Now: study the pattern behind those alarms + discussions with experts to make them quieter and safe (ex: optimized ranges and thresholds).Next slides contain my first remarks.
Summing up all the alarms: something pretty much everyday
• Those which issue a Warning/Alarm/Fatal at least every few days;
• There are a few of them (see next slides);
34
Noisy Monitors
Noisy Monitors
Noisy Monitors
37
Noisy Monitors
• Some alarms appear simultaneously in many monitors ;• Happens when something a bit dramatic occurred (at leat something at all must have happened) ;• I guess we want those alarms; we should see to it that they’re still
there after monitoring ranges/thresholds have been optimized.
• Ex: Fills 1738, 1743, 1944
HCAL_LEDNoiseECAL_LEDNoise
Noisy Monitors
• Some alarms appear simultaneously in many monitors ;• Happens when something a bit dramatic occurred (at leat something at all must have happened) ;• I guess we want those alarms; we should see to it that they’re still
there after monitoring ranges/thresholds have been optimized.
• Fill 1944: right after LHCb restarted on July 14th, shortly after a power cut.
• Fill 1743: mis-Configuration of ODIN, LED pulsing in a physics BXID.
40
Noisy Monitors: Spd Fake Signal
I observe one faulty saveset every few hours, everything’s OK 15
minutes before/after. Instability in the pedestal ?
Most of the times, not very much above the Warning threshold.
Noisy Monitors: Prs PedestalMeans
• Shows up after LHCb restarted on July 14th + power cut.
FEB11 on crate 2 changed by Stephane ?
Noisy Monitors: Ecal/Prs Low Occupancy
• Other alarms are simultaneous for Ecal and Prs. Always (save one time) due to the very first saveset analyzed in the fill, typically 1 to 5 minutes after the start.
• PS2FEB11 is visible on the left of the PRS plot.
• Do they really appeared in the alarm section of the presenter ? If yes, discarding the first saveset will reduce a lot their rate. 43
Noisy Monitors
• Known for long. Find something to fix it…
Noisy Monitors
• Known for long. Find something to fix it…
1799,21/5/11 2040, 22/8/11
Summary and Prospects
• Surveyed 2011 monitoring data to find ways to reduce the number of alarms to be handled by the Shift Data Manager;
• Many alarms correlated/simultaneous: could group them into a single one;
• will require a bit of coding (create new monitoring histos): one of my next steps.
• A few monitors trigger an alarm every few days; combining everything, it means something almost every day. I’m presently having a look at that to determine if this can be re-optimized (less alarms and still safe).
• Scripts written for this study can be made available to the Piquets (after a bit of cleaning). Could be used every day to monitor in their whole the fills taken in the past 24 hours.
45
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up
Back-up