Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
35t shifter report
Matthew Tamsett27/10/15
/ 19
Outline
2
This talk will present and overview of my experiences as a 35t shifter over the past 2 days.
Contents
1. Work flow
2. Monitoring
3. Two interesting DAQ features
4. Outlook
/ 19
Workflow
3
Workflow consisted of following instructions on the following web page:
https://cdcvs.fnal.gov/redmine/projects/35ton/wiki/Instructions_for_Shifters
and trying to exercise the DAQ in lots of different ways - particular focus has been on trying to determine the rates and configurations that we can stably run at.
Originally started using the “just_do_it.sh” script - worked out of the box.
Removed some control from the shifter and had some odd features (like spawning two MessageLoggers and not cleaning up after itself).
DRC (Tingjun) and John Freeman have provided excellent support and debugging.
/ 19
Monitoring
4
The outcomes of the tests are monitored via the command line, Message Logger & the online monitoring:
http://lbne-dqm.fnal.gov/OnlineMonitoring/
and the ganglia web page (after I found out that it existed):
http://lbne35t-gateway01.fnal.gov/ganglia/
Ganglia pages required more care to interpret as they are presented per host machine and so required cross correlation with the MessageLogger to use.
Comments are posted to the elog.
/ 19
Command line tools
5
• Lots of detailed information available
• Need John to show me how to use them
• Once I had seen them once, they were then very useful
/ 19
Message logger
6
• Really great
• A vast amount of useful information
• Very good and useful filtering mechanisms
• Primary source of information on almost all problems
/ 19
Elog
7
• Pretty good.
• Everyone seems to use it.
• However it does require some scrolling up and down to look to see if folks have posted comments - could do with an alert feature?
/ 19
Online monitoring
8
• Seems to work very well.
• Plots show up in short order and contain a lot of useful information.
• Can be difficult to pick out what are interesting pathologies as there are spikes, missing data and other features all over the place.
• That said it has noticeably improved over just the last two days.
• Posted comments on style and content to the Elog.
/ 19
Ganglia
9
• Lots of information.
• Lots of it is difficult to interpret for a non-expert.
• Could do with a few particularly useful plots being picked out to a separate page.
• Very useful plots include:
‣ Data logger event rate
‣ Data logger event size
‣ Event builder incomplete event rate
I’m sure there are lots of other great things in there too.
/ 19
Event rates
10
Run 4021 4 RCE
4022 2 RCE
4023 1 RCE
Run started here
/ 19
Event rates
11
Run 4021 4 RCE
4022 2 RCE
4023 1 RCE
Sub run roll-over
/ 19
Event rates
12
Run 4021 4 RCE
4022 2 RCE
4023 1 RCE
Back pressure related error seen (after 7 minutes)
/ 19
Event rates
13
Run 4021 4 RCE
4022 2 RCE
4023 1 RCE
Run ended here and next one started Spike is incomplete events being flushed
/ 19
Event rates
14
Run 4021 4 RCE
4022 2 RCE
4023 1 RCE
Back pressure seen here (after 13 minutes)
/ 19
Event rates
15
Run 4021 4 RCE
4022 2 RCE
4023 1 RCE
Run ended and next one started. Final run ended manually, no errors seen.
/ 19
Incomplete events
16
4033 14 RCE
1 Hz trigger
Can see the build up of back pressure in the event builder.
/ 19
Incomplete events
17
4033 14 RCE
1 Hz trigger
These are flushed at sub-run roll-over. But that only seems to make things worse.
/ 19
Incomplete events
18
4033 14 RCE
1 Hz trigger
Back pressure errors appear at sub-run 01-02 roll-over
/ 19
Outlook
19
DAQ is clearly a work in progress
Shifters roll has been more as a tester rather than a shifter.
Comments seem to be that having someone testing the DAQ out in various ways is proving valuable.
Guidance on the sort of tests to run would be valuable.
Would not be able to understand most errors/warnings without John and Tingjun.
Lots of people are working on the system:
• Problems aren’t always repeatable.
• Performance isn’t very consistent (lots of unknown variables).
• Benchmarking isn’t always possible.