19
35t shifter report Matthew Tamsett 27/10/15

35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

35t shifter report

Matthew Tamsett27/10/15

Page 2: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Outline

2

This talk will present and overview of my experiences as a 35t shifter over the past 2 days.

Contents

1. Work flow

2. Monitoring

3. Two interesting DAQ features

4. Outlook

Page 3: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Workflow

3

Workflow consisted of following instructions on the following web page:

https://cdcvs.fnal.gov/redmine/projects/35ton/wiki/Instructions_for_Shifters

and trying to exercise the DAQ in lots of different ways - particular focus has been on trying to determine the rates and configurations that we can stably run at.

Originally started using the “just_do_it.sh” script - worked out of the box.

Removed some control from the shifter and had some odd features (like spawning two MessageLoggers and not cleaning up after itself).

DRC (Tingjun) and John Freeman have provided excellent support and debugging.

Page 4: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Monitoring

4

The outcomes of the tests are monitored via the command line, Message Logger & the online monitoring:

http://lbne-dqm.fnal.gov/OnlineMonitoring/

and the ganglia web page (after I found out that it existed):

http://lbne35t-gateway01.fnal.gov/ganglia/

Ganglia pages required more care to interpret as they are presented per host machine and so required cross correlation with the MessageLogger to use.

Comments are posted to the elog.

Page 5: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Command line tools

5

• Lots of detailed information available

• Need John to show me how to use them

• Once I had seen them once, they were then very useful

Page 6: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Message logger

6

• Really great

• A vast amount of useful information

• Very good and useful filtering mechanisms

• Primary source of information on almost all problems

Page 7: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Elog

7

• Pretty good.

• Everyone seems to use it.

• However it does require some scrolling up and down to look to see if folks have posted comments - could do with an alert feature?

Page 8: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Online monitoring

8

• Seems to work very well.

• Plots show up in short order and contain a lot of useful information.

• Can be difficult to pick out what are interesting pathologies as there are spikes, missing data and other features all over the place.

• That said it has noticeably improved over just the last two days.

• Posted comments on style and content to the Elog.

Page 9: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Ganglia

9

• Lots of information.

• Lots of it is difficult to interpret for a non-expert.

• Could do with a few particularly useful plots being picked out to a separate page.

• Very useful plots include:

‣ Data logger event rate

‣ Data logger event size

‣ Event builder incomplete event rate

I’m sure there are lots of other great things in there too.

Page 10: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Event rates

10

Run 4021 4 RCE

4022 2 RCE

4023 1 RCE

Run started here

Page 11: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Event rates

11

Run 4021 4 RCE

4022 2 RCE

4023 1 RCE

Sub run roll-over

Page 12: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Event rates

12

Run 4021 4 RCE

4022 2 RCE

4023 1 RCE

Back pressure related error seen (after 7 minutes)

Page 13: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Event rates

13

Run 4021 4 RCE

4022 2 RCE

4023 1 RCE

Run ended here and next one started Spike is incomplete events being flushed

Page 14: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Event rates

14

Run 4021 4 RCE

4022 2 RCE

4023 1 RCE

Back pressure seen here (after 13 minutes)

Page 15: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Event rates

15

Run 4021 4 RCE

4022 2 RCE

4023 1 RCE

Run ended and next one started. Final run ended manually, no errors seen.

Page 16: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Incomplete events

16

4033 14 RCE

1 Hz trigger

Can see the build up of back pressure in the event builder.

Page 17: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Incomplete events

17

4033 14 RCE

1 Hz trigger

These are flushed at sub-run roll-over. But that only seems to make things worse.

Page 18: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Incomplete events

18

4033 14 RCE

1 Hz trigger

Back pressure errors appear at sub-run 01-02 roll-over

Page 19: 35t shifter report - INDICO-FNAL (Indico) · 13 Run 4021 4 RCE 4022 2 RCE 4023 1 RCE Run ended here and next one started Spike is incomplete events being flushed ... • Benchmarking

/ 19

Outlook

19

DAQ is clearly a work in progress

Shifters roll has been more as a tester rather than a shifter.

Comments seem to be that having someone testing the DAQ out in various ways is proving valuable.

Guidance on the sort of tests to run would be valuable.

Would not be able to understand most errors/warnings without John and Tingjun.

Lots of people are working on the system:

• Problems aren’t always repeatable.

• Performance isn’t very consistent (lots of unknown variables).

• Benchmarking isn’t always possible.