Upload
bonita
View
40
Download
0
Embed Size (px)
DESCRIPTION
Supplemental Material: Data on Tevatron Complex Reliability. Elliott McCrory 25 February 2004. Presentation Outline. Main talk is elsewhere The data here are supplemental to that talk. Downtime Logger, D18. Console Application: Screen Shot. Web Page: 4 Weeks of Downtime. Click Here…. - PowerPoint PPT Presentation
Citation preview
Supplemental Material: Supplemental Material: Data on Tevatron Data on Tevatron
Complex ReliabilityComplex Reliability
Elliott McCrory25 February 2004
Reliability Data (Supplemental Material) – McCrory 2
Presentation OutlinePresentation Outline
Main talk is elsewhere The data here are
supplemental to that talk
Reliability Data (Supplemental Material) – McCrory 3
Downtime Logger, D18Downtime Logger, D18C
on
sole
Ap
plic
ati
on
: S
creen
Sh
ot
Reliability Data (Supplemental Material) – McCrory 4
Web Page: 4 Weeks of Web Page: 4 Weeks of DowntimeDowntime
Click Here…
Reliability Data (Supplemental Material) – McCrory 5
Focus on one subsystem: BPSFocus on one subsystem: BPS
Entries like these are uses as the basis for the analysis
Reliability Data (Supplemental Material) – McCrory 6
Tevatron Complex Tevatron Complex SubsystemsSubsystems
BDIAG BMAG BMISC BPS BRF BVAC BWATR CMISC CNET CR&LK CTIME EXPAR FDIAG FMISC FPS FWATR
LDIAG LMAG LMISC LPS LRF LVAC LWATR MIDIAG MIMISC MIPS MIRF MISC MIVAC MIWATR MIMAG MIXFER
POWER RADTRIP RRMISC SAFETY TCRYO TDIAG TMAG TMISC TPS TQPM TQUEN TRF TVAC TWATR WATER
NTF PACC (L) PBCOOL PBDIAG PBMAG PBMISC PBPS PBRF PBTRGT PBVAC PBWATR PBXFER
Reliability Data (Supplemental Material) – McCrory 7
Definition of TermsDefinition of Terms MTBF
Mean Time Between FailureSimple arithmetic average of the time
between failures λ
Used to characterize the probability of an interval in the classic random queuing problem
• E.g., the arrival interval of cars at a toll booth λ = 1/<t> = 1/σ
• Actually, not σ, but R.M.S.Calculated here as (2/(MTBF + σ(MTBF)))
Average Downtime Fraction= [Σ (time down)] / (specified interval)
Reliability Data (Supplemental Material) – McCrory 8
MTBF For Each Major SystemMTBF For Each Major System
Days since 1/1/2003
Linac
Run
nin
g a
vera
ge M
TB
F, h
ours
Running average over 120 days
Running average over 30 days
Fall 2003Shutdown
Good
Bad
Reliability Data (Supplemental Material) – McCrory 9
MTBF For Each Major SystemMTBF For Each Major System
Booster
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, m
inute
s
Good
Bad
Reliability Data (Supplemental Material) – McCrory 10
MTBF For Each Major SystemMTBF For Each Major System
PBar
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, m
inute
s
A lot less downtime
Good
Bad
Reliability Data (Supplemental Material) – McCrory 11
MTBF For Each Major SystemMTBF For Each Major System
Main Injector
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, m
inute
s
Good
Bad
Reliability Data (Supplemental Material) – McCrory 12
MTBF For Each Major SystemMTBF For Each Major System
Tevatron
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, m
inute
s
Good
Bad
Reliability Data (Supplemental Material) – McCrory 13
MTBF For Each Major SystemMTBF For Each Major System
Controls
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, m
inute
s
Good
Bad
Reliability Data (Supplemental Material) – McCrory 14
Tevatron as 2 MachinesTevatron as 2 Machines The machines are called “Low Beta” and
“Not Low Beta” Is “Low Beta” if energy is bigger than 900Otherwise, it is called “Not Low Beta”
Only one machine exists at a time Time between failures
Only count time in which the machine “exists”
That is, count minutes at 980 to determine time between failures at 980
Ditto for “Not Low Beta” Eliminate obvious bad points
E.g., 5000+ minutes of shutdown between failures at 150 for TeV this winter.
Reliability Data (Supplemental Material) – McCrory 15
Tevatron as 2 machines, Tevatron as 2 machines, MTBFMTBF
0
20
40
60
80
100
120
140
160
180
1/1/2003 2/20/2003 4/11/2003 5/31/2003 7/20/2003 9/8/2003 10/28/2003 12/17/2003 2/5/2004
Failure Interval, T150
Failure Interval, T980
10 per. Mov. Avg.(Failure Interval, T150)
10 per. Mov. Avg.(Failure Interval, T980)
MTBF is getting longer!
Reliability Data (Supplemental Material) – McCrory 16
λλ(t)(t) For Each Major System For Each Major System
Linac
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, h
ours
Running average over 120 days
Running average over 30 days
Bad
Good
Reliability Data (Supplemental Material) – McCrory 17
λλ(t)(t) For Each Major System For Each Major System
Booster
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, h
ours
Bad
Good
Reliability Data (Supplemental Material) – McCrory 18
λλ(t)(t) For Each Major System For Each Major System
Main Injector
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, h
ours
Bad
Good
Reliability Data (Supplemental Material) – McCrory 19
λλ(t)(t) For Each Major System For Each Major System
PBar
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, h
ours
Bad
Good
Reliability Data (Supplemental Material) – McCrory 20
λλ(t)(t) For Each Major System For Each Major System
Tevatron
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, h
ours
Bad
Good
Reliability Data (Supplemental Material) – McCrory 21
λλ(t)(t) For Each Major System For Each Major System
Controls
Days since 1/1/2003
Run
nin
g a
vera
ge M
TB
F, h
ours
Bad
Good
Reliability Data (Supplemental Material) – McCrory 22
Average Downtime FractionAverage Downtime Fraction
Days since 1/1/2003
Run
nin
g a
vera
ge D
ow
nti
me
Dura
tion,
min
ute
s
Linac
Running average over 120 days
Running average over 30 days
Bad
Good
Reliability Data (Supplemental Material) – McCrory 23
Average Downtime FractionAverage Downtime Fraction
Days since 1/1/2003
Run
nin
g a
vera
ge D
ow
nti
me
Dura
tion,
min
ute
s
Booster Bad
Good
Reliability Data (Supplemental Material) – McCrory 24
Average Downtime FractionAverage Downtime Fraction
Days since 1/1/2003
Run
nin
g a
vera
ge D
ow
nti
me
Dura
tion,
min
ute
s
Main Injector Bad
Good
Reliability Data (Supplemental Material) – McCrory 25
Average Downtime FractionAverage Downtime Fraction
Days since 1/1/2003
Run
nin
g a
vera
ge D
ow
nti
me
Dura
tion,
min
ute
s
PBar Bad
Good
Reliability Data (Supplemental Material) – McCrory 26
Average Downtime FractionAverage Downtime Fraction
Days since 1/1/2003
Run
nin
g a
vera
ge D
ow
nti
me
Dura
tion,
min
ute
s
Tevatron Bad
Good
Reliability Data (Supplemental Material) – McCrory 27
Average Downtime FractionAverage Downtime Fraction
Days since 1/1/2003
Run
nin
g a
vera
ge D
ow
nti
me
Dura
tion,
min
ute
s
Controls Bad
Good
Reliability Data (Supplemental Material) – McCrory 28
Radioactive DecayRadioactive DecayRadioactive Decay of substance with half-life of 20
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100
Time
Su
bst
ance
Rem
ain
ing
λ e –λ t is the frequency distribution of the classic random queuing problem
F(t) = 2 -t/τF(t) = 2 -t/τ
= e -ln(2) t/τ
≡ e –λ t
λ = 0.0346
<t> = σ = 1/ λ = 28.9
Reliability Data (Supplemental Material) – McCrory 29
Controls DowntimesControls DowntimesTime Duration Min System Description
1/2/03 11:30 10 min. 10 CR&LK Orbmp timer card work 1/3/03 8:30 1.00 hrs. 60 CMISC CONS2 getting eaten by rtvscan, localcon, winvmenue, auto-reboots
1/8/03 16:19 47 min. 47 CNET Unable to talk to Low Beta QPMs - network problem. 1/28/03 1:55 55 min. 55 CR&LK Booster beam permit down.
1/30/03 12:30 1.77 hrs. 106.2 CMISC TLG got sick - leaky memory is bad. 2/10/03 10:27 1 min. 1 CMISC MI abort on MECAR
2/13/03 7:18 33 min. 33 CR&LK MI correctors in crate 24 coming back with calculation errors 2/17/03 2:15 55 min. 55 CR&LK MI CAMAC fiber repeater PS @ MI40 is dead.
2/19/03 17:17 58 min. 58 CR&LK TeV crate A8 PS and fan pack failure. Both were replaced. 2/19/03 18:37 41 min. 41 CR&LK TeV $A8 PS replacement was a bad spare. A workable spare was installed.2/21/03 16:30 14 min. 14 CMISC Booster FE @ >100% of bandwidth
3/6/03 13:30 1.50 hrs. 90 CMISC BBM violation: 120 GeV to AP0. No beam was present.3/27/03 2:50 50 min. 50 CMISC False BBM violation.8GeV-MI integration problem.4/5/03 23:55 15 min. 15 CNET No shot zone held off proton loading due to known network problem
4/8/03 8:00 1.43 hrs. 85.8 CR&LK E48 kicker not charging for reverse tune up. 465 card replaced. 4/11/03 4:16 1.73 hrs. 103.8 CTIME Tev Crate $3D TClock problems, replace 178 card
4/18/03 15:50 2.93 hrs. 175.8 CNET A2 thermometry crate networking repair. 4/25/03 4:40 3.17 hrs. 190.2 CR&LK C:DQ0 trip traced to bad crate controller for Tev $D9
4/25/03 16:25 9 min. 9 CMISC MI LLRF needed a reboot.4/25/03 19:23 35 min. 35 CMISC MI LLRF needed a reboot.4/26/03 21:06 12 min. 12 CTIME rebooting TLG front end 5/16/03 11:58 40 min. 40 CMISC T:STORE and C:FILE set to wrong store number (2511)
5/22/03 7:25 21 min. 21 CMISC Proton loading controls problem.6/9/03 0:08 55 min. 55 CR&LK Damper problems due to bad PS in Booster crate $93
6/20/03 3:50 10 min. 10 CR&LK Bad 170 card in Pbar crate $57. 6/20/03 10:01 1 min. 1 CTIME Pbar trip due to TLG $29 spacing. 6/24/03 22:15 1.25 hrs. 75 CR&LK Tev E1 vacuum controls problem.
6/27/03 9:27 5 min. 5 CR&LK Booster off due to lost power in crates $40 and $41 7/3/03 23:30 2.92 hrs. 175.2 CMISC Communications lost with all TeV sector frig VME's. 7/7/03 13:55 21 min. 21 CR&LK 453 card failure; MI-3; crate $85 7/8/03 20:40 47 min. 47 CR&LK Booster MADC #8 failure. Chassis changed out.
7/14/03 14:14 13 min. 13 CR&LK Lost Crates 70, 71, 72 when a circuit breaker tripped when used by a heat gun 7/17/03 4:16 2.77 hrs. 166.2 CMISC Pbar MADC #14 P.S. failure 7/25/03 9:29 5 min. 5 CR&LK I:HT848 C453 card replacement: MI3, C84, slot 5
7/25/03 10:27 2 min. 2 CR&LK MI node crate $84 changeout 7/25/03 10:56 4 min. 4 CR&LK MI node crate $84 troubleshooting 7/25/03 20:25 55 min. 55 CR&LK TeV crate $CC power supply changeout7/30/03 21:04 5.02 hrs. 301.2 CR&LK QUench due to A1 CIA failure. 7/31/03 15:44 6 min. 6 CMISC Camac link Driver for Switchyard front end failed.
8/4/03 20:13 55 min. 55 CR&LK PBar Crate $5B takes down stacking 8/5/03 8:25 2.17 hrs. 130.2 CMISC FCC without power affecting network. MiniBoone DAC specifically.
8/10/03 9:20 59 min. 59 CR&LK C:S7B1A replacement.8/13/03 17:16 44 min. 44 CTIME Booster EAPS trip due to BRF1. 220 sec timeline was the cause. 8/14/03 10:00 2.00 hrs. 120 CMISC VAX OACs restarts due to Database hardware problem.
8/18/03 9:00 1.83 hrs. 109.8 CR&LK Bad PS in Tev crate $C5.8/27/03 11:43 14 min. 14 CR&LK SY CAMAC link down, held off MiniBooNE beam permit. 8/28/03 20:00 4.73 hrs. 283.8 CTIME MiniBooNE toroid timing problems at MI10
1/5/04 18:12 7 min. 7 CR&LK MI crate $65 needed a restore after a 453 card swap 1/7/04 21:10 24 min. 24 CR&LK C:SFB2 465 card swap problems (dip switch incorrectly set) 1/13/04 9:40 1.25 hrs. 75 CMISC MiniBooNe DAQ problems
1/14/04 12:55 12 min. 12 CTIME HVF12 PS problems due to timeline. 1/20/04 10:30 30 min. 30 CMISC TRF gates will not come on.
1/22/04 7:01 9 min. 9 CR&LK C467 replaced for I:LAM41. 1/22/04 23:45 1.93 hrs. 115.8 CMISC Alarms screen not operating 1/31/04 17:42 7 min. 7 CTIME P1 line trip due to TLG mixup.
2/4/04 16:00 1.50 hrs. 90 CMISC MiniBooNE and stacking beam inhibited due to HEP inhibit ($53) generated.
Data
Dum
p o
f A
ll C
on
trols
D
ow
nti
mes
since
1-1
-2003
Reliability Data (Supplemental Material) – McCrory 30
0.136
0.0090.006
0.183
0.044
0.0220.037 0.043
0.001
0.01
0.1
1
Booster Controls Experiment Linac Main Inj Pbar Tevatron Misc
λλ For All Systems For All SystemsLo
g S
cale
Probability of failure per hour = 1 – λ
E.g. Linac up for 5 hours: (1-0.183)5 = 0.364
Reliability Data (Supplemental Material) – McCrory 31
Quality of Quality of λλ Fit Assumption Fit Assumption
0
0.2
0.4
0.6
0.8
1
1.2
Booster Controls Experiment Linac Main Inj Pbar Tevatron Misc
Q = <t>/σ