Upload
lamtu
View
219
Download
0
Embed Size (px)
Citation preview
Homework : Design for Operational Feasibility :Wireless Immersive Training Vest Monitoring System
Prepared by SYSENG 368 group 5:Chris Blanchard – [email protected] Caunt – [email protected]
Michael Donnerstein – [email protected] Neuman – [email protected] Ramachandran – [email protected]
Submitted: October 10th, 2012
SafetyDefinition
According to CMMI +SAFE V1.2 (TECHNICAL NOTE CMU/SEI-2007-TN-006 Carnegie Mellon March 2007), Safety can be defined as “An acceptable level of risk. Absolute safety (i.e., zero risk) is not generally achievable. Therefore, we define safety in terms of the level of risk that is deemed acceptable.”
According to MIL-STD-882E, Safety can be defined as “Freedom from conditions that can cause death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.”
For the purposes of this report, safety will be defined as the expectation that the system, under defined conditions, does not increase the risk of death, injury, occupational illness, damage to or loss to property, loss of system availability, or damage to the environment above an acceptable level.
Technical Performance Measures
We will undertake a series of safety and hazard analyses throughout the system lifecycle. These will be in accordance with an agreed standard with the customer and will need to address customer specific safety articles, university safety guidelines, and any relevant government safety regulation. The analysis will need to be reviewed by safety subject matter expert or experts to ensure that all required aspects of safety and hazards have been addressed.
The safety analyses will include a hazard list that is maintained throughout the lifecycle and include system and subsystem hazards, a maintained hazard analysis that determines casual links and mitigations at both the system and subsystem levels, analysis of all changes to the system to ensure that system safety is not compromised, and investigations into all mishaps and near misses. Reporting and investigations of near misses is particularly important as they can be used to prevent mishaps in the future.
Analysis approach
The safety and hazard analyses will use a risk based approach. The level of risk will be assessed using a Risk Assessment Matrix (see Table 3 below). The matrix is generated by using the severity of a potential mishap against the probability it will occur. The tables below are taken from MIL-STD-882E.
1 | P a g e
SEVERITY CATEGORIESDescription Severity
CategoryMishap Result Criteria
Catastrophic 1 Could result in one or more of the following: death, permanent total disability, irreversible significant environmental impact, or monetary loss equal to or exceeding $10M.
Critical 2 Could result in one or more of the following: permanent partial disability, injuries or occupational illness that may result in hospitalization of at least three personnel, reversible significant environmental impact, or monetary loss equal to or exceeding $1M but less than $10M.
Marginal 3 Could result in one or more of the following: injury or occupational illness resulting in one or more lost work day(s), reversible moderate environmental impact, or monetary loss equal to or exceeding $100K but less than $1M.
Negligible 4 Could result in one or more of the following: injury or occupational illness not resulting in a lost work day, minimal environmental impact, or monetary loss less than $100K.
Table 1: Severity Categories
PROBABILITY LEVELSDescription Level Specific Individual Item Fleet or InventoryFrequent A Likely to occur often in the life of an item. Continuously experienced.Probable B Will occur several times in the life of an item. Will occur frequently.Occasional C Likely to occur sometime in the life of an item. Will occur several times.Remote D Unlikely, but possible to occur in the life of an
item. Unlikely, but can reasonably be expected to occur.
Improbable E So unlikely, it can be assumed occurrence may not be experienced in the life of an item.
Unlikely to occur, but possible.
Eliminated F Incapable of occurrence. This level is used when potential hazards are identified and later eliminated.
Incapable of occurrence. This level is used when potential hazards are identified and later eliminated.
Table 2: Probability Levels
Using the Severity from Table I and the Probability Level from Table 2, the following risk assessment matrix is constructed in MIL-STD-882E.
2 | P a g e
RISK ASSESSMENT MATRIXSEVERITY->
PROBABILITYCatastrophic
(1) Critical
(2) Marginal
(3)Negligible
(4)Frequent(A) High High Serious MediumProbable (B) High High Serious MediumOccasional (C) High Serious Medium LowRemote (D) Serious Medium Medium LowImprobable (E) Medium Medium Medium LowEliminated (F) Eliminated
Table 3: Risk Assessment Matrix
Test and evaluation plans
Testing the safety of the system is done via analysis of the individual safety and hazard analyses. The acceptability of the risk will need to be agreed with the customer whilst also considering any other legal impact.
3 | P a g e
Reliability
Definition
Reliability is the measure of the system performing satisfactorily under a given duty cycle for a given time period. A system is considered reliable when it is able to meet the given duty cycle without interruption by the failure to operate satisfactorily. Reliability evaluations are a main component of evaluating the successful operation of a system, and are therefore critical in satisfying the customer needs.
Technical Performance Measures
Trainer NotificationData TransmissionComponent OperationData Recording FidelityUser Alert SystemSystem Duty CycleTransmission Distance
Analysis Approach
Figure 1, below is a diagram of the components and their interfaces. The representation below is important for the remainder of the reliability section since it allows easy system visualization, and shows the interfaces (where problems tend to exist). When visualizing the interfaces it makes the completion of a FMECA easier and possible issues harder to miss.
Figure 1: Component Interfaces
4 | P a g e
System Life Cycle
A life cycle / duty cycle is needed to calculate or assume reliability for any system this is demonstrated in the following equation, where random variable t is a density function of f(t):
R ( t )=∫t
∞
f ( t )dt
The assumed life cycle is shown below, it has the use divided by component and normalized to a duty cycle of three years.
Component AssumptionsDaily Duty
Cycle Life Duty CycleMote System In
VestAlways Operating
During Training 8 Hours 6240 Hours
Software Always Operating During Training 8 Hours 6240 Hours
Meshlium Converter
Always Operating During Training 8 Hours 6240 Hours
Recording System On 5% of the time during training 0.4 Hours 312 Hours
Tactor Relay On 2% of the time during training 0.16 Hours 125 Hours
Note: Assumes memory handles quickly incoming data as with any computer,assumes a faux pas by one user will not exceed once per minute on average.
Table 4: Life Cycles
Reliability Predictions
First it is important to note that at this stage it would be inappropriate to try to attach a reliability number to the components, the reliability information will become apparent as a result of testing, although required system reliability is a known value. The only measure of reliability that we will be using is a comparison of the MTBF to assumed duty cycle. The reason that MTBM is not utilized is that all components are COTs and easily replaced.
Component Life Duty Cycle MTBFa) Mote System In Vest 6240 Hours 390000 Hoursb) Software 6240 Hours N/A Dependent on Complexityc) Meshium Converter 6240 Hours 390000 Hoursd) Recording System 312 Hours 1.2 x 10^6 Hourse) Tactor Relay 125 Hours 50000 Hours
Table 5: Estimated MTBF
5 | P a g e
Sources:a) Similar to router (component c)b) N/Ac) http://www.cisco.com/en/US/prod/collateral/wireless/ps5678/ps10092/datasheet_c78-
502793.htmld) http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701176.pdfe) http://www.radio-electronics.com/info/data/semicond/leds-light-emitting-diodes/lifespan-
lifetime-expectancy-mtbf.php
Worst Case Stack Up
The reliability for the system was set at 99% for the duty cycle. If we assume that the reliability of each component is equal that means that the reliability would have to be .99^5, or 99.8 percent. As a result that value would have to be met for all components of the system, this can be done in various ways and will be explained in the reliability acceptance testing section.
Test and Evaluation Plans
Reliability Acceptance Testing
Reliability testing would be preferred on the components of the system; there are two main ways that this can be accomplished. For the purposes of this section the term “life” or “lives” shall be defined as the amount of time and severity where the component must function satisfactorily. The first method of testing is known as success based testing, in this method numerous lives are run in order to prove reliability. Success based testing when looking for reliability is generally done by taking many samples and running them to a small number of lives, or doing the inverse and taking few components to a high number of lives. The testing type chosen will depend on available samples, test time, and prototype expense. The second reliability test is one where the components are tested to failure, or accumulated failures. This is generally done with a given sample size and the components are run till they are no longer performing satisfactorily. Failure points can then be graphed and analyzed, a Weibull analysis is typically done, an example of this is displayed below. Reliability on the y-axis can then be compared to the lives, hours or cycles on the x-axis. Failure based testing is usually preferred when time is allowed because it enables the Weibull plot to be compared against other duty cycles and allows for characterization of a known design.
6 | P a g e
Figure 2: Weibull Probability
A FMECA example that reflects reliability of the system evaluation is shown below, this must include all functions and account for all failure mode. Completing a FMECA will help ensure that all issues possible are considered and accounted for, this is done by using the three columns below, severity (SEV), occurrences (OCC), and detection (DET). Every potential failure mode should be accounted for and rated in the previously mentioned columns. If the RPN is above 40 a design change or corrective action must be implemented to lessen the occurrence or increase detection. Once all items are below 40 the design is approved. The scales for the ratings are shown in the next section.
Item / Function
of the Part
Potential Failure Mode
(Loss of Function or value
to customer
)
Potential Effect(s) of Failure
SEV
Potential Cause(s)
/ Mechanism(s
) of Failure
OCC
Current Design
Controls Detection
Analytical or physical
validation method
planned or completed
DET
PREVRPN
Data Gathering on User
Lost Data
Training exercise
cannot be graded
appropriately
7
Distance from
Meshlium Converter
1
Evaluating distance
requirements are met by
design
3
Confirmation of training
area, to make sure users are
prevented from leaving the training
area
21
7 | P a g e
Table 6: FMECA Example
STANDARD FMECA RISK RANKINGS
Severity Occurrence
Detection
Rating Criteria Effect Ratin
g Probability of FailurePossible failure Rates
Rating Detection
Criteria: Likelihood of Detection
by Design Control
10
Very high severity ranking when a potential failure mode affects safety without warning
Hazardous without warning
10 Very high: Failure is almost inevitable > 1 in 2 10
Absolute Uncertaint
y
Design Control will not and/or cannot detect a potential cause/mechanism and subsequent failure mode; or there is no Design Control --
9
Very high severity ranking when a potential failure mode affects safety with warning
Hazardous with
warning9 Very high: Failure is
almost inevitable 1 in 3 9 Very Remote
Very remote chance the Design Control will detect a potential cause/mechanism and subsequent failure mode
8
System inoperable, with loss of primary function
Very high 8 High: Repeated failures 1 in 8 8 Remote
Remote chance the Design Control will detect a potential cause/mechanism and subsequent failure mode --
7
System operable, but at reduced level of performance. Customer dissatisfied
High 7 High: Repeated failures 1 in 20 7 Very Low
Very low chance the Design Control will detect a potential cause/mechanism and subsequent failure mode --
6
System operable, but missing tertiary functions.
Moderate 6 Moderate: Occasional failures 1 in 80 6 low
Low chance the Design Control will detect a potential cause/mechanism and subsequent failure mode --
5
System operable with reduce functionality. Customer experiences some level of dissatisfaction.
Low 5 Moderate: Occasional failures 1 in 400 5 Moderate
Moderate chance the Design Control will detect a potential cause/mechanism and subsequent failure mode
4
System aesthetics / weight does not conform, defect noticed by most customers
Very low 4 Moderate: Occasional failures
1 in 2,000 4 Moderatel
y High
Moderately high chance the Design Control will detect a potential cause/mechanism and subsequent failure mode
3
System aesthetics / weight does not conform,
Minor 3 Low: Relatively few failures
1 in 15,000 3 High
High chance the Design Control will detect a potential cause/mechanism
8 | P a g e
defect noticed by average customer
and subsequent failure mode
2
System aesthetics / weight does not conform, defect noticed by discriminating customer
Very minor 2 Low: Relatively few
failures1 in
150,000 2 Very High
Very high chance the Design Control will detect a potential cause/mechanism and subsequent failure mode
1 No Effect None 1 Remote: Failure is unlikely
< 1 in 1,500,00
01 Almost
Certain
Design Controls will almost certainly detect a potential cause/mechanism and subsequent failure mode
Table 7: Standard FMECA Rankings
9 | P a g e
MaintainabilityDefinition
Maintainability is defined as the ease and ability of a system to have maintenance performed. It includes consideration of methods to make sure maintenance can be done effectively, safely, at the least practical cost in the least amount of time while minimizing the expenditure of support resources without jeopardizing the mission of the system.
Technical Performance Measures
One of our customer’s requirements is that system uptime shall be greater than 99% during active mission time. This allows a maximum of approximately 5 minutes during an eight hour mission. Therefore, our design for maintainability must ensure that if a failure occurs that it can be restored to working condition within the five minute window and our design must allow for a minimum of maintenance periods in each mission session. While redundancy of systems can be considered to meet these maintainability goals, our system has also been designed with a mind to keeping maintenance costs and overall system cost within the customer’s budget.
The most critical measure of maintainability of our system will be Mean Corrective Maintenance Time (MCT). When a system fails, the series of steps to bring the system back into full operation is the corrective maintenance cycle. The average of all these times is the definition of MCT.
MCT=∑i
n=1
( λi)(Mct i)
∑i
n=1
λi
λi is defined as the failure rate of the ith component. Our system contains both hardware and software components and therefore the maintainability of both must be considered. As discussed by Blanchard and Fabrycky, the corrective maintenance cycle can be visualized in Figure 3.
Quickly assessing that a failure has occurred is the first step in minimizing our MCT. Because our system monitors signals from the soldiers constantly, the detection of a failure will be accomplished by the absence of data received. The software will be
10 | P a g e
Figure 3: Corrective Maintenance Cycle
developed to analyze the stream of incoming data for any cessation to indicate a fault. A more difficult to detect will be one of that causes the system to transmit incorrect data. To eliminate the risk of this fault type, an initiation of the hardware and software will be recommended at the onset of each mission session. During this initiation phase, all systems shall be tested, all recognizable gestures shall be made for proper transmission and recognition and vital statistics initially monitored.
The second major step will be to isolate the problem component for replacement or repair. The hardware components of our system consist of a control center, wireless routing components and power supplies and cabling for these installations. A failure of the computer or hard drive in the control room should be recognized immediately for troubleshooting. Loss of coverage of a portion of the covered mission area may not be recognized until soldiers are deployed into these areas. At that time, the loss of received data will indicate a failure of part of the wireless network. Maintenance personnel can then be immediately dispatched.
Because all hardware components are COTS, replacement of the identified failed item can quickly be made and the mission returned to operation. All components have been selected for easy ‘Plug and Play’ capability allowing a simple exchange of components to be all the maintenance required.
The bulk of the errors in the software developed as a part of this system are expected to be identified during the development and testing phases prior to implementation. Extensive on-site training is expected to be performed. Finally, software support will be available at all times during initial deployment. It is expected that errors discovered in the software after deployment will be critical and will have longer corrective times than the hardware switch outs.
Because our system has the possibility of several small and quick failures and a few larger and longer lead time repairs we expect it to roughly follow a log-normal distribution.
Figure 4: Repair Time Distribution
The Y axis represents the number of repairs anticipated while the X axis represents the length of the repair time for a given failure. The ‘fat’ right hand tail of the distribution graphically represents the extended downtime associated with the software or an outright computer failure in the control room.
11 | P a g e
Analysis approach
We expect to calculate the initial MCT from data supplied from our COTS vendors. During our initial field testing we will complement this data with real world results. As stated, in order to meet customer requirements, total failure time cannot exceed 5 minutes per training exercise. To meet this goal, sufficient spare batteries and other hardware components must be on hand. In addition, we will recommend a program of preventative maintenance in order to minimize in-mission failures. The generalized preventative maintenance flow sheet is shown in Figure 5.
12 | P a g e
Figure 5: Preventative Maintenance Cycle
This preventative maintenance procedure will be recommend for implementation prior to each mission each day.
Routine preventative and general maintenance tasks will be the responsibility of the trainers and military staff. However, during initial roll-out of the system, the team will be on-site to conduct training on the steps to be taken. A preliminary version of maintenance tasks to be completed is listed below in Table 8.
Description of Task Frequency Responsible PartyRemoval of Batteries for Charging Daily / Post Mission Training Support StaffSystem power-up Daily / Post Mission TrainerSystem Capability Test Daily / Pre-Mission TrainerData Back-up Weekly TrainerComponent Physical Inspection Quarterly Training Support Staff
Table 8: Typical Maintenenace Tasks
Test and evaluation plans
Evaluation of our estimates for MCT will not be possible until the system is deployed in the field and mission results can be analyzed. However during our initial testing we should be able to arrive at a reasonably accurate Mean Corrective Time and use this to determine expected availability for the customer.
13 | P a g e
AvailabilityDefinition
The probability that a system, when used under stated conditions in an ideal support environment (i.e., readily available tools, spares, maintenance, personnel, etc.), will operated satisfactorily at any point in time as required.
Availability may be expressed in three ways.
1. Inherent Availability (Ai) – excludes preventative or scheduled maintenance, logistics delay, and administrative delay.
Ai=MTBF
MTBF+MTTRMTTR=Mct= mean corrective maintenance timeMTBF=mean time between failure
2. Achieved Availability – includes preventative (scheduled) maintenance.
Aa=
MTBMMTBM+M
M= mean active maintenance timeMTBM= mean time between maintenance
3. Operational Availability –includes “everything”.
Ao=MTBM
MTBM +MDTMDT=mean maintenance downtime
For the component choice we have will need to obtain the MTTR and MTBF figures from the vendors. COTS vendors for the laptop and router components have do not offer the information on their website. As this type of information is generally commercially sensitive and requires a signed non-disclosure agreement before the vendor is willing to supply these numbers.
Technical Performance Measures
The hardware used in this system will be COTS products. Manufacturing lead time may impact the mean downtime due to a spares outage. Estimates for the various availabilities can be calculated using figures from the Maintainability analysis and vendor supplied data.
14 | P a g e
Analysis approach
An initial estimate of the various availabilities will be created once the various input measures are available using the formulas in the definition section. These estimates will be updated with actual data as it becomes available. Actual values for availability measures can only be calculated after a sustained period of operations provide actual measures for MTBF, MTTR, MTBM, M, and MDT.
Test and evaluation plans
The calculated values for availability will be tested by simulating failures and testing maintenance personnel’s ability to diagnose and correct those failures.
15 | P a g e
AffordabilityDefinition
Affordability is defined as the total lifecycle cost of our system and includes the costs associated with development, productionization, support, and eventual disposal.
Technical Performance Measures
The system has been designed with a goal of keeping overall costs at or below $10,000. Overall affordability of the project will be based upon keeping the total lifecycle costs of the system under the budgeted cost. In this case, the lifecycle of the system is defined to be the manufacturer warranty of the COTS equipment.
Analysis Approach
Due to the COTS nature of the chosen equipment, it can be assumed that the costs of research and development will be nominal and therefore do not contribute to the overall system lifecycle costs. That leaves costs associated with production, support, and disposal as the primary drivers for this system. A top down Cost Breakdown Structure for the system is depicted below in Figure 6.
As stated, the system has been designed with a goal of keeping overall costs at or below $10,000. Using the numbers presented in the CDR, a rolled up cost per sub-element is given in Table 9 below. Note that
16 | P a g e
Maintenance Operation
Hardware Software Labor Testing
Recycle Disposal
Disposal Costs
Assumed $0
Support CostsProduction
CostsDevelopment
Costs
Total System Cost
Figure 6: Top Level Cost Breakdown Structure
these costs do not consider any considerations for potential cost growths, which was assumed to be 10% in order to cover unanticipated modifications. Because many of the elements of cost remain initial estimates, there is a significant risk of cost increases due to engineering changes and unforeseen issues.
Phase Subelement CostDevelopment $0Production $7,267Support $450Disposal $150TOTAL $7,842
Table 9: Rolled-Up Sub-Element Costs
This clearly shows that the bulk of overall system costs are incurred in the production phase. A broken out cost structure for productions is shown in Table 10.
Production Phase Individual CostHardware $3,817Software $1,200Testing $500Training $500Miscellaneous $1,250Production Total $7,267
Table 10: Production Cost Breakdown
Maintenance personnel will be required to provide periodic support for the maintainability of the system. While this is not a nominal expense, it assumed that these personnel will not be funded out of the budget for this system.
Test and Evaluation Plans
Expenditures will be tracked over the lifecycle of the project. Because the bulk of project expenditures are expected in the production phase, we should know if the system will be within budget prior to entering the support and disposal phases. As such, assumptions for support and disposal costs will be used when determining the affordability of the system.
17 | P a g e
SupportabilityDefinition
Supportability refers to the inherent characteristics of design and installation that enable the effective and efficient maintenance and support of the system throughout its planned life cycle.
Technical Performance Measures
Using the maintainability and reliability information for each of our systems components, we will calculate the probability of success with spares available for each element in the system configuration.
That probability is calculated using the formula
P(X t≤k )=∑k=0
k
P(X t=k )
¿∑k=0
k (nλt )k e−nλt
k ! where
λ= failure rate per time unitn=number of systemst=time units.
Once the result of that is determined, we will need to calculate the number of spare parts required to be kept on hand so the probability that a spare part is available is at an acceptable level to meet our system
availability. The probability is calculated using the formula P=∑
n=0
n=s [ (R )[−ln R ]n
n! ]where
P = probability of having a spare of a particular item available when requiredS = number of spare parts carried in stock
R = composite reliability, R=e−Kλt
K = quantity of parts used of a particular typeln R = natural logarithm of R.
Analysis approach
An initial estimate for supportability will be created once the various input measures are available using the formulas in above. These estimates will be updated with actual reliability data as it becomes available
18 | P a g e
Test and evaluation plans
Supportability will be tested in the following ways:
1. Using vendor reliability data and comparing it to actual usage over a short period;2. By performing a maintainability demonstration;3. Evaluating support personnel;4. Evaluating maintenance procedures; and5. Evaluating vendor and administrative lead times.
19 | P a g e
DisposabilityDefinition
The concept of disposability is concerned with the termination /elimination of a system after completing its life-cycle. It is an important design dependent parameter in product development. The system or product after serving its life cycle is exposed to the possibility of being completely terminated or can be recycled depending upon the utilization.
Technical Performance Measures
Green Product: should be such that at the end of its useful life passes through disassembly and other reclamation processes to reuse non-hazardous and renewable materials.
Clean processes: Process of development or building the system should minimize the use of natural resources, minimize generation of wastes, and minimize usage of power.
Eco factory: Physical location where the device or the system is developed or manufactures. It focuses on implementing environmental conscious approach (ECDM).
Analysis approach
It is a system life cycle approach that aims at maintaining an effective and sustainable environment. The disposability function can be achieved either by eliminating the entire system/product or by reusing/recycling parts of the system which has some capabilities. The advantage of recycling is that it results in reduced disposal costs and increasing total product value.
The disposability of our system will be carried out in the following way:
The system is divided into obsolete components (not technically feasible), phased out components, and non-repairable failed components.
All of these components are evaluated and based on the classification above, we determine if we have to either recycle the components or dispose completely.
The components are then put in categories depending on their reusability. Components which are not usable are disposed or recycled.
Components that are reusable are again used in the system. The components which are partially reusable or reusable after modification are then evaluated again and a decision is made to dispose the ones that require a lot of modification.
Finally the disposed components are checked for environmental impacts and are exterminated.
20 | P a g e
Test and evaluation plans
Recycling: Is a major factor in disposability and we have requirements addressing a critical percentage of components or materials which have to be recycled. Batteries are rechargeable and will work for 5 days a week for 45 weeks and will be recycled.
Demanufacturing: Is the disassembly and recycling of obsolete products. The goal is to remove and recycle every component used in our system somehow.
Recycling during production: Recycle the waste that is produced when the system is made or built as far as possible.
21 | P a g e