Homework : Design for Operational Feasibility …web.mst.edu/~cornss/se368/Blanchard/HW - Ilities - Gro… · Web viewDuring our initial field testing we will complement this data

Homework : Design for Operational Feasibility :Wireless Immersive Training Vest Monitoring System

Prepared by SYSENG 368 group 5:Chris Blanchard – [email protected] Caunt – [email protected]

Michael Donnerstein – [email protected] Neuman – [email protected] Ramachandran – [email protected]

Submitted: October 10th, 2012

mailto:[email protected]





SafetyDefinition

According to CMMI +SAFE V1.2 (TECHNICAL NOTE CMU/SEI-2007-TN-006 Carnegie Mellon March 2007), Safety can be defined as “An acceptable level of risk. Absolute safety (i.e., zero risk) is not generally achievable. Therefore, we define safety in terms of the level of risk that is deemed acceptable.”

According to MIL-STD-882E, Safety can be defined as “Freedom from conditions that can cause death, injury, occupational illness, damage to or loss of equipment or property, or damage to the environment.”

For the purposes of this report, safety will be defined as the expectation that the system, under defined conditions, does not increase the risk of death, injury, occupational illness, damage to or loss to property, loss of system availability, or damage to the environment above an acceptable level.

Technical Performance Measures

We will undertake a series of safety and hazard analyses throughout the system lifecycle. These will be in accordance with an agreed standard with the customer and will need to address customer specific safety articles, university safety guidelines, and any relevant government safety regulation. The analysis will need to be reviewed by safety subject matter expert or experts to ensure that all required aspects of safety and hazards have been addressed.

The safety analyses will include a hazard list that is maintained throughout the lifecycle and include system and subsystem hazards, a maintained hazard analysis that determines casual links and mitigations at both the system and subsystem levels, analysis of all changes to the system to ensure that system safety is not compromised, and investigations into all mishaps and near misses. Reporting and investigations of near misses is particularly important as they can be used to prevent mishaps in the future.

Analysis approach

The safety and hazard analyses will use a risk based approach. The level of risk will be assessed using a Risk Assessment Matrix (see Table 3 below). The matrix is generated by using the severity of a potential mishap against the probability it will occur. The tables below are taken from MIL-STD-882E.

1 | P a g e

SEVERITY CATEGORIESDescription Severity

CategoryMishap Result Criteria

Catastrophic 1 Could result in one or more of the following: death, permanent total disability, irreversible significant environmental impact, or monetary loss equal to or exceeding $10M.

Critical 2 Could result in one or more of the following: permanent partial disability, injuries or occupational illness that may result in hospitalization of at least three personnel, reversible significant environmental impact, or monetary loss equal to or exceeding $1M but less than $10M.

Marginal 3 Could result in one or more of the following: injury or occupational illness resulting in one or more lost work day(s), reversible moderate environmental impact, or monetary loss equal to or exceeding $100K but less than $1M.

Negligible 4 Could result in one or more of the following: injury or occupational illness not resulting in a lost work day, minimal environmental impact, or monetary loss less than $100K.

Table 1: Severity Categories

PROBABILITY LEVELSDescription Level Specific Individual Item Fleet or InventoryFrequent A Likely to occur often in the life of an item. Continuously experienced.Probable B Will occur several times in the life of an item. Will occur frequently.Occasional C Likely to occur sometime in the life of an item. Will occur several times.Remote D Unlikely, but possible to occur in the life of an

item. Unlikely, but can reasonably be expected to occur.

Improbable E So unlikely, it can be assumed occurrence may not be experienced in the life of an item.

Unlikely to occur, but possible.

Eliminated F Incapable of occurrence. This level is used when potential hazards are identified and later eliminated.

Incapable of occurrence. This level is used when potential hazards are identified and later eliminated.

Table 2: Probability Levels

Using the Severity from Table I and the Probability Level from Table 2, the following risk assessment matrix is constructed in MIL-STD-882E.

2 | P a g e

RISK ASSESSMENT MATRIXSEVERITY->

PROBABILITYCatastrophic

(1) Critical

(2) Marginal

(3)Negligible

(4)Frequent(A) High High Serious MediumProbable (B) High High Serious MediumOccasional (C) High Serious Medium LowRemote (D) Serious Medium Medium LowImprobable (E) Medium Medium Medium LowEliminated (F) Eliminated

Table 3: Risk Assessment Matrix

Test and evaluation plans

Testing the safety of the system is done via analysis of the individual safety and hazard analyses. The acceptability of the risk will need to be agreed with the customer whilst also considering any other legal impact.

3 | P a g e

Reliability

Definition

Reliability is the measure of the system performing satisfactorily under a given duty cycle for a given time period. A system is considered reliable when it is able to meet the given duty cycle without interruption by the failure to operate satisfactorily. Reliability evaluations are a main component of evaluating the successful operation of a system, and are therefore critical in satisfying the customer needs.


Trainer NotificationData TransmissionComponent OperationData Recording FidelityUser Alert SystemSystem Duty CycleTransmission Distance

Analysis Approach

Figure 1, below is a diagram of the components and their interfaces. The representation below is important for the remainder of the reliability section since it allows easy system visualization, and shows the interfaces (where problems tend to exist). When visualizing the interfaces it makes the completion of a FMECA easier and possible issues harder to miss.

Figure 1: Component Interfaces

4 | P a g e

System Life Cycle

A life cycle / duty cycle is needed to calculate or assume reliability for any system this is demonstrated in the following equation, where random variable t is a density function of f(t):

R ( t )=∫t

∞

f ( t )dt

The assumed life cycle is shown below, it has the use divided by component and normalized to a duty cycle of three years.

Component AssumptionsDaily Duty

Cycle Life Duty CycleMote System In

VestAlways Operating

During Training 8 Hours 6240 Hours

Software Always Operating During Training 8 Hours 6240 Hours

Meshlium Converter

Always Operating During Training 8 Hours 6240 Hours

Recording System On 5% of the time during training 0.4 Hours 312 Hours

Tactor Relay On 2% of the time during training 0.16 Hours 125 Hours

Note: Assumes memory handles quickly incoming data as with any computer,assumes a faux pas by one user will not exceed once per minute on average.

Table 4: Life Cycles

Reliability Predictions

First it is important to note that at this stage it would be inappropriate to try to attach a reliability number to the components, the reliability information will become apparent as a result of testing, although required system reliability is a known value. The only measure of reliability that we will be using is a comparison of the MTBF to assumed duty cycle. The reason that MTBM is not utilized is that all components are COTs and easily replaced.

Component Life Duty Cycle MTBFa) Mote System In Vest 6240 Hours 390000 Hoursb) Software 6240 Hours N/A Dependent on Complexityc) Meshium Converter 6240 Hours 390000 Hoursd) Recording System 312 Hours 1.2 x 10^6 Hourse) Tactor Relay 125 Hours 50000 Hours

Table 5: Estimated MTBF

5 | P a g e

Sources:a) Similar to router (component c)b) N/Ac) http://www.cisco.com/en/US/prod/collateral/wireless/ps5678/ps10092/datasheet_c78-

502793.htmld) http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701176.pdfe) http://www.radio-electronics.com/info/data/semicond/leds-light-emitting-diodes/lifespan-

lifetime-expectancy-mtbf.php

Worst Case Stack Up

The reliability for the system was set at 99% for the duty cycle. If we assume that the reliability of each component is equal that means that the reliability would have to be .99^5, or 99.8 percent. As a result that value would have to be met for all components of the system, this can be done in various ways and will be explained in the reliability acceptance testing section.

Test and Evaluation Plans

Reliability Acceptance Testing

Reliability testing would be preferred on the components of the system; there are two main ways that this can be accomplished. For the purposes of this section the term “life” or “lives” shall be defined as the amount of time and severity where the component must function satisfactorily. The first method of testing is known as success based testing, in this method numerous lives are run in order to prove reliability. Success based testing when looking for reliability is generally done by taking many samples and running them to a small number of lives, or doing the inverse and taking few components to a high number of lives. The testing type chosen will depend on available samples, test time, and prototype expense. The second reliability test is one where the components are tested to failure, or accumulated failures. This is generally done with a given sample size and the components are run till they are no longer performing satisfactorily. Failure points can then be graphed and analyzed, a Weibull analysis is typically done, an example of this is displayed below. Reliability on the y-axis can then be compared to the lives, hours or cycles on the x-axis. Failure based testing is usually preferred when time is allowed because it enables the Weibull plot to be compared against other duty cycles and allows for characterization of a known design.

6 | P a g e

http://www.cisco.com/en/US/prod/collateral/wireless/ps5678/ps10092/datasheet_c78-502793.html

http://www.cisco.com/en/US/prod/collateral/wireless/ps5678/ps10092/datasheet_c78-502793.html

Figure 2: Weibull Probability

A FMECA example that reflects reliability of the system evaluation is shown below, this must include all functions and account for all failure mode. Completing a FMECA will help ensure that all issues possible are considered and accounted for, this is done by using the three columns below, severity (SEV), occurrences (OCC), and detection (DET). Every potential failure mode should be accounted for and rated in the previously mentioned columns. If the RPN is above 40 a design change or corrective action must be implemented to lessen the occurrence or increase detection. Once all items are below 40 the design is approved. The scales for the ratings are shown in the next section.

Item / Function

of the Part

Potential Failure Mode

(Loss of Function or value

to customer

)

Potential Effect(s) of Failure

SEV

Potential Cause(s)

/ Mechanism(s

) of Failure

OCC

Current Design

Controls Detection

Analytical or physical

validation method

planned or completed

DET

PREVRPN

Data Gathering on User

Lost Data

Training exercise

cannot be graded

appropriately

7

Distance from

Meshlium Converter

1

Evaluating distance

requirements are met by

design

3

Confirmation of training

area, to make sure users are

prevented from leaving the training

area

21

7 | P a g e

Table 6: FMECA Example

STANDARD FMECA RISK RANKINGS

Severity Occurrence

Detection

Rating Criteria Effect Ratin

g Probability of FailurePossible failure Rates

Rating Detection

Criteria: Likelihood of Detection

by Design Control

10

Very high severity ranking when a potential failure mode affects safety without warning

Hazardous without warning

10 Very high: Failure is almost inevitable > 1 in 2 10

Absolute Uncertaint

y

Design Control will not and/or cannot detect a potential cause/mechanism and subsequent failure mode; or there is no Design Control --

9

Very high severity ranking when a potential failure mode affects safety with warning

Hazardous with

warning9 Very high: Failure is

almost inevitable 1 in 3 9 Very Remote

Very remote chance the Design Control will detect a potential cause/mechanism and subsequent failure mode

8

System inoperable, with loss of primary function

Very high 8 High: Repeated failures 1 in 8 8 Remote

Remote chance the Design Control will detect a potential cause/mechanism and subsequent failure mode --

7

System operable, but at reduced level of performance. Customer dissatisfied

High 7 High: Repeated failures 1 in 20 7 Very Low

Very low chance the Design Control will detect a potential cause/mechanism and subsequent failure mode --

6

System operable, but missing tertiary functions.

Moderate 6 Moderate: Occasional failures 1 in 80 6 low

Low chance the Design Control will detect a potential cause/mechanism and subsequent failure mode --

5

System operable with reduce functionality. Customer experiences some level of dissatisfaction.

Low 5 Moderate: Occasional failures 1 in 400 5 Moderate

Moderate chance the Design Control will detect a potential cause/mechanism and subsequent failure mode

4

System aesthetics / weight does not conform, defect noticed by most customers

Very low 4 Moderate: Occasional failures

1 in 2,000 4 Moderatel

y High

Moderately high chance the Design Control will detect a potential cause/mechanism and subsequent failure mode

3

System aesthetics / weight does not conform,

Minor 3 Low: Relatively few failures

1 in 15,000 3 High

High chance the Design Control will detect a potential cause/mechanism

8 | P a g e

defect noticed by average customer

and subsequent failure mode

2

System aesthetics / weight does not conform, defect noticed by discriminating customer

Very minor 2 Low: Relatively few

failures1 in

150,000 2 Very High

Very high chance the Design Control will detect a potential cause/mechanism and subsequent failure mode

1 No Effect None 1 Remote: Failure is unlikely

< 1 in 1,500,00

01 Almost

Certain

Design Controls will almost certainly detect a potential cause/mechanism and subsequent failure mode

Table 7: Standard FMECA Rankings

9 | P a g e

MaintainabilityDefinition

Maintainability is defined as the ease and ability of a system to have maintenance performed. It includes consideration of methods to make sure maintenance can be done effectively, safely, at the least practical cost in the least amount of time while minimizing the expenditure of support resources without jeopardizing the mission of the system.


One of our customer’s requirements is that system uptime shall be greater than 99% during active mission time. This allows a maximum of approximately 5 minutes during an eight hour mission. Therefore, our design for maintainability must ensure that if a failure occurs that it can be restored to working condition within the five minute window and our design must allow for a minimum of maintenance periods in each mission session. While redundancy of systems can be considered to meet these maintainability goals, our system has also been designed with a mind to keeping maintenance costs and overall system cost within the customer’s budget.

The most critical measure of maintainability of our system will be Mean Corrective Maintenance Time (MCT). When a system fails, the series of steps to bring the system back into full operation is the corrective maintenance cycle. The average of all these times is the definition of MCT.

MCT=∑i

n=1

( λi)(Mct i)

∑i

n=1

λi

λi is defined as the failure rate of the ith component. Our system contains both hardware and software components and therefore the maintainability of both must be considered. As discussed by Blanchard and Fabrycky, the corrective maintenance cycle can be visualized in Figure 3.

Quickly assessing that a failure has occurred is the first step in minimizing our MCT. Because our system monitors signals from the soldiers constantly, the detection of a failure will be accomplished by the absence of data received. The software will be

10 | P a g e

Figure 3: Corrective Maintenance Cycle

developed to analyze the stream of incoming data for any cessation to indicate a fault. A more difficult to detect will be one of that causes the system to transmit incorrect data. To eliminate the risk of this fault type, an initiation of the hardware and software will be recommended at the onset of each mission session. During this initiation phase, all systems shall be tested, all recognizable gestures shall be made for proper transmission and recognition and vital statistics initially monitored.

The second major step will be to isolate the problem component for replacement or repair. The hardware components of our system consist of a control center, wireless routing components and power supplies and cabling for these installations. A failure of the computer or hard drive in the control room should be recognized immediately for troubleshooting. Loss of coverage of a portion of the covered mission area may not be recognized until soldiers are deployed into these areas. At that time, the loss of received data will indicate a failure of part of the wireless network. Maintenance personnel can then be immediately dispatched.

Because all hardware components are COTS, replacement of the identified failed item can quickly be made and the mission returned to operation. All components have been selected for easy ‘Plug and Play’ capability allowing a simple exchange of components to be all the maintenance required.

The bulk of the errors in the software developed as a part of this system are expected to be identified during the development and testing phases prior to implementation. Extensive on-site training is expected to be performed. Finally, software support will be available at all times during initial deployment. It is expected that errors discovered in the software after deployment will be critical and will have longer corrective times than the hardware switch outs.

Because our system has the possibility of several small and quick failures and a few larger and longer lead time repairs we expect it to roughly follow a log-normal distribution.

Figure 4: Repair Time Distribution

The Y axis represents the number of repairs anticipated while the X axis represents the length of the repair time for a given failure. The ‘fat’ right hand tail of the distribution graphically represents the extended downtime associated with the software or an outright computer failure in the control room.

11 | P a g e

Analysis approach

We expect to calculate the initial MCT from data supplied from our COTS vendors. During our initial field testing we will complement this data with real world results. As stated, in order to meet customer requirements, total failure time cannot exceed 5 minutes per training exercise. To meet this goal, sufficient spare batteries and other hardware components must be on hand. In addition, we will recommend a program of preventative maintenance in order to minimize in-mission failures. The generalized preventative maintenance flow sheet is shown in Figure 5.

12 | P a g e

Figure 5: Preventative Maintenance Cycle

This preventative maintenance procedure will be recommend for implementation prior to each mission each day.

Routine preventative and general maintenance tasks will be the responsibility of the trainers and military staff. However, during initial roll-out of the system, the team will be on-site to conduct training on the steps to be taken. A preliminary version of maintenance tasks to be completed is listed below in Table 8.

Description of Task Frequency Responsible PartyRemoval of Batteries for Charging Daily / Post Mission Training Support StaffSystem power-up Daily / Post Mission TrainerSystem Capability Test Daily / Pre-Mission TrainerData Back-up Weekly TrainerComponent Physical Inspection Quarterly Training Support Staff

Table 8: Typical Maintenenace Tasks


Evaluation of our estimates for MCT will not be possible until the system is deployed in the field and mission results can be analyzed. However during our initial testing we should be able to arrive at a reasonably accurate Mean Corrective Time and use this to determine expected availability for the customer.

13 | P a g e

AvailabilityDefinition

The probability that a system, when used under stated conditions in an ideal support environment (i.e., readily available tools, spares, maintenance, personnel, etc.), will operated satisfactorily at any point in time as required.

Availability may be expressed in three ways.

1. Inherent Availability (Ai) – excludes preventative or scheduled maintenance, logistics delay, and administrative delay.

Ai=MTBF

MTBF+MTTRMTTR=Mct= mean corrective maintenance timeMTBF=mean time between failure

2. Achieved Availability – includes preventative (scheduled) maintenance.

Aa=

MTBMMTBM+M

M= mean active maintenance timeMTBM= mean time between maintenance

3. Operational Availability –includes “everything”.

Ao=MTBM

MTBM +MDTMDT=mean maintenance downtime

For the component choice we have will need to obtain the MTTR and MTBF figures from the vendors. COTS vendors for the laptop and router components have do not offer the information on their website. As this type of information is generally commercially sensitive and requires a signed non-disclosure agreement before the vendor is willing to supply these numbers.


The hardware used in this system will be COTS products. Manufacturing lead time may impact the mean downtime due to a spares outage. Estimates for the various availabilities can be calculated using figures from the Maintainability analysis and vendor supplied data.

14 | P a g e

Analysis approach

An initial estimate of the various availabilities will be created once the various input measures are available using the formulas in the definition section. These estimates will be updated with actual data as it becomes available. Actual values for availability measures can only be calculated after a sustained period of operations provide actual measures for MTBF, MTTR, MTBM, M, and MDT.


The calculated values for availability will be tested by simulating failures and testing maintenance personnel’s ability to diagnose and correct those failures.

15 | P a g e

AffordabilityDefinition

Affordability is defined as the total lifecycle cost of our system and includes the costs associated with development, productionization, support, and eventual disposal.


The system has been designed with a goal of keeping overall costs at or below $10,000. Overall affordability of the project will be based upon keeping the total lifecycle costs of the system under the budgeted cost. In this case, the lifecycle of the system is defined to be the manufacturer warranty of the COTS equipment.

Analysis Approach

Due to the COTS nature of the chosen equipment, it can be assumed that the costs of research and development will be nominal and therefore do not contribute to the overall system lifecycle costs. That leaves costs associated with production, support, and disposal as the primary drivers for this system. A top down Cost Breakdown Structure for the system is depicted below in Figure 6.

As stated, the system has been designed with a goal of keeping overall costs at or below $10,000. Using the numbers presented in the CDR, a rolled up cost per sub-element is given in Table 9 below. Note that

16 | P a g e

Maintenance Operation

Hardware Software Labor Testing

Recycle Disposal

Disposal Costs

Assumed $0

Support CostsProduction

CostsDevelopment

Costs

Total System Cost

Figure 6: Top Level Cost Breakdown Structure

these costs do not consider any considerations for potential cost growths, which was assumed to be 10% in order to cover unanticipated modifications. Because many of the elements of cost remain initial estimates, there is a significant risk of cost increases due to engineering changes and unforeseen issues.

Phase Subelement CostDevelopment $0Production $7,267Support $450Disposal $150TOTAL $7,842

Table 9: Rolled-Up Sub-Element Costs

This clearly shows that the bulk of overall system costs are incurred in the production phase. A broken out cost structure for productions is shown in Table 10.

Production Phase Individual CostHardware $3,817Software $1,200Testing $500Training $500Miscellaneous $1,250Production Total $7,267

Table 10: Production Cost Breakdown

Maintenance personnel will be required to provide periodic support for the maintainability of the system. While this is not a nominal expense, it assumed that these personnel will not be funded out of the budget for this system.

Test and Evaluation Plans

Expenditures will be tracked over the lifecycle of the project. Because the bulk of project expenditures are expected in the production phase, we should know if the system will be within budget prior to entering the support and disposal phases. As such, assumptions for support and disposal costs will be used when determining the affordability of the system.

17 | P a g e

SupportabilityDefinition

Supportability refers to the inherent characteristics of design and installation that enable the effective and efficient maintenance and support of the system throughout its planned life cycle.


Using the maintainability and reliability information for each of our systems components, we will calculate the probability of success with spares available for each element in the system configuration.

That probability is calculated using the formula

P(X t≤k )=∑k=0

k

P(X t=k )

¿∑k=0

k (nλt )k e−nλt

k ! where

λ= failure rate per time unitn=number of systemst=time units.

Once the result of that is determined, we will need to calculate the number of spare parts required to be kept on hand so the probability that a spare part is available is at an acceptable level to meet our system

availability. The probability is calculated using the formula P=∑

n=0

n=s [ (R )[−ln R ]n

n! ]where

P = probability of having a spare of a particular item available when requiredS = number of spare parts carried in stock

R = composite reliability, R=e−Kλt

K = quantity of parts used of a particular typeln R = natural logarithm of R.

Analysis approach

An initial estimate for supportability will be created once the various input measures are available using the formulas in above. These estimates will be updated with actual reliability data as it becomes available

18 | P a g e


Supportability will be tested in the following ways:

1. Using vendor reliability data and comparing it to actual usage over a short period;2. By performing a maintainability demonstration;3. Evaluating support personnel;4. Evaluating maintenance procedures; and5. Evaluating vendor and administrative lead times.

19 | P a g e

DisposabilityDefinition

The concept of disposability is concerned with the termination /elimination of a system after completing its life-cycle. It is an important design dependent parameter in product development. The system or product after serving its life cycle is exposed to the possibility of being completely terminated or can be recycled depending upon the utilization.


Green Product: should be such that at the end of its useful life passes through disassembly and other reclamation processes to reuse non-hazardous and renewable materials.

Clean processes: Process of development or building the system should minimize the use of natural resources, minimize generation of wastes, and minimize usage of power.

Eco factory: Physical location where the device or the system is developed or manufactures. It focuses on implementing environmental conscious approach (ECDM).

Analysis approach

It is a system life cycle approach that aims at maintaining an effective and sustainable environment. The disposability function can be achieved either by eliminating the entire system/product or by reusing/recycling parts of the system which has some capabilities. The advantage of recycling is that it results in reduced disposal costs and increasing total product value.

The disposability of our system will be carried out in the following way:

The system is divided into obsolete components (not technically feasible), phased out components, and non-repairable failed components.

All of these components are evaluated and based on the classification above, we determine if we have to either recycle the components or dispose completely.

The components are then put in categories depending on their reusability. Components which are not usable are disposed or recycled.

Components that are reusable are again used in the system. The components which are partially reusable or reusable after modification are then evaluated again and a decision is made to dispose the ones that require a lot of modification.

Finally the disposed components are checked for environmental impacts and are exterminated.

20 | P a g e


Recycling: Is a major factor in disposability and we have requirements addressing a critical percentage of components or materials which have to be recycled. Batteries are rechargeable and will work for 5 days a week for 45 weeks and will be recycled.

Demanufacturing: Is the disassembly and recycling of obsolete products. The goal is to remove and recycle every component used in our system somehow.

Recycling during production: Recycle the waste that is produced when the system is made or built as far as possible.

21 | P a g e

Documents

Homework : Design for Operational Feasibility …web.mst.edu/~cornss/se368/Blanchard/HW - Ilities - Gro… · Web viewDuring our initial field testing we will complement this data