56
© 2016 PT&C Forensic Consulting Services, P.A. Expert Tips for Investigating IT Equipment Failures Tom Bonse Project Manager Jared Fegan Project Consultant

Expert Tips for Investigating IT Equipment Failures - … Expert Tips...EnCE-certified Computer Forensics Experts ... hard disk drives and presents them to the computer as ... - Solid

  • Upload
    buihanh

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

© 2016 PT&C Forensic Consulting Services, P.A.

Expert Tips for Investigating IT Equipment Failures

Tom BonseProject Manager

Jared FeganProject Consultant

© 2016 PT&C Forensic Consulting Services, P.A.

Our Background in Technology

• Electrical Engineers• Mechanical Engineers• Computer/Network Hardware Experts• Software Engineers• Network Security Experts• EnCE-certified Computer Forensics Experts• Certified Ethical Hackers• IPC certified application specialists• Various Microsoft certifications• Various other certifications

© 2016 PT&C Forensic Consulting Services, P.A.

The Course Agenda• Starting the claims investigation• What are common types of IT equipment?

• Components of this equipment

• Common Perils• Evaluation and testing of the equipment• Hard Disk Drives and common failure modes• What is equipment restoration?

• Science and feasibility• Techniques used

• Case studies• Q&A - Conclusion

© 2016 PT&C Forensic Consulting Services, P.A.

What Makes IT Equipment Losses Unique?

• Increased Business Interruption/Extra Expense

• Multiple Stakeholders• Warranty Issues• Certification Issues• Privacy Issues• Different Manufacturer

Philosophies

© 2016 PT&C Forensic Consulting Services, P.A.

Starting the claim process

• Identifying the equipment• Understanding the claimed event• Determining if this event is plausible• Supporting OR differing data regarding the

events of the loss• Examination of the equipment claimed• Determination of the cause of the claimed

event.

© 2016 PT&C Forensic Consulting Services, P.A.

Approaching A Server Loss

• Types of Server and Storage Equipment

Rack Mount Server

Tower ServerNAS / SAN / File Server

Storage Array

© 2016 PT&C Forensic Consulting Services, P.A.

Approaching a Point of Sale (PoS) Loss

• Sales Terminals• Peripherals –

Printers/Scanners etc.

• Backend server• Software

© 2016 PT&C Forensic Consulting Services, P.A.

Approaching a Dental/Medical Loss

• Equipment upgrades • Software Capability

© 2016 PT&C Forensic Consulting Services, P.A.

Interior of a Rack Mount Server

Power Supply

RedundantArrayOf

IndependentDisksController

Processors

Motherboard

© 2016 PT&C Forensic Consulting Services, P.A.

Interior of a Tower Server

Processor

Power Supply

RedundantArrayOf

IndependentDisksController

SCSI BackplaneMotherboard

© 2016 PT&C Forensic Consulting Services, P.A.

Servers within servers?• Software based servers that reside on the same physical piece

of equipment.– Referred to as Virtual Machine or as a Virtual Server.A virtual machine (VM) is an isolated software container that can run its own operating systems (OS) and applications as if it were a physical computer. A virtual machine behaves exactly like a physical computer and contains it own virtual central processing unit (CPU), random access memory (RAM), virtual hard disk drives (HDD) and a network interface card (NIC). The VM shares the resources (RAM, CPU, storage space) of the physical server based on the configuration when the VM is created.

– No real limit to number that can be on same equipment (limited by resources).

– Shared resources (processing, memory etc.)– Appears to the equipment as if it is located on its own server.

© 2016 PT&C Forensic Consulting Services, P.A.

Four VMs on a single server

Why is asking if VM’s are present important?

© 2016 PT&C Forensic Consulting Services, P.A.

What is RAID?• Redundant Array of Independent Disks

OR• Redundant Array of Inexpensive Disks

RAID = RAID is a technology that employs the simultaneous use of two (2) or more hard disk drives to achieve greater levels of performance, reliability or redundancy, and/or larger data volume sizes.

© 2016 PT&C Forensic Consulting Services, P.A.

What is a RAID Controller?

• A RAID controller is a device which manages the physical hard disk drives and presents them to the computer as logical units. It almost always implements hardware RAID and often provides additional disk cache.

• Many RAID Controllers require uninterrupted power for the onboard memory. This is due to all of the configuration settings being stored on the random access memory (RAM).

• Not to be confused with a Host Bus Adapter (HBA) which has no RAID configuration abilities.

© 2016 PT&C Forensic Consulting Services, P.A.

Typical Types of RAID Arrays

• RAID 1RAID1, also known as a mirror, consists of a maximum of two (2) Hard Disk Drives (HDDs) that write the same data simultaneously. This provides fault tolerance from hard disk drive errors and/or failures that can be experienced during normal operation. This fault tolerance, in the event that no more than one (1) of the hard disk drive fails, will allow for the server to provide uninterrupted service by operating on the remaining hard disk drive.

• RAID 5A RAID 5 configuration requires a minimum of three (3) HDDs that write data across all the HDDs simultaneously. This provides fault tolerance in the event that no more than one (1) HDD experiences errors and/or failures during normal operation. In the event that no more than one (1) of the HDDs fails, the server will continue to provide uninterrupted service by operating on the remaining HDDs. While in operation, the RAID configuration will be identified as a degraded state. If a second HDD goes offline, the RAID5 will exceed the built in redundancy

© 2016 PT&C Forensic Consulting Services, P.A.

Graphical Representation

© 2016 PT&C Forensic Consulting Services, P.A.

Failures Experienced Lightning Damage Power Surge Loss of Power to surrounding area, or Battery Backup (UPS) issues Cooling/Environmental Control System failure Equipment in high contamination area or exposed to contaminates Natural Disaster – Tornado, flooding, etc. Vandalism Hard Disk Drive Failures RAID Controller Failure Software Corruption Application Errors Human Errors

© 2016 PT&C Forensic Consulting Services, P.A.

Lightning

$686 billion in lightning losses each year29% had no lightning detected on loss dateInsurance Industry is left to determine the cause of the damage

© 2016 PT&C Forensic Consulting Services, P.A.

Types of Lightning Damage

The magnetic fields created by nearby lightning strikes induces a voltage in long cables (30 feet or longer) which can damage components at the ends of the cables.

Direct StrikeStrike hits property causing visible damage

Inductive Coupling

© 2016 PT&C Forensic Consulting Services, P.A.

How Lightning Damages

Lightning requires a point of entry into a piece of equipment (power, phone, or network cables…etc.)

Lightning will follow an electrical path inside the equipment

Lightning typically causes catastrophic and instantaneous damage

© 2016 PT&C Forensic Consulting Services, P.A.

Path of Power

• Examination of equipment will identify an electrical power path into the device.

• Uninterruptible Power Supply (UPS)

• Surge Protector• Power Strip or Power

Distribution Unit (PDU)• Power Supply

© 2016 PT&C Forensic Consulting Services, P.A.

Power Surge

A large sudden increase of voltage (or current). This would cause damage to the components in the power path.

Can be caused by events as major as a short circuit in the utility equipment or as subtle as a neighboring facility turning on an air conditioning unit.

© 2016 PT&C Forensic Consulting Services, P.A.

Lightning vs. Power SurgeLightning will provide a high amount of energy for a short period of time.Power Surge will provide a lower amount of energy than lightning for a longer amount of time

ENERGY TIME

© 2016 PT&C Forensic Consulting Services, P.A.

Equipment Examination• Examination of equipment will identify an

electrical path into the device.• There may not always be viewable damage,

especially with inductive coupling events. In some cases, the failed operation of a portion of the computer may be the extent of identifying an issue.SO…..How can we test to verify the root cause of damage if we cannot turn it on?

© 2016 PT&C Forensic Consulting Services, P.A.

Visual Inspection

Visual inspection can reveal problems such as degraded / leaking capacitors or other physical damage.

Critical components are missing.

© 2016 PT&C Forensic Consulting Services, P.A.

Component Level Equipment Testing

• Break down the device into subcomponents…..Power SupplyMotherboardRAMProcessorHard Disk DriveEtc……………………

© 2016 PT&C Forensic Consulting Services, P.A.

Verification of Operation

• Each of the items can be placed into a testing system and validated for proper operation

• The testing completed is above and beyond normal everyday operation. Therefore, it will discover a failure that might not be discovered during normal system operation.

• Extended testing in a controlled environment will assist with possible latent or triggered failures.

• In many cases, a forced logical error can be configured. Therefore, the level of operation and/or response from a device can be verified by this method.

• This testing, in conjunction with the mounted components can determine the exact mode of failure.

© 2016 PT&C Forensic Consulting Services, P.A.

Gathering of operational data• SATA & IDE HDDs have Self-Monitoring Analysis and Recording

Technology (S.M.A.R.T) logs. • SCSI and SAS HDDs have P/G and log files. • Servers can have management logs, event logs• Operating Systems have various system and application logs that

can be reviewed.• Networks may have a management system called Intelligent Platform

Management Interface (IPMI). This has the ability to track, identify, and respond to events of the network and associated devices.

• Each obtained log will provide the historical operational statistics that can be reviewed for errors or issues.

© 2016 PT&C Forensic Consulting Services, P.A.

ExaminationFollowing the gathering of the logs, etc…..• Powering of the equipment• Errors presented during the boot process• Software/Application errors presented

Focus on the subsystems to identify failures and causes of these failures. Review of all available data to recreate the incident on the date of loss.

© 2016 PT&C Forensic Consulting Services, P.A.

A Hard Disk Drive• Typical HDDs in today's computer

equipment:1. Small Computer System Interface (SCSI)2. Integrated Drive Electronics (IDE) 3. Serial Advanced Technology Attachment (SATA)

- Solid State Drive (SSD)4. Serial Attached SCSI (SAS)

© 2016 PT&C Forensic Consulting Services, P.A.

The Types of Interface

SCSI

SATA SAS

IDE

© 2016 PT&C Forensic Consulting Services, P.A.

Internal Components of a HDD

Air Filter Packet

Actuator Axis or Head Stack

Magnet

Actuator

Platters

Spindle

Read/Write Heads

Actuator Arm

Preamp Chip

Ribbon Cable

© 2016 PT&C Forensic Consulting Services, P.A.

Actuator Arm

Read / Write Head

Recording Surface - Platter

Internal Components of a HDD

© 2016 PT&C Forensic Consulting Services, P.A.

Map of the Platter

(A) Track (B) Sector

(C) Block

(D) Cluster

A track is a circular path on the surface of a disk.

A sector can be thought of as a wedge-shaped area of a disk. The term sector, however, is more often used as a synonym for block.

The intersection of a track and a sector is called a block. These blocks are the smallest breakdown of a HDD. Block = 512 bytes.

A cluster is the logical amount of disk space that can be allocated to hold a file. Smallest size of a cluster is one (1) sector.

© 2016 PT&C Forensic Consulting Services, P.A.

Data Losses for Consideration

• Where was the data stored? (Storage Method)• What data was lost? (Database, images, documents etc.)• What kind of peril caused the data to be lost?

– Hardware error - component failure– Software error – software update, error in program– Environment – over temperature, natural disaster etc.– Power event – power loss corrupts data, configuration and software.– Human error – deletion of data, virtual machine, poor maintenance, incorrect

maintenance.• Professional Data Recovery Services is a means to recover data that is deemed

unrecoverable. – Typically between $800-$2,000.00 per drive for normal service.– Upcharge for faster service.– Each Virtual Machine (VM) is often treated as its own recovery which increases the

cost of data recovery.

© 2016 PT&C Forensic Consulting Services, P.A.

Reason for Drive Failures Degraded sectors…Bad sectors Head crash Intermittently failing read/write heads Damaged read/write heads Head position tracking issues Firmware (software) corruption Known issues

© 2016 PT&C Forensic Consulting Services, P.A.

Bad SectorsA bad sector is a common occurrence involving a sector of the hard disk drive (HDD) media failing either due to physical damage of the media, or a deterioration of the magnetic media on which information is stored. Because of this failure, information cannot be written to the given area of the drive media.

© 2016 PT&C Forensic Consulting Services, P.A.

Head Crash

A head crash is where a read/write head makes contact with the platter during operation. Because of the speed of rotation of the platters and the cross axial movement of the read/write heads, the event of the read/write heads touching the platter is often catastrophic. Evidence of this event is identified with concentric rings or arcs viewable on the surface of the platters within the insured’s HDD. This is where the read/write heads are actually scratching or removing the magnetic layer from the platter. In addition to this type of failure, the read/write head sustains irreparable damage due to the delicacy of the components.

© 2016 PT&C Forensic Consulting Services, P.A.

Other Examples

© 2016 PT&C Forensic Consulting Services, P.A.

Damaged Heads - Close Up

© 2016 PT&C Forensic Consulting Services, P.A.

Types of Contamination

Fire examples: flammable gas leaks such as saline, electrical, flammable solvent fires, furnace

Chemical spills: burst chemical supply line, leaking gas pipe, operator chemical spill

Water spillage: burst process cooling water, burst deionized water or ultra-pure water pipes, condensation, roof leaks

Construction dust: ceiling tile installation/removal, concrete floor install/repair

© 2016 PT&C Forensic Consulting Services, P.A.

Contamination Perspective

•0.01µm ≤ Tobacco smoke ≤ 1.00 µm

© 2016 PT&C Forensic Consulting Services, P.A.

Effects of Contamination Including Corrosion

Contamination effects include:o Cosmetic damage, odor, cosmetic change, obscuration, mechanical

binding, short circuits/arcing, thermal dissipation, increased contact resistance and especially corrosion (see below ).

o Corrosion and Corrosiveso Water and water vapor (including humidity) combine with ions to form

corrosive acids. -> Example H20 + Cl = HCl (Hydrochloric Acid)o Avoid contact with zinc, brass, galvanized iron, aluminum, copper and

copper alloys since violent reactions occuro Elevated temperatures and humidity will cause the reactivity to accelerate.

Reducing the environmental influences, can reduce surface deterioration. o Corrosive Ions in Smoke

o Sulfates - From burning wood, cardboard, paper, etc.o Nitrates - From burning nylon carpets, drapes, and certain plasticso Chlorides - From burning plastics, such as PVC and electrical wiring

© 2016 PT&C Forensic Consulting Services, P.A.

Restoration of Contaminated EquipmentFeasibility of Restoration (Is restoration a viable option? )

Equipment loss professional should look at the following:• Heat damage, arcing, corrosion, physical damage• Conditions of loss site• Circumstances surrounding the equipment• Concerns regarding business interruption or extra expense• Surface wipe sample and surface conductivity test results

Restorable - No signs of corrosion, surface conductivity tests low

Not restorable - Contamination effects caused excessive corrosion, arcing and overheating

© 2016 PT&C Forensic Consulting Services, P.A.

Science of RestorationDOE Study Vs. IPC Standard

1. DOE study threshold 20µg/in2 of aggregate chloride equivalent.More suitable for manufacturing facilities, machine

shops etc.2. IPC J-Standard threshold is 10.06µg/in2 of aggregate

sodium chloride equivalent .More suitable for data centers, medical facilities etc.

© 2016 PT&C Forensic Consulting Services, P.A.

Failure Probability

© 2016 PT&C Forensic Consulting Services, P.A.

IT equipment Special Considerations

Special considerations:• BI costs can be enormous - quick

decisive action is essential• Contamination incidents - extremely

important to accurately determine the extent of contamination (Insured normally cannot do this). Expert + analytical lab services required to support assessment.

• A very good understanding of the manufacturing technologies, chemicals and gases used, as well as cleanroom and facilities experience and understanding.

• Lateral thinking for solving problems.

© 2016 PT&C Forensic Consulting Services, P.A.

Fact or Fiction?• All manufacturers condemn equipment if

contaminated by water or smoke? FICTIONIn fact:

• Siemens Medical along with Allianz started equipment restoration in the 70’s

• Third party service companies are utilized to perform the needed repairs and reinstate their service contracts.

• All electronic circuit boards are damaged if contaminated by water. FICTION In fact:

• Many electronic circuit boards are water resistant with conformal coating

• De-ionized water is an integral part in the manufacturing of electronic circuit boards

Provided by Aqueous Technologies

© 2016 PT&C Forensic Consulting Services, P.A.

Equipment Restoration Basics

• Can be completed on pieces affected by:• Smoke and soot• Water or excessive

humidity• Construction dust• Chemical contamination

• Restoration is a viable option if:• Minimal heat damage• Minimal arcing or short

circuiting• Low levels of chloride

corrosion/oxidation

Surface conductivity – Sodium Chloride (NaCl) (24.9 µg/cm2 = 160µg/in2)

Laboratory wipe sample test results from a 24-port switch indicate Sodium Chloride equivalent > 10.06 µg/in2

© 2016 PT&C Forensic Consulting Services, P.A.

Equipment Restoration Techniques

• Dry Techniques - HEPA vacuum, agitation with brushes, low pressure deionized compressed air.

• Modified Wet - A combination of dry techniques as well as hand detail utilizing aqueous spray solutions.

• Aqueous Wet - Thorough disassembly of power supplies, control circuitry and mechanicals assemblies. Aqueous wash utilizing deionized water and cleaning solutions.

• Overnight drying – where applicable, in a heated chamber to remove moisture. In addition, a vacuum chamber can be used to reduce enhance moisture removal and improve drying time.TekPro - Vacuum drying chamber

© 2016 PT&C Forensic Consulting Services, P.A.

Why Consider Restoration as a Strategy for a IT Equipment Claim?

1. Cost efficient Less than 30% of replacement costs in most

cases2. Reduces down time and BI losses Restoration can often times be completed faster

than replacement, especially with high end medical pieces

Reduces installation, configuration costs Removes need to train personnel on new

equipment

© 2016 PT&C Forensic Consulting Services, P.A.

Preserving Your Options

• Disconnect Power• Control Humidity• Use Temporary Barriers• Remove Excess Water• Consider a Preservative• Retain Professional Advice• Mechanical Preservation

- Lubricating Agents• Electronics Preservation

© 2016 PT&C Forensic Consulting Services, P.A.

Case Study 1

CLAIM: Over-temperature event causes servers to shutdown and now will not reboot properly

• Obtaining the logs from the HDDs provided no thermal temperature were exceeded. Subsequent testing provided no errors or failures.

• Server logs identify thermal event (reached high limit).• Thermal event triggered thermal shutdown of the server in order to protect the

server from damage.– Now changes the temperature specifications of the server for storage

• Testing of the hardware separate from the software provides full functionality.• Shutdown was not graceful, instead was abrupt, which caused damage to OS

files.• Server does not need to be replaced.• OS needs reloaded, server is restorable. Database is not damaged and can

be fully restored to use.

© 2016 PT&C Forensic Consulting Services, P.A.

Case Study 2CLAIM: Multiple hard disk drives (HDDs) failure on a server

– Server experienced multiple HDD failures simultaneously– RAID controller failure provided inconsistencies reporting the HDDs– HDDs sent to data recovery facility, identified that the HDDs were not physically

damaged– Obtained the server for testing– Identified HDDs are fully operational– Server main components are fully operational– No logs on the server or RAID controller– Put the server and HDDs together and tested

• Reviewed the RAID controller• Found that the RAID controller was reporting the physical HDDs inconsistently

– Physical drive – two (2) HDDs offline– Logical drive – one online, second offline– Rebooted – All online and system RAID was Optimal– OS is corrupted and requires reloading

© 2016 PT&C Forensic Consulting Services, P.A.

Case Study 3

CLAIM: Tornado damaged buildings including server room full of servers and other network equipment.

Inventory obtained for claimed equipmentVisual Inspection of claimed equipment including internal inspectionWipe samples of contaminates to obtain lab results to see if equipment is restorableTesting of equipment to provide if units are functional/operationalSend wipe samples and obtain lab resultsShare lab results, restoration options with insured and clients to inform as well as obtain buy

in to the processProvide list of equipment that needs to be replaced and what can be restoredProvide quotes for restoration and replacement equipmentBegin Restoration and turn key approach to restoring insured’s equipment to pre-loss

conditionInsured’s operation restored before critical time period with all parties satisfied with results.

© 2016 PT&C Forensic Consulting Services, P.A.

Tom [email protected]

919.328.0793

Questions?Jared Fegan

[email protected]