47
COMP1321 COMP1321 Backup of data Backup of data Richard Henson Richard Henson December 2015 December 2015

COMP1321 Backup of data Richard Henson December 2015

Embed Size (px)

Citation preview

Page 1: COMP1321 Backup of data Richard Henson December 2015

COMP1321 COMP1321 Backup of dataBackup of data

Richard HensonRichard Henson

December 2015December 2015

Page 2: COMP1321 Backup of data Richard Henson December 2015

Week 10 – Back Up and Week 10 – Back Up and Fault ToleranceFault Tolerance

ObjectivesObjectives– describe different kinds of solutions describe different kinds of solutions

available for data backup available for data backup – explain the concept and principles of explain the concept and principles of

fault tolerancefault tolerance

Page 3: COMP1321 Backup of data Richard Henson December 2015

If it can go wrong, If it can go wrong, it will!it will!

…………Murphy’s Law…………Murphy’s Law

Page 4: COMP1321 Backup of data Richard Henson December 2015

Backing Up and Fault ToleranceBacking Up and Fault Tolerance

In terms of computing…In terms of computing…– ““back up”back up” is what is done to data, in case the is what is done to data, in case the

original is corrupted for some reasonoriginal is corrupted for some reason» e.g. all computer users should back up any files they may save, e.g. all computer users should back up any files they may save,

and want to use again later…and want to use again later…» e.g. all network users should have their “user data” saved as part e.g. all network users should have their “user data” saved as part

of good network processesof good network processes

– ““fault tolerance”fault tolerance” is more fundamental, and is more fundamental, and concerned with 100% availabilityconcerned with 100% availability

» relates to hardware as well and the software required to manage relates to hardware as well and the software required to manage that hardware…that hardware…

but back up is an essential part of fault-tolerancebut back up is an essential part of fault-tolerance

Page 5: COMP1321 Backup of data Richard Henson December 2015

Fault Tolerance of DataFault Tolerance of Data Data on storage media is easily Data on storage media is easily

corrupted or deletedcorrupted or deleted– magnetic disks particularly sensitive to magnetic disks particularly sensitive to

data lossdata loss– must be backed up at all timesmust be backed up at all times

Useful also to store contents of Useful also to store contents of memorymemory– otherwise lost whenever there is a power otherwise lost whenever there is a power

interruption or system malfunctioninterruption or system malfunction

Page 6: COMP1321 Backup of data Richard Henson December 2015

Backing Up DataBacking Up Data

If complete hard disk is regularly copied…If complete hard disk is regularly copied…– massive amounts of data will soon accumulate…massive amounts of data will soon accumulate…– need a storage medium that can copy very large need a storage medium that can copy very large

quantitiesquantities

Once upon a time, tape storage the preferred Once upon a time, tape storage the preferred methodmethod– use a new back up tape every dayuse a new back up tape every day– keep the old ones carefully labelled in a safe placekeep the old ones carefully labelled in a safe place

Page 7: COMP1321 Backup of data Richard Henson December 2015

Which Data Should be Which Data Should be Backed Up?Backed Up?

Can be classified into several typesCan be classified into several types– System dataSystem data– Critical System dataCritical System data– Application dataApplication data– User dataUser data

May be backed up in different ways, May be backed up in different ways, with differing regularity:with differing regularity:

Page 8: COMP1321 Backup of data Richard Henson December 2015

What data should NOT What data should NOT be backed upbe backed up

Probably two categories:Probably two categories:– that which is safely stored elsewhere and that which is safely stored elsewhere and

can be restored at leisurecan be restored at leisure» e.g. applications on CDe.g. applications on CD

– that which won’t be used again and won’t that which won’t be used again and won’t be missedbe missed

» e.g. temporary filese.g. temporary files» read/unread emails that aren’t importantread/unread emails that aren’t important» saved word files, etc. that are no longer neededsaved word files, etc. that are no longer needed

Page 9: COMP1321 Backup of data Richard Henson December 2015

Clearing out data Clearing out data that is longer neededthat is longer needed

According to a visionary researcher:According to a visionary researcher:

– "All computer-mediated processes produce data. Unless "All computer-mediated processes produce data. Unless dealt with, it stays around. dealt with, it stays around.

– And it’s after-effects can be pretty toxic. And it’s after-effects can be pretty toxic. – And, just as 100 years ago we ignored pollution in our rush And, just as 100 years ago we ignored pollution in our rush

to build the Industrial Age, today we’re ignoring data in our to build the Industrial Age, today we’re ignoring data in our rush to build the Information Age. rush to build the Information Age.

– And, I believe, 100 years from now our great-grandchildren And, I believe, 100 years from now our great-grandchildren will look back at the decisions we made and wonder how we will look back at the decisions we made and wonder how we could have been so ignorant and short-sighted." could have been so ignorant and short-sighted."

(Bruce Schneier, 2008)(Bruce Schneier, 2008)

Page 10: COMP1321 Backup of data Richard Henson December 2015

Automatic tidying up Automatic tidying up of dataof data

The answer is simple… The answer is simple… – BACK UP processes should be BACK UP processes should be

accompanied by DELETE processes!accompanied by DELETE processes!– not yet accepted practice…not yet accepted practice…

This is good information managementThis is good information management– Reduces risk of information getting into the Reduces risk of information getting into the

wrong handswrong hands– and ensures compliance with UK Data and ensures compliance with UK Data

Protection LegislationProtection Legislation

Page 11: COMP1321 Backup of data Richard Henson December 2015

Essential/Important Essential/Important System DataSystem Data

Essential: what is needed for a healthy boot upEssential: what is needed for a healthy boot up– Microsoft networks refer to this as SYSTEM STATE Microsoft networks refer to this as SYSTEM STATE

DATADATA– highly dynamichighly dynamic

» regularregular back up essential back up essential

Data to support utilitiesData to support utilities– required for “housekeeping” dutiesrequired for “housekeeping” duties– back up every time back up every time not essentialnot essential

» data available on CDdata available on CD

Data to support servicesData to support services– as utilities dataas utilities data

Page 12: COMP1321 Backup of data Richard Henson December 2015

Backing up User DataBacking up User Data A number of approaches available:A number of approaches available:

– Incremental backupIncremental backup» Some files backed up on Monday, others on Tuesday, etc…Some files backed up on Monday, others on Tuesday, etc…

– Differential backupDifferential backup» Just files that have changed (different datestamp) are backed Just files that have changed (different datestamp) are backed

upup

– Full backupFull backup» all data backed upall data backed up

What about critical system data? e.g. What about critical system data? e.g. Windows registry settingsWindows registry settings– differential backup?differential backup?

Page 13: COMP1321 Backup of data Richard Henson December 2015

Tape Backup?Tape Backup?

Can store many gigabytes of data on a Can store many gigabytes of data on a single tapesingle tape

Storage is fairly rapid, but BUT… tape is Storage is fairly rapid, but BUT… tape is no longer regarded as the natural no longer regarded as the natural choice…choice…– storage medium is still magneticstorage medium is still magnetic– can be very slow retrieval timecan be very slow retrieval time

Page 14: COMP1321 Backup of data Richard Henson December 2015

The Backup Process The Backup Process

Handled by softwareHandled by software– easily scheduled to be automaticeasily scheduled to be automatic

Data could be backed up to a variety of Data could be backed up to a variety of alternative media alternative media – e.g. removable hard diske.g. removable hard disk

A lot of backup data will accumulate…A lot of backup data will accumulate…– general rule to dump data after three backup general rule to dump data after three backup

“generations”“generations”– known as grandfather-father-sonknown as grandfather-father-son

Page 15: COMP1321 Backup of data Richard Henson December 2015

Other Alternatives to Other Alternatives to Tape BackupTape Backup

Server data could also be backed up to:Server data could also be backed up to:– a USB-linked hard drivea USB-linked hard drive– another computer on the networkanother computer on the network– a computer on another network in a a computer on another network in a

different locationdifferent location» easily achievable via the Interneteasily achievable via the Internet

» data will be preserved in the event of a fire or data will be preserved in the event of a fire or environmental catastropheenvironmental catastrophe

Page 16: COMP1321 Backup of data Richard Henson December 2015

Verification of BackupVerification of Backup One thing to THINK that the data is being One thing to THINK that the data is being

backed upbacked up

Quite another to ensure that this has indeed Quite another to ensure that this has indeed occurred!occurred!– no reason to assume that the backup will be no reason to assume that the backup will be

completely effectivecompletely effective– plenty that could go wrongplenty that could go wrong

Data backup routine should to check:Data backup routine should to check:– that the data has indeed been copiedthat the data has indeed been copied– that there are no errorsthat there are no errors

Good backup software should make such Good backup software should make such checks automatically!checks automatically!

Page 17: COMP1321 Backup of data Richard Henson December 2015

Restoring Backed Up DataRestoring Backed Up Data Should happen…Should happen…

– as part of a regular routineas part of a regular routine– just like the backing up itself…just like the backing up itself…

No good backing the data up to tape or No good backing the data up to tape or disk if it can’t easily be recovered!disk if it can’t easily be recovered!– or can’t be copied back to the right place…or can’t be copied back to the right place…

Back up software should always be Back up software should always be tested in “restore” mode as well… tested in “restore” mode as well…

Page 18: COMP1321 Backup of data Richard Henson December 2015

Beyond Backup: Beyond Backup: “Thinking the unthinkable...”“Thinking the unthinkable...”

Humans are optimisticHumans are optimistic– we HOPE things won’t ever go wrong…we HOPE things won’t ever go wrong…– but they do!!!but they do!!!

ANY network device could go wrong at any ANY network device could go wrong at any timetime– could affect network performancecould affect network performance– could even bring the whole network to a halt…could even bring the whole network to a halt…– with time, the business/organisation will be kaputwith time, the business/organisation will be kaput

Software can also failSoftware can also fail– may go into an endless loopmay go into an endless loop– may need to be restartedmay need to be restarted

Page 19: COMP1321 Backup of data Richard Henson December 2015

An International Standard for An International Standard for “Business Continuity Planning”“Business Continuity Planning”

BS25999:BS25999:– taking “Murphy’s Law” contingency planning to all taking “Murphy’s Law” contingency planning to all

aspects of the organisationaspects of the organisation» recent UK e.g.: floodingrecent UK e.g.: flooding

» need to prepare for it so the business can continue…need to prepare for it so the business can continue…

These days, a business’s most important These days, a business’s most important asset often is its informationasset often is its information– stored in digital formatstored in digital format– copy needs to be kept in a different locationcopy needs to be kept in a different location

Page 20: COMP1321 Backup of data Richard Henson December 2015

Fault Tolerance and Fault Tolerance and Computer SystemsComputer Systems

All about All about availabilityavailability Any organisation now dependent on Any organisation now dependent on

digital datadigital data Power cut… people stop work… most Power cut… people stop work… most

of what they do involves a computerof what they do involves a computer Good fault tolerance is about minimising Good fault tolerance is about minimising

the chances of this happening…the chances of this happening…

Page 21: COMP1321 Backup of data Richard Henson December 2015

Definition of “Fault Tolerant”?Definition of “Fault Tolerant”?

“A computer system or component designed so that, in the event that a component fails, a backup component or procedure can immediately take its place with no loss of service”

Page 22: COMP1321 Backup of data Richard Henson December 2015

Fault Tolerance role of the Fault Tolerance role of the Network Operating SystemNetwork Operating System

Each important hardware component on Each important hardware component on the network should have a backup that the network should have a backup that can take over in the event of a failurecan take over in the event of a failure

NOS should thereforeNOS should therefore– detect failuresdetect failures– enable a backup to automatically take over enable a backup to automatically take over

when the fault is detected...when the fault is detected...

Page 23: COMP1321 Backup of data Richard Henson December 2015

Achieving Fault ToleranceAchieving Fault Tolerance ONE APPROACH…

– carefully written software» software detects failure of other software

» takes evasive action in real time

– hardware has an embedded system that:» detects failure

» rapidly swaps alternative hardware into action

Makes sense for the operating system to do all of this…– detects both hardware and software failure

» restarts program(s)

» swaps in alternative pre-wired hardware

Page 24: COMP1321 Backup of data Richard Henson December 2015

Concept of Data “Mirroring” Concept of Data “Mirroring” Problem with periodic backup:Problem with periodic backup:

– data copied the previous nightdata copied the previous night– what if the system hard disk goes kaput in the what if the system hard disk goes kaput in the

middle of the next day?middle of the next day?

Copy of all data should additionally be stored Copy of all data should additionally be stored “shorter term” on further media“shorter term” on further media– easiest way is to have another disk in reserveeasiest way is to have another disk in reserve– everything copied to system disk also copied to everything copied to system disk also copied to

mirrormirror

Page 25: COMP1321 Backup of data Richard Henson December 2015

Disk MirroringDisk Mirroring

Increases boot/system disk Increases boot/system disk fault tolerance under most fault tolerance under most conditions conditions

In its simplest form:In its simplest form:– all data held on one disk:all data held on one disk:– second disk is an exact copy second disk is an exact copy

of the firstof the first When anything is written to When anything is written to

disk…disk…– written simultaneously to both written simultaneously to both

disksdisks

Disk controller

Writes data to A

Writes same data to B

Disk A

Disk B

Page 26: COMP1321 Backup of data Richard Henson December 2015

Where even Mirroring alone Where even Mirroring alone is not enough…is not enough…

If the system crashes and will not If the system crashes and will not reboot…reboot…– operating system doesn’t get reloadedoperating system doesn’t get reloaded– therefore the mirror never gets activatedtherefore the mirror never gets activated

» and copied files cannot be read…and copied files cannot be read…

Page 27: COMP1321 Backup of data Richard Henson December 2015

Recovering the system after a Recovering the system after a damaged Mirrored Boot Diskdamaged Mirrored Boot Disk

Boot program can only point to one disk at a Boot program can only point to one disk at a time…time…

If the boot disk crashes …If the boot disk crashes …– the system boot program will fail to access a disk the system boot program will fail to access a disk

at all next time it restarts…at all next time it restarts… System needs an alternative boot up…System needs an alternative boot up…

– e.g. use a boot floppy or CD to restart the e.g. use a boot floppy or CD to restart the system…system…

Boot program can then be modified to point to Boot program can then be modified to point to the backup, not the faulty diskthe backup, not the faulty disk

Page 28: COMP1321 Backup of data Richard Henson December 2015

Remedial action after a Remedial action after a broken mirrorbroken mirror

Just because the system is up and running Just because the system is up and running again, doesn’t mean the emergency is over…again, doesn’t mean the emergency is over…

Fault-tolerance MUST be restored before Fault-tolerance MUST be restored before ANYONE can relaxANYONE can relax– replacement disk must be added asap to replace replacement disk must be added asap to replace

the damaged onethe damaged one– the mirror must then be re-establishedthe mirror must then be re-established– all the disk copying required to re-establish system all the disk copying required to re-establish system

fault-tolerance may take some time…fault-tolerance may take some time…

Page 29: COMP1321 Backup of data Richard Henson December 2015

Relative Merits of Mirroring Relative Merits of Mirroring (system availability) (system availability)

Advantage:Advantage:– system keeps going as normal if a non-system keeps going as normal if a non-

boot disk crashesboot disk crashes Disadvantages:Disadvantages:

– disk write operations take longerdisk write operations take longer– half of available disk space is used up half of available disk space is used up

(only 50% efficient used of storage)(only 50% efficient used of storage)

Page 30: COMP1321 Backup of data Richard Henson December 2015

Hardware flaw with MirroringHardware flaw with Mirroring

Regardless of the boot disk problem, Regardless of the boot disk problem, disk mirroring is STILL not entirely fault-disk mirroring is STILL not entirely fault-tolerant!tolerant!

– both disks connected to the same hard both disks connected to the same hard disk controllerdisk controller

– if the controller card goes down, if the controller card goes down, neitherneither disk will be accessibledisk will be accessible

Page 31: COMP1321 Backup of data Richard Henson December 2015

Disk DuplexingDisk Duplexing

Separate controller Separate controller card for each diskcard for each disk– if one card goes down, if one card goes down,

only the disk connected only the disk connected to it is affectedto it is affected

NOTE:NOTE:– use of duplexing DOES NOT use of duplexing DOES NOT

eradicate the potential re-eradicate the potential re-booting problem caused by a booting problem caused by a damaged boot diskdamaged boot disk

– needs the same solution as needs the same solution as mirroringmirroring

Controller A

Controller B

Disk A Disk B

motherboard

Page 32: COMP1321 Backup of data Richard Henson December 2015

Problem: Too Much Problem: Too Much redundancy of disk spaceredundancy of disk space

Redundancy = disk space used by the Redundancy = disk space used by the system/total disk spacesystem/total disk space

Both mirroring and duplexing:Both mirroring and duplexing:– Redundancy = 0.5 (50%)Redundancy = 0.5 (50%)– Rather highRather high– Half of available space tied up in backup!Half of available space tied up in backup!

Solution: RAID (Redundant Array of Solution: RAID (Redundant Array of Inexpensive Disks)Inexpensive Disks)– less redundancyless redundancy– still full backupstill full backup

Page 33: COMP1321 Backup of data Richard Henson December 2015

What is RAID?What is RAID? A system of several disks where part of each A system of several disks where part of each

disk is used to store system data, and the rest disk is used to store system data, and the rest stores backup datastores backup data

If all the disks are linked together and just If all the disks are linked together and just used for primary data (ie no backup):used for primary data (ie no backup):

– the arrangement is known as a the arrangement is known as a stripe setstripe set– also known as RAID 0 (ie zero fault tolerance)also known as RAID 0 (ie zero fault tolerance)

Page 34: COMP1321 Backup of data Richard Henson December 2015

Categories of RAID Categories of RAID providing fault toleranceproviding fault tolerance

RAID 1 - mirroring or duplexingRAID 1 - mirroring or duplexing RAID 2 – backup using disks that do not have RAID 2 – backup using disks that do not have

their own error-checkingtheir own error-checking RAID 3 – backup using disks with their own RAID 3 – backup using disks with their own

error checkingerror checking– striped across disks at byte levelstriped across disks at byte level– parity data stored on one driveparity data stored on one drive

RAID 4 – similar to RAID 3RAID 4 – similar to RAID 3– but data striped in whole blocks, not per bytebut data striped in whole blocks, not per byte– poor data write performancepoor data write performance

Page 35: COMP1321 Backup of data Richard Henson December 2015

RAID 5 (the best!)RAID 5 (the best!)

Can use different number of disks Can use different number of disks (minimum three)(minimum three)

Each disk divided into sectionsEach disk divided into sections One parity section in each diskOne parity section in each disk Data write faster than RAID 4Data write faster than RAID 4 Redundancy depends on number of Redundancy depends on number of

disks used….disks used….

Page 36: COMP1321 Backup of data Richard Henson December 2015

Example of RAID 5 Example of RAID 5 (four disks)(four disks)

– Each of the four disks divided into four Each of the four disks divided into four sectionssections

» one section for parity in eachone section for parity in each» not always write to parity disk not always write to parity disk » data write therefore faster than RAID 4, read slowerdata write therefore faster than RAID 4, read slower» Redundancy = ¼Redundancy = ¼

Page 37: COMP1321 Backup of data Richard Henson December 2015

RAID 5 (five disks)RAID 5 (five disks)

– RAID 5 using five disks (most popular), RAID 5 using five disks (most popular), each divided into five sectionseach divided into five sections

» One section for parity as beforeOne section for parity as before

» Redundancy = 1/5Redundancy = 1/5

Page 38: COMP1321 Backup of data Richard Henson December 2015

Hot SwappingHot Swapping

Disks that can be removed and replaced Disks that can be removed and replaced without rebooting the systemwithout rebooting the system

If a disk that belongs to a RAID system fails…If a disk that belongs to a RAID system fails…– the system can continuethe system can continue– but fault tolerance is immediately lostbut fault tolerance is immediately lost

Helpful and quicker to replace the disk, and Helpful and quicker to replace the disk, and re-establish RAID:re-establish RAID:– as soon as possibleas soon as possible– without having to turn the power off and rebootingwithout having to turn the power off and rebooting

Page 39: COMP1321 Backup of data Richard Henson December 2015

Fault Tolerance and Re-bootFault Tolerance and Re-boot

If a system crashes and/or is rebooted…If a system crashes and/or is rebooted…– availability is temporarily lostavailability is temporarily lost

Needs to be a reserve system (backup Needs to be a reserve system (backup server) that will perform that system’s server) that will perform that system’s functions in the meantimefunctions in the meantime

Network Operating system needs to Network Operating system needs to synchronise processes across systems synchronise processes across systems to enable this to take place…to enable this to take place…

Page 40: COMP1321 Backup of data Richard Henson December 2015

The Backup ServerThe Backup Server

Essential for 100% availabilityEssential for 100% availability Should be configured as a replacement Should be configured as a replacement

for the main serverfor the main server– also needs to be a domain controlleralso needs to be a domain controller– must also have a copy of the users must also have a copy of the users

database, regularly synchronised with the database, regularly synchronised with the main domain controller main domain controller

– also configured to be able to log users onto also configured to be able to log users onto the networkthe network

Page 41: COMP1321 Backup of data Richard Henson December 2015

Backup of Settings before Backup of Settings before Reconfiguration by RebootingReconfiguration by Rebooting

New hardware will be added to a server from New hardware will be added to a server from time to time:time to time:– more memory?more memory?– extra hard disk?extra hard disk?– new video, sound, or network card?new video, sound, or network card?

Hot swapping may well NOT be supportedHot swapping may well NOT be supported Server will have to reboot and reconfigureServer will have to reboot and reconfigure

Page 42: COMP1321 Backup of data Richard Henson December 2015

Backup of Settings before Backup of Settings before Reconfiguration by RebootingReconfiguration by Rebooting If the new drivers are not correct:If the new drivers are not correct:

– system may not reboot properlysystem may not reboot properly– may be difficult to remove driversmay be difficult to remove drivers

In such circumstances, system needs a In such circumstances, system needs a “rollback” feature, so the old hardware can be “rollback” feature, so the old hardware can be put back, as well as… put back, as well as… – previous settings safely stored where they can be previous settings safely stored where they can be

easily retrievedeasily retrieved– previous settings restored as an option on boot-upprevious settings restored as an option on boot-up

Page 43: COMP1321 Backup of data Richard Henson December 2015

Keeping Servers Cool!Keeping Servers Cool!

Servers work hard (especially the Servers work hard (especially the disks…)disks…)

Can get hotCan get hot– will reduce MTBF of componentswill reduce MTBF of components

Need good ventilation at all times…Need good ventilation at all times…

Page 44: COMP1321 Backup of data Richard Henson December 2015

Minimising Effects of Minimising Effects of Power FailurePower Failure

Power failure can ruin hardwarePower failure can ruin hardware– mains spikes can overheat componentsmains spikes can overheat components– sudden lack of power will lose data sudden lack of power will lose data

currently being processedcurrently being processed Best to protect all hardware:Best to protect all hardware:

– bottom line - surge preventerbottom line - surge preventer– better: UPS (uninterruptible power supply)better: UPS (uninterruptible power supply)

Page 45: COMP1321 Backup of data Richard Henson December 2015

The UPSThe UPS Battery packs that can provide mains voltage Battery packs that can provide mains voltage

after a power cutafter a power cut– for a few minutes (cheap but effective)for a few minutes (cheap but effective)– or half an hour (expensive, less down time)or half an hour (expensive, less down time)

NOS needs to make sure it automatically cuts in NOS needs to make sure it automatically cuts in when voltage drops sharplywhen voltage drops sharply

Power continuation must include the backup Power continuation must include the backup domain controller, so synchronisation can occurdomain controller, so synchronisation can occur– procedure of “graceful degradation”procedure of “graceful degradation”

» allows processing to go to completionallows processing to go to completion

» allows new system settings to be writtenallows new system settings to be written

Page 46: COMP1321 Backup of data Richard Henson December 2015

The Fault Tolerant The Fault Tolerant Network Operating SystemNetwork Operating System

A Fault Tolerant system needs to have A Fault Tolerant system needs to have good control of hardware, backup good control of hardware, backup hardware and softwarehardware and software

The NOS, and those who configure it, The NOS, and those who configure it, need to use fault tolerance effectively need to use fault tolerance effectively so an organisational network willso an organisational network will– keep going… (accessibility)keep going… (accessibility)– do what is expected… (reliability, stability)do what is expected… (reliability, stability)

Page 47: COMP1321 Backup of data Richard Henson December 2015

Network Operating Systems Network Operating Systems and Fault Toleranceand Fault Tolerance

Many features to make fault tolerance Many features to make fault tolerance kick in automaticallykick in automatically

However, fault tolerance only restored However, fault tolerance only restored once the faulty component has been once the faulty component has been replaced and its replacement configured replaced and its replacement configured to work as the new backup…to work as the new backup…

You’ll see how all this can be achieved You’ll see how all this can be achieved in the practicals…in the practicals…