HIS 2015: Prof. Ian Phillips - Stronger than its weakest link

  • View
    2.420

  • Download
    1

  • Category

    Business

Preview:

Citation preview

1

Stronger than its weakest link

HighIntegritySo.wareConference(HIS'15)5nov15:Bristol.

Pdf&SlideCast@hCp://ianp24.blogspot.com

Opinionsexpressedaremyown...

Prof. Ian Phillips Principal Staff Engineer

ARM Ltd ian.phillips@arm.com

Visiting Prof. at ...

Contribution to Industry Award 2008

2v0

2

High Integrity Software !?

..Or..

"ThescienNficmethodassumesthatasystemwithperfectintegrityyieldsasingularextrapolaNonwithinitsdomainthatonecantestagainstobservedresults"(Wikipedia)

§  IsSo.waretheweakestlinkinHighIntegritySystems?§  Suchthatimprovingitisallthat's

necessarytoproduceHighIntegritySystems?

§  WhenwesaySo.wareareweareactuallythinkingComputaNon?§  ButComputa;onisaboutresultsnotabout

implementa;ontechnologies!

3

We know what Proper Computing is... §  HPCandMainframe...maybeWorksta2on

§  ButnotreallyLaptopor...(Heavenforbid)aPocketable?

4

Graham's Orrery - c1700

§  AmachinetoComputetheposi;onoftheplanets§  Single-Task,Con;nuousTime,Analogue,Mechanical,Computer(Withbacklash!)

George Graham. Clock-Maker (1674-1751)

5

Amsler’s Planimeter - c1856

Planimeter 2015 !

§  AMachineforCompu;ngtheAreaofanarbitrary2Dshape§  Technology:PrecisionMechanics,Analogue§  Availabletoday...Electronicallyenhanced

Jakob Amsler-Laffon. Mathematician, physicist, engineer (1823-1912)

6

IN (x) Enumerated Phenomena

OUT (y) Processed Data/

Information y=F(x)

§  State(s)andTime(t)areimplicitorexplicitvariablesinthis

§  AndsoareAccuracy(a),Reliability(r)andCost($)

§  Allofwhichcanbebalanced(Architected)tomeetEnd-Customerneeds§  Exceedingneedsalmostalways'costs'more!

...TechnologiesandMethodologiesjustoffer'star$'op;onsoverbasicfunc;onality ...Notallofwhichwillbecommerciallyvaluable

Computing is solving a Model of a Subset of Reality ... Fast enough to be useful and affordable by its customer

y=F(x,s,t,a,r,$)

7

10nm

100nm

1um

10um

100um

App

roxi

mat

e P

roce

ss G

eom

etry

ITRS’99

Tran

sist

ors/

Chi

p (M

)

Tran

sist

or/P

M (

K)

X

http://en.wikipedia.org/wiki/Moore’s_law

Digital Electronics Changed the Computation Game ...

8

2012: Nvidea’s Tegra 3 Processor Unit (Around1Btransistors)

NB: The Tegra 3 is similar to the Apple A4

9

Computing Systems

§  TheSystemisperceivedatitsHumanInterface§  Thoughtheactualinterfaceis(usually)rela;velydumb§  AnditsComputeEngineisalmostalwaysremote(andmaybeshared)

10

The Invisible Face of Computing Today

UnrecognisedbutVital...AllneedtobeDependable

11

The Visible Face of Computing Today

EssenNalbutnotVital...ButBIG-BIG-BIG$

12

§  DigitalElectronics§  So.ware§  Memory§  OpNcs§  AnalogueElectronic§  Sensors/Transducers§  Mechanics§  Micro-Motors§  Displays§  DischargeTube§  RoboNcAssembly§  PlasNc,Metal,Glass

Input:Image(Light)=>Compute(ProcessImage)=>Output:SDCard(Electrons)

...ManyTechnologiesseamlesslycoopera;ng,toEnhanceHumanMemory...Tradi;onalsiloes(inc.SWandHW)arejustameanstothisend!

Electronic System (Cyber-physical System) - c2015

Incorporating DIGIC5+ (ARM)

System-Level Computation

‘Classic’ Computer

13

Human Population

Computing for the Masses ... ... Technology Products are Increasingly ‘Intelligent’

1970 1980 1990 2000 2010 2020 2030

Main Frame

Mini Computer

Personal Computer

Desktop Internet

Mobile Internet

Mill

ions

of

Uni

ts

1st Era Select work-tasks

2nd Era Broad-based computing for specific tasks

3rd Era Computing as part of our lives

TechnologyistheDriver

ConsumeristheDriver

...OldMarketsares;llthere;butdon'tdrivetheTechnologytoday!

14

Typical 2015 Computing Platform ... ... is just 137.2 x 70.5 x 5.9 mm

15

Typical 2015 Computing Platform Exynos5422Eight32bitCPUs(big.LITTLE):•  Fourbig(2.1GHzARMA15)forheavytasks;

•  Foursmall(1.5GHzARMA7)forlightertasks.

+NineMaliGPUcores...

...A~30CoreHeterogeneousMul;-Processor...InyourShirtPocket!

OneBoard...21significant‘Chips’

16

2010: Apple’s A4 SIP Package (Cross-sec;on)

ICPackagingTechnology§  Theprocessoristhecentrerectangle.Thesilvercirclesbeneathitaresolderballs.§  TworectanglesaboveareRAMdie,offsettomakeroomforthewirebonds.

§  PufngtheRAMclosetotheprocessorreduceslatency,makingRAMfasterandreducespowerconsumpNon...Butincreasescost.

§  Memory:Unknown§  Processor:Samsung/Apple(ARMProcessor)§  Packaging:Unknown(SIPTechnology)

Source ... http://www.ifixit.com

Processor SOC Die

2 Memory Dies

Glue

Memory ‘Package’

4-Layer Platform Package’

SteveJobsWWDC2010

17

2013: Samsung Solid-State Memory

§  SmartMemory(eMMC)§  16-128Gbinasinglepackage

§  8Gb/die.Stacked2-16die/package§  HandleserrorsintheAPI(SmartInterface)§  Packagejust1.4mmthick!(11.5x13x1.4mm)...Smallerthanapostagestamp

18

10nm

100nm

1um

10um

100um

App

roxi

mat

e P

roce

ss G

eom

etry

ITRS’99

Tran

sist

ors/

Chi

p (M

)

Tran

sist

or/P

M (

K)

“Verification Gap”

1,800py 8,500py

100py

Moore’s Law: Increasing Design Challenge...

http://en.wikipedia.org/wiki/Moore’s_law

19

§  TheysellthingsthatTheirCustomersdesireandcanafford§  Tosa;sfytheEnd-Customersneeds...InanEnd-Productwhichmaybeseveral‘layers’abovethem.

§  FocusontheirCoreCompetenciesasaComponentProviderinaGlobalMarket§  AvoidCommodiNsaNonbyDifferenNaNon

§  ImprovedCostandQuality(byimprovingProcess)..and..§  ImprovedBusiness-Models(whichmaketheMoney)..and..§  ImprovedFunc;onality(bynewTechnologyandMethods)

§  ButNewProductDevelopmentisaCostandaRisktobeMinimised§  Technology(HW,SW,Mechanics,Op;cs,Graphene,etc)justenablesOp;ons!§  New-Technologymaycostmore(includingrisk)thanitdeliversinProductValue!§  Over-Designcosts...Businesscan'taffordthePrecau;onaryPrinciple!

...BecausesuccessfulEnd-Productsfundtheiren;re(RD&I)Value-Chains...ReuseoftheirTechnologiesbecomeeconomicnecessityinothermarkets!

Computing Technologies in Business Context Businesses have to be Competitive, Money Making Machines today ...

20

Component and Sub-Systems from Global Enterprise ... ... Global Teams contributing Specialist Knowledge & Knowhow

§  AppleID’d159Tier-1Suppliers...§  ThousandsofEngineersGlobally

§  Est.10xTier-2Suppliers...§  IncludingVirtualComponents1and

Sub-Systems(ARMandotherIPProviders)

§  Mul;pleTechnologies...§  Hardware,Sojware,Op;cs,

Mechanics,Acous;cs,RF,Plas;cs,etc§  Manufacturing,Test,Qualifica;on,

etc.§  Methods,Tools,Training,etc

§  TensofthousandsEngineersGlobally...Morethan90%ofTechnologyandMethodsareReused(produc;vity)!

1:VirtualComponentsdonotappearonBOM

21

§  ButtheonlywaytoeconomicallyrealisethispotenNalisbyproductevoluNon;reusingandreusingagaintheworkofourtechnicalpredecessors...§  Hardware,SoHwareandotherTechnologies;MethodsandTools;andthroughoutthestack§  In-Company:SourcedandEvolvedfromPredecessorProducts§  Ex-Company:SourcedfrombusinesseswithSpecialistKnowledge/Experiance§  ReuseImprovesQuality;asobjectsaredesignedmorecarefully,andbug-fixesareincremental§  ReuseImprovesProducLvity;asobjectscanbedeployedwithoutunderstandtheirimplementa;on

technology(oritslimita;ons)...Itdeliversworkingsystemsquicklywithfiniteteams;butthedependabilitycannotbequan;fied!

...Despitethis,CommercialTechnologieswillbeusedinSystemsonwhichpeopleDepend§  ThecostofalternaLveswillbeseveralordersofmagnitudetoogreat§  Theissueis(just)makingdependablesystemsusingundependablecomponents

Designer Productivity has become the Limiting Factor The Customer Expectation of the Billions of available Transistors is irresistible!

22

ARM: Delivers Reuse-Based Productivity ...

....24Processorsin6FamiliesfordifferentApplica;onDomains

About 50MTr

About 50KTr

23

... Tools to create optimal Hetrogeneous Multi-Processors ...

ACE

ACE

NIC-400 Network Interconnect

Flash GPIO

NIC-400

USBQuad Cortex-

A15

L2 cache

Interrupt Control

CoreLink™DMC-520

x72DDR4-3200

PHY

AHB

Snoop Filter

Quad Cortex-

A15

L2 cache

Quad Cortex-

A15

L2 cache

Quad Cortex-

A15

L2 cache

CoreLink™DMC-520

x72DDR4-3200

8-16MB L3 cache

PCIe10-40GbE

DPI Crypto

CoreLink™ CCN-504 Cache Coherent Network

IO Virtualisation with System MMU

DSPDSP

DSP

SATA

Dual channel DDR3/4 x72

Up to 4 cores per cluster

Up to 4 coherent clusters

Integrated L3 cache

Up to 18 AMBA interfaces for I/O coherent accelerators

and IO

Peripheral address space

Heterogeneous processors – CPU, GPU, DSP and accelerators

Virtualized Interrupts

Uniform System memory

24

… Other Tools, Libraries and Partners to Realize the Potential §  TechnologytobuildElectronicSystemsolu2ons:

§  SoHware,Drivers,OS-Ports,Tools,ULliLestocreateefficientsystemwithop;mizedsojwaresolu;ons

§  DiversePhysicalComponents,includingCPUandGPUprocessorsdesignedforspecifictasks

§  InterconnectSystemIPdeliveringcoherencyandthequalityofservicerequiredforlowestmemorybandwidth

§  OpLmisedCell-Librariesforahighlyop;mizedSoCimplementa;ons

§  WellConnectedtoPartnersintheLife-Cycle:§  Forcomplementarytoolsandmethodsrequiredby

SystemDevelopers

§  GlobalTechnologyGlobalPartners:§  >900Licences;MillionsofDevelopers

25

Are the Outcomes of this 'chain' Dependable? Evidently so: They are Functional and Dependable enough to satisfy Billions/yr!

(2Q2015)

Smart-Phone shipments 2Q15 - 185 million (~0.75B/yr)

...Theprobabilityofa'fairlyreliable'systemsfailing,whenyouneedtouseitfor'improbable'event,is'highlyimprobable'...Andmostlythisisenough

26

‘OpNmal’Plaporm

HW1" HW2" HW3" HW4"Hardware Interface"RTOS/Drivers"

Thr

ead"

Bus(es) Processor(s)

F1"F2"

F3"F4"

F5"

CreateFuncNonal-Model1ona'Generic'Plaporm

(F1)! (F3)!

(F5)!(F2)!

EvolvingtheModel(&

Plaporm)unNlFu

ncNonal

andNon-FuncNo

nal,PerformanceisA

dequate.

NOTE:'FinalSW'issNlla

ModelofBehaviou

r!

Design is Transforming a Model of Behaviour ... ... evolving a Mathematical Model to meet Non-Functional Constraints

TransformtoaFuncNonal-Modelonan'OpNmal'(HW/SW)Plaporm

1:ThisincludesaModelofExecu;onsuchasaJavaVM.

27

§  AllmodelsareasimplificaNonofreality;thereforetheyallhavelimitaNons§  "Allmodelsarewrong,butsomeareuseful"(G.E.Box)

§  NormalSo.wareDesignMethodsarecreate-it-wrong,test-it-right...§  QualityisestablishedbyTest;andbug-fixes/patchesinthefield(Aninherentlypoormethod)§  SojwareReuseoffershugelyimprovedProducLvity(Not-usingitisnotanop;on)§  SojwareReuseoffersimprovedQuality(Butoverwhat?)

§  ExaminaNonshowsthatallcodehashighresidualerrors...§  WellstructuredandtestedSource-Codehas~5errorsper1,000linesofcode(E-KLOC)§  Commercialcodeistypically~5xworsethanthis§  Mosterrorsareharmless–Butthereisnousefulcorrela;on

§  Formal-MethodsarebeRer;butcostishighifyoucan'tuNlise(normal)legacycode.§  ButEven'Perfect-Sojware's;llhastoexecuteonanImperfect-Plauorm

..."YES!":ButGood-Enoughsa;sfiestheCommercialImpera;veformostapplica;ons

Is Software (Logic) Inherently Undependable? Software is a Model of Reality, executing on a Hardware and Software Platform

28

Open Source is Dependable? "Somebody will see the bugs!" (But only if they look!)

1: http://www.wired.com/2014/04/heartbleedslesson/ 2: http://veridicalsystems.com/blog/of-money-responsibility-and-pride/

“ItisnowveryclearthatOpenSSLdevelopmentcouldbenefitfromdedicatedfull-Nme,properlyfundeddevelopers”“OSFtypicallyreceivesonly$2,000ayearindonaNons”§  OpenSSLHeartBleedbug(2014)1

§  UpdatewasreceivedjustbeforeaPublicHoliday§  Editorwasaknownandhigh-qualitysource§  Codewasreviewedinformallyandreleased

§  Editorwasconflictedwithday-job,familyandholidaypressure2 §  Toolixleresourcestodoaproperjob.

§  ThiswasaclassicE-KLOCerror...§  NotaCoding,Formayng,orFunc;onalerror§  ItwasaSystemerror(anomissioninanon-func;onalaspectofthecode).

...Wasthe‘fault’withthesojwareSource(OpenSSLSojwareFounda;on(OSF))? ...OraUserCommunitytoo-readytobelieveintheMythofOpenSourcesojware?

29

§  BooleanMathemaNcs(HDL)isDependable;butimplementaNondependsonreliablymappingitsequaNonstothephysicalworldthroughLogic-Gates§  AGateisaSaturatedAnaloguecircuit;withNon-Func;onalaxributes.

§  CMOShasbeena'reliable'Booleanmappingfor30years,but...§  Today’s20nmtransistors(14nmsoon)havelargervariability,

andtherearemanymoreonachip(Typically1Bin2014)§  At70degC,Vtn=130mv(sigma~25mv)around1in5million,

transistorshaveVt<0(Can’tbeturnedoff)§  Sothat’s>100transistors/chipthatdon’tswitchoff§  Andthere'sanother>100thatonlyturn-onweakly(lowdrive/slow)§  Thisisintrinsic(atomic),sowillalwaysberandomlylocated!

..."NO!":Today’schipsshouldn’twork!(Sowhydothey?)

So is Hardware (Logic) Dependable? 1/3

B

A

+V

A

B OUT NAND

OUT

30

MiNgaNngthiswehave...§  WeakTransistors:Notall...

§  Areat70degCevenifthedieis(Butsomewillbehigher)§  AreMinimumSize(Larger‘area’reducesvariability)§  AreonCri;calPaths;andtheprobabilityoftherebeingmorethanoneonapathislow!

§  CMOSLogic:IsveryrobustandwillconNnuetofuncLonwithout-of-spectransistors§  LeakyGatesandFasterTransi;onsareseldomfunc;onalfailures(buttheydohitreliability!)§  Speedvaria;onsonapathaverageout(onaverage!)§  Errorsarefrequentlydifficulttodetect(andthuscorrect!)

§  Memory:AnalogueCircuitsaremuchmoresensiNvetotransistorvariaNon.But...§  Failuresareeasiertodetect(andworkaround)§  Sparerows/columnsareincludedtofixmanufacturing(sta;c)defects...butnotdynamic(use)§  NV-Mlimitedwrite-cyclesandbitfailuresareshieldedbytheirsmartAPI...tosomedegree.

...Hardwarefailureisnotalwayseasilyspoxedatthefunc;onallevel!

So is Hardware (Logic) Dependable? 2/3

31

§  Andwehaven'tincludedimponderables... §  InternallyandExternallygeneratednoise?(Greatersuscep;bilityatlowervoltages)§  High-energypar;cles?(Greatersuscep;bilityatsmallergeometries)§  Wear-out:Vt/GaindrijandElectroMigra;on?(Greatersuscep;bilityatsmallergeometries)§  LocalHot-Spots?(140Cisnotuncommononchip)§  Limita;onsofVerifica;onandTest(State-Spaceexplora;onisalwaysasub-set)

§  WearerepeatedlymulNplyingNny-improbables,byeverlarger-numbers...§  Andmanyofthevaluesareonlyguesses!§  Wehavenorealideaaboutthereliability/dependabilityofmodernSystemsorComponents

§  Butweknowthatasprocessgeometriesshrink,SuscepNbilitywillgetworse...§  Chipswillgetevermorecomplex(andmorechipswillbeusedinmorecomplexSystems)§  TransistorswillgetsmallerandDesignerswillerodesafetymarginstogetperformance

...Despitethis;ChipsandSystemsdoYieldmorethanwewouldrightlyexpect... ...Sowemustbeu;lisingUnknownSafetyMargins!

So is Hardware (Logic) Dependable? 3/3

32

Killing a Sacred Cow: SW and HW Logic are the Same ... They have different characteristics, so choice is a System Architectural decision!

// A master-slave type D-Flip Flop

module flop (data, clock, clear, q, qb);

input data, clock, clear;

output q, qb;

// primitive #delay instance-name

// (output, input1, input2, .....),

nand #10 nd1 (a, data, clock, clear),

nd2 (b, ndata, clock),

nd4 (d, c, b, clear),

nd5 (e, c, nclock),

nd6 (f, d, nclock),

nd8 (qb, q, f, clear);

nand #9 nd3 (c, a, d),

nd7 (q, e, qb);

not #10 inv1 (ndata, data),

inv2 (nclock, clock);

endmodule

'Hardware' Language (Verilog) 'Software' Language (C)

#include<time.h>

/* Use the PC's timer to check */

/* processing time */

main()

{

clock_t time,deltime;

long junk,i;

float secs;

LOOP:

printf("input loop count: ");

scanf("%ld",&junk);

time = clock();

for(i=0;i<junk;i++)

deltime = clock() - time;

secs = (float) deltime/CLOCKS_PER

printf("for %ld loops, #tics = %

%f\n",junk,deltime,secs);

goto LOOP;

...

Target Platform CMOS -------- CPU

Target Architecture Info

Compilers HW ----------- SW

Configuration Files HW -------------- SW

33

§  BytheNmeyouarewriNngApplicaNonsyouarehugelydependentonthelayered-accuracyofotherpeoplesworkbeneath

...BothHardwareandSo.ware

So whilst Boolean Mathematics is Absolute ... ... all implementations of it are not

A Software View

A Hardware View

34

§  WeCan’tDesignthemRight§  HWisSW;andCodingerrorsremain.State-spacetoobigforsimula;on

explora;on.Can’tmodelorexplorewholeSystemsandtheyaretoocomplexforFormalmethods.Reuseembodiesunknownbugs.

§  WeCan’tMakethemRight§  ChipsaresubjecttoProcessImperfec;onsandVariability.Chipsand

SystemsaresubjecttoVerifica;onsandTestEscapes.Booleanmathisabsolute;logiccellsandreallayoutsarenot

§  WeCan’tKeepthemRight§  Chipsaresuscep;bletoSupplyTransients,Wear-OutandHigh-Energy

par;cles.Mostdamageisnotimmediatelyobvious.

...Anditwillallgetworseasprocessgeometriesshrink

...YeteveryyearwemakeBillionsofSystemsthatwork! "TheNaysayersarejustHarbingersofDoom!"

So Complex Electronic Systems are Impossible!

35

§  System-LevelDependabilityiswhatmaCers...§  ComponentandSub-Systemdependabilityisinherentlypoor(andwillgetworse).

§  ProducNvitydemandsthatDependableSystemsmustReuseComponentsandSub-Systems(PhysicalandVirtual);andtheaffordableonesareofCommercialquality!§  Clean-Sheetdesignisnotanop;onforalmostallcomplexproducts! ...thecost-is-no-objectcustomerisanendangeredspecie

§  IncreasingtheDependabilityofComponentsandSub-Systemshelps;butcanneverbeenough

§ ARMproductisreally;'EnhancedReuseforElectronicSystemDesignandManufacture'

...TheOnlyPlacetoimplementSystem-LevelDependabilityonanUndependablePlauorm,isattheSystem-Layer!

§  Reliablecomponentsandsub-systemswillhelp,butcannoteverbeenough§  Predominantlya'So.ware'challenge;butnotalone(Don'tforgetthesimpleWatch-Dog)

Dependable on Undependable Any Methods that are based on perfection in HW or SW are untenable ...

36

The Real Conclusions §  SystemsarewhatEnd-Customersbuy;theyexpectthemtobeDependableEnough

§  Asubjec;veconcept;whichisApplica;on,StateandContextdependent(&Technologyindependent)

§  CommercialComponents(HW/SW)willbethebuildingblocksofDependableSystems§  CommercialusegivesustheTechnologieswhichweareeconomicallyboundtousetoday§  Thoughtheyworkbexerthanwewouldrightlyexpect,wecannotquan;fytheirquality§  ImprovingtheirQuality/Reliability/Dependabilityhelps;but100%isanasympto;cgoal!

§  TheSystemKnowswhattheSystemWants§  So:SystembehaviourandrobustnessmustbehandledattheSystem-Level(Top-Level);

onlyitcanknowtheexpectedac;onandappropriatecorrec;veac;onforitsdomain.§  And:BecauseofthesizeoftheFunc;onalandNon-Func;onalSpace,conformancecannotbe

measured;soitwillrequireaPolicyBasedapproach.

...Meanwhilesystemsthatpeopledependonwillbeproduced ...TheCommercialImpera;vecan’t/won'twaitforthe'rightmethodology'

37

The END Is Very Nigh ...

Pdf & SlideCast through http://ianp24.blogspot.com