Hawkes Intel Microcode

Embed Size (px)

DESCRIPTION

microcode intel update

Citation preview

  • NotesonIntelMicrocodeUpdatesBenHawkesDecember,2012March,2013

    Introduction

    AllmodernCPUvendorshaveahistoryofdesignandimplementationdefects,rangingfromrelativelybenignstabilityissuestopotentialsecurityvulnerabilities.ThelatestCPUerratareleaseforsecondgenerationIntelCoreprocessorsdescribesatotalof120erratums,orhardwarebugs.AlthoughmostoftheseerratabugsarelistedasNoFix,IntelhassupportedtheabilitytoapplystabilityandsecurityupdatestotheCPUintheformofmicrocodeupdatesforwelloveradecade*.

    Unfortunately,themicrocodeupdateformatisundocumented.Researchersarecurrentlypreventedfromgaininganysortofdetailedunderstandingofthemicrocodeformat,whichmeansthatitisimpossibletostudytheupdatestoclearlyestablishwhetheranysecurityissuesarebeingfixedbymicrocodepatches.ThefollowingdocumentisasummaryofnotesIgatheredwhileinvestigatingtheIntelmicrocodeupdatemechanism.

    *TheearliestIntelmicrocodereleaseappearstobefromJanuary29,2000.Sincethatdate,afurther29distinctmicrocodeDATfileshavebeenreleased.

    Acknowledgements

    TheinitialideatostudyIntelsmicrocodeupdatemechanismwasinspireddirectlyfromTavisOrmandysexploratoryworkonthissubjectin2011.Furthermore,IdliketothankEmiliaKsper,TavisOrmandy,GynvaelColdwindandThomasDullienfortheiroutstandingtechnicalassistanceandencouragement.

    Howdoesthemicrocodeupdatemechanismwork?

    MicrocodeupdatesareappliedtoaCPUbywritingthevirtualaddressoftheIntelsuppliedundocumentedbinaryblobtoamodelspecificregister(MSR)calledIA32_UCODE_WRITE.ThisisaprivilegedoperationthatisnormallyperformedbythesystemBIOSatboottime,butmodernoperatingsystemkernelsalsoincludesupportforapplyingmicrocodeupdates.

    TheBIOS(oroperatingsystem)shouldverifythatthesuppliedupdatecorrectlymatchestherunninghardwarebeforeattemptingtheWRMSRoperation.Inordertodoso,eachmicrocodeupdatecomespackagedwithashortheadercontainingvariousupdatemetadata.TheheaderisdocumentedbyIntelinVolume3oftheDeveloper'sManual.Itcontainsthreepiecesofinformationrequiredforvalidation:themicrocoderevision,processorsignature,andprocessorflags.

    Themicrocoderevisionisanincrementalversionnumberyoucanonlysuccessfullyapplyanupdateifthecurrentmicrocoderevisionislessthantherevisionsupplied.TheBIOSwilltypicallyextractthecurrentmicrocoderevisionbyissuingaRDMSRcalledIA32_UCODE_REVandthencomparethisvalueagainsttherevisioncontainedinthenewmicrocodeupdate'sheader.

    Theprocessorsignatureisauniquerepresentativeofthehardwaremodelthatthemicrocodewillapplyto.ThesignatureoftherunninghardwarecanberetrievedusingtheCPUIDinstruction,andthencomparedagainstthevaluesuppliedinthemicrocodeheader.AccordingtoIntel,"eachmicrocodeupdateisdesignedspecificallyforagivenextendedfamily,extendedmodel,type,family,model,andsteppingoftheprocessor.".Theprocessorflagsfieldissimilar,Intelsays:"theBIOSusestheprocessorflagsfieldinconjunctionwiththeplatformIdbitsinMSR(17H)todeterminewhetherornotanupdateisappropriatetoloadonaprocessor."

  • OnceamicrocodeupdatehasbeenappliedusingIA32_UCODE_WRITE,theBIOSwilltypicallyissueaCPUIDinstructionandthenreadtheIA32_UCODE_REVMSRagain.Iftherevisionnumberhasincreased,theupdatewasappliedsuccessfully.

    Observation#1Whatdoesamicrocodeupdatelooklike?

    Since2008IntelhasregularlyreleasedDATfilescontainingthemostuptodatemicrocoderevisionsforeachprocessor.Priortothis,microcodeupdatedatawasshippedaspartoftheopensourcetoolmicrocode_ctl.AnarchiveofallmicrocodeDATreleasescanbefoundhere.

    Sowhatdoestheundocumentedblobportionofthemicrocodeupdatelooklike?Itappearsthatthereisatleasttwodifferentformatstotheundocumentedblob,theoldformatbeingusedupuntilPentium4andcertainearlymodelsoftheIntelCore2,andthenewformatusedfromthatpointonwards.Thisarticlecoversthenewstyleformatonly.

    ThefollowgraphicshowsamicrocodeupdateforanIntelCorei5M460(i.e.withthedocumentedmicrocodeheaderstripped):

    Itisimmediatelyclearthatthereisaplaintextstructure(96bytesinlength)atthestartoftheundocumentedblob.Someeasilyidentifiablefieldsarecolorized:

    Microcoderevisionnumber. Releasedate(notethatthisdateissometimesonedaypriortothemicrocodeheaderdate). Reallengthofmicrocodeupdate(countedin4bytewords). Processorsignature.

    Andsomelesseasilyidentifiablefieldsthatappeartobeincommonusagearemarkedingrey:

    Possibleflagsfield?Maynotbeinuseinrecenthardwaretypes. Possibleloaderversion? Possiblelengthfield(whennonzero)?Notconsistentlyused.

  • Observation#2Isthereanystructureinthemicrocodeupdateafterthe96byteheader?

    Mostofthedatalocatedafterthe96byteheaderappearstoberandomandwithoutstructure.However,performingalongestcommonsubstringanalysisonanarchiveofeveryuniquemicrocodeupdate(availableinbinaryformathere)showedthatdifferentrevisionsforthesame(orsimilar)processorsignatureswillsharesomecommonbytestrings:

    Inthisfigure,twodistinctstringshavebeenidentified:

    Ingreen,a2048bitstringthatisconstantbetweenmicrocoderevisions.Inred,a32bitstringthatisconstantforallmicrocodeupdatesusingthenewstyleformat.

    Intotal,12unique2048bitstringswerefoundtobesharedacross24processorsignatures.Theextracteddataisavailablehere(intheformat).

    Notethat2048bitsisacommonlyusedlengthforanRSAmodulus,andthat0x00000011(decimal17)isacommonlyusedvalueforanRSAexponent.ThissuggeststhatthesecommonstringsmaybeanRSApublickey.Furtherevidencetosupportthisclaimisthat:

    Eachofthevaluesarestrictly2048bitinlength,i.e.themostsignificantbitisalwaysset.Noneofthevaluesaretriviallyfactorableby2,i.e.thevaluesarealloddnumbered.Noneofthevaluesarefactorablebyanyvaluebetween2and2^32.

    Observation#3Canthelengthofthemicrocodeupdatebeverified?

    Thelengthfieldofthe96bytemicrocodeheader(shadedingreeninfig1)canbeverifiedusingafaultinjectionanalysis.Theideaistosequentiallymutateeachbyteofavalidmicrocodeupdate,attempttoapplytheupdate,andrecordwhethertheupdatewasappliedsuccessfullyornot.

    TheunderlyingassumptionhereisthattheCPUshouldvalidatetheintegrityofthemicrocodeupdate,butmaynotvalidatetheintegrityofpadding(sincemicrocodeupdatesmustbeamultipleof1024,itisassumedthatpaddingisnormallyrequired).

  • TestingonanIntelCorei5M460(sig0x20655,pf0x800),theexpectedlengthofthemicrocodeupdate(inrevision3)is1668bytes(0x1a1*4).Sequentiallyflippingabitineachbytefromoffset0to2000andwaitingforthefirstsuccessfullyappliedupdategivesthefollowingresults:

    ThisresultwasobservedonIntelCore2DuoP9500,IntelCorei5M460andIntelCorei52520Mchips.Forallotherexperimentsbelow,resultswerereproducedonIntelCorei5M460,IntelCorei52520M,andIntelXeonW3690chips.

    Observation#4Howmanycyclesdoesanupdatetaketobeappliedsuccessfully?

    TocollecttheaveragenumberofcyclestheCPUtooktosuccessfullyapplyamicrocodeupdate,aspecializedsystemwassetupthatwould:

    1. Bootthesystemwithaninitialmicrocoderevision.2. InstallaLinuxkernelmodulethat:

    a. Invalidatecaches(wbinvd)b. Stopinstructionprefetch(sync_core)c. Disableinterruptsfortherunningcore(local_irq_disable)d. Recordtimestampcounter(rdtsc)e. Applythenextmicrocodeupdaterevision(wrmsrMSR_IA32_UCODE_WRITE)f. Recordtimestampcounter

    3. Recordtherdtscdeltainsyslog4. Reboot

    Thecacheinvalidationandinterruptdisablewereintendedtoreducevarianceinthetimingdelta.Rebootingisrequiredtoresetthesystemtotheoriginalmicrocoderevision,assuccessfullyappliedrevisionsmustbestrictlyincremental.

    Theexactcyclevaluewillvarysignificantlybetweendifferenttypesofhardware(olderhardwarewasobservedtotakesignificantlymorecycles),howeverabaselinevaluecanbeusedinfurthertiminganalysisonthesamehardware.Forexample,thebaselineaveragetimedeltaacross2000applicationsofmicrocoderevision3foranIntelCorei5M460is:

    Average:488953cyclesSamplestandarddeviation:12270cycles

    Thehighvariationinthesampledeltascollectedispresumedtobecausedbymulticoresystems.Ifthemicrocodeupdatemechanismhastoachieveaconsistentstateacrossallavailableinstructionpipelines(includingconsistencyacrosshyperthreads,prefetchedinstructions,instructioncachesonallcores),thiscouldresultinahighlevelofvariance,asthecollectionmechanismusedhereonlycleansinternalstatefortherunningcore.

  • Observation#5Dothenumberofcycleschangedependingonthelocationofafault?

    Usingthebaselinetimingdeltaabove,itispossibletofinddeviationsbyflippingeverypossiblebitpositioninthemicrocodeupdateandattemptingtoapplythemalformedupdate.Alloftheseupdateattemptswillfail,buttheideaisthatcertainfieldsmaybetreateddifferentlybythemicrocodeupdatemechanism,andthatthismayshowupinthecycledelta.

    RunningthistestonanIntelCorei5M460givesthefollowingresults:

    Thischartshowstheresultsofthefirst1000bitpositionsbeingflipped.Threedistinctareasofinterestcanbeseen.Allotherbitpositionabove1000returnacyclecountmatchingthefailurecaseseenabove.

    Thefirstareaofinterest,betweenbitoffsets32and63,correspondstoanunknownwordtheinthe96byteheaderthatalwayshasvalue0x000000a1.Thismayserveasamagicvalue,checkedwhenthemicrocodeisfirstloadedtoensurethatanexpectedformathasbeenreceived.

    Thesecondareaofinterestisasinglebitatoffset64,whichappearstocorrespondtoaflagsfield.Intheoriginalanalysis,thisbitwasset.However,clearingthebitandrepeatingtheanalysisshowsidenticalresultstofigure4,exceptwithasignificantlyloweraveragecountofcyclesforthenormalfailurecase.Thedecreaseincyclecountappearstobeproportionaltothenumberofphysicalcoresonthesystem,whichmaysuggestthisbitisusedtodecidewhethertheupdatewillbeiterativelyappliedtoallcores,oronlyappliedonasinglecore.

    Thethirdareaofinterest,betweenbitoffsets233and253,correspondstothemicrocodesizefield.

  • Observation#6Whathappensofthemicrocodesizefieldismodified?

    Modifyingeachbitpositionresultsinanincrementallyhighercyclecount.Toinvestigatethisfurther,asecondanalysiswasrunthatrecordsthecyclecountforeachsizevaluebetween0and10000.ThefollowingshowstheresultsofthisanalysisonanIntelCorei52520M:

    Inthischartwecanseeaclearcorrelationbetweenanincreasingsizevalueandanincreasingcyclecount.Thischartappearstoalsoshowartifactsfromrunningthissystemonamulticoresystem(notethatthei52520Misaquadcoreprocessor,andthatfourmaintrendlinescanbeseen).

    Runningthesizemodificationanalysiswithanincorrectmagicvalue(i.e.replacing0x000000a1withadifferentvalue)resultsinaflatchartwithnocorrelationbetweenvalueandcyclecount.Thissuggeststhatthemagicvalueischeckedpriortothesizevaluebeingused.

  • Duetothehighlevelofnoisewhilerunningthisanalysisonamulticoresystem,theanalysiswasrerunwithsymmetricmultiprocessing(SMP)andHyperThreadingdisabled.Aclearlinearcorrelationbetweenlengthvalueandcyclecountisseen.ThefollowdataistakenfromanIntelCorei52520M:

  • Withthiscleanerdata,itispossibletoobservenewtimingbehavior.Bydisplayingasmallersample,cleartimingshelvesareseenasthesizevalueincreases:

  • Byobservingtheindividualpointsofthetimingshelves,itisclearthateachtimingshelfhas16points.Sinceeachsingleincreaseinsizevaluecorrespondstoa4byteincreaseinmicrocodedata,16pointsrepresents512bitsofdata.

    512bitsisthestandardmessageblocksizeforpopularcryptographichashfunctionssuchasMD5,SHA1andSHA2.ThetimingshelvesobservedmatchwhatwewouldexpectfromaMerkleDamgrdhashfunction,aseachnewshelfrepresentstheincreasenumberofcyclesrequiredtoprocessanewmessageblock.

    Inpublickeysignatureschemes,itisnormaltosignahashofthedatamessageinsteadofsigningtheentiremessagecontents.Thismeansthatahashoperationbeingobservedintheearlystagesofthemicrocodeloaderprocessisanexpectedresult.

    Thelackoftimingartifactscorrespondingtosymmetrickeyalgorithmblocksizes(i.e.128bits)mayalsoindicatethatauthenticationofthemicrocodecontentsisoccurringpriortodecryptionofthemicrocodecontents(i.e.theciphertextisauthenticated).GiventhespaceconstraintsofamodernCPUarchitecture,thisdesignisnotentirelyunexpected,asitallowstheprocessortoloadthedecryptedcontentdirectly,withouthavingtostoretheplaintextforauthenticationpurposes.

    Observation#7Whatotherdataisinthefirst706bytesofamicrocodeupdate?

    Notethatthefirstshelfisobservedaftersupplyingasizevalueof176(or704bytesofmicrodedata),andthatsupplyingasizevalueof706bytesorlessresultsinaconstanttimingshelf.Thiswouldsuggestthatthereexistsaminimumlengthofnonvariablelengthdatathatwillbehashedregardlessofthesuppliedmicrocodesizefield.ThisdataincludestheundocumentedmicrocodeheaderandtheRSApublickeythathasbeendiscussedabove.

  • IfweassumethatthepresenceofanRSApublickeysuggeststheusageofRSAasadigitalsignaturealgorithm,thenitstandstoreasonthatanRSAsignaturewillbefoundinthemicrocodeupdate.Ifthissignaturevalueiscalculatedusingthepublickeyembeddedinthemicrocodeupdate,thenwewouldexpecttofinda2048bitvaluethatisstrictlylessthanthemodulusvalue(sincethesignatureiscalculatedusingthismodulus).

    Examiningthe2048bitsthatarecontiguouslyafterthepublickeyexponentvalue(0x00000011),wefindavalidcandidateforanRSAsignature.Ineverycase,the2048bitvalueaftertheexponentisstrictlylessthanthe2048bitspriortotheexponent(thepresumedRSAmodulus).

    Wecanattempttorecovertheoriginallysigneddatabyraisingthesignaturevaluetothepowerof0x00000011andthenusingthemodulusvalue.Theresultsofthisoperationcanbefoundhere.Theformatofthisfileis.

    TheresultappearstousePKCS#1v1.5padding,withaprivatekeyoperationsetfortheblocktype.Itisalsoclearthatearlierprocessormodelsuseda160bitdigestforthesignaturehash,whichisconsistentwithSHA1.Laterprocessormodelsusea256bitdigest,whichisconsistentwithSHA2.

    AllattemptsatrecreatingthesehashvaluesusingstandardSHAimplementationshavefailed.SeveralnonstandardvariationsofMerkleDamgrdstrengtheningwerealsoattempted.Thismayindicatethatanonstandardinitialvectororsomeothernonstandardstructuralvariationisusedwhencalculatingthesignedhashvalue.

    AttemptstoinsertanewpublickeyandsignatureforthesamePKCS#1signeddataintothemicrocodealsofailed,whichsuggeststhatthepublickeyispartoftheauthenticateddata,orthatahashoftheexpected/officialpublickeyisstoredinfactoryembeddedmemoryandverifiedafterauthentication.

    Interestingly,itwasobservedthatsettingthemostsignificantwordofthepublickeymodulustozeroresultsinahardwarereset(inthecaseofasinglecoresystem,thismanifestsasahardwarehalt/freeze,notasystemrestart).Thismaysuggestadivisionbyzeroerrorexistsinthemicrocodeauthenticationroutine.

  • Conclusion

    StudyingtheIntelmicrocodeupdatemechanismthroughdataanalysisandtiminganalysishasrevealedpropertiesaboutthecryptographicdesignofthissystem:

    Severalpreviouslyundocumentedheaderfieldshavebeenidentifiedanddescribed. Theresultssuggestthatmicrocodeupdatesareauthenticatedusinga2048bitRSAsignature. TheRSAsignatureoperationappearstobeconstanttime(i.e.unaffectedbychangestothesuppliedexponent,

    modulusorsignaturevalue). Timinganalysisreveals512bitstepscorrelatingtosuppliedmicrocodelength.Thisisacommonmessage

    blocksizeforcryptographichashfunctionssuchasSHA1andSHA2 TheRSAsignaturewaslocated,andthesigneddataisaPKCS#11.5encodedhashvalue.Olderprocessor

    modelsusea160bitdigest(SHA1),andnewerprocessmodelsusea256bitdigest(SHA2).