Upload
venki-venkappa
View
35
Download
2
Embed Size (px)
DESCRIPTION
jjjjj
Citation preview
ARMMicrocontrollerCodeSizeAnalysis|Overview 1
32BitMicrocontrollerCodeSizeAnalysisDraft1.2.4.JosephYiu,AndrewFrame
OverviewMicrocontrollerapplicationprogramcodesizecandirectlyaffectthecostandpowerconsumptionofproductsthereforeitisalmostalwaysviewedasanimportantfactorintheselectionofamicrocontrollerforembeddedprojects.Sincethereleaseandavailabilityof32bitprocessorssuchastheARMCortexM3,moreandmoremicrocontrollerusershavediscoveredthebenefitsofswitchingto32bitproductslowerpower,greaterenergyefficiency,smallercodesizeandmuchbetterperformance.Whilstmostofthebenefitsofusing32bitmicrocontrollersarewidelyknown,thecodesizeadvantageof32bitmicrocontrollersislessobvious.
Inthisarticlewewillexplainwhy32bitmicrocontrollerscanreduceapplicationcodesizewhilststillachievinghighsystemperformanceandeaseofuse.
Typicalmythsofprogramsize
Myth#1:8bitand16bitmicrocontrollershavesmallercodesizeThereisacommonmisconceptionthatswitchingfroman8bitmicrocontrollertoa32bitmicrocontrollerwillresultinmuchbiggercodesizewhy?Manypeoplehavetheimpressionthat8bitmicrocontrollersuse8bitinstructionsand32bitmicrocontrollersuse32bitinstructions.Thisimpressionisoftenreinforcedbyslightlymisleadingmarketingfromthe8bitand16bitmicrocontrollervendors.
Inreality,manyinstructionsin8bitmicrocontrollersare16bit,24bitsorothersizeslargerthan8bit,forexample,thePIC18instructionsizesare16bitand,withthe8051architecture,althoughsomeinstructionsare1bytelong,manyothersare2or3byteslong.
Sowouldcodesizebebettermovingtoa16bitmicrocontroller?Notnecessarily.TakingtheMSP430asanexample,asingleoperandinstructioncantake4bytes(32bits)andadoubleoperandinstructioncantake6bytes(48bits).Intheworstcase,anextendedimmediate/indexinstructioninMSP430Xcantake8bytes(64bits).
SohowaboutthecodesizeforARMCortexmicrocontrollers?TheARMCortexM3andCortexM0processorsarebasedonThumb2technology,whichprovidesexcellentcodedensity.Thumb2microcontrollershave16bitinstructionsaswellas32bitinstructions,withthe32bitinstructionfunctionalityasupersetofthe16bitversion.InmostcasesaCcompilerwillusethe16bitversionoftheinstruction.The32bitversionwouldonlybeusedwhentheoperationcannotbeperformed
ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 2
witha16bitinstruction.Asaresult,mostoftheinstructionsinanARMCortexmicrocontrollerprogramare16bits.Thatsevensmallerthansomeoftheinstructionsin8bitmicrocontrollers.
Instruction size
8051Min
Max
Number of bits
PIC18 MSP430 / MSP430X
Min
Max
ARM
Min
Max
PIC24
16
32
48
64
Figure1:Sizeofasingleinstructioninvariousprocessors
WithinacompiledprogramforCortexMprocessors,thenumberof32bitinstructionscanbeonlyasmallportionofthetotalinstructioncount.Forexample,theamountof32bitinstructionsintheDhrystoneprogramimageisonly15.8%ofthetotalinstructioncount(averageinstructionsizeis18.53bits)whencompiledfortheCortexM3.FortheCortexM0theratioof32bitinstructionsisevenlowerat5.4%(averageinstructionsize16.9bits).
Myth#2:Myapplicationonlyprocesses8bitdataand16bitdataManyembeddeddevelopersthinkthatiftheirapplicationonlyprocesses8bitdatathenthereisnobenefitinswitchingtoa32bitmicrocontroller.However,lookingintotheoutputfromtheCcompilercarefully,inmostcasesthehumbleintegerdatatypeisactually16bits.Sowhenyouhaveaforloopwithanintegerasloopindex,comparingavaluetoanintegervalue,orusingaClibraryfunctionthatusesaninteger(e.g.memcpy()),youareactuallyusing16bitorlargerdata.Thiscanaffectcodesizeandperformanceinvariousways:
Foreachintegercomputation,an8bitprocessorwillneedmultipleinstructionstocarryouttheoperations.Thisdirectlyincreasesthecodesizeandtheclockcyclecount.
Iftheintegervaluehastobesavedintomemory,orifyouneedtoloadanimmediatevaluefromprogramROMtothisinteger,itwilltakemultipleinstructionsandmultipleclockcycles.
Sinceanintegercantakeuptwo8bitregisters,moreregistersarerequiredtoholdthesamenumberofintegervariables.Whenthereareaninsufficientnumberofregistersintheregisterbanktoholdlocalvariables,somehavetobestoredinmemory.Thusan8bitmicrocontrollermightresultinmorememoryaccesseswhichincreasescodesizeandreducesperformanceandpowerefficiency.Thesameissueappliestotheprocessingof32bitdataon16bitmicrocontrollers.
ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 3
Sincemoreregistersarerequiredtoholdanintegerinan8bitmicrocontrollerwhenpassingvariablestoafunctionviathestack,orsavingregistercontentsduringcontextswitchingorinterruptservicing,thenumberofstackoperationsrequiredismorethanthatof32bitmicrocontrollers.Thisincreasestheprogramsize,andcanalsoaffectinterruptlatencybecauseanInterruptServiceRoutine(ISR)mustmakesurethatallregistersusedaresavedatISRentryandrestoredatISRexit.Thesameissueappliestotheprocessingof32bitdataon16bitmicrocontrollers.
Thereisevenmorebadnewsfor8bitmicrocontrollerusers:memoryaddresspointerstakemultiplebytessodataprocessinginvolvingtheuseofpointerscanthereforebeextremelyinefficient.
Myth#3:A32bitprocessorisnotefficientathandling8bitand16bitdataMost32bitprocessorsareactuallyveryefficientathandling8bitand16bitdata.Compactmemoryaccessinstructionsforsignedandunsigned8bit,16bitand32bitdataareallavailable.Therearealsoanumberofinstructionsspeciallyincludedfordatatypeconversions.Overallthehandlingof8bitand16bitdatain32bitprocessorssuchastheARMCortexmicrocontrollersisjustaseasyandefficientashandling32bitdata.
Myth#4:ClibrariesforARMprocessorsaretoobigTherearevariousClibraryoptionsforARMprocessors.Formicrocontrollerapplications,anumberofcompilervendorshavedevelopedClibrarieswithamuchsmallerfootprint.Forexample,theARMdevelopmenttoolshaveasmallerversionoftheClibrarycalledMicroLib.TheseClibrariesareespeciallydesignedformicrocontrollersandallowapplicationcodesizetobesmallandefficient.
Myth#5:InterrupthandlingonARMmicrocontrollersismorecomplexOntheARMCortexmicrocontrollerstheinterruptserviceroutinesarejustnormalCsubroutines.VectoredornestedinterruptsaresupportedbytheNestedVectoredInterruptController(NVIC)withnoneedforsoftwareintervention.Infactthesetupprocessandprocessingofaninterruptrequestismuchsimplerthan8bitand16bitmicrocontrollers,asgenerallyyouonlyneedtoprogramtheprioritylevelofaninterruptandthenenableit.
Theinterruptvectorsarestoredinavectortableinthebeginningofthememory,normallywithintheflash,withouttheneedforanysoftwareprogrammingsteps.WhenaninterruptrequesttakesplacetheprocessorautomaticallyfetchesthecorrespondinginterruptvectorandstartstoexecutetheISR.Someoftheregistersarepushedtothestackbyahardwaresequenceandrestoredautomaticallywhentheinterrupthandlerexits.TheotherregistersthatarenotcoveredbythehardwarestackingsequencearepushedontothestackbyCcompilergeneratedcodeonlyiftheregisterisusedandmodifiedwithintheISR.
ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 4
Whataboutmovingto16bitmicrocontrollers?16bitmicrocontrollerscanbeefficientinhandling16bitintegersand8bitdata(e.g.strings)howeverthecodesizeisstillnotasoptimalasusing32bitprocessors:
Handlingof32bitdata:iftheapplicationrequireshandlingofanylonginteger(32bit)orfloatingpointtypesthentheefficiencyof16bitprocessorsisgreatlyreducedbecausemultipleinstructionsarerequiredforeachprocessingoperation,aswellasdatatransfersbetweentheprocessorandthememory.
Registerusage:Whenprocessing32bitdata,16bitprocessorsrequirestworegisterstoholdeach32bitvariable.Thisreducesthenumberofvariablesthatcanbeheldintheregisterbank,hencereducingprocessingspeedaswellasincreasingstackoperationsandmemoryaccesses.
Memoryaddressingmode:Many16bitarchitecturesprovideonlybasicaddressingmodessimilarto8bitarchitectures.Asaresult,thecodedensityispoorwhentheyareusedinapplicationsthatrequireprocessingofcomplexdatasets.
64Kbyteslimitation:Many16bitprocessorsarelimitedto64Kbytesofaddressablememoryreducingthefunctionalityoftheapplication.Some16bitarchitectureshaveextensionstoallowmorethan64Kbytesofmemorytobeaccessed,however,theseextensionshaveaninstructioncodeandclockcycleoverhead,forexample,amemorypointerwouldbelargerthan16bitsandmightrequiremultipleinstructionsandmultipleregisterstoprocessit.
ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 5
InstructionSetefficiencyWhencustomersporttheirapplicationsfrom8bitarchitecturetoARMCortexmicrocontrollers,theyveryoftenfindthatthetotalcodehasdramaticallydecreased.Forexample,whenMelfas(aleadingcompanyincapacitivesensingtouchscreencontrollers)evaluatedtheCortexM0processor,theyfoundthattheCortexM0programsizewaslessthanhalfofthatofthe8051and,atthesametime,deliveredfivetimesmoreperformanceatthesameclockfrequency.This,forexample,couldenablethemtoruntheapplicationat1/5clockspeedoftheequivalent8051product,reducingthepowerconsumption,andloweringproductcostatthesametimeduetoasmallerprogramflashsizerequirements.
SohowdoesARMarchitectureprovidesuchbigadvantages?ThekeyfactorisThumb2technologywhichprovidesahighlyefficientunifiedinstructionset.
PowerfulAddressingmodeTheARMCortexmicrocontrollerssupportanumberofaddressingmodesformemorytransferinstructions.Forexample:
Immediateoffset(Address=Registervalue+offset)
Registeroffset((Address=Registervalue1+shifted(Registervalue2))
PCrelated(Address=CurrentPCvalue+offset)
Stackpointerrelated(Address=SP+offset)
Multipleregisterloadandstore,withoptionalautomaticbaseaddressupdate
PUSH/POPinstructionswithmultipleregisters
Asaresultofthesevariousaddressingmodes,datatransferbetweenregistersandmemorycanbehandledwithfewerinstructions.SincethePUSHandPOPinstructionssupportmultipleregisters,inmostcases,savingandrestoringofregistersinafunctioncallwillonlyneedonePUSHinthebeginningoffunctionandonePOPattheendofthefunction.ThePOPcanevenbecombinedwiththereturninstructionattheendoffunctiontofurtherreducetheinstructioncount.
ConditionalbranchesAlmostallprocessorsprovideconditionalbranchinstructionshoweverARMprocessorsprovideimprovedconditionalbranchingbyhavingseparatedbranchconditionsforsignedandunsigneddataoperationresults,andprovidingagoodbranchrange.
Forexample,whencomparingtheconditionalbranchesoftheCortexM0andMSP430,theCortexM0hasmorebranchconditionsavailable,makingitpossibletogeneratemorecompactcodenomatterwhetherthedatabeingprocessissignedorunsigned.TheMSP430conditionalbranchesmightrequiremultipleinstructionstogetthesameoperations.
ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 6
Generallythesamesituationappliestomany8bitor16bitmicrocontrollerswhendealingwithsigneddata,additionalstepsmightalsoberequiredintheconditionalbranch.
InadditiontothebranchinstructionsavailableintheCortexM0,theCortexM3processoralsosupportscompareandbranchinstructions(CBZandCBNZ).Thisfurthersimplifiessomeoftheconditionalbranchinstructionsequence.
ConditionalExecutionAnotherareathatallowstheARMCortexM3microcontrollerstohavemorecompactcodeistheconditionalexecutionfeature.TheCortexM3supportsaninstructioncalledIT(IFTHEN).Thisinstructionallowsupto4subsequentinstructionstobeconditionallyexecutedreducingtheneedforadditionalbranches.Forexample,
if(xpos1
ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 7
MOV.W R11, R14 MOV.W R12, R13 JMP Label2 Label1 MOV.W R11, R13 MOV.W R12, R14 Label2 ThisresultsinanextratwobytesfortheMSP430whencomparedtoCortexM3.
MultiplyandDivideBoththeCortexM0andCortexM3processorssupportsinglecyclemultiplyoperations.TheCortexM3alsohasmultiplyandmultiplyaccumulateinstructionsfor32bitor64bitresults.Theseinstructionsgreatlyreducethecodesizerequiredwhenhandlingmultiplicationoflargevariables.
Mostother8bitand16bitmicrocontrollersalsohavemultiplyinstructionshoweverthelimitationoftheregistersizeoftenmeansthatthemultiplicationrequiresmultiplesteps,iftheresultneedstobemorethan8or16bits.
TheMSP430doesnothavemultiplyinstruction(MSP430documentslaa329,reference1).Tocarryoutmultiplicationeitheramemorymappedhardwaremultiplierisused,orthemultiplyoperationhastobehandledbysoftwareusingaddandshift.Evenifahardwaremultiplierispresentthememorymappednatureofthemultiplierresultsintheadditionaloverheadoftransferringdatatoandfromtheexternalhardware.Inaddition,usingthemultiplierwithinaninterrupthandlercouldcauseexistingdatainthemultipliertobelost.Asaresult,interruptsareusuallydisabledbeforeamultiplyoperationandtheinterruptisreenabledaftermultiplicationiscompleted.Thisaddsadditionalsoftwareoverheadandaffectsinterruptlatencyanddeterminism.
TheCortexM3processoralsohasunsignedandsignedintegerdivideinstructions.ThisreducesthecodesizerequiredinapplicationsthatneedtoperformintegerdivisionbecausethereisnoneedfortheClibrarytoincludeafunctionforhandlingdivideoperations.
PowerfulinstructionsetInadditionaltothestandarddataprocessing,memoryaccessandprogramcontrolinstructions,theCortexmicrocontrollersalsosupportanumberofotherinstructionstohelpdatatypeconversion.TheCortexM3processoralsosupportsanumberofbitfieldoperationsreducingthesoftwareoverheadin,forexample,peripheralcontrolandcommunicationdataprocessing.
ARMMicrocontrollerCodeSizeAnalysis|Breakingthe64Kbytememorybarrier 8
Breakingthe64KbytememorybarrierAsalreadymentioned,many8bitand16bitmicrocontrollersarelimitedto64kbytesaddressablememory.Duetothenatureof8bitand16bitmicrocontrollerarchitecture,thecodingefficiencyofthesemicrocontrollersoftendecreasesdramaticallywhentheapplicationexceedsthe64kbytememorybarrier.In8bitand16bitmicrocontrollers(e.g.8051,PIC24,C166)thisisoftenhandledbymemorybankswitchingormemorysegmentationwiththeswitchingcodegeneratedautomaticallybytheCcompilers.Everytimeafunctionordatainadifferentmemorypageisrequiredbankswitchingcodewouldbeneededandhencefurtherincreasestheprogramsize.
Figure2:Increasecodesizeoverheadofmemorybankswitchingorsegmentationin8bitand16bitsystems
Thememorybankswitchingnotonlycreateslargercodebutitalsogreatlyreducestheperformanceofasystem.Thisisespeciallythecaseifthedatabeingprocessedisondifferentmemorybank(e.g.copyingablockofdatafromonepagetoanotherpagecanbeverycostlyintermsofperformance.)Thisisparticularlyinefficientfor8bitmicrocontrollerslikethe8051becausetheMCS51
architecturedoesnothavepropersupportforsuchamemorybankswitchingfeature.ThereforememoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollikeI/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedoutinacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthememorypagesmightnotbefullyutilizedandmemoryspaceiswasted.
Forthe8bitand16bitmicrocontrollersthatsupportmemoryofover64kthisoftencomesataprice.TheMSP430Xdesignovercomesthe64KbytesmemorybarrierbyincreasingtheProgramCounter(PC)andregisterwidthto20bits.Despitenomemorypagingbeinginvolved,thesizesofsomeMSP430XinstructionsareconsiderablylargerthantheoriginalMSP430.Forexample,whenthelargememorymodelisused,adoubleoperandformattedinstructioncantake8bytesratherthan6(a33%increases):
ARMMicrocontrollerCodeSizeAnalysis|Examples 9
Op-code
15 12 11 8
Rsrc Ad
7
B/W
6
As
5
Rdst
4 3 0
Source or destination 15:0
Destination 15:0
MSP430 Double Operand intruction
Op-code
15 12 11 8
Rsrc Ad
7
B/W
6
As
5
Rdst
4 3 0
Source or destination 15:0
Destination 15:0
MSP430X Double Operand
intruction
00011 Source 19:16 Destination 19:16A/L Rsrv
Figure3:SupportoflargermemorysystemincreasesthesizeofsomeinstructionsinMSP430X
Apartfromthesizeoftheinstructionitself,theuseofthe20bitaddressingalsoincreasesthenumberofstackoperationsrequired.Sincethememoryisonly16bit,thesavingofa20bitaddresspointerwillneedtwostackpushoperations,resultinginextrainstructionsandpoorutilizationofthestackmemory.
Figure4:UseoflargememorydatamodelinMSP430Xincreasescodesize
Asaresult,anMSP430Xapplicationhasalowercodedensitywhenthelargememorymodelisused,whichisrequiredwhentheaddressrangeexceedsthe64krange.
InARMCortexmicrocontrollers,32bitlinearaddressingisusedtoprovide4GBofmemoryspaceforembeddedapplications.Thereforethereisnopagingoverheadandtheprogrammingmodeliseasytouse.
ARMMicrocontrollerCodeSizeAnalysis|Examples 10
ExamplesTodemonstratethecodesizecomparedto8bitand16bitprocessors,anumberoftestcasesarecompiledandillustratedhere.ThetestsarebasedonMSP430CompetitiveBenchmarkdocumentfromTexasinstruments(SLAA205C,reference2).Theresultslistedhereshowtotalprogrammemorysizeinbytes.
MSP430results:
ThetestslistedarecompiledusingIAREmbeddedWorkbench4.20.1withhardwaremultiplerenabled,optimizationlevelsettoHighwithSizeoptimization.Unlessspecified,theSmalldatamodelisusedandtypedoubleis32bit.Theresultsareobtainedatlinkeroutputreport(CODE+CONST).
ARMCortexprocessorresults:
ThetestslistedarecompiledusingRealViewDevelopmentSuite4.0SP2.Optimizationlevelis3forsize,minimalvectortable,andMicroLIBisused.Theresultsareobtainedatlinkeroutputreport(VECTORS+CODE).
Test GenericMSP430
MSP430F5438 MSP430F5438largedatamodel
CortexM3
Math8bit 198 198 202 144
Math16bit 144 144 144 144
Math32bit 256 244 256 120
MathFloat 1122 1122 1162 600
Matrix2dim8bit 180 178 196 184
Matrix2dim16bit 268 246 290 256
Matrixmult 276 228 (linkererror) 228
Switch8bit 200 218 218 160
Switch16bit 198 218 218 160
Firfilter(Note1) 1202 1170 1222 716(820withoutmodification)
Dhry 923 893 1079 900
Whet(Note2) 6434 6308 6614 4384(8496withoutmodification)
ARMMicrocontrollerCodeSizeAnalysis|Examples 11
Note1:TheconstantdataarrayintheFirfiltertestismodifiedtouse16bitdatatypeontheCortexMprocessor(constunsignedshortintINPUT[]).
Note2:Whencertainmathfunctionsareused(sin,cos,atan,sqrt,exp,log)intheARMCstandardthedoubleprecisionlibrariesareusedbydefault.Thiscanresultinsignificantlylargerprogramsizeunlessadjustmentsaremade.Inordertoachieveanequivalentcomparison,theprogramcodeiseditedsothatsingleprecisionversionsareused(sinf,cosf,atanf,sqrtd,expf,logf).Also,someoftheconstantdefinitionshavebeenadjustedtosingleprecision(e.g.1.0becomes1.0F).
Figure5:Codesizecomparisonforbasicoperations
Thetotalsizeforsimpletests(integermath,matrixandswitchtests)are:
Summaryforsimpletests
GenericMSP430 MSP430F5438 CortexM3
Totalsize(bytes) 1720 1674 1396
Advantage(%smaller) 2.6% 18.8%
Forapplicationsusingfloatingpoint,thereusasignicantadvantageforCortexmicrocontrollers.,whereasDhrystoneprogramsizeiscloser.
ARMMicrocontrollerCodeSizeAnalysis|Examples 12
Figure6:Codesizecomparisonforfloatingpointoperationsandbenchmarksuites
Thetotalsizeforbenchmarkandfloatingpointtests(Dhrystone,Whetstone,FirfilterandMathFloat)are:
Summaryforsimpletests
GenericMSP430 MSP430F5438 CortexM3
Totalsize(bytes) 9681 9493 6600
Advantage(%smaller) 1.9% 31.8%
Observations:
1. Fromtheresults,wecanseethattheCortexmicrocontrollershavebettercodedensitycomparedtoMSP430inmostcases.TheremainingtestsshowsimilarcodedensitywhencomparedtoMSP430.
2. Oneofthetests(firfilter)usesanintegerdatatypeforaconstantarray.Sinceanintegeris32bitintheARMprocessorandis16bitonMSP430,theprogramhasbeenmodifiedtoallowadirectcomparison.
3. WhenthelargedatamemorymodelisusedwithMSP430,thecodesizeincreasesbyupto20%(dhrystone).
4. WeareunabletoreproducealloftheclaimedresultsintheTexasInstrumentsdocument.ThismaybebecausethestorageofconstantdatainROMmighthavebeenomittedfromtheircodesizecalculations.
ARMMicrocontrollerCodeSizeAnalysis|Additionalinvestigationonfloatingpoint 13
AdditionalinvestigationonfloatingpointWhenanalysingtheresultsofthewhetstonebenchmarkitbecameapparentthattheMSP430Ccompileronlygeneratedsingleprecisionfloatingoperations,whiletheARMCcompilergenerateddoubleprecisionoperationsforsomeofthemathfunctionsused.
AfterchangingthecodetouseonlysingleprecisionfloatingpointsthecodesizereduceddramaticallyandresultedinmuchsmallercodesizethantheMSP430codesize.
TheIARMSP430compilerhasanoptiontodefinefloatingpoint:Sizeoftypedoublewhichisbydefaultsetto32bit(singleprecision).Ifitissetto64bit(asinARMCcompiler),thecodesizeincreasedsignificantly.
Programsize GenericMSP430 MSP430430F5438
TypeDoubleis32bit 6434 6308
TypeDoubleis64bit 11510 11798
TheseresultsmatchthoseseenfortheARMCortexM3processor.
Programsize CortexM3
Whetstonemodifiedtousesingleprecisiononly 4384
Outofboxcompileforwhetstone(usedoubleprecisionformathfunctions)
8496
Theoptionofsettingtypedoubleto32bitisquitesensibleforsmallmicrocontrollerapplicationswheretheCcodemightonlyneedtoprocesssourcedatageneratedfrom12bit/14bitADC.Benchmarkingusingdifferentdefaulttypescanmakeaverybigdifferenceandnotshowaccuratecomparativeresults.
ARMMicrocontrollerCodeSizeAnalysis|RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers
14
RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers
UseMicroLibIntheARMdevelopmenttoolsthereisanoptiontousetheareaoptimizedMicroLIBratherthanthestandardClibraries.TheMicroLIBissuitableformostembeddedapplicationsandhasamuchsmallercodesizewhencomparedtothestandardClibrary.
EnsuretheuseofareaoptimizationsTheperformanceofCortexMmicrocontrollersismuchhigherthanthatof16bitand8bitmicrocontrollerssowhenportingapplicationsfromthesemicrocontrollersyoucangenerallyselectthehighestareaoptimizationratherthanselectingoptimizationsforspeed.Theresultingperformancewillstillbemuchhigherthanthatofa16bitor8bitsystemrunningatthesameclockfrequency.
UsetherightdatatypeWhenportingapplicationsfrom8bitor16bitmicrocontrollers,youmightneedtomodifythedatatypeforconstantarraystoachievethemostoptimalprogramsize.Forexample,anintegerisnormally16bitin8bitand16bitmicrocontrollers,whileinARMmicrocontrollersintegersare32bit.
Type Numberofbitsin8051
NumberofbitsinMSP430
NumberofbitsinARM
char,unsignedchar 8 8 8
enum 8/16 16 8/16/32(smallestischosen)
short,unsignedshort 16 16 16
int,unsignedint 16 16 32
long,unsignedlong 32 32 32
float 32 32 32
double 32 32 64
Whenportingaconstantarrayofintegersfroman8bitor16bitarchitecture,youshouldmodifythedatatypefrominttoshortinttomakesuretheconstantarrayremainsthesamesize.Forexample,
constintmydata={1234,5678,};
Thisshouldbechangedto:
constshortintmydata={1234,5678,};
ARMMicrocontrollerCodeSizeAnalysis|RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers
15
Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortintegermightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g.variables)doesnotrequiremodification.
FloatingpointfunctionsSomefloatingpointfunctionsaredefinedassingleprecisionin8bitor16bitmicrocontrollersandarebydefaultdefinedasdoubleprecisioninARMmicrocontrollers,aswehavefoundoutwiththewhetstonetestanalysis.Whenportingapplicationcodefrom8bitor16bitmicrocontrollerstoanARMmicrocontroller,youmighthavetoadjustmathfunctionstosingleprecisionversionsandmodifyconstantdefinitionstoensurethattheprogrambehavesinthesameway.Forexample,inthewhetstoneprogramcode,asectionofcodeusessomemathfunctionsthataredoubleprecisioninARMcompilers:
X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(XY)1.0));
Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(XY)1.0));
Ifwewanttousesingleprecisiononly,theprogramcodehastobechangedto
X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(XY)1.0F));
Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y)+cosf(XY)1.0F));
Otherconstantdefinitionssuchas:
/*Module7:Procedurecalls*/X=1.0;Y=1.0;Z=1.0;shouldtobechangedtothefollowingforsingleprecisionrepresentation:
/*Module7:Procedurecalls*/X=1.0F;Y=1.0F;Z=1.0F;
DefineperipheralsasdatastructureYoucanalsoreduceprogramsizebydefiningregistersinperipheralsasadatastructure.Forexample,insteadofrepresentingtheSysTicktimerregistersas
#define SYSTICK_CTRL (*((volatile unsigned long *)(0xE000E010))) #define SYSTICK_LOAD (*((volatile unsigned long *)(0xE000E014))) #define SYSTICK_VAL (*((volatile unsigned long *)(0xE000E018))) #define SYSTICK_CALIB (*((volatile unsigned long *)(0xE000E01C)))
ARMMicrocontrollerCodeSizeAnalysis|Conclusions 16
youcandefinetheSysTickregistersas:
typedef struct { volatile unsigned int CTRL; volatile unsigned int LOAD; volatile unsigned int VAL; unsigned int CALIB; } SysTick_Type; #define SysTick ((SysTick_Type *) 0xE000E010) Bydoingthis,youonlyneedoneaddressconstanttobestoredintheprogramROM.Theregisteraccesseswillbeusingthisaddressconstantwithdifferentaddressoffsetsfordifferentregisters.Ifasequenceofhardwareregisteraccessesisrequiredforaperipheral,usingadatastructurecanreducecodesizeaswellasimproveperformance.Most8bitmicrocontrollersdonothavethesameaddressingmodefeaturewhichcanresultinamuchlargercodesizeforthesametask.
Conclusions32bitprocessorsprovideequalormoreoftenbettercodesizethan8bitand16bitarchitectureswhilstatthesametimedeliveringmuchbetterperformance.
Forusersof8bitmicrocontrollers,movingtoa16bitarchitecturecansolvesomeoftheinherentproblemswith8bitarchitectures,however,theoverallbenefitsofmigratingfrom8bitto16bitismuchlessthanthatachievedbymigratingtothe32bitCortexprocessors.
Asthepowerconsumptionandcostof32bitmicrocontrollershasreduceddramaticallyoverlastfewyears,32bitprocessorshavebecomethebestchoiceformanyembeddedprojects.
ReferenceThefollowingarticlesonMSP430arereferenced:
Reference
1 MSP430CompetitiveBenchmarkinghttp://focus.ti.com/lit/an/slaa205c/slaa205c.pdf
2 EfficientMultiplicationandDivisionUsingMSP430http://focus.ti.com/lit/an/slaa329/slaa329.pdf