Upload
jonathan-skelton
View
1.555
Download
29
Embed Size (px)
Citation preview
VASP:SomeAccumulatedWisdom
J.M.Skelton
WMDGroupMeeting21st September2015
WMDGroupMeeting,September2015|Slide2
Convergence:Parameters
• FourkeytechnicalparametersinaVASPcalculation:
o Basisset:ENCUT andPREC (or,alternatively,NGX,NGY,NGZ)
o k-pointsampling:KPOINTS fileandSIGMA
o [Forcertaintypesofpseudopotential.] Augmentationgrid:ENAUG andPREC (or,alternatively,NGXF,NGYF,NGZF)
o Whichspacetheprojectionoperatorsareappliedin(LREAL)
WMDGroupMeeting,September2015|Slide3
Convergence:Augmentationgrid
• Asecond,finermeshisusedtorepresentthechargedensityneartheioncores:controlledbyENAUG (orPREC +EAUG inthePOTCAR files),whichdeterminesNG*F
WMDGroupMeeting,September2015|Slide4
Convergence:ZnS revisited
• ForcalculationsonZnS withTPSS,ENAUG needstobeincreasedfromthedefault(butENCUT = 550 eVisfine)- equivalenttoincreasingNG*F [butwithout alsoincreasingNG* asintheQHA-ExC paper,whichevidentlyunnecessary(!)]
WMDGroupMeeting,September2015|Slide5
Convergence:ZnS revisited
• ForcalculationsonZnS withTPSS,ENAUG needstobeincreasedfromthedefault(butENCUT = 550 eVisfine)- equivalenttoincreasingNG*F,butwithout alsoincreasingNG*,whichiswasteful
ENCUT/eV
ENAUG/eV NG* NG*F Noise? t /min
550 575.892 120 160 û -
650 575.892 128 160 û -
750 575.892 140 160 û -
850 575.892 150 160 û -
550 675.892 120 180 ü 116
550 775.892 120 192 ü 108
550 875.892 120 200 ü 113
WMDGroupMeeting,September2015|Slide6
TheVASPSCFcycle
• TheSCFcycleproceedsintwophases:
o Theplane-wavecoefficientsareinitialisedrandomlyand“pre-optimised”withinafixedpotentialgivenbythesuperposition ofatomicdensities(INIWAV,NELMDL)
o Thewavefunctions anddensityarethenoptimisedself-consistentlytoconvergence(EDIFF,NELMIN,NELM)
o Ifaninitialchargedensityexists(e.g.fromapreviousSCForconvergedCHGCAR/WAVECAR),thefirststepcanbeskipped(ISTART,ICHARG)
• Toaccelerateconvergence,theoutput densityfromastepN isnotfeddirectlyintothenextstepN+1,but ismixedwiththeinputdensity(IMIX,INIMIX,MIXPRE,MAXMIX,AMIX,AMIN,AMIX_MAG,BMIX,BMIX_MAG,WC)
• Forthemathematically-minded:http://th.fhi-berlin.mpg.de/th/Meetings/DFT-workshop-Berlin2011/presentations/2011-07-14_Marsman_Martijn.pdf
WMDGroupMeeting,September2015|Slide7
TheVASPSCFcycle
N E dE d eps ncg rms rms(c)DAV: 1 0.425437171796E+04 0.42544E+04 -0.38613E+05 920 0.178E+03DAV: 2 -0.114846409831E+04 -0.54028E+04 -0.51653E+04 1130 0.323E+02DAV: 3 -0.169662738043E+04 -0.54816E+03 -0.53994E+03 1130 0.100E+02DAV: 4 -0.171494085624E+04 -0.18313E+02 -0.18206E+02 1160 0.198E+01DAV: 5 -0.171553585547E+04 -0.59500E+00 -0.59387E+00 1220 0.331E+00 0.706E+01RMM: 6 -0.159733114612E+04 0.11820E+03 -0.21124E+02 920 0.147E+01 0.352E+01RMM: 7 -0.157358217358E+04 0.23749E+02 -0.82778E+01 920 0.937E+00 0.173E+01RMM: 8 -0.157195752202E+04 0.16247E+01 -0.10028E+01 922 0.344E+00 0.736E+00RMM: 9 -0.157170732229E+04 0.25020E+00 -0.24051E+00 920 0.173E+00 0.186E+00RMM: 10 -0.157170709721E+04 0.22508E-03 -0.17654E-01 932 0.561E-01 0.965E-01RMM: 11 -0.157173130475E+04 -0.24208E-01 -0.10240E-01 920 0.332E-01 0.466E-01RMM: 12 -0.157174953342E+04 -0.18229E-01 -0.23004E-02 920 0.198E-01 0.213E-01RMM: 13 -0.157175624413E+04 -0.67107E-02 -0.12470E-02 920 0.134E-01 0.938E-02RMM: 14 -0.157175705572E+04 -0.81159E-03 -0.49641E-03 922 0.781E-02 0.577E-02RMM: 15 -0.157175711576E+04 -0.60039E-04 -0.62130E-04 922 0.302E-02 0.211E-02RMM: 16 -0.157175714692E+04 -0.31162E-04 -0.18825E-04 932 0.152E-02 0.146E-02RMM: 17 -0.157175715237E+04 -0.54516E-05 -0.37827E-05 935 0.701E-03 0.564E-03RMM: 18 -0.157175715526E+04 -0.28845E-05 -0.88070E-06 824 0.340E-03 0.361E-03RMM: 19 -0.157175715551E+04 -0.24851E-06 -0.27408E-06 657 0.209E-03
1 F= -.15717572E+04 E0= -.15717572E+04 d E =-.291254-147
BetweenNELMINandNELM stepsintotal
NELMDL stepsinafixedpotential
Minimisationalgorithm
Totalfreeenergy
Changeintotalenergyandeigenvalues
Numberofevaluationsof𝐻"#𝛹⟩
Differenceininputandoutput density;oscillationsprobablyindicateconvergenceproblems
Totalfreeandzero-broadening(𝜎 → 0)energy
WMDGroupMeeting,September2015|Slide8
TheALGO tag
• ALGO isthe“recommended”tagforselectingtheelectronic-minimisationalgorithm
• Mostofthealgorithmshave“subswitches”,whichcanbeselectedusingIALGO
• ItendtouseoneoffourALGOs:
• RMM-DIIS(ALGO = VeryFast):fastestperSCFstep,bestparallelised,andconvergesquicklyclosetoaminimum,butcanstrugglewithdifficultsystems
• BlockedDavidson(ALGO = Normal):slower thanRMM-DIIS,butusuallystable,althoughcanstillstrugglewithdifficultproblems (e.g.magnetism,meta-GGAsandhybrids)
• Davidson/RMM-DIIS(ALGO = Fast):UsesALGO = Normal forthe“pre-optimisation”,thenswitchestoALGO = VeryFast;agooddefaultchoice
• All-bandconjugategradient(ALGO = All):Slow,butverystable;useasafallbackwhenALGO = Normal struggles,andforhybrids
WMDGroupMeeting,September2015|Slide9
TamingTPSS(andothermeta-GGAs)
!ALGO = Normal | All
!GGA = PSMETAGGA = TPSS | revTPSS | M06L
LASPH = .TRUE.LMIXTAU = .TRUE.
!ENAUG = MAX(EAUG) * 1.5!NGXF = <>; NGYF = <>; NGZF = <>;
• Inmyexperience,meta-GGAscansometimesbemoredifficulttoconvergethanstandardGGAfunctionals (orevenhybrids)
RMM-DIIS(ALGO = Fast | VeryFast)sometimesstruggle
Don’tforget- (rev)TPSSarebasedonPBE
AsphericalgradientcorrectionsinsidePAWspheres
Passkinetic-energydensitytothecharge-densitymixer
Mayneedto increaseENAUG/NG*F ifveryaccurateforcesareneeded(e.g.phonons)
WMDGroupMeeting,September2015|Slide10
Parallelisation
• ThenewestversionsofVASPimplementfourlevelsofparallelism:
o k-pointparallelism:KPAR
o Bandparallelismanddatadistribution:NCORE andNPAR
o Parallelisationanddatadistributionoverplane-wavecoefficients(=FFTs;doneoverplanesalongNGZ):LPLANE
o Parallelisationofsomelinear-algebraoperationsusingScaLAPACK (notionallysetatcompiletime,butcanbecontrolledusingLSCALAPACK)
• Effectiveparallelisationwill…:
o …minimise(relativelyslow)communicationbetweenMPIprocesses,…
o …distributedatatoreducememoryrequirements…
o …andmakesuretheMPIprocesseshaveenoughworktokeepthembusy
WMDGroupMeeting,September2015|Slide11
Parallelisation:Workloaddistribution
Cores
KPAR k-pointgroups
NPAR bandgroups
NGZ FFTgroups(?)
• WorkloaddistributionoverKPAR k-pointgroups,NBANDS bandgroupsandNGZ plane-wavecoefficient(FFT)groups[not100%surehowthisworks…]
WMDGroupMeeting,September2015|Slide12
Parallelisation:Datadistribution
Data
KPAR k-pointgroups
NPAR bandgroups
NGZ FFTgroups(?)
• DatadistributionoverNBANDS bandgroupsandNGZ plane-wavecoefficient(FFT)groups[alsonot100%surehowthisworks…]
WMDGroupMeeting,September2015|Slide13
Parallelisation:KPAR
• DuringastandardDFTcalculation,k-pointsareindependent ->k-pointparallelismshouldbelinearlyscaling,althoughperhapsnot inpractice:https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/
• <#cores>mustbedivisiblebyKPAR,buttheparallelisationisvia a“round-robin”algorithm,so<#k-points> doesnotneedtobedivisiblebyKPAR ->checkhowmanyirreducible k-pointsyouhave(head IBZKPT)andsetKPARaccordingly
k1
k2
k3
k1 k2
k3
k1 k2 k3
KPAR = 1t =3[OK]
KPAR = 2;t =2[Bad]KPAR = 3t =1[Good]
R1
R2
R3
R1
R2
R1
NCORE :numberofcoresinbandgroupsNPAR :numberofbandstreatedsimultaneously
WMDGroupMeeting,September2015|Slide14
Parallelisation:NCORE andNPAR
NCORE =< #cores >NPAR
• WhynotNCORE = 1/NPAR = <#cores> (thedefault)?- morebandgroups(probably)increasesmemorypressureandincursasubstantialcommunicationoverhead
7.08x
6.41x
6.32x
WMDGroupMeeting,September2015|Slide15
Parallelisation:NCORE andNPAR
• WARNING:VASPwillincreasethedefaultNBANDS tothenearestmultipleofthenumberofgroups
• SincetheelectronicminimisationscalesasapowerofNBANDS, thiscanbackfireincalculationswithalargeNPAR (e.g.thoserequiringNPAR = <#cores>)
Cores
NBANDS
Default Adjusted
96 455 480
128 455 512
192 455 576
256 455 512
384 455 768
512 455 512
NBANDS=NELECT
2+NIONS2
Examplesystem:
• 238atomsw/272electrons
• DefaultNBANDS =455
NBANDS =35NELECT +NMAG
WMDGroupMeeting,September2015|Slide16
Parallelisation:Memory
• KPAR:currentimplementationdoesnotdistributedataoverk-pointgroups->KPAR = NwilluseN xmorememorythanKPAR = 1
• NPAR/NCORE:dataisdistributedoverbandgroups->decreasingNPAR/increasingNCORE byafactorofN willreducememoryrequirementsbyN x
• NPAR takesprecedenceoverNCORE - ifyouuse“master”INCAR files,makesureyoudon’tdefineboth
• ThedefaultsforNPAR/NCORE (NPAR = <#cores>,NCORE = 1)areusuallyapoorchoiceforbothmemoryand performance
• Bandparallelismforhybridfunctionals hasbeensupportedsinceVASP5.3.5;formemory-intensivecalculations,itisagoodalternativetounderpopulating nodes
• LPLANE:distributesdataoverplane-wavecoefficients,andspeedsthingsupbyreducingcommunicationduringFFTs- thedefaultisLPLANE = .TRUE.,andshouldonlyneedtobechangedformassively-parallelarchitectures(e.g.BG/Q)
WMDGroupMeeting,September2015|Slide17
Parallelisation:ScaLAPACK
• RMM-DIIS(ALGO = VeryFast | Fast)involvesthreesteps:
EDDIAG :subspacediagonalisationRMM-DIIS :electronicminimisationORTHCH :wavefunction orthogonalisation
Routine 312atoms 624 atoms 1,248atoms 1,872 atoms
EDDIAG 2.90(18.64%) 12.97(22.24%) 75.26(26.38%) 208.29(31.31%)
RMM-DIIS 12.39(79.63%) 42.73(73.27%) 187.62(65.78%) 379.80(57.10%)
ORTHCH 0.27(1.74 %) 2.62(4.49%) 22.36(7.84%) 77.11(11.59%)
• EDDIAG andORTHCH formallyscaleasN3,andrapidlybegintodominatetheSCFcycletimeforlargecalculations
• AgoodscaLAPACK librarycanimprovetheperformanceoftheseroutinesinmassively-parallelcalculations
Seealso:https://www.nsc.liu.se/~pla/blog/2014/01/30/vasp9k/
WMDGroupMeeting,September2015|Slide18
Parallelisation:My“rulesofthumb”
• Forx86_64IBsystems(Archer,Balena,Neon…):
o UseKPAR inpreferencetoNPAR
o SetNPAR = (<#nodes>/KPAR) orNCORE = <#cores/node>
o 1node/band groupper50atoms;maywanttouse2nodes/50 atomsforhybrids,ordecreaseto½nodeperbandgroupfor<10atoms
o ALGO = Fast isausuallyagoodchoice,exceptforbadly-behavedsystems
o LeaveLPLANE atthedefault(.TRUE.)
o FortheIBMBG/Q(STFCHartree):
o TheHartree machinecurrentlyusesVASP5.2.x->noKPAR
o Trytochooseasquarenumberofcores,andsetNPAR = sqrt(<#cores>)
o ConsidersettingLPLANE = .FALSE. if<#cores> ≥NGZ