18
VASP: Some Accumulated Wisdom J. M. Skelton WMD Group Meeting 21 st September 2015

VASP: Some Accumulated Wisdom

Embed Size (px)

Citation preview

Page 1: VASP: Some Accumulated Wisdom

VASP:SomeAccumulatedWisdom

J.M.Skelton

WMDGroupMeeting21st September2015

Page 2: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide2

Convergence:Parameters

• FourkeytechnicalparametersinaVASPcalculation:

o Basisset:ENCUT andPREC (or,alternatively,NGX,NGY,NGZ)

o k-pointsampling:KPOINTS fileandSIGMA

o [Forcertaintypesofpseudopotential.] Augmentationgrid:ENAUG andPREC (or,alternatively,NGXF,NGYF,NGZF)

o Whichspacetheprojectionoperatorsareappliedin(LREAL)

Page 3: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide3

Convergence:Augmentationgrid

• Asecond,finermeshisusedtorepresentthechargedensityneartheioncores:controlledbyENAUG (orPREC +EAUG inthePOTCAR files),whichdeterminesNG*F

Page 4: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide4

Convergence:ZnS revisited

• ForcalculationsonZnS withTPSS,ENAUG needstobeincreasedfromthedefault(butENCUT = 550 eVisfine)- equivalenttoincreasingNG*F [butwithout alsoincreasingNG* asintheQHA-ExC paper,whichevidentlyunnecessary(!)]

Page 5: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide5

Convergence:ZnS revisited

• ForcalculationsonZnS withTPSS,ENAUG needstobeincreasedfromthedefault(butENCUT = 550 eVisfine)- equivalenttoincreasingNG*F,butwithout alsoincreasingNG*,whichiswasteful

ENCUT/eV

ENAUG/eV NG* NG*F Noise? t /min

550 575.892 120 160 û -

650 575.892 128 160 û -

750 575.892 140 160 û -

850 575.892 150 160 û -

550 675.892 120 180 ü 116

550 775.892 120 192 ü 108

550 875.892 120 200 ü 113

Page 6: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide6

TheVASPSCFcycle

• TheSCFcycleproceedsintwophases:

o Theplane-wavecoefficientsareinitialisedrandomlyand“pre-optimised”withinafixedpotentialgivenbythesuperposition ofatomicdensities(INIWAV,NELMDL)

o Thewavefunctions anddensityarethenoptimisedself-consistentlytoconvergence(EDIFF,NELMIN,NELM)

o Ifaninitialchargedensityexists(e.g.fromapreviousSCForconvergedCHGCAR/WAVECAR),thefirststepcanbeskipped(ISTART,ICHARG)

• Toaccelerateconvergence,theoutput densityfromastepN isnotfeddirectlyintothenextstepN+1,but ismixedwiththeinputdensity(IMIX,INIMIX,MIXPRE,MAXMIX,AMIX,AMIN,AMIX_MAG,BMIX,BMIX_MAG,WC)

• Forthemathematically-minded:http://th.fhi-berlin.mpg.de/th/Meetings/DFT-workshop-Berlin2011/presentations/2011-07-14_Marsman_Martijn.pdf

Page 7: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide7

TheVASPSCFcycle

N E dE d eps ncg rms rms(c)DAV: 1 0.425437171796E+04 0.42544E+04 -0.38613E+05 920 0.178E+03DAV: 2 -0.114846409831E+04 -0.54028E+04 -0.51653E+04 1130 0.323E+02DAV: 3 -0.169662738043E+04 -0.54816E+03 -0.53994E+03 1130 0.100E+02DAV: 4 -0.171494085624E+04 -0.18313E+02 -0.18206E+02 1160 0.198E+01DAV: 5 -0.171553585547E+04 -0.59500E+00 -0.59387E+00 1220 0.331E+00 0.706E+01RMM: 6 -0.159733114612E+04 0.11820E+03 -0.21124E+02 920 0.147E+01 0.352E+01RMM: 7 -0.157358217358E+04 0.23749E+02 -0.82778E+01 920 0.937E+00 0.173E+01RMM: 8 -0.157195752202E+04 0.16247E+01 -0.10028E+01 922 0.344E+00 0.736E+00RMM: 9 -0.157170732229E+04 0.25020E+00 -0.24051E+00 920 0.173E+00 0.186E+00RMM: 10 -0.157170709721E+04 0.22508E-03 -0.17654E-01 932 0.561E-01 0.965E-01RMM: 11 -0.157173130475E+04 -0.24208E-01 -0.10240E-01 920 0.332E-01 0.466E-01RMM: 12 -0.157174953342E+04 -0.18229E-01 -0.23004E-02 920 0.198E-01 0.213E-01RMM: 13 -0.157175624413E+04 -0.67107E-02 -0.12470E-02 920 0.134E-01 0.938E-02RMM: 14 -0.157175705572E+04 -0.81159E-03 -0.49641E-03 922 0.781E-02 0.577E-02RMM: 15 -0.157175711576E+04 -0.60039E-04 -0.62130E-04 922 0.302E-02 0.211E-02RMM: 16 -0.157175714692E+04 -0.31162E-04 -0.18825E-04 932 0.152E-02 0.146E-02RMM: 17 -0.157175715237E+04 -0.54516E-05 -0.37827E-05 935 0.701E-03 0.564E-03RMM: 18 -0.157175715526E+04 -0.28845E-05 -0.88070E-06 824 0.340E-03 0.361E-03RMM: 19 -0.157175715551E+04 -0.24851E-06 -0.27408E-06 657 0.209E-03

1 F= -.15717572E+04 E0= -.15717572E+04 d E =-.291254-147

BetweenNELMINandNELM stepsintotal

NELMDL stepsinafixedpotential

Minimisationalgorithm

Totalfreeenergy

Changeintotalenergyandeigenvalues

Numberofevaluationsof𝐻"#𝛹⟩

Differenceininputandoutput density;oscillationsprobablyindicateconvergenceproblems

Totalfreeandzero-broadening(𝜎 → 0)energy

Page 8: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide8

TheALGO tag

• ALGO isthe“recommended”tagforselectingtheelectronic-minimisationalgorithm

• Mostofthealgorithmshave“subswitches”,whichcanbeselectedusingIALGO

• ItendtouseoneoffourALGOs:

• RMM-DIIS(ALGO = VeryFast):fastestperSCFstep,bestparallelised,andconvergesquicklyclosetoaminimum,butcanstrugglewithdifficultsystems

• BlockedDavidson(ALGO = Normal):slower thanRMM-DIIS,butusuallystable,althoughcanstillstrugglewithdifficultproblems (e.g.magnetism,meta-GGAsandhybrids)

• Davidson/RMM-DIIS(ALGO = Fast):UsesALGO = Normal forthe“pre-optimisation”,thenswitchestoALGO = VeryFast;agooddefaultchoice

• All-bandconjugategradient(ALGO = All):Slow,butverystable;useasafallbackwhenALGO = Normal struggles,andforhybrids

Page 9: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide9

TamingTPSS(andothermeta-GGAs)

!ALGO = Normal | All

!GGA = PSMETAGGA = TPSS | revTPSS | M06L

LASPH = .TRUE.LMIXTAU = .TRUE.

!ENAUG = MAX(EAUG) * 1.5!NGXF = <>; NGYF = <>; NGZF = <>;

• Inmyexperience,meta-GGAscansometimesbemoredifficulttoconvergethanstandardGGAfunctionals (orevenhybrids)

RMM-DIIS(ALGO = Fast | VeryFast)sometimesstruggle

Don’tforget- (rev)TPSSarebasedonPBE

AsphericalgradientcorrectionsinsidePAWspheres

Passkinetic-energydensitytothecharge-densitymixer

Mayneedto increaseENAUG/NG*F ifveryaccurateforcesareneeded(e.g.phonons)

Page 10: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide10

Parallelisation

• ThenewestversionsofVASPimplementfourlevelsofparallelism:

o k-pointparallelism:KPAR

o Bandparallelismanddatadistribution:NCORE andNPAR

o Parallelisationanddatadistributionoverplane-wavecoefficients(=FFTs;doneoverplanesalongNGZ):LPLANE

o Parallelisationofsomelinear-algebraoperationsusingScaLAPACK (notionallysetatcompiletime,butcanbecontrolledusingLSCALAPACK)

• Effectiveparallelisationwill…:

o …minimise(relativelyslow)communicationbetweenMPIprocesses,…

o …distributedatatoreducememoryrequirements…

o …andmakesuretheMPIprocesseshaveenoughworktokeepthembusy

Page 11: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide11

Parallelisation:Workloaddistribution

Cores

KPAR k-pointgroups

NPAR bandgroups

NGZ FFTgroups(?)

• WorkloaddistributionoverKPAR k-pointgroups,NBANDS bandgroupsandNGZ plane-wavecoefficient(FFT)groups[not100%surehowthisworks…]

Page 12: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide12

Parallelisation:Datadistribution

Data

KPAR k-pointgroups

NPAR bandgroups

NGZ FFTgroups(?)

• DatadistributionoverNBANDS bandgroupsandNGZ plane-wavecoefficient(FFT)groups[alsonot100%surehowthisworks…]

Page 13: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide13

Parallelisation:KPAR

• DuringastandardDFTcalculation,k-pointsareindependent ->k-pointparallelismshouldbelinearlyscaling,althoughperhapsnot inpractice:https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/

• <#cores>mustbedivisiblebyKPAR,buttheparallelisationisvia a“round-robin”algorithm,so<#k-points> doesnotneedtobedivisiblebyKPAR ->checkhowmanyirreducible k-pointsyouhave(head IBZKPT)andsetKPARaccordingly

k1

k2

k3

k1 k2

k3

k1 k2 k3

KPAR = 1t =3[OK]

KPAR = 2;t =2[Bad]KPAR = 3t =1[Good]

R1

R2

R3

R1

R2

R1

Page 14: VASP: Some Accumulated Wisdom

NCORE :numberofcoresinbandgroupsNPAR :numberofbandstreatedsimultaneously

WMDGroupMeeting,September2015|Slide14

Parallelisation:NCORE andNPAR

NCORE =< #cores >NPAR

• WhynotNCORE = 1/NPAR = <#cores> (thedefault)?- morebandgroups(probably)increasesmemorypressureandincursasubstantialcommunicationoverhead

7.08x

6.41x

6.32x

Page 15: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide15

Parallelisation:NCORE andNPAR

• WARNING:VASPwillincreasethedefaultNBANDS tothenearestmultipleofthenumberofgroups

• SincetheelectronicminimisationscalesasapowerofNBANDS, thiscanbackfireincalculationswithalargeNPAR (e.g.thoserequiringNPAR = <#cores>)

Cores

NBANDS

Default Adjusted

96 455 480

128 455 512

192 455 576

256 455 512

384 455 768

512 455 512

NBANDS=NELECT

2+NIONS2

Examplesystem:

• 238atomsw/272electrons

• DefaultNBANDS =455

NBANDS =35NELECT +NMAG

Page 16: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide16

Parallelisation:Memory

• KPAR:currentimplementationdoesnotdistributedataoverk-pointgroups->KPAR = NwilluseN xmorememorythanKPAR = 1

• NPAR/NCORE:dataisdistributedoverbandgroups->decreasingNPAR/increasingNCORE byafactorofN willreducememoryrequirementsbyN x

• NPAR takesprecedenceoverNCORE - ifyouuse“master”INCAR files,makesureyoudon’tdefineboth

• ThedefaultsforNPAR/NCORE (NPAR = <#cores>,NCORE = 1)areusuallyapoorchoiceforbothmemoryand performance

• Bandparallelismforhybridfunctionals hasbeensupportedsinceVASP5.3.5;formemory-intensivecalculations,itisagoodalternativetounderpopulating nodes

• LPLANE:distributesdataoverplane-wavecoefficients,andspeedsthingsupbyreducingcommunicationduringFFTs- thedefaultisLPLANE = .TRUE.,andshouldonlyneedtobechangedformassively-parallelarchitectures(e.g.BG/Q)

Page 17: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide17

Parallelisation:ScaLAPACK

• RMM-DIIS(ALGO = VeryFast | Fast)involvesthreesteps:

EDDIAG :subspacediagonalisationRMM-DIIS :electronicminimisationORTHCH :wavefunction orthogonalisation

Routine 312atoms 624 atoms 1,248atoms 1,872 atoms

EDDIAG 2.90(18.64%) 12.97(22.24%) 75.26(26.38%) 208.29(31.31%)

RMM-DIIS 12.39(79.63%) 42.73(73.27%) 187.62(65.78%) 379.80(57.10%)

ORTHCH 0.27(1.74 %) 2.62(4.49%) 22.36(7.84%) 77.11(11.59%)

• EDDIAG andORTHCH formallyscaleasN3,andrapidlybegintodominatetheSCFcycletimeforlargecalculations

• AgoodscaLAPACK librarycanimprovetheperformanceoftheseroutinesinmassively-parallelcalculations

Seealso:https://www.nsc.liu.se/~pla/blog/2014/01/30/vasp9k/

Page 18: VASP: Some Accumulated Wisdom

WMDGroupMeeting,September2015|Slide18

Parallelisation:My“rulesofthumb”

• Forx86_64IBsystems(Archer,Balena,Neon…):

o UseKPAR inpreferencetoNPAR

o SetNPAR = (<#nodes>/KPAR) orNCORE = <#cores/node>

o 1node/band groupper50atoms;maywanttouse2nodes/50 atomsforhybrids,ordecreaseto½nodeperbandgroupfor<10atoms

o ALGO = Fast isausuallyagoodchoice,exceptforbadly-behavedsystems

o LeaveLPLANE atthedefault(.TRUE.)

o FortheIBMBG/Q(STFCHartree):

o TheHartree machinecurrentlyusesVASP5.2.x->noKPAR

o Trytochooseasquarenumberofcores,andsetNPAR = sqrt(<#cores>)

o ConsidersettingLPLANE = .FALSE. if<#cores> ≥NGZ