17
Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8 Grad OS Course Project Kevin Kastner Xueheng Hu

Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8

  • Upload
    nariko

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8. Grad OS Course Project Kevin Kastner Xueheng Hu. Introduction. Molecular Dynamics (MD) MD is extremely computationally intensive Primarily due to the sheer size of the system - PowerPoint PPT Presentation

Citation preview

Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8

Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8Grad OS Course ProjectKevin KastnerXueheng Hu

1Introduction Molecular Dynamics (MD)MD is extremely computationally intensivePrimarily due to the sheer size of the systemLarge system simulation can potentially take thousands of years on a modern desktopNAMD Parallelized simulation tool for MDRecent release is 2.82Our course project is mainly about investigating the performance attribute in molecular dynamics simulations.

Molecular dynamics is a virtual simulation that depicts the movements of individual atoms and simple molecules in a given system. AndMD simulations are usually very computationally intensive, which is primarily due to the sheer size of the systems being simulated. Large system simulations could potentially take thousands of years to complete on a modern desktop.

The simulation tool that were using is NAMD, and the most recent release is 2. 8

2GPCR Simulation Example

3Here are some videos of what a molecular dynamics simulation does.(Start left video)On the left, is a G-Protein Coupled Receptor, or GPCR protein in a 10 ns simulation, that is, it shows the amount of movement the actual protein would do in only 10 ns real-time.(Start right video)On the right is the same protein as on the left, except this one also shows all of the surrounding water and lipid atoms that are also being calculated in the MD simulation. It should also be noted that there are more atoms in the protein as well that are not shown.As all of these atoms are being considered when doing calculations in MD, you can see how it would be quite computationally intensive.3Summary of Work CompletedPerformance Comparison: NAMD 2.7 vs 2.8Tested three different systems using each version, comparing efficiency of eachHow different size/complexity of the systems affect the performance of NAMDNAMD Scaling analysisForce Field Comparison4Our work contains two main parts: First, we compared the performance of the newest version of NAMD, 2.8, and the previous version, 2.7, on three systems of differing size.

Secondly, we explored how different sizes and complexities of the systems affect the performance of NAMD. - We performed a scaling analysis of both versions of NAMD, finding the optimum number of cores as well as peak performance for systems of varying size. - We also performed a force field comparison by running one of the system on NAMD using AMBER in addition to running it on CHARMM.

4Performance MetricsPerformance Efficiency

Performance Efficiency per Core

Normalized Performance Efficiency per Core

x: core set; base: 12

5Performance Efficiency: time used for the actual movement of the protein divided by the corresponding time used to complete the simulation for the same amount of movement.

Performance Efficiency per core: takes the efficiency of a specific core set and divide it by the efficiency when the based core set is used

Normalized performance per core: performance ration divided by core set ratio

5Simulation Systems

Octopamine Receptor, a GPCR56824 atoms(b) DHFR-TS Fusion Protein82026 atoms(c) Ubiquitin7051 atoms6These are the three systems that we ran.a. The first is a octopamine receptor protein in a lipid membrane, solvated with water and ions, and containing about 57000 atoms.b. The next is a dihydrofolate reductase-thymidylate synthase fusion protein (DHFR-TS) solvated in water and containing about 80000 atoms.c. The last is ubiquitin solvated in water and containing about 7000 atoms.

The simulation systems shown here were tested on the Kraken high-performance computing cluster, which was discussed by Dr. Timothy Stitt in one of our guest lectures.We tested the simulation efficiency of the two NAMD versions with varying core amounts, which I will refer to as core sets.We did 5 runs for each core set for each version of NAMD on three different systems using the CHARMM force field.

6Results - 57000 Atoms57000 Atom Efficiency7

Here are the performance results for the octopamine receptor system containing about 57000 atoms.Shown here are the average performance efficiencies of all 5 runs, with corresponding standard deviation displayed as error bars.As can be seen here, NAMD 2.7 does the same or better than version 2.8 for up to and including 300 cores, supporting our results from earlier.However, an unexpected event happened, in that for the 396 and higher core sets, NAMD 2.8 did much better than NAMD 2.7.Furthermore, NAMD 2.7s efficiency begins to decline at approximately the same point as NAMD 2.8s efficiency has its most drastic increase, with the exception of the beginning, of course.We are as of yet uncertain as to why this occurs.This graph also gives us an approximate optimal number of cores for our 57000 atom system on each version of NAMD, with 2.7s optimum being around 300 and 2.8s being around 504.7Results - 57000 Atoms

57000 Atom Efficiency per Core8This chart shows the average estimated efficiency of each core in each core set compared to our baseline metric of 12 cores.Note that 12 cores are used instead of 1 as this is the lowest amount of cores that can be used in Kraken.As is expected, you see a decrease in the efficiency of each core with the increasing number of cores used.This is of course due to the need for increasing amount of communication between the cores to complete the task.In agreement with the previous chart, NAMD 2.7 makes better use of each core in each core set up to and including 300 cores.However, NAMD 2.8 outperforms 2.7 in the 396 through 1008 core sets, which had also been shown in the previous chart.

8Results - 80000 Atoms

80000 Atom Efficiency9Here are the performance results for the DHFR-TS system containing about 80000 atoms, making it the largest system in our test set.In agreement with our previous results, NAMD 2.7 outperforming for the lower core sets, and NAMD 2.8 outperforming at the higher sets (in this case the dividing line appears to be at 192 through 300 cores).Once again, at the higher core sets NAMD 2.8 outperforms 2.7 by a higher amount than NAMD 2.7 ever outperforms 2.8 (as is shown on this Figure, NAMD 2.7s outperformance at lower core sets is barely noticeable while 2.8s is readily apparent at higher core sets).Something that is worthy of note is the large performance spike that appears at the 1008 core set for NAMD 2.8.9Results - 80000 Atoms

80000 Atom Efficiency per Core10This chart shows the estimated percentage efficiency per core for the 80000 atom system.As is expected, the efficiency of each core decreases with each increase in the total number of cores in the set, due to more time being spent communicating between cores.This graph also demonstrates the efficiency reversal between the two NAMD versions, though here is appears to be at the 504 core set.This is because NAMD 2.8 actually outperforms 2.7 slightly at the base set, so even though the two versions have equivalent efficiencies for the 192 and 300 core sets, NAMD 2.8 appears more inefficient.10Results - 7000 Atoms7000 Atom Efficiency11

Here are the performance results for the ubiquitin system containing about 7000 atoms, making it the smallest system in our test set.This system did not follow the trend of our other two systems, as NAMD 2.8 did better than or equal to 2.7 for nearly every core set.Even more interesting is that NAMD 2.8 outperforms 2.7 the most in the 96 through 192 core sets, instead of for the higher core sets as had been seen previously.Also note that tests higher than 2016 were performed, but had efficiencies less than or equal to what is shown at their respective 2016 core sets, so they were left out for better visibility of the lower end core sets.11Results - 7000 Atoms

7000 Atom Efficiency per Core12As it shows here the estimated percentage efficiency per core for the 7000 atom system.As is expected, the efficiency of each core decreases with each increase in the total number of cores in the set, due to more time being spent communicating between cores.This chart also demonstrates the large efficiency increase per core for NAMD 2.8 over NAMD 2.7 in the 96, 120, and 192 core sets.12Results NAMD Scaling AnalysisOptimal Number of CoresPeak Performance13

We had mentioned before about potentially finding an optimum number of cores for each version of NAMD.These graphs were generated by taking the peak performance values for each system and plotting the size of the systems by either the number of cores (indicate left graph) or the efficiency (indicate right graph).Please note that even though the 7000 atom system had a higher efficiency in the 2016 core set for both versions of NAMD, the peak values found in the lower core sets were chosen as they have a better per core efficiency and would not be as wasteful for the kraken service units being allocated.For NAMD 2.7 an apparent optimum number of cores was found and appears to be at around 300 cores.However, we were not able to find an optimum number of cores for NAMD 2.8, due to the sudden jump in optimum cores for our largest system.

An optimum peak performances did appear to be found and are shown.As expected, the smallest system performed the best and their appears to be a general decrease as the system size increases, though there appears to be a slight increase in performance in NAMD 2.8 from the 57000 atom system to the 80000 atom system.

These results indicate that NAMD 2.8 was optimized for using larger core sets (indicate left graph) on larger systems (indicate right graph).13Results - Force Field ComparisonNAMD 2.7 57000 atomsNAMD 2.8 57000 atoms14

Force field comparisons were also done to see if the differing types of parameterization by the force fields had any sort of influence on NAMDs efficiency.The AMBER force field was used on the 57000 atom system and the performances displayed here for both versions of NAMD.For both versions of NAMD, performances were nearly identical for both force fields, though there was a slight decrease in AMBERs performance in the 504 core set in NAMD 2.8.Despite this decrease, it appears that choosing different force fields has little bearing on the efficiency of either version of NAMD.14Summary of ResultsPerformance Difference 57000 and 80000 atom: NAMD 2.8 was optimized for performance using larger core sets7000 atom: odd results, two possible reasons: performance optimization only works for larger simulation systemsthe performances for either version will start to increase again if giving enough cores and the efficiencies may potentially reverse once again NAMD Scaling AnalysisOptimal Number of CoresPeak PerformanceForce Field ComparisonCHARMM vs AMBER15So in summary, our project had three main parts.First, the Performance Difference - Get from slideNext, for the NAMD Scaling Analysis - We found an apparent optimum number of cores for NAMD 2.7, but could not find it for NAMD 2.8 due to a sudden jump in the 80000 atom system - We found the peak performance trend in both versions of NAMD, and the surprising result that NAMD 2.8 may be optimized for larger systems.Finally, a Force Field Comparison of CHARMM vs AMBER was done for the 57000 atom system. - Despite differing parameters defined by each force field, the efficiencies of both force fields were nearly identical.15Future WorkMore test cases to obtain empirical data for performance boundariesDeeper Analysis on Performance DifferencesSystem CallsNetwork Communications(We need to find out available tools for Kraken)

16Potential future work that could come from this project would be to include more test cases to better determine performance scaling and boundaries.This includes more core sets for each of the systems already tested as well as new systems to test.Something else that could be done would be to find tools that will run on Kraken and do a deeper analysis of what is going on in each of cores.Some aspects that would potentially be very useful to study would be the system calls being done by the cores as well as the network communications that are going on between them.

Other tests that could be done would be a CPU vs GPU comparison, as well as testing systems of equal size yet differing complexities.**Look at statistical tools that look at MPI**16Questions?

17(Start video)So with that, are there any questions?17