View
216
Download
1
Category
Tags:
Preview:
Citation preview
apeNEXT:apeNEXT:Computational Challenges Computational Challenges and First Physics Resultsand First Physics Results
Florence, 8-10 February, 2007Florence, 8-10 February, 2007
From Theories to Model Simulations A Computational Challenge
G.C. RossiUniversity of Rome Tor VergataINFN - Sezione di Tor Vergata
Outline of the talk1. Introduction
● New powerful computational tools● Cross-fertilization of nearby research fields● New challenges and opportunities
2. Applications● Field Theory and Statistical Mechanics ● Dynamical Systems ● Systems of biological interest● …
3. The case of biological systems● What was done with APE● What we are doing● What one may think of doing
4. A future for a biocomputer?
FROM PROGRESSIVESEPARATION
TO OVERLAPPINGAREAS OF INTEREST
NATURALSCIENCE
Mathematics Physics Biology
Introduction
Meteorology Fluido- Metabolic Turbulencedynamics networks Chaos
Statistical Quantum Structure of Econo-physicsPhysics Field Theory Macro-molecules
Similar Mathematical Description and Algorithms Cross-fertilization among nearby Research Fields
Numerical Simulations Dedicated Computers
Dealt with by Numerical Tools Advances in Computer Developments
Stochastic Methods
PDE & Stability Analysis
Lattice QCDComputational AstrophysicsWeather ForecastingGenome ProjectComputational Biology
New Architectures
Exponential Increase of
Wide Spectrum of Applications
Parallel PlatformsPC-ClustersGRID
CPU TimeMemoryStoring Capacity
(Yet another instance of Zipf’s law!)
No. 1 BlueGene/L
at DOE’s LLNB
with 280 Tflops
Intelligent Exploitation of the available unprecedented Computational Power
is driving us
to conceive New Problems to more Realistic Model systems to promising New Computational Strategies
We are entering the era of
Intelligent Computing
1) Lattice QCD The plaquette expectation value The ’ mass Full QCD with Wilson fermions (either twisted or not) …
3) Systems of Biological interest Bi-layers models of cell membranes
models of proteins Eteropolymers models of bio-macromolecules
recognition processes
… models of …
Examples
2) Dynamical Systems Weather forecasting Climate change scenarios Fluido-dynamics and turbulence …
Lattice QCD
I’m not going to talk about
nor about
Dynamical Systems
Some computational issues in simulating Systems of Biological interest
G. La Penna (CNR) F. Guerrieri (UTV)
V. Minicozzi (UTV) S. Furlan (CNR)
S. Morante (UTV) G. Salina (INFN)
Rather I will try to comment on
What was done (by us) with APE machines
Machine System Results Methods
Cubetto Gas/Liquid Extensive MD
Torre Butane and H2O studies MC
APE100 Lipidic Membranes of Phase HMC
APEmille Diffusive systems Diagram FFT
APEnext Smooth portability
1. Good use of parallelism
2. Efficient codes
3. Can one move on to more interesting systems?
Model simulations of Phospholipid Bilayers
C) 36 (18+18) DMPC Molecules + 72 (36+36) H2O Molecules
DMMC
DMPC
A) 64 (32+32) DMMC Molecules
B) 512 (256+256) DMMC Molecules
La Penna, Minicozzi, Morante, Rossi, Salina
Phospholipid Bilayer Phase Diagram
L Fluid
L’ Gel
L Solid(Superficial) Density
% Water
0
10
20
30
40
50
60T
10 20 30 50 6040
LL’+ H2O
P’+ H2O
L+ H2OL
P’
L’
Important physiological parameter
More interesting systems/problems
• 3D structure of Macromolecules (folding and misfolding)
• Antigen – Antibody recognition
• Protein – DNA interaction
• Protein docking and translocation
• …
by ab initio (Q.M.) methods
• Signal transduction
• Metabolic Networks
• …
by PDE’s and stochastic simulations
3D structure of Macromolecules
Functionality of a protein requires correct folding
-helix -sheet ...
Misfolding leads to malfunctioning, often to severe pathologies
Creutzfeldt-Jakob disease BSE (mad cow) Alzheimer disease Cystic fibrosis ...
Question: Can we tell the 3D-Structure from the linear a.a. sequence? Answer:
Very difficult: it looks like an NP-complete problem!
A test case: Prion Protein PrP (A bit of phenomenology)
PrP is a cell glycoprotein (highly expressed in the central nervous system of many mammals), whose physiological role is still unclear
It is, however, known to selectively bind copper, Cu
Mature PrP has a flexible, disordered, N-terminal (23-120) and a globular C-terminal (121-231)
The N-terminal domain contains four (in humans) copies (repeats) of the octa-peptide PHGGGWGQ, each capable of binding Cu
Cu-octarepeat interaction is cooperative in nature and can possibly have a role in disease related PrP misfolding and aggregation
Experiments more specifically indicate that the Cu binding site is located within the HGGG tetra-peptide
An ideal case for Car-Parrinello ab initio simulations
BoPrP (bovine)-helices = green-strands = cyan, segments with non-regular secondary structure = yellowflexibly disordered "tail" (23-121) = yellow dots
HuPrP (human)-helices = orange-strands = cyan segments with non-regular secondary structure = yellow,flexibly disordered "tail" (23-121) = yellow dots
Initial 2x[Cu(+2) HG(-)G(-)G] configuration
Cu(+2)
O
N
C
H
V = (15 A)3
Furlan, La Penna, Guerrieri, Morante, Rossi, in JBIC (2007)
[HG(-)G(-)G]1
[HG(-)G(-)G]2
[Cu(+2)]1
[Cu(+2)]2
1.8 ps trajectory @ 300K
Cu
O
N
C
Final 2x[Cu(+2) HG(-)G(-)G] configuration
Cu(+2)
O
N
C
H
[HG(-)G(-)G]1
[HG(-)G(-)G]2
[Cu(+2)]2
[Cu(+2)]1
No Cu atoms No binding
O
N
C
H
• Quantum – ESPRESSO (www.pwscf.org) freely available
P. Giannozzi, F. de Angelis, R. Car, J. Chem. Phys. 120, 5903 (2004)
• Ultrasoft (Vanderbilt) potentials – PBE Functional
(Fortran 90 – FFTW - OpenMPI)
Standard Benchmarks for CPMD - I
# Processors Nodes Time/stepsec
32 8 3.45
64 16 2.41
128 32 1.24
256 64 0.89
32 molecules PSC Lemieux cluster with
water: 256 electrons on a up to 3000 processors
cutoff = 70 Ry QUAD Alpha @ 1 GHz
S. Kumar, Y. Shi, L.V. Kale, M.E: Tuckerman and G.J. Martyna (2002)
# Processors Nodes Time/stepsec
384 128 0.64
768 256 0.51
1024 512 0.37
1360 680 0.30
Very sophisticated implementation software (Charm++)
1 g/cm3
CPAIMD code
Standard Benchmarks for CPMD - II
# Processors Time/stepsec # Processors Time/step
sec
16 6.6 128 0.97
32 3.35 256 0.64
64 1.85 512 0.45
32 molecules IBM BlueGene/L with
water: 256 electrons on up to 1024 processors
cutoff = 100 Ry PowerPC 440 @ 0.7 GHz
J. Hutter and A. Curioni (2006)
CPMD code
IBM Res. Lab. (Zurich)
M.P.I. Stuttgart
Very sophisticated implementation software
# Processors MPI SMP Time/stepsec
512 64 8 99.4
1024 256 4 71.9
1024 128 8 56.3
1232 154 8 52.1
Standard Benchmarks for CPMD - III
1000 atoms IBM cluster with
SiC: 4000 electrons on up to 40 x 32 processors
cutoff = 60 Ry Power4 @ 1.3 GHz
J. Hutter and A. Curioni (2006)
CPMD code (Fortran 77)
IBM Res. Zurich Lab.
M.P.I. Stuttgart
# Processors Time/stepsec
8 15.2
14 18.4
Standard Benchmarks for CPMD - IV
water: 32 molecules Fermi cluster with
256 electrons on up to 16 processors
Intel/Pentium IV @ 1.7 GHz cutoff: small = 25 Ry
large = 250 Ry
S. Furlan, F. Guerrieri, G. La Penna, S. Morante and G.C. Rossi (2007)
V = (10 Å)3
ESPRESSO code
Fortran 90
not worse that the other big platforms!
CP computational burdenfrom J. Hutter
Typical for a long range problem on a non optimazed comm. network
A Few (Tentative) Conclusions
• If one so wishes, APE (like BlueGene/L) can be used for CPMD
• (Small) PC-clusters almost as good as large parallel platforms
efficiency is limited by communication latency
scaling with number of nodes is limited (upper bound)
Question: Do we really need such a massive parallelism?
A(n intermediate a)nswer: a compact, fully connected, cluster
knN
nN nf
Nkf N
21
0
e )(1~
NNcs logTs
P cyclically ordered processors (N/P lumps of data per processor)
P procesors working on N/P lumps+
non-local communication steps of N/P lumps
P procesors working on N/P lumps+
non-local communication steps of N/P lumps
Pr1
Pr2PrP
PP
logP
T PPP NbNN
c
Pr1
Pr2PrP
NcNN
c PPP Plog
PT
Network topology:Network topology: 1D Fast Fourier Transform
n.n. comm.
P comm. steps of N/P lumps = N
one way comm. ring
P2 comm. steps of N/P lumps = NP
Pr1
Pr2PrP
PP
logP
T PPP NbNN
a
Pr1
Pr2PrP
NcNN
a PPP Plog
PT
Pr1
Pr2PrP
P P PT logP P P
N N Na d
Ideal communication network: all-to-all
All-to-all comm.
1 comm step of N/P lumps N/P
• # physical links grows with P2
• Machine linear dimension grows with P
• Very hard (impossible?) to maintain synchronization @ GHz over few 102 meters
• Limited number of processors with 5-10 Tflops peak speed
• Most possibly connected communication network
Towards a biocomputer…
Two (among many) options
1) A 3x3x3 cube of Cell processors 27 x 200 Gflops= 5.4 Tflops
2) A 5x5x5 cube of APE(next)2 processors 125 x 5(?) Gflops= 6.25 Tflops
1-dimensional all-to-all comm.
in each dimension = 81 links
(full 3-dimesional = 351 links)
1-dimensional all-to-all comm.
in each dimension = 750 links
(full 3-dimesional = 7750 links)
x+1x+2x+3
x+4
y+2
x,y,z
y+1 y+3y+4
z+1z+2z+3
z+4
Processor available
Networking?
Processor under development
Networking know-how available
or 4x4x4
or 2x2x2
Conclusions
• we need a factor 103 in simulation times (few nanoseconds)
• system size (few 103 - 104 atoms/electrons) is almost OK
• not too far from target with current technology!
To attack “biological recognition events”
Very exciting perspectives
Recommended