16
pNeo M a r k H r e l d e PetaFLOP Applications Working Group – January 16, 2004

PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Embed Size (px)

Citation preview

Page 1: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

pNeo

M a

r

k

H r

e

l

d

e

PetaFLOP Applications Working Group – January 16, 2004

Page 2: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

p

Motivations• Simulation of Epiform Activity

in the Neocortex– understand and fix seizures

• pNeo– from PGENESIS pNeocortex– Streamlined– Compiled– Customized – Instrument and profile

• Performance model– Computation– Communication– System overhead

• Scaling to Billion cell Millions of steps– 10K to 100K nodes

Page 3: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Nano Intro Neuro

• Hodgkin-Huxley Model• Simulation layout

– Multiple compartment cells– Multiple cell types– Multiple classes of dense interconnect– Parallel partitioning

Page 4: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Modeling

IS

Soma

Na K

Spike

Ex

Inh

IS

Soma

Na K

Spike

Ex

Inh

Soma

Na K

Spike

Ex

Soma

Na K

Spike

Ex

Superficial Pyramidal Deep Pyramidal Basket Chandelier

9 ms 12 ms 6 ms 6 ms

Execution time* per step per cell• Integration of state in each compartment• Minor housekeeping• One spike connection

•dt = 0.00001 sec (10 usec)•0.1 sec => 10 sec•10,000 steps => 1 M steps

* 400 MHz Pentium 2 times

Page 5: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Wiring Diagram and Heartbeat

Excitatory Connections Inhibitory ConnectionsDetail from a slice of human focal neocortex

Real neural activity Corresponding simulated activity

Page 6: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Interconnect and Partitioning

• ~8000 connections per cell (within factor of few)

• 30 ns to process a spike event

• Cell grids– 6 cell types– 5 m spacing typ.– ~ 105 cells / mm2

• Connection template– Several conn. types– Annular with hole– 500 m, 5 m– 10% probability, e.g.

• Processor partitioning– Memory limited w/current

simulator– up to ~400 cells per node

Page 7: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

PseudocodeTIME = 0; /* TIME LOOP */ do { foreach Object in MODEL { segment.INIT(); } Synchronize; ExchangeData; foreach SpikingObject in MODEL { if (potential >= threshhold) foreach SynapticConnection { CallEventAction(msg->dst); } } foreach ChannelObject in MODEL { foreach ContactPotential { Adjust(V); } integrate; } foreach CompartmentObject in MODEL { foreach ContactPotential { Adjust(Vm, Rm); } integrate; } step TIME; } while ( TIME < TOTAL_SIM_TIME )

Page 8: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

What I’ve Been Doing

• pNeocortex running on PGENESIS– aggregate: timing and memory

• GENESIS and custom instrumentation

– Chiba and Jazz [PVM]– 256 nodes– 100K neurons (30K unit cells)

• Serial pNeo (sNeo?)– component: timing, memory

• gprof, custom instrumentation

– special configurations– modeling

Page 9: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Scaling with Problem SizeSimulation Time (16x16 Processors)

10

100

1000

10000

100000

1000 10000 100000

Number of Cells

Sim

Tim

e [

se

co

nd

s]

N4

Page 10: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Memory Model

• Objects– by class

• Synaptic Connections– objects– Messages

• Aggregation– neuron types

• Superficial Pyramidal• Deep Pyramidal• Basket cells• Chandelier cells

Page 11: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Execution Time Model

• Model building or loading• Course phases of simulation loop

– Integration– Spike– Communication

• Fine grain model– Compartment– Cell– Spike Event

Page 12: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Projections

• Million Steps– 0.1 ms

• 10K Nodes

• Spike Rate– 1E-3 / spikegen / step

• Connectivity– 12E+3 / spikegen

Page 13: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Optimizations: MEMORY USE• Save memory by generating connection lists

on the fly each time they are needed (seeded algorithm).

• Save memory by compressing connection sublists.– Large number of connections for a relatively small

number of cells (per node) says there's a lot of redundancy in the connection patterns or sub patterns.

Page 14: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

Optimizations: EXECUTION TIME• It looks like the time to process spike events

is the dominant contributor.– Streamlining this would improve execution time for

extremely large runs.– This goal is at odds with memory saving methods

above: computation (replacing lists) might take more time rather than less time to process connection lists.

Page 15: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

QUESTIONS

• Why do the timing for S_Pyr only and D_Pyr only not add up to the timing for BOTH?

• Why is there a Tfreebee term to adjust for the very low first spike step in modeltiming runs?

• What's a good way to measure or estimate firing rate so that it can be used in the model?

• Is there a memory leak: Why does memory used increase during the simulation?

Page 16: PNeo Ma r k Hr e l d e PetaFLOP Applications Working Group – January 16, 2004

pNeo: Next Steps

• Limits– Memory is limiting current size of simulation per node– Communication dominates time at present

• PVM => Ethernet => Slow

– Computation hot spots (?)

• Redemptive tactics– Light weight connections

• Tighten up or compress data structures• Construct on the fly?

– Myranet• pNeo => MPI => Fast

– Detailed performance analysis

• Parallel version