Hank’s ActivitiesLonghorn/XD AHM
Austin, TXDecember 20, 2010
Volume rendering of 4608^3 combustion data set
Image credit: Mark Howison
Volume rendering of flame data set using VisIt + IceT on Longhorn. Image credit: Tom Fogal
My perception of my role in Longhorn/XD
Help users succeed via: Direct support Ensuring necessary algorithms/functionality are in
place Research most effective way to utilize
Longhorn Also help test machine through aggressive usage
Collaborate with / facilitate for other project members
Provide visibility for center externally (outreach, etc)
Outline Researching how to best use Longhorn
HW-accelerated volume rendering on Longhorn SW-ray casting on Longhorn
Collaborations Manta/VisIt VDF/VisIt
User support Analysis of 4K^3 turbulent data
Connected components algorithms Other user support
Outreach
HW-accelerated volume rendering on longhorn
“Large Data Visualization on Distributed Memory Multi-GPU Clusters”, HPG2010
Authors: Fogal, Childs, Shankar, Krueger, Bergeron, and Hatcher
Ran VisIt + IceT on Longhorn, varying data size and number of GPUs.
Stage data on CPU, transfer to GPU (high transfer time, but can look at bigger data sets)
Volume rendering of flame data set using VisIt + IceT on Longhorn. Image credit: Tom Fogal
HW-accelerated volume rendering on longhorn
Observation about CPU volume rendering:
Number of cores
Large Small
Ray evaluation Fast Slow
Compositing Slow Fast
Paper purpose: study the performance characteristics of GPU volume rendering at
high concurrency on big data.
Paper purpose: study the performance characteristics of GPU volume rendering at
high concurrency on big data.
Idea: GPU volume rendering has the computational horsepower to do ray evaluation quickly, but will have many fewer MPI participants.
Big data
Lots of GPUs
Fast-ish on small data
Software ray-casting
Previous work (not XD-related): “MPI-Hybrid Parallelism for Volume
Rendering on Large Multi-Core Systems”, EGPGV 2010
Authors: Howison, Bethel, and Childs Strong scaling study up 216,000 cores on ORNL
Jaguar machine looking at 4608^3 data. Study outcome: hybrid parallelism benefits this
algorithm, primarily during the compositing phase, since there are less participants in MPI communication.
One of two EGPGV best paper winners, invited for follow on article to TVCG.
Volume rendering of combustion data set
Image credit: Mark Howison
Software ray-casting
TVCG article (unpublished research): Add
weak scaling study (up to 22K^3) on Jaguar GPU scaling study on Longhorn
GPU scaling study: Went up to 448 GPUs Purpose: similar to Fogal work, but with a
different spin … show that hybrid parallelism is beneficial. Instead of pthreads or OpenMP on the CPU, we are now using CUDA on the GPU.
Scaling results on GPU
2308^3 data
2308^3 data 2308^3-
4608^3 data
2308^3-4608^3
data
Software ray-casting on Longhorn
Two caveats:(1)We didn’t optimize for CUDA. So we could
have had favorable numbers to an even higher concurrency level.
(2)But Jaguar @ 46K processors has more memory and can look at way bigger data sets.
Two caveats:(1)We didn’t optimize for CUDA. So we could
have had favorable numbers to an even higher concurrency level.
(2)But Jaguar @ 46K processors has more memory and can look at way bigger data sets.
Takeaway: for this algorithm and this data size, longhorn is as powerful as 46K processors of
jaguar.
Takeaway: for this algorithm and this data size, longhorn is as powerful as 46K processors of
jaguar.
Manta/VisIt
Carson Brownlee delivers integration of VisIt and Manta via vtkManta objects.
Hank does some small work: Updates work from VisIt 2.0 to
VisIt 2.2 & makes a branch for Hank and Carson to put fixes on.
Testing Carson and Hank create a list of
issues and are in the process of tracking them down.
Rendering of isosurface by VisIt using Manta
Visualizing and Analyzing Large-Scale Turbulent Flow
Detect, track, classify, and visualize features in large-scale turbulent flow.
Analysis effort by Kelly Gaither (TACC), Hank Childs (LBNL), & Cyrus Harrison (LLNL).
Stresses two algorithms that are difficult in a distributed memory parallel setting:
1. Can we identify connected components?
2. Can we characterize their shape?
VisIt calculated connected components on a 4K^3 turbulence data in parallel using TACC's Longhorn machine. 2 million components were initially identified and then the map expression was used to select only the components that had total volume greater than 15. Data courtesy of P.K. Yeung & and Diego Donzis
Identifying connected components in parallel is difficult.
Hard to do efficiently
Tremendous bookkeeping problem.
4 stage algorithm that finds local connectivity and then merges globally.
Participating in 2011 EGPGV submission describing this
algorithm and its performance. Authors: Harrison, Childs, Gaither
Participating in 2011 EGPGV submission describing this
algorithm and its performance. Authors: Harrison, Childs, Gaither
We used shape characterization to assist our feature tracking.
15
Shape characterization metric: chord length distribution Difficult to perform
efficiently in a distributed memory setting
P0P1
P3
P2Line Scan Filter
1) ChooseLines
2) CalculateIntersections
3) Segmentredistribution
4) Analyzelines
5) Collectresults
Line Scan Analysis Sink
It is our hope that chord length distributions, a characteristic
function, can assist in tracking component behavior over time.
It is our hope that chord length distributions, a characteristic
function, can assist in tracking component behavior over time.
My role in this effort
Easily summarized: “use VisIt to get results to Kelly”
Several iterations: Started with just statistics of components Looked at how variation in isovalue affected statistics Added in chord length distributions as a characteristic
function Took still images of each component for visual
inspection (recently) extracted each component as its own
surface for combined inspection.
VDF/VisIt
John Clyne and Dan Lagreca add VDF reader to VisIt.
Hank performs some testing and debugging. Still lots to do:
Formal commit to VisIt repo. Also add in new VisIt multi-res hooks.
Study how well large features are preserved across refinement level.
Use coarsest versions in conjunction with analysis code from Janine Bennett.
Other user support
Small amount of effort helping Saju Varghese and Kentaro Nagmine of UNLV Fixed VisIt bug with ray-casting + point
meshes Helped them format their data into BOV
format
Outreach & Service VisIt tutorials:
SC10 (beginning and advanced), Nov 2010, NOLA Users at US ARL, Sep 2010, Abderdeen, MD SciDAC 2010, July 2010, Chattanooga, TN
Speaker at NSF Extreme Scale I/O and Data Analysis Workshop, March 2010, Austin, TX
Participant in NSF Workshop on SW Development Environments, Sep 2010, Washington DC
Given ~10 additional talks at various venues this year
Proposed Future Plans
Continue collaboration with Kelly on analyzing turbulent flow
Formally integrate VDF Multi-res study with John & Kelly Would like to do 1T cell
runs on Longhorn Continued user support
Esp. CIG Connected components @
EGPGV VisIt + GPU
Two trillion cell data set, rendered
in VisIt by David Pugmire on ORNL
Jaguar machine
Summary Researching how to best use Longhorn
HW-accelerated volume rendering on Longhorn SW-ray casting on Longhorn
Collaborating with other Longhorn/XD members Manta/VisIt VDF/VisIt
Doing user support Helping Kelly analyze 4K^3 turbulent data
Working to make sure connected components algorithms is up to snuff
Some user support and more to come… Performing outreach activities