View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Data-Intensive Science: Data-Intensive Science: Addressing common needs Addressing common needs
with shared toolswith shared tools
Data-Intensive Science: Data-Intensive Science: Addressing common needs Addressing common needs
with shared toolswith shared tools
Christopher StubbsChristopher Stubbs
ProfessorProfessor
Department of PhysicsDepartment of Physics
Department of AstronomyDepartment of [email protected]@fas.harvard.edu
Christopher StubbsChristopher Stubbs
ProfessorProfessor
Department of PhysicsDepartment of Physics
Department of AstronomyDepartment of [email protected]@fas.harvard.edu
2
Storing, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data setsStoring, analyzing, and exploiting large data sets
Searching for dark matter Searching for dark matter and dark energyand dark energy
Searching for dark matter Searching for dark matter and dark energyand dark energy
Searching Searching
for new for new
elementary elementary
particlesparticles
Searching Searching
for new for new
elementary elementary
particlesparticles
Detailed imaging of brain functionDetailed imaging of brain functionDetailed imaging of brain functionDetailed imaging of brain function
3
Some common threadsSome common threadsSome common threadsSome common threads• Ambitious instruments copious dataAmbitious instruments copious data
• E.g. tens of TB per night from imminent astronomy surveys
• Loosely coupled computingLoosely coupled computing• Don’t need linked analysis that uses all images
• Diverse applications from common dataDiverse applications from common data• Simulations are an integral aspect Simulations are an integral aspect • Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere• International collaborationsInternational collaborations• Computer science aspects Computer science aspects
• World’s largest non-proprietary databases• Clustering, data mining, file system optimization…
• Ambitious instruments copious dataAmbitious instruments copious data• E.g. tens of TB per night from imminent astronomy surveys
• Loosely coupled computingLoosely coupled computing• Don’t need linked analysis that uses all images
• Diverse applications from common dataDiverse applications from common data• Simulations are an integral aspect Simulations are an integral aspect • Build apparatus here, run it elsewhereBuild apparatus here, run it elsewhere• International collaborationsInternational collaborations• Computer science aspects Computer science aspects
• World’s largest non-proprietary databases• Clustering, data mining, file system optimization…
4
5
27 km27 km27 km27 km
CERN, outside GenevaCERN, outside GenevaCERN, outside GenevaCERN, outside Geneva
6
Seriously Big Toys. Seriously Big Toys. Seriously Big Toys. Seriously Big Toys.
Harvard involvement Harvard involvement in ATLAS detector:in ATLAS detector:
• J. DaCosta and G. Brandenberg at CERN now, in shakedown
• Built muon chambers here
• J. Huth plays leadership role in scientific computing for LHC
Harvard involvement Harvard involvement in ATLAS detector:in ATLAS detector:
• J. DaCosta and G. Brandenberg at CERN now, in shakedown
• Built muon chambers here
• J. Huth plays leadership role in scientific computing for LHC
Event SimulationsEvent Simulations
>30 Million event >30 Million event simulations are typicalsimulations are typical
Pick an interactionPick an interaction
Propagate through Propagate through model of the detectormodel of the detector
Measure detection Measure detection efficienciesefficiencies
>30 Million event >30 Million event simulations are typicalsimulations are typical
Pick an interactionPick an interaction
Propagate through Propagate through model of the detectormodel of the detector
Measure detection Measure detection efficienciesefficiencies
On-the-fly event On-the-fly event reconstructionreconstruction
Find tracks Find tracks
and trigger/store and trigger/store if interestingif interesting
Find tracks Find tracks
and trigger/store and trigger/store if interestingif interesting
Precise track Precise track determination determination Precise track Precise track
determination determination
AggregateAggregate
event statisticsevent statistics
AggregateAggregate
event statisticsevent statistics
ATLAS computingATLAS computingATLAS computingATLAS computing• 5 million lines of code5 million lines of code
• 200 developers, worldwide200 developers, worldwide
• 200 collision events per second200 collision events per second
• Automated event selection in firmwareAutomated event selection in firmware
• Selected subset of events to diskSelected subset of events to disk
• These selected events distributed These selected events distributed worldwide to a hierarchy of data centers.worldwide to a hierarchy of data centers.
• 5 million lines of code5 million lines of code
• 200 developers, worldwide200 developers, worldwide
• 200 collision events per second200 collision events per second
• Automated event selection in firmwareAutomated event selection in firmware
• Selected subset of events to diskSelected subset of events to disk
• These selected events distributed These selected events distributed worldwide to a hierarchy of data centers.worldwide to a hierarchy of data centers.
Sky Surveys in AstronomySky Surveys in AstronomySky Surveys in AstronomySky Surveys in AstronomyOptical:Optical:
PanSTARRSPanSTARRS
1.4 Gpix, 1.8m1.4 Gpix, 1.8m
Optical:Optical:
PanSTARRSPanSTARRS
1.4 Gpix, 1.8m1.4 Gpix, 1.8m
Radio:Radio: Mileura Wide-Field ArrayMileura Wide-Field Array
1 km array of 8000 custom antennas1 km array of 8000 custom antennas
128 gigabit/s computing challenge128 gigabit/s computing challenge
Radio:Radio: Mileura Wide-Field ArrayMileura Wide-Field Array
1 km array of 8000 custom antennas1 km array of 8000 custom antennas
128 gigabit/s computing challenge128 gigabit/s computing challenge
11
Close, Far,Close, Far,Recent AncientRecent Ancient
Expansion Expansion historyhistory can be mapped by measuring can be mapped by measuring both distances and redshiftsboth distances and redshifts
Our View of the Expanding UniverseOur View of the Expanding UniverseOur View of the Expanding UniverseOur View of the Expanding Universe
Expansion causes stretching of light, “redshift”Expansion causes stretching of light, “redshift”
12
(Hubble Space Telescope, NASA)(Hubble Space Telescope, NASA)
Supernovae are powerful cosmological probes
Distances to ~6% from brightness
Redshifts from features in spectra
13
Redshift = Δλ / λ
Distanceto Supernova
Far away
Nearby0.01 0.1 1.0
Δλλ
Schmidt et al, High-z SN TeamSchmidt et al, High-z SN Team
14
Near Earth AsteroidsNear Earth AsteroidsNear Earth AsteroidsNear Earth Asteroids
• Inventory of solar system is incompleteInventory of solar system is incomplete• R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers• R=300m asteroids in ocean wipe out a R=300m asteroids in ocean wipe out a
coastlinecoastline• Demanding project: requires mapping the sky Demanding project: requires mapping the sky
down to 24down to 24thth every few days, individual every few days, individual exposures not to exceed ~20 sec. exposures not to exceed ~20 sec.
• PanSTARRS will detect NEAs to ~400m PanSTARRS will detect NEAs to ~400m
• Inventory of solar system is incompleteInventory of solar system is incomplete• R=1 km asteroids are dinosaur killersR=1 km asteroids are dinosaur killers• R=300m asteroids in ocean wipe out a R=300m asteroids in ocean wipe out a
coastlinecoastline• Demanding project: requires mapping the sky Demanding project: requires mapping the sky
down to 24down to 24thth every few days, individual every few days, individual exposures not to exceed ~20 sec. exposures not to exceed ~20 sec.
• PanSTARRS will detect NEAs to ~400m PanSTARRS will detect NEAs to ~400m
Cosmic Cinematography: ChallengesCosmic Cinematography: ChallengesCosmic Cinematography: ChallengesCosmic Cinematography: Challenges
The “static” sky: The “static” sky:
optimal co-adding of images, optimal co-adding of images,
database issuesdatabase issues
The transient sky:The transient sky:
variability classificationvariability classification
asteroid association and orbitsasteroid association and orbits
light curve analysislight curve analysis
fusion with other data setsfusion with other data sets
The “static” sky: The “static” sky:
optimal co-adding of images, optimal co-adding of images,
database issuesdatabase issues
The transient sky:The transient sky:
variability classificationvariability classification
asteroid association and orbitsasteroid association and orbits
light curve analysislight curve analysis
fusion with other data setsfusion with other data sets
16
A New Approach to Radio A New Approach to Radio Astronomy HardwareAstronomy Hardware
17
A Brief History of the Universe
•culmination of structure formation •first luminous structures•turning point after the Dark Ages
Era of Reionization
ionized
neutral( H )
ionized
z~6.2
“The
Gap
”
18
BOOLARDY
19
Lincoln Greenhill (CfA)- MWA project
20
IIC affords us the opportunity to share IIC affords us the opportunity to share resources, tools and know-howresources, tools and know-how
IIC affords us the opportunity to share IIC affords us the opportunity to share resources, tools and know-howresources, tools and know-how
• Shared hardware maximizes effectivenessShared hardware maximizes effectiveness• Shared archival data storage, cooperativelyShared archival data storage, cooperatively• Reap benefits of sophisticated system Reap benefits of sophisticated system
administrators and database professionalsadministrators and database professionalsPeople are quantized, unaffordable for single group
• Learn from each other on technical topics Learn from each other on technical topics of common interestof common interestOften large discrepancies across subfields, IIC raises
all boats.
• Shared hardware maximizes effectivenessShared hardware maximizes effectiveness• Shared archival data storage, cooperativelyShared archival data storage, cooperatively• Reap benefits of sophisticated system Reap benefits of sophisticated system
administrators and database professionalsadministrators and database professionalsPeople are quantized, unaffordable for single group
• Learn from each other on technical topics Learn from each other on technical topics of common interestof common interestOften large discrepancies across subfields, IIC raises
all boats.
8K x 8K pixel array8K x 8K pixel array
16 independent amplifiers16 independent amplifiers
Each is a 1024 x 2048 Each is a 1024 x 2048 subimagesubimage
8K x 8K pixel array8K x 8K pixel array
16 independent amplifiers16 independent amplifiers
Each is a 1024 x 2048 Each is a 1024 x 2048 subimagesubimage
22
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.