Upload
leo-harris
View
27
Download
1
Embed Size (px)
DESCRIPTION
Features of the SDSS. Special 2.5m telescope, at Apache Point, NM 3 degree field of view Zero distortion focal plane Two surveys in one Photometric survey in 5 bands - 200 million objects Spectroscopic redshift survey - 1 million distances Automated data reduction - PowerPoint PPT Presentation
Citation preview
Special 2.5m telescope, at Apache Point, NM3 degree field of viewZero distortion focal plane
Two surveys in onePhotometric survey in 5 bands - 200 million objectsSpectroscopic redshift survey - 1 million distances
Automated data reductionOver 120 man-years of development(Fermilab + collaboration scientists)
Very high data volumeExpect over 40 TB of raw dataAbout 2 TB processed catalogsData made available to the public
Features of the SDSS
Data Processing Pipelines
All raw data (40TB) saved at Fermilab
Object catalog 500 GB parameters of >108 objects
Redshift Catalog 1 GB parameters of 106 objects
Atlas Images 1500 GB 5 color cutouts of >108 objects Spectra 60 GB in a one-dimensional formDerived Catalogs 20 GB clusters QSO absorption lines4x4 Pixel All-Sky Map 60 GB heavily compressedCorrected Frames 15 TB
Object catalog 500 GB parameters of >108 objects
Redshift Catalog 1 GB parameters of 106 objects
Atlas Images 1500 GB 5 color cutouts of >108 objects Spectra 60 GB in a one-dimensional formDerived Catalogs 20 GB clusters QSO absorption lines4x4 Pixel All-Sky Map 60 GB heavily compressedCorrected Frames 15 TB
SDSS Data Products
Accessing the Data
• Few fixed access patterns– one cannot build indices for all possible queries– worst case scenario is linear scan of the whole table
• Increasingly large differences between– Random access– Sequential I/O
• Often much faster to scan than to seek• Good layout of data => more sequential I/O• Geometric indexing – partitioning in storage• Using Objectivity/DB• Ported to MS SQL Server (w. Jim Gray)
SDSS in GriPhyN
• Two Tier2 Nodes (FNAL+JHU)– testing framework on real data in different scenarios
• FNAL node– massive reprocessing of images
• full regeneration of catalogs from the images (on disk)• gravitational lensing, finer morphological classification• Image coaddition, differencing
• JHU node– catalog calculations, integrated with database
• tasks require lots of data, can be run in parallel• various statistical calculations, likelihood analyses• power spectra, correlation functions, Monte-Carlo
• Public access– creating virtual data for NVO services (implemented later)
The SDSS Southern Survey
• Scanning a single stripe on the sky >30 times over• Coaddition => extra depth• Differencing => time dimension• Multiple ways to combine the stripes
– Rerun the pipelines with custom parameters– Build a new object catalog– Perform particular science analysis (lensing map)
• On the right timescale to try GriPhyN framework
Large Scale Statistical Analysis
• Galaxy distribution has non-trivial clustering patterns– Reflects conditions in the early universe
• Spatial statistical tools to be run on object catalog, applying many different cuts to the data– Spatial power spectrum – Correlation functions
• These algorithms are typically N2 or N3 with the number objects!!
• Some of the analyses will partition well (likelihood), others will not (pair counts)
Trends in Astronomy
Future dominated by detector improvements
Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in Megapix, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels.
• Moore’s Law growth in CCD capabilities
• Gigapixel arrays on the horizon
• Improvements in computing and storage will track growth in data volume
• Investment in software is critical, and growing
VO- The challenges
• Large number of new surveys– multi-TB in size, 100 million objects or more– individual archives planned, or under way
• Multi-wavelength view of the sky– more than 13 wavelength coverage in 5 years
• Size of the archived data40,000 square degrees is 2 Trillion pixels– One band 4 Terabytes– Multi-wavelength 10-100 Terabytes– Time dimension 10 Petabytes
• Current techniques inadequate• Scalable hardware/networking requirements• Transition to the new astronomy
MACHO2MASSDENISSDSSDPOSSGSC-IIVISTACOBE MAPNVSSFIRSTGALEXROSATOGLE, ...
MACHO2MASSDENISSDSSDPOSSGSC-IIVISTACOBE MAPNVSSFIRSTGALEXROSATOGLE, ...