11
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1. Preface Contributions by Jim Grey Astronomy data flow 2. Past Glories Why it was easy to be world-leading 3. Future challenges Why really big data makes us worry! CSIRO Parkes radio telescope

Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

Embed Size (px)

Citation preview

Page 1: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

Big Data in Science(Lessons from astrophysics)

Michael Drinkwater, UQ & CAASTRO

1. Preface Contributions by Jim Grey

Astronomy data flow

2. Past Glories Why it was easy to be world-leading

3. Future challenges Why really big data makes us worry!

CSIRO Parkes radio telescope

Page 2: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

1. Preface: Jim Grey (Microsoft eScience)

› Much of what I discuss was already said by the late Jim Grey:

› “I have been hanging out with astronomers for about the last 10 years… I look at their telescopes… $15-20M worth of capital equipment with about 20-50 people operating the instrument… millions of lines of code are needed to analyse all this information. In fact the software cost dominates the capital expenditure!”

› Jim Grey on eScience, in The Fourth Paradigm, eds Hey, Tansley & Tolle, 2009. (emphasis added)

research.microsoft.com

Jim Grey,Microsoft Research

Page 3: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

1. Preface: Astronomy Data Flow

Telescope Raw Images Output Image

Science Database Catalogues

Page 4: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

2. Past Glories

› 20 years ago

- Easy to lead the world!

› UKST photographic all sky survey

- 1 image = 1 GB

- All-sky image = 1 TB

- All-sky catalogue = 100 MB

- Put online with two summer student projects

Page 5: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

2. Past Glories

› Why did astronomy lead the way with (old) big data?

› 1) Telescopes are expensive so only a few data sources

- Data complex so only a few software packages, especially for national projects

- => easy to adopt a common data file format

› 2) Astronomers had strong computing skills

- => easy to search relatively large discovery space

CSIRO's ASKAP radio telescope with its innovative phased array receiver technology. (Image: Dragonfly Media)

Page 6: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

2. Past Glories

› Problems with the old approach in astronomy

- Most team projects underestimate or ignore database budget

- Astronomers too independent – skeptical of computer science expertise

- Bespoke solutions not scalable or sustainable

The Anglo-Australian Telescope (Image: AAO) – used for many team projects

Page 7: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

2. Past Glories

› WiggleZ Dark Energy Survey

- 5 year observing project

- $5M facility time + $1.5M grants + 20 team salaries

- Database $40k (donated by host as not funded)

› Success!

- 4 tests proving Einstein’s General Relativity correct

- Many other results

- 1425 citations

› Failure!

- Database failed as not supported

Page 8: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

3. Future Challenges

› New projects so large astronomy must change…

- 1995 Schmidt photographic survey: 1 TB

- 2006 Sloan Digital Sky Survey: 25 TB

- …

- 2022-32 Large Synoptic Survey Telescope 130 PB in 10 years

- 2030-? Square Kilometre Array radio telescope: 10 PB per day!

- More data per day than entire internet per year

The LSST: 8.4 m telescope mirror, 3.2Gpixel camera

Page 9: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

3. Future Challenges

› Challenges we know how to solve (Jim Gray predicted most of these)

- Realistic funding

- Scalable database structure: how to avoid i/o limits

- Must move the query to the data

- Efficient database design (Jim’s 20 questions to define functionality)

Page 10: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

3. Future Challenges

› Nasty challenges we are yet to solve…

- Complex data mining way beyond SQL

- “Teaching software engineering to the whole community”1

- Real-time analysis for transient events

- Cross-matching different large databases in different locations

“The data collected by the SKA in a single day would take nearly two million years to play back on an iPod.” skatelescop.org

1. Mario Juric, LSST Data Management Project Scientist

Page 11: Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories

Postscript: Jim Grey (Microsoft eScience)

› Jim Gray’s rules for large data design:

- Scientific computing is increasingly data intensive

- Solution is a “scale-out” architecture

- Bring computations to the data, rather than data to the computations

- Start the design with the 20 top questions

- Go from "working to working"

- From “Gray’s Laws: Database-centric Computing in Science”, Szalay & Blakeley, , in The Fourth Paradigm, eds Hey, Tansley & Tolle, 2009.

research.microsoft.com

Jim Grey,Microsoft Research