21
Rise of the Scientific Database John A. De Goes, @jdegoes

Rise of the scientific database

Embed Size (px)

DESCRIPTION

Slides from the talk, "Rise of the Scientific Database" at Strata 2012 (Santa Clara).

Citation preview

Page 1: Rise of the scientific database

Rise of the Scientific Database

John A. De Goes, @jdegoes

Page 2: Rise of the scientific database

Agenda

• Scientific Computing & Databases

• Blessing / Curse of the RDBMS

• Power of the Array

• Scientific Databases

• Hadoop

• Summary & Conclusions

Page 3: Rise of the scientific database

What is Scientific Computing?

"Scientific computing is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems."

—Wikipedia

Page 4: Rise of the scientific database

1940's

1950's

1960's

1970's

1980's

1990's

2000's

2010's

The Future

Finite element methods

Numeric linear algebra

Linear programming

Monte carlo

Finite differences

Fortran

Modern numerical linear algebra

Gradient methods

Finite difference for PDEs

Stable SVD algorithms

Iterative methods

Stable pseudoinverses

FFT

APL invented

SAS released

LINPACK

MATLAB

Conjugate gradient

Poisson solvers

Large-scale eigenvalue solvers

GNU Octave

Python

SPSS

J

LAPACK

Mathematica

SciLab

SciPy

PDL

Rasdaman

NumPy

Hadoop

Mahout

HPCC

CUDA

OpenCL

BrookGPU

Julia

Spark

MLBase

SciDB

MonetDB / SciQL

???

Page 5: Rise of the scientific database

What is a Database?

"A technology that combines the ability to store data with a high-level, high-performance means of storing, retrieving, and manipulating that data without having to write code or have knowledge of the mechanisms of implementation."

Page 6: Rise of the scientific database

1960's

1970's

1980's

1990's

2000's

2010's

The Future

CODASYL

IMS

SABRE

Relational Model

Ingres (QUEL)

System R (SEQUEL)

SQL/DBS

DBS2

Oracle

"RDBMS"

SQL wins

DB2

DBase

SQL Server

Other solutions

ODBMS

MySQL

PostgreSQL

MongoDB

CouchDB

Riak

Neo4j

Julia

Spark

MLBase

SciDB

MonetDB / SciQL

???

Page 7: Rise of the scientific database

The Relationship between Scientific Computing & Databases

ScientificComputing

Data Analysis

Scientific Databases

Page 8: Rise of the scientific database

The Database Landscape

Operational Analytical

Structured

Unstructured

Scientific

2005

1980

2000

1970 ?

?

?

?2000

Semi-structured

sums & countsgets & puts data analysis

Page 9: Rise of the scientific database

Relational Algebra

Projection Selection Rename Natural Join

R S

Theta JoinSemijoin

R S R S

Antijoin

÷R S

Division

⟕R S

Left outer join

R S

Right outer join

⟖ ⟗R S

Full outer join

G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)

Aggregation

Page 10: Rise of the scientific database

The Curse of RDBMS

Setsrows

Tuplescolumns

???

Page 11: Rise of the scientific database

The Curse of RDBMS

Setsrows

Tuplescolumns

Arrays

Page 12: Rise of the scientific database

The Power of the Array

• Linear Algebra

• Transforms (Fourier, wavelet, etc.)

• Spatial Analysis

• Temporal Analysis

• Etc.

Page 13: Rise of the scientific database

Poor Man’s Arrays

SELECT X.row AS row, Y.col AS col,

SUM(X.value * Y.value) AS value,

FROM X, Y where X.col = X.row

GROUP BY X.row, Y.col

Page 14: Rise of the scientific database

Poor Man’s Arrays

SELECT A.name, A.sales, SUM(B.sales) AS

running_total

FROM Sales AS A, Sales AS B

WHERE A.sales < B.sales or

(A.sales = B.sales and

A.name = B.name)

GROUP BY A.name, A.sales

Page 15: Rise of the scientific database

Poor Man’s Arrays

Page 16: Rise of the scientific database

What is a Scientific Database?

• First-class support for multidimensional arrays

• Creation

• Manipulation

• Composition

• Capable of expressing whole analyses, not just snippets

• Tremendous benefits across multiple dimensions

• Scalability & Performance

• Expressiveness & Usability

• Robustness & Accuracy

Page 17: Rise of the scientific database

Array Algebra

• Many different approaches (NRCA, SciQL, AFL, ODMG, etc.)

• Possible to define as extensions to relational core (but not necessary)

• Most approaches share common core

• Array deconstruction

• Array construction

• Array reduction

Page 18: Rise of the scientific database

Scientific Databases

Rasdaman SciDB MonetDB (+SciQL)

Page 19: Rise of the scientific database

What About Hadoop?

• Commonly used in scientific computing

• No scientific database technology

• But many useful programming libraries

• Hama

• Mahout

• Cascading

• Hadoop doesn’t make it easy

• YARN should help (Tez?)

• Balancing needs help

• Not the only game in town anymore (BDAS, MPI-2, HPCC, etc.)

Page 20: Rise of the scientific database

Conclusions

• Scientific computing can benefit from a scientific database

• Success of RDBMS was also a curse

• NoSQL, big data, catalysts for disruption

• Still early for scientific databases

• Hadoop loves/hates science

Page 21: Rise of the scientific database

Resources

John A. De Goes, @jdegoes

SciDB / Array Functional Languagehttp://bit.ly/VdXJkA

Rasdaman / rasqlhttp://en.wikipedia.org/wiki/Rasdaman

MonetDB / SciQLhttp://monetdb.org

Precog / Quirrelhttp://precog.com

Query Language for Multidimensional Arrays: Design, Implementation, & Optimization Techniques