Upload
john-de-goes
View
1.179
Download
1
Embed Size (px)
DESCRIPTION
Slides from the talk, "Rise of the Scientific Database" at Strata 2012 (Santa Clara).
Citation preview
Rise of the Scientific Database
John A. De Goes, @jdegoes
Agenda
• Scientific Computing & Databases
• Blessing / Curse of the RDBMS
• Power of the Array
• Scientific Databases
• Hadoop
• Summary & Conclusions
What is Scientific Computing?
"Scientific computing is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems."
—Wikipedia
1940's
1950's
1960's
1970's
1980's
1990's
2000's
2010's
The Future
Finite element methods
Numeric linear algebra
Linear programming
Monte carlo
Finite differences
Fortran
Modern numerical linear algebra
Gradient methods
Finite difference for PDEs
Stable SVD algorithms
Iterative methods
Stable pseudoinverses
FFT
APL invented
SAS released
LINPACK
MATLAB
Conjugate gradient
Poisson solvers
Large-scale eigenvalue solvers
GNU Octave
Python
SPSS
J
LAPACK
Mathematica
SciLab
SciPy
PDL
Rasdaman
NumPy
Hadoop
Mahout
HPCC
CUDA
OpenCL
BrookGPU
Julia
Spark
MLBase
SciDB
MonetDB / SciQL
???
What is a Database?
"A technology that combines the ability to store data with a high-level, high-performance means of storing, retrieving, and manipulating that data without having to write code or have knowledge of the mechanisms of implementation."
1960's
1970's
1980's
1990's
2000's
2010's
The Future
CODASYL
IMS
SABRE
Relational Model
Ingres (QUEL)
System R (SEQUEL)
SQL/DBS
DBS2
Oracle
"RDBMS"
SQL wins
DB2
DBase
SQL Server
Other solutions
ODBMS
MySQL
PostgreSQL
MongoDB
CouchDB
Riak
Neo4j
Julia
Spark
MLBase
SciDB
MonetDB / SciQL
???
The Relationship between Scientific Computing & Databases
ScientificComputing
Data Analysis
Scientific Databases
The Database Landscape
Operational Analytical
Structured
Unstructured
Scientific
2005
1980
2000
1970 ?
?
?
?2000
Semi-structured
sums & countsgets & puts data analysis
Relational Algebra
Projection Selection Rename Natural Join
R S
Theta JoinSemijoin
R S R S
Antijoin
÷R S
Division
⟕R S
Left outer join
R S
Right outer join
⟖ ⟗R S
Full outer join
G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)
Aggregation
The Curse of RDBMS
Setsrows
Tuplescolumns
???
The Curse of RDBMS
Setsrows
Tuplescolumns
Arrays
The Power of the Array
• Linear Algebra
• Transforms (Fourier, wavelet, etc.)
• Spatial Analysis
• Temporal Analysis
• Etc.
Poor Man’s Arrays
SELECT X.row AS row, Y.col AS col,
SUM(X.value * Y.value) AS value,
FROM X, Y where X.col = X.row
GROUP BY X.row, Y.col
Poor Man’s Arrays
SELECT A.name, A.sales, SUM(B.sales) AS
running_total
FROM Sales AS A, Sales AS B
WHERE A.sales < B.sales or
(A.sales = B.sales and
A.name = B.name)
GROUP BY A.name, A.sales
Poor Man’s Arrays
What is a Scientific Database?
• First-class support for multidimensional arrays
• Creation
• Manipulation
• Composition
• Capable of expressing whole analyses, not just snippets
• Tremendous benefits across multiple dimensions
• Scalability & Performance
• Expressiveness & Usability
• Robustness & Accuracy
Array Algebra
• Many different approaches (NRCA, SciQL, AFL, ODMG, etc.)
• Possible to define as extensions to relational core (but not necessary)
• Most approaches share common core
• Array deconstruction
• Array construction
• Array reduction
Scientific Databases
Rasdaman SciDB MonetDB (+SciQL)
What About Hadoop?
• Commonly used in scientific computing
• No scientific database technology
• But many useful programming libraries
• Hama
• Mahout
• Cascading
• Hadoop doesn’t make it easy
• YARN should help (Tez?)
• Balancing needs help
• Not the only game in town anymore (BDAS, MPI-2, HPCC, etc.)
Conclusions
• Scientific computing can benefit from a scientific database
• Success of RDBMS was also a curse
• NoSQL, big data, catalysts for disruption
• Still early for scientific databases
• Hadoop loves/hates science
Resources
John A. De Goes, @jdegoes
SciDB / Array Functional Languagehttp://bit.ly/VdXJkA
Rasdaman / rasqlhttp://en.wikipedia.org/wiki/Rasdaman
MonetDB / SciQLhttp://monetdb.org
Precog / Quirrelhttp://precog.com
Query Language for Multidimensional Arrays: Design, Implementation, & Optimization Techniques