10
SAME DATA. BETTER RESULTS. PAUL SALAZAR [email protected] 1

Skytree big data london meetup - may 2013

Embed Size (px)

DESCRIPTION

Slides from Paul Salazar's talk on SkyTree at the 18th Big Data London meetup.

Citation preview

Page 1: Skytree   big data london meetup - may 2013

SAME DATA. BETTER RESULTS.

PAUL SALAZAR [email protected]!

1

Page 2: Skytree   big data london meetup - may 2013

SKYTREE’S FOCUS "

PRODUCTION GRADE"MACHINE LEARNING

Machine learning: the modern science of finding patterns and making predictions from data.! aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!

Page 3: Skytree   big data london meetup - may 2013

Machine Learning Use Cases!

Predict categories and classes!Predict values and numbers!Grouping and segmentation!Detection and characterization!Visualization and reduction!Find similar items !

Classification !Regression!

Clustering!Density Estimation !

Dimension Reduction!Multidimensional Querying!

Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine, 2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression

Recommendations Predictions Outlier Detection

Page 4: Skytree   big data london meetup - may 2013

What are the current options for ML for Big Data!

1.  Just use a subset of the data!!–  e.g. just take the first 1,000 rows. Result to expect: Capture only

the broadest patterns. à Lower accuracy."2.  Just use a simple ML method!!

–  e.g. use logistic regression instead of nonlinear SVM. Result to expect: Entire types of patterns cannot be found. à Lower accuracy."

3.  Just use simple parallelism/MapReduce!!–  i.e. replace all the for-loops with parallel ones. Result to expect:

Only the simplest of ML methods (not O(N2)/O(N3)) can be significantly sped up this way. à See #2."

4.  Just throw it in the cloud!!–  i.e. somehow use the large compute power of the cloud. Result

to expect: The cost of sending it to the cloud is even greater than the compute cost. à See #1.  See also #3."

Page 5: Skytree   big data london meetup - may 2013

Skytree’s Unique Differentiation:Fundamental Technology Breakthrough!Complexity of State-of-the-Art Machine Learning methods:!1.  Querying: all-nearest-neighbors O(N2)!2.  Density estimation: kernel density estimation O(N2), kernel conditional density est.

O(N3) !3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor

classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), !4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree,

Gaussian process regression O(N3)!5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3),

maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical models!

6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation

2-sample testing O(Nn), n=2, 3, 4, …!►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data!

Skytree has invented a way to reduce the complexity of above methods from O(N2) and O(N3) to O(N) or O(N log N).

5

Page 6: Skytree   big data london meetup - may 2013

Performance!

Up to 10,000x !speedups!(on one CPU)!

6

Page 7: Skytree   big data london meetup - may 2013

How Does Skytree Do This?!

7

Deep knowledge of algorithms Drawing from the latest from academia

Smart programming

Efficient ways to compute order N(2) and N(3) Distributed systems

Take advantage of parallel computing speed

Page 8: Skytree   big data london meetup - may 2013

Team!

8

Martin Hack, CEO & Co-Founder Sun, GreenBorder (Google)!

Alexander Gray, PhD, CTO & Co-Founder Leading Light for Large-Scale, Fast Algorithms!

Paul Salazar, VP Sales RedHat, Greenplum!

Leland Wilkinson, PhD, VP Data VisualizationCreator of SYSTAT (SPSS/IBM).!

Tim Marsland, PhD, VP EngineeringSun Fellow, CTO Software, Apple, Oracle!

!

!

!

EXECUTIVE TEAM!

BOARD OF DIRECTORS!

Rick Lewis, USVPNoah Doyle, Javelin Venture Partners!David Toth, Founder and CEO NetRatings (Nielsen)!

Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!Prof. James Demmel, UC Berkeley: high-performance computing!

INVESTORS!

TECH!ADVISORY!

BOARD!

USVP, Javelin Venture Partners, Scott McNealy, UPS

Page 9: Skytree   big data london meetup - may 2013

Product Overview!

9

Skytree Adviser for Desktop Data Science for Everyone

Skytree Server for Enterprises Enterprise Machine Learning

•  Predict Categories/Classes •  Detect Anomalies •  Find Trends

•  Predict Values/Numbers •  Identify Patterns •  Find Outliers

Advanced Analytics:

Page 10: Skytree   big data london meetup - may 2013

Thank you for learning about Skytree Read more at www.skytree.net !•  We’re hiring: check out our careers page.!

•  Download Skytree Adviser for Free.!

•  Pick up a T-Shirt.!