Upload
bigdatalondon
View
668
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Slides from Paul Salazar's talk on SkyTree at the 18th Big Data London meetup.
Citation preview
SKYTREE’S FOCUS "
PRODUCTION GRADE"MACHINE LEARNING
Machine learning: the modern science of finding patterns and making predictions from data.! aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
Machine Learning Use Cases!
Predict categories and classes!Predict values and numbers!Grouping and segmentation!Detection and characterization!Visualization and reduction!Find similar items !
Classification !Regression!
Clustering!Density Estimation !
Dimension Reduction!Multidimensional Querying!
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine, 2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions Outlier Detection
What are the current options for ML for Big Data!
1. Just use a subset of the data!!– e.g. just take the first 1,000 rows. Result to expect: Capture only
the broadest patterns. à Lower accuracy."2. Just use a simple ML method!!
– e.g. use logistic regression instead of nonlinear SVM. Result to expect: Entire types of patterns cannot be found. à Lower accuracy."
3. Just use simple parallelism/MapReduce!!– i.e. replace all the for-loops with parallel ones. Result to expect:
Only the simplest of ML methods (not O(N2)/O(N3)) can be significantly sped up this way. à See #2."
4. Just throw it in the cloud!!– i.e. somehow use the large compute power of the cloud. Result
to expect: The cost of sending it to the cloud is even greater than the compute cost. à See #1. See also #3."
Skytree’s Unique Differentiation:Fundamental Technology Breakthrough!Complexity of State-of-the-Art Machine Learning methods:!1. Querying: all-nearest-neighbors O(N2)!2. Density estimation: kernel density estimation O(N2), kernel conditional density est.
O(N3) !3. Classification: logistic regression, decision tree, neural nets, nearest-neighbor
classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), !4. Regression: linear regression, LASSO, kernel regression O(N2), regression tree,
Gaussian process regression O(N3)!5. Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3),
maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical models!
6. Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!7. Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation
2-sample testing O(Nn), n=2, 3, 4, …!► Unfortunately O(N2), O(N3) are computationally prohibitive for big data!
Skytree has invented a way to reduce the complexity of above methods from O(N2) and O(N3) to O(N) or O(N log N).
5
Performance!
Up to 10,000x !speedups!(on one CPU)!
6
How Does Skytree Do This?!
7
Deep knowledge of algorithms Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3) Distributed systems
Take advantage of parallel computing speed
Team!
8
Martin Hack, CEO & Co-Founder Sun, GreenBorder (Google)!
Alexander Gray, PhD, CTO & Co-Founder Leading Light for Large-Scale, Fast Algorithms!
Paul Salazar, VP Sales RedHat, Greenplum!
Leland Wilkinson, PhD, VP Data VisualizationCreator of SYSTAT (SPSS/IBM).!
Tim Marsland, PhD, VP EngineeringSun Fellow, CTO Software, Apple, Oracle!
!
!
!
EXECUTIVE TEAM!
BOARD OF DIRECTORS!
Rick Lewis, USVPNoah Doyle, Javelin Venture Partners!David Toth, Founder and CEO NetRatings (Nielsen)!
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!Prof. James Demmel, UC Berkeley: high-performance computing!
INVESTORS!
TECH!ADVISORY!
BOARD!
USVP, Javelin Venture Partners, Scott McNealy, UPS
Product Overview!
9
Skytree Adviser for Desktop Data Science for Everyone
Skytree Server for Enterprises Enterprise Machine Learning
• Predict Categories/Classes • Detect Anomalies • Find Trends
• Predict Values/Numbers • Identify Patterns • Find Outliers
Advanced Analytics:
Thank you for learning about Skytree Read more at www.skytree.net !• We’re hiring: check out our careers page.!
• Download Skytree Adviser for Free.!
• Pick up a T-Shirt.!