Building a High-Performance and Accurate Machine-Learning ......H2O Driverless AI is a high-performance, single-node computing platform for automatic development and rapid deployment

Executive SummaryToday’s enterprises are increasingly relying on artificial intelligence (AI) and machine learning. In a recent survey of executives at pioneering companies, 90 percent report already having AI strategies in place.1 In a different survey, cognitive technologies and AI were consistently cited as the top emerging technologies in which companies planned to invest.2 AI is getting all this attention because it can deliver a steady stream of business insights that can increase productivity and efficiency while lowering operational costs. It may also enhance regulatory compliance and enable extensive personalization of goods and services—all of which can lead to a competitive advantage in today’s fiercely competitive landscape.

In this reference architecture, we explore how H2O.ai* products, combined with 2nd generation Intel® Xeon® Scalable processors and Intel® DC Solid State Drives (SSDs) can improve AI performance as measured by training and inference time. What’s more, the reference architecture described herein achieves these performance gains while maintaining model accuracy. For example, when the XGBoost* algorithm is optimized for Intel® technology, the algorithm performs model training up to 4.5X faster than the native algorithm;3 integrating Intel® Data Analytics Acceleration Library (Intel® DAAL) with H2O Driverless AI* results in a 3X speedup on the MNIST dataset.4 Substantiating details are provided in the rest of this document and in the endnotes.

Using the information and reference architecture presented here, you can take the next step toward unleashing the power of AI in your organization.

2nd Generation Intel® Xeon® Scalable Processors

H2O* Sparkling Water* H2O Driverless AI*

Intel® DC Solid State Drives

Intel® Data Analytics Acceleration Library

Figure 1. Reference architecture for an H2O.ai* and Intel® technology machine-learning platform.

Combining H2O.ai products, 2nd generation Intel® Xeon® Scalable processors, and Intel® DC SSDs can decrease training time and inference time without sacrificing model accuracy

Building a High-Performance and Accurate Machine-Learning Platform

RefeRence ARchitectuReEnterpriseArtificial Intelligence

Table of Contents

Executive Summary . . . . . . . . . . . . . . . . 1

Introduction . . . . . . . . . . . . . . . . . . . . . . . 2

Solution Overview . . . . . . . . . . . . . . . . . . 4

H2O.ai Reference Architecture . . . . . . 4

H2O* Setup Guide . . . . . . . . . . . . . . . . . . 6

Benchmarking Results . . . . . . . . . . . . . . 7

Summary . . . . . . . . . . . . . . . . . . . . . . . . . 11

Appendix A: Full Benchmark Configuration Details . . . . . . . . . . . . . . 12

Appendix B: Benchmarking XGBoost* Parameters . . . . . . . . . . . . . . . . . . . . . . . 14

Appendix C: Intel® DAAL Setup and H2O Driverless AI* Parameters . . . . . 14

Reference Architecture | Building a High-Performance and Accurate Machine-Learning Platform 2

IntroductionH2O.ai* is a suite of artificial intelligence (AI), machine learning (ML), and deep learning (DL) products that support distributed parallel processing across clusters, with an in-memory architecture. H2O.ai products support both supervised and unsupervised learning using industry-standard parallelized algorithms that take advantage of fine-grained in-memory MapReduce*.

The H2O.ai suite includes two open source products (H2O* and Sparkling Water*), as well as H2O Driverless AI* (enterprise software).

Open Source H2O*

H2O.ai open source products are Apache* v2 licensed with enterprise support subscriptions. They are built for data scientists, with support for R* and Python* and an interactive graphical user interface (GUI), H2O Flow*.

H2O is a collection of in-memory, distributed machine-learning algorithms that can be used with existing big-data infrastructures. These can include bare metal, Apache Hadoop*, and Apache Spark* clusters. It can ingest data directly from Hadoop’s Highly Distributed File System* (HDFS*), Spark, Amazon S3*, Azure Data Lake*, or any other data source, into its in-memory distributed key value store. H2O can be deployed in three ways, as described in Table 1.

We focused our H2O open source benchmarking tests on the second deployment method in Table 1, Sparkling Water. Because it uses Spark, which has been optimized for Intel® architecture, Sparkling Water’s performance reflects these optimizations.

Table 1. H2O* Deployment Choices

Method Details

H2O* Standalone Cluster H2O is responsible for distribution and inter-node communication.

Sparkling Water*H2O is integrated with Apache Spark*. H2O jobs are submitted to the Spark master and then executed within Spark executors.

H2O on Hadoop*YARN* (Yet Another Resource Negotiator*) schedules H2O jobs as MapReduce* tasks on Hadoop.

Combining Spark (a powerful general-purpose, open source, in-memory platform) with H2O’s machine-learning capabilities provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. As shown in Figure 2, Sparkling Water is launched inside a Spark executor, which is created after application submission. At this point, H2O starts services, including distributed key-value (K/V) store and memory manager, and orchestrates them into a cloud. The topology of the created cloud exactly matches the topology of the underlying Spark cluster. The figure represents the internal Sparkling Water cluster.

Figure 2. H2O Sparkling Water* allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Apache Spark*.

Implements spark-submit

Sparkling Water Cluster

Sparkling App

App

Sparkling Water* Class

Sparkling Water Class

Sparkling Water Class

Spark* MasterJava* Virtual Machine

(JVM)

Spark WorkerJVM

Spark WorkerJVM

Spark WorkerJVM

Spark ExecutorJVM

Spark ExecutorJVM

Spark ExecutorJVM

H2O*

H2O

H2O


H2O Driverless AI*

H2O Driverless AI is a high-performance, single-node computing platform for automatic development and rapid deployment of state-of-the-art predictive analytics models. As shown in Figure 3, H2O Driverless AI automates several time-consuming aspects of a typical data science workflow, including data visualization, model optimization, feature engineering, predictive modeling, and scoring pipelines. Here’s how Driverless AI works:

1. Drag and drop data. It can accept tabular data from plain text sources and a variety of external cloud and desktop data sources, including HDFS, SQL, Amazon S3, Snowflake*, Google BigQuery*, Azure Blog Storage*, and local data storage.

2. Automatic visualization. It can help you understand the data shape, outliers, missing values, and more through a variety of automated visualization techniques.

3. Automatic model optimization. It uses best-practice model recipes and the power of high-performance computing to iterate across thousands of possible models, including advanced feature engineering and parameter tuning. The end result is selection of better possible models for the data.

4. Automatic scoring pipelines. It uses ultra-low latency Python or Java* automatic scoring pipelines that include feature transformations and models.

Driverless AI targets business applications such as loss-given-default, probability of default, customer churn, campaign response, fraud detection, anti-money laundering, demand forecasting, and predictive asset maintenance models (or in machine-learning parlance: common regression, binomial classification, and multinomial classification problems). Built for domain users, analysts, and data scientists, Driverless AI features a GUI-based interface for end-to-end data science. The software is licensed on a per-seat basis, with an annual subscription.

Benchmarking Overview and Key Performance Metrics

This reference architecture showcases H2O Sparkling Water running on multi-node servers equipped with 2nd generation Intel® Xeon® Scalable processors. These three benchmarking tests were performed; see the “Benchmarking Results” section later in this document for complete details.

• Running the Gradient Boosting Machines (GBM) algorithm on a 120 GB Airlines dataset, using Sparkling Water. The algorithm predicted if a flight had a high chance of being delayed or canceled. H2O built the predictive model using 10 years’ worth of data (120 million rows).

• Running the XGBoost* algorithm on the MNIST dataset (60,000 training images and 10,000 test images). Intel has optimized the XGBoost algorithm (which is used by GBM) to improve threading and instruction prefetching. Our benchmark tests reveal how these optimizations can improve XGBoost performance.

• Assessing the performance benefit of integrating Intel® Data Analytics Acceleration Library (Intel® DAAL) with Driverless AI, using the MNIST and HIGGS datasets.

For more information about the three datasets, see the section “Data Preparation” later in this document. For the open source benchmarks, the key performance metrics are training time (how long it takes to build the model), inference time (how long it takes to make the predictions on the test data), and model accuracy. For Driverless AI, the key performance metrics are model and feature tuning time, feature evolution time, final pipeline training time, and model accuracy.

Figure 3. H2O Driverless AI* saves time and effort by automating data visualization, model optimization, and scoring.

How H2O Driverless AI* Works

1 2 3 4

Model Recipe 1

Advanced Feature Engineering Algorithm Model Tuning

Model Recipe n


X Y

Automatic Scoring Pipeline

Automatic Model OptimizationDrag and Drop Data

Automatic Visualization

Creates Data Set

Modeling and Visualization

Model Recipes



Solution OverviewThe reference architecture described in this document represents a base configuration for a typical ML solution. Besides the H2O.ai products, the reference architecture consists of 2nd Generation Intel Xeon Scalable processors, Intel DAAL, and Intel® DC Solid State Drives (SSDs).

2nd Generation Intel® Xeon® Scalable Processors

The 2nd generation Intel Xeon Scalable processors are optimized for demanding data center workloads. This processor family features higher frequencies than previous generations of Intel Xeon Scalable processors, along with architecture improvements and AI and DL inference workload enhancements.

Intel® Data Analytics Acceleration Library (Intel® DAAL)

Intel DAAL helps applications deliver predictions more quickly and analyze large data sets without increasing compute resources. It optimizes data ingestion and algorithmic compute together for high performance, and it supports off-line, streaming, and distributed usage models to meet a range of application needs.

Intel DAAL can help with all stages of analytics:

• Pre-processing (decompression, filtering, and normalization)

• Transformation (aggregation and dimension reduction)

• Analysis (summary statistics and clustering)

• Modeling (training, parameter estimation, and simulation)

• Validation (hypothesis testing and model error detection)

• Decision making (forecasting and decision trees)

New features include high-performance logistic regression, extended GBM functionality, and user-defined data modification procedures. To help with your most difficult big data analytics challenges, use Priority Support to consult with Intel engineers.

daal4py is a simplified Python API to Intel DAAL that allows for faster usage of the framework. It provides highly configurable machine learning kernels, some of which support streaming input data and/or can be easily and efficiently scaled out to clusters of workstations.

Intel® DC Solid State Drives (SSDs)

Intel DC SSDs are optimized for Intel Xeon Scalable processors and are available in a variety of media types and capacities. For example, Intel® 3D NAND SSDs (such as the Intel® SSD DC P4500, P4501, and P4600 Series) are optimized for cloud infrastructures, offering outstanding quality, reliability, advanced manageability, and serviceability to minimize service disruptions.

Intel® Optane™ DC SSDs deliver:

• High throughput for breakthrough performance

• Low latency for response under load

• Predictably fast service for quality of service

• Ultra-high endurance

H2O.ai Reference ArchitectureThe following sections provide the node configurations and give details on cluster and firmware requirements.

Reference Configurations

Tables 2 and 3 below detail the reference architectures for an H2O Sparkling Water cluster (must have at least five worker nodes) and a Driverless AI compute node. Table 4 provides further network infrastructure information. For complete benchmarking configurations, including BIOS and OS kernel version, refer to Appendix A: Full Benchmark Configuration Details.

Cluster Requirements and Firmware Settings

Table 5 lists the technologies that are required to run the benchmarks described in the “Benchmarking Results” section, later in this document. Table 6 lists firmware setting recommendations.

https://www.intel.com/content/www/us/en/products/docs/processors/xeon/2nd-gen-xeon-scalable-processors-brief.html

https://software.intel.com/en-us/intel-daal

https://software.intel.com/en-us/support/priority-support

https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds.html


Table 2. Five-Node Sparkling Water* Configuration

Hardware DescriptionRequired or Recommended

CPU• 2x Intel® Xeon® Gold 6248 processor

(20 cores, 2.5 GHz, 150W)Required

Memory (minimum) • 384 GB (12 x 32 GB 2666MHz DDR4 ECC RDIMM) Recommended

Storage• Capacity Tier: 6x 2 TB Intel® SSD DC P4610• Cache Tier: 2x 375 GB Intel® Optane™ SSD DC P4800X• Boot Drive: 1x 480 GB SATA-based Intel® SSD D3-S4610

Recommended

Data Network • Integrated 10 GbE (Intel® Ethernet Connection X722) Required

Management Network • Integrated 1 GbE port 0/RMM port Required

Table 3. Single-Node H2O Driverless AI* Configuration

Hardware DescriptionRequired or Recommended

CPU• 2x Intel® Xeon® Platinum 8260L processor

(24 cores, 2.4 GHz, 165W)Required

Memory (minimum) • 384 GB (12 x 32 GB 2666MHz DDR4 ECC RDIMM) Recommended

Storage• Capacity Tier: 6x 2 TB Intel® SSD DC P4610• Cache Tier: 2x 375 GB Intel® Optane™ SSD DC P4800X• Boot Drive: 1x 480 GB SATA-based Intel® SSD D3-S4610

Recommended

Data Network • Integrated 10 GbE (Intel® Ethernet Connection X722) Required

Management Network • Integrated 1 GbE port 0/RMM port Required

Table 4. Network Infrastructure Details

Network Type DescriptionRequired or Recommended

Data 10 GbE 48-port switch Recommended

Management 1 GbE 48-port switch Recommended

Table 5. Cluster Requirements

Software Version

Linux* DistributionCentOS* Linux release 7.6 or Red Hat Enterprise Linux* (RHEL*) 7

Apache Spark* 2.4.0

Apache Hadoop* 2.7.3

Java Development Kit* (JDK*) Oracle* JDK 1.8.0 update 181

H2O Driverless AI* (DAI) DAI 1.5 or newer

Sparkling Water* sparkling-water-2.4

Intel® Data Analytics Acceleration Library (Intel® DAAL) Intel DAAL version 2019.1.1

Table 6. Firmware Settings

Firmware Setting

Intel® Hyper Threading Technology Advanced > Processor Configuration > Intel HTT > Enabled

CPU Power & Performance Policy Advanced > Power & Performance > CPU Power & Performance Policy > Performance

Workload Configuration Advanced > Power & Performance > Workload Configuration > Balanced

CPU P-State Control Advanced > Power & Performance > CPU P State Control > Intel® Turbo Boost Technology > Enabled


H2O* Setup GuideThe following sections give an overview of how to install Sparkling Water and Driverless AI as well as how to prepare the input data.

Install Sparkling Water*

Sparkling Water is an integration of H2O into the Spark ecosystem. It facilitates the use of H2O algorithms in Spark workflows. It is designed as a regular Spark application and provides a way to start H2O services on each node of a Spark cluster and access data stored in data structures of Spark and H2O. Follow these general steps to install Sparkling Water:

1. Download the latest version of Spark (if not already installed) from the Spark downloads page. For the benchmarks described in this document, we installed Spark release 2.4.

2. Choose a package type: Pre-built for Hadoop 2.7 and later

3. Download the Sparkling Water package from the H20.ai downloads page.

4. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.

5. Install Sparkling Water on each node.

6. Create an H2O cloud inside the Spark cluster.

Detailed installation instructions are here.

Install H2O Driverless AI

For the benchmarks described in this document, we installed Driverless AI v1.5.4 on single node with 2x Intel® Xeon® Platinum 8260L processors. We installed the TAR SH version of Driverless AI v1.5.4 from here. Detailed Driverless AI installation instructions are here.

Once Driverless AI is installed, start the service by running the run-dai.sh command:# Start Driverless AI. ./run-dai.sh

All logs are captured in the logs folder; the file named dai.log is the log file for Driverless AI runs.

Once Driverless AI is started, you can access the GUI from the default port 12345 through the browser as https://<server-name>:12345. Figure 4 shows a partial screenshot of the GUI.

To stop Driverless AI, run the kill-dai.sh shell script:# Start Driverless AI. ./kill-dai.sh

Figure 4. H2O Driverless AI* GUI example.

https://spark.apache.org/downloads.html

https://www.h2o.ai/download/#sparkling-water

http://docs.h2o.ai/sparkling-water/2.2/latest-stable/doc/pysparkling.html

https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.5.4-65/index.html

https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.5.3-11/docs/userguide/install/linux-tarsh.html


Data Preparation

In our benchmarking tests, we used the three datasets described below.

Airlines Dataset This dataset consists of flight arrival and departure details for all commercial U.S. flights from 1987 to 2008. The approximately 120 billion records (CSV format) occupy 120 GB.

To obtain the dataset, run “wget https://s3.amazonaws.com/h2o-airlines-unpacked/allyears _ 10.csv”

More information can be found here.

MNIST DatasetThe MNIST database of handwritten digits has a training set of 60,000 examples and a test set of 10,000 examples. It is a subset of a larger set available from the National Institute of Standards and Technology (NIST). The digits have been size-normalized and centered in a fixed-size image. The MNIST dataset can be downloaded from here.

HIGGS DatasetThis is a classification problem to distinguish between a signal process that produces Higgs bosons and a background process that does not. The data was produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The last seven features are functions of the first 21 features; these are high-level features derived by physicists to help discriminate between the two classes. There is an interest in using deep-learning methods to eliminate the need for physicists to manually develop such features. The last 500,000 examples are used as a test set. The HIGGS dataset can be downloaded from here.

Benchmarking ResultsThe following three sections describe the benchmarking results obtained from Sparkling Water, the optimized XGBoost algorithm, and integrating Intel DAAL with Driverless AI.

Benchmarking Sparkling Water

For this benchmarking test, we used Sparkling Water in cluster mode (for setup instructions, see the H2O documentation). In this mode, H2O launches through Spark workers, and Spark manages the job scheduling and communication between nodes. All five Intel® Xeon® Gold 6248 processor-based worker nodes were used to run H2O jobs.

We used the Airlines dataset (see “Data Preparation” above) to predict the probability of a flight being delayed or cancelled, given certain parameters. Sparkling Water built the predictive model by examining the “IsDepDelayed” column/category, using 10 years’ worth of data (120 million rows). We used the GBM machine-learning algorithm, which is commonly used for regression and classification problems; it produces a prediction model by working on a set of weak decision trees to improve accuracy.

The overall process is to first train the model, and then test it.

Training on the Airlines DatasetWe followed these steps to train the model:

1. Start the Spark master and slave nodes, and form a cloud of five nodes. https://spark.apache.org/docs/latest/spark-standalone.html#installing-spark-standalone-to-a-cluster

1. Once Spark is started, launch a PySparkling shell and create a H2OContext as follows:: cd sparkling-water-2.4.8 bin/pysparkling --conf spark.ext.h2o.nthreads=1 from pysparkling import * import h2o hc = H2OContext.getOrCreate(spark)

2. Open H2O Flow from any node IP address: http://<ip-address>:54321. More information on H2O flow can be found here.

3. Verify the cluster status of five nodes from Admin > Cluster Status.

4. Create a Network File System (NFS) server on one node with a shared directory, and mount the share on all the other nodes. Copy the Airlines dataset to this share. All the H2O nodes need to have access to the Airlines dataset because loading it into memory is distributed across nodes.

https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O

http://yann.lecun.com/exdb/mnist/

https://archive.ics.uci.edu/ml/datasets/HIGGS

http://docs.h2o.ai/sparkling-water/2.4/latest-stable/doc/install/install_and_start.html

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html


5. Import the Airlines dataset into H2O from the NFS share.

6. Parse the loaded Airlines dataset.

7. Once the dataset is parsed, it is loaded into memory as a HEX file. View the data to ensure it was properly imported and parsed.

8. Split the dataset into Training (75 percent) and Inference (25 percent) frames.

9. Build the model:a. Select GBM as the algorithm.b. Select the training frame (frame_0.750).c. Select the response column (“IsDepDelayed”).d. Click the Build Model button.

11. Observe the build progress through Admin > Water Meter. In our benchmark, the Water Meter shows that all CPU cores are being used at full capacity (98 to 99 percent), as indicated by the blue area in Figure 5.

Node 1 Node 2 Node 3 Node 4 Node 5

Figure 5. The H2O Sparkling Water* graphical user interface (GUI) can show CPU utilization during model training.

12. Once the model build is complete, the GUI displays the job run time (Training Time) in minutes and seconds. For our test, Sparkling Water completed the model training in 3 minutes and 45 seconds—that’s about 400,000 rows per second.5

13. Click the View button to inspect the model’s Receiver Operating Characteristic (ROC) curve, logloss scoring history, and training metrics.

Inferencing on the Airlines DatasetOnce the model is built using 75 percent of the Airlines dataset, inferencing can be performed using the remaining 25 percent of data. Follow these steps:

1. Click the Predict button in the model.

2. Select the test data frame (frame_0.25).

3. Run Inferencing on the selected data by again clicking on Predict (see Figure 6).

4. Inspect the inference metrics and ROC curve. Our benchmark completed the inference in just 21 seconds—almost 1.5 million rows per second.6

PredictName: prediction-c488e3e1-40

Model: gbm-84db10fa-1458-4b28-938e-b1e11867a778

Frame: frame_0.250

ComputeLeaf Node

Assignment:

Actions: Predict

Figure 6. Once the model training is complete, starting the model inferencing is as easy as clicking the Predict button.


Benchmarking XGBoost*

Intel has optimized the XGBoost open source algorithm, available here. We made changes to improve thread utilization and scalability, as well as to optimize C libraries. Our benchmark test compared the performance of the native XGBoost version 0.81 to the performance of the optimized XGBoost 0.82 integrated into H2O, using the MNIST dataset and the parameters detailed in Appendix B: Benchmarking XGBoost* Parameters. Our benchmark test revealed up to a 4.5X speedup in training time with the optimized XGBoost algorithm (see Figure 7).7

Benchmarking H2O Driverless AI

We used the MNIST and HIGGS datasets for benchmarking Driverless AI. The base benchmarking used the out-of-the-box install of Driverless AI V1.5. Then we compared those results with the results obtained by integrating Intel DAAL with Driverless AI.

Benchmarking Native Driverless AIBecause Driverless AI runs all algorithms (XGBoost, Generalized Linear Model* (GLM*), LightGBM*, and TensorFlow*), we modified Driverless AI to run only XGBoost and benchmarked the results. We chose this approach so that we could measure the improvement of XGBoost performance when Intel DAAL is integrated with Driverless AI. We followed these steps:

1. Modify the config.toml configuration file to set the parameters to run only XGBoost.

2. Run Driverless AI using the GUI at https://<server-name>:12345.

3. Load the MNIST dataset and set the target column to the last column (c785). Also load the HIGGS dataset and set the target column to the first column (c1).

4. Run experiments using these Driverless AI experiment settings: Accuracy=7, Time=2, Interpretability=5.

Benchmarking Intel DAAL Integrated into Driverless AIFor integrating Intel DAAL into Driverless AI, we followed these steps (setup details are in Appendix C: Intel® DAAL Setup and H2O Driverless AI* Parameters):

1. Put the Intel DAAL and Intel® Threading Building Blocks (Intel® TBB) headers into the Driverless AI ROOT folder.

2. Install the DAAL4PY routine, which includes the Intel-optimized XGBoost algorithm.

3. Use the training and inferencing functions in sklearn.py and core.py Python routines to use the DAAL4PY functions instead of the native XGBoost.

Overall, Intel DAAL contributed to significant performance improvements, as described herein.8 For the MNIST dataset, we saw up to a 3X improvement in all three key metrics (model and feature tuning time, feature evolution time, and final pipeline training time) for the MNIST dataset and up to 1.5X for the HIGGS dataset (see Figure 8). For the HIGGS dataset, the most impressive result was a 6X improvement in final pipeline training time. For both datasets, the accuracy was nearly identical for both training and test data—meaning that the speedup provided by Intel DAAL does not negatively affect accuracy. See Tables 7 and 8 for complete data.

5

4

3

2

1

0

Norm

aliz

ed P

erfo

rman

ce UP TO

4.5X FASTER7

Native Optimized

XGBoost* PerformanceTraining Time Performance for MNIST Dataset using

Intel® Xeon® Gold 6248 Processor (2-socket)Higher is Better

Figure 7. H2O Driverless AI* with Intel® Data Analytics Acceleration Library.

https://github.com/dmlc/xgboost


Table 7. MNIST Dataset Performance (10 classes, 60,000 rows, 785 columns)

Time (seconds) Accuracy (%)

Benchmark ConfigurationModel and

Feature TuningFeature

Evolution

Final Pipeline Training

All Experiment

Training Dataset

Test Dataset

Base (Driverless AI* with Native XGBoost*)

148.05 3775.92 96.55 4048.21 0.99504 0.99590

Intel® DAAL Integrated with Driverless AI (DAAL4PY Routine with Optimized XGBoost)

45.04 1258.73 28.41 1346.74 0.99519 0.99612

Speedup 3X 3X 3X 3X

Table 8. HIGGS Dataset Performance (2 classes, 1 million rows, 28 columns)

Time (seconds) Accuracy (%)

Benchmark ConfigurationModel and

Feature TuningFeature

Evolution

Final Pipeline Training

All Experiment

Training Dataset

Test Dataset

Base (Driverless AI* with Native XGBoost*)

96.37 1139.43 89.70 1325.5 0.73447 0.74070

Intel® DAAL Integrated with Driverless AI (DAAL4PY Routine with Optimized XGBoost)

56.05 783.85 14.67 854.57 0.73323 0.73323

Speedup 1.7X 1.45X 6X 1.6X

Figure 8. Training time speedup with Intel® Data Analytics Acceleration Library and Optimized XGBoost*.

Nor

mal

ized

Per

form

ance

2S Intel® Xeon® Platinum 8260 w/ Driverless AI, Native XGBoost

2S Intel® Xeon® Platinum 8260 w/ Driverless AI, Optimized XGBoost

5

4

3

2

1

0

H20 Driverless AI with Optimized XGBoost*Performance Metric: Training Time

Up to 3x

Up to 1.5x

MNIST Dataset HIGGS Dataset


SummaryH2O open source is a distributed architecture that takes advantage of Intel architecture and scales linearly with clusters. Sparkling Water is seamlessly integrated with Spark, which is optimized for Intel architecture. All parts of the pipeline—from importing a dataset, to parsing, to building and inferencing models—scale with core count. As illustrated in this document, by integrating optimized XGBoost into H2O, training time is up to 4.5X faster for the MNIST dataset.

Driverless AI is a single-node training solution that provides enhanced performance running on an Intel® processor with many cores. By applying XGBoost optimizations with Intel DAAL, performance can be improved by up to 3X while maintaining accuracy.

Find the solution that is right for your organization. Contact your Intel representative or visit https://builders.intel.com/intelselectsolutions.

Learn MoreYou may find the following resources helpful:

• H2O.ai home page

• Intel® Xeon® Scalable processors

• Intel® Data Analytics Acceleration Library

• Intel® DC Solid State Drives

https://builders.intel.com/intelselectsolutions

https://www.h2o.ai

https://www.intel.com/content/www/us/en/products/docs/processors/xeon/2nd-gen-xeon-scalable-processors-brief.html

https://software.intel.com/en-us/intel-daal

https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/data-center-ssds.html


Appendix A: Full Benchmark Configuration Details

Table A1. H2O Sparkling Water* Benchmark System Configuration

Description Configuration

Platform S2600WFT

# Nodes 5

# Sockets 2

CPU Intel® Xeon® Gold 6248 processor (2.5 GHz, 150W)

Cores/socket, Threads/socket 20/40

Ucode – microcode 0x400000a

Intel® Hyper Threading Technology On

Intel® Turbo Boost Technology On

BIOS version including microcode verison: cat /proc/cpuinfo | grep microcode –m1

SE5C620.86B.0D.01.0134.100420181737 Microcode 0x400000a

System DDR Memory Configuration (slots/capacity/run-speed)

12 slots/16GB/2666

System DCPMM Configuration (slots/capacity/run-speed)

N/A

Total Memory/Node (DDR+DCPMM) 384+0

StorageCapacity Tier 6x 2 TB Intel® SSD DC P4610

Cache Tier 2x 375 GB Intel® Optane™ SSD DC P4800X

Boot Drive 1x 480 GB SATA-based Intel® SSD D3-S4610

NIC 1x Intel® Ethernet Connection X722

PCH N/A

Other Hardware (Accelerator) N/A

OS CentOS* Linux* release 7.6.1810 (Core)

Kernel 3.10.0-862.14.4.el7.x86_64

Sparkling Water* v2.4

Apache Spark* v2.4.0

Apache Hadoop* v2.7.3

Java Development Kit* (JDK*) Oracle* JDK v1.8.0 update 181

Dataset Airline (https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O)




Table A2. H2O Driverless AI* Benchmark System Configuration

Platform S2600WFT

# Nodes 1

# Sockets 2

CPU Intel® Xeon® Platinum 8260L processor (2.4 GHz, 165W)

Cores/socket, Threads/socket 24/48

Ucode – microcode 0x400000A

Intel® Hyper Threading Technology On

Intel® Turbo Boost Technology On

BIOS version including microcode verison: cat /proc/cpuinfo | grep microcode –m1

SE5C620.86B.0D.01.0159.100720181711 Microcode 0x400000A

System DDR Memory Configuration (slots/capacity/run-speed)

12 slots/16GB/2666

System DCPMM Configuration (slots/capacity/run-speed)

N/A

Total Memory/Node (DDR+DCPMM) 384+0

StorageCapacity Tier 6x 2 TB Intel® SSD DC P4610

Cache Tier 2x 375 GB Intel® Optane™ SSD DC P4800X

Boot Drive 1x 480 GB SATA-based Intel® SSD D3-S4610

NIC 1x Intel® Ethernet Connection X722

PCH N/A

Other Hardware (Accelerator) N/A

OS CentOS* Linux* release 7.6.1810 (Core)

Kernel 3.10.0-862.14.4.el7.x86_64

H2O Driverless AI* v1.5

Intel® Data Analytics Acceleration Library and Intel® Threading Building Blocks

Baseline Configuration: N/A Test Configuration: Intel DAAL v2019.1.1 and Intel TBB v2019.4

Datasets MNIST (http://yann.lecun.com/exdb/mnist/) HIGGS (https://archive.ics.uci.edu/ml/datasets/HIGGS)

http://yann.lecun.com/exdb/mnist/

https://archive.ics.uci.edu/ml/datasets/HIGGS


Appendix B: Benchmarking XGBoost* ParametersWe used these parameters to benchmark the optimized XGBoost* algorithm:keep_cross_validation_models = Falsekeep_cross_validation_predictions = Falsescore_each_iteration = Falsemax_depth=6learn_rate=0.3subsample=1tree_method=”hist”ntrees=100backend =”cpu”

Appendix C: Intel® DAAL Setup and H2O Driverless AI* ParametersThe following information describes the modifications we made to integrate Intel® Data Analytics Acceleration Library (Intel® DAAL) with H2O Driverless AI*.

1. Install DAAL4PY and all required Intel® libraries:conda create -n DAAL4PY -c intel python=3.6 impi-devel tbb-devel daal daal-include cython jinja2 numpy daal4py

2. Install H2O Driverless AI v1.5:wget https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.5.4-65/x86_64-centos7/dai-1.5.4-linux-x86_64.shchmod 755 dai-1.5.4-linux-x86_64.sh. ./dai-1.5.4-linux-x86_64.shcp -r miniconda3/envs/DAAL4PY/include/{algorithms,data_management,serial,services,tbb,daal.h} dai-1.5.4-linux-x86_64/python/include/cp -r miniconda3/envs/DAAL4PY/lib/{libdaal_*,libtbb*} dai-1.5.4-linux-x86_64/python/lib/export PATH=dai-1.5.4-linux-x86_64/python/bin:$PATHexport LD_LIBRARY_PATH=dai-1.5.4-linux-x86_64/python/lib:$LD_LIBRARY_PATHgit clone https://github.com/IntelPython/daal4py.gitexport DAALROOT=/root/dai-1.5.4-linux-x86_64/python && export TBBROOT=$DAALROOT && export MPIROOT=$DAALROOT && export NO_DIST=1 && export NO_STREAM=1 conda activate DAAL4PYcd daal4py && python setup.py build_ext && python setup.py install

3. Modify config.toml to run XGBoost* and disable all other algorithms:max_cores = 56fixed_ensemble_level = 0feature_engineering_effort = 0parameter_tuning_num_models = 1params_xgboost = “{‘max_depth’: 6, ‘grow_policy’: ‘depthwise’, ‘early_stopping_rounds’: None, ‘early_stopping_threshold’: None}”params_tune_xgboost = “{‘max_depth’: [6], ‘grow_policy’: [‘depthwise’]}”enable_xgboost = “on”enable_glm = “off”enable_lightgbm = “off”enable_rf = “off”max_nestimators = 100max_nestimators_feature_evolution_factor = 1.0


Solution Provided By:

1 MIT Sloan Management Review, September 17, 2018, “2018 MIT Sloan Management Review and The Boston Consulting Group (BCG) Artificial Intelligence Global Executive Study and Research Report.” sloanreview.mit.edu/projects/artificial-intelligence-in-business-gets-real

2 Deloitte Insights, August 8, 2018, “2018 Global CIO Survey: Manifesting Legacy.” deloitte.com/insights/us/en/topics/leadership/global-cio-survey-2018.html

3 Tested by Intel March 7, 2019. Baseline Configuration: See Table A1 for complete configuration details. Test configuration: See Table A1 and Appendix B for complete configuration details. The only difference between the baseline and test configurations was that the baseline used the native XGBoost* algorithm, while the test configuration used the optimized XGBoost algorithm (github.com/dmlc/xgboost).

4 Tested by Intel March 7, 2019. H2O Driverless AI configured to run only Boost, with the following settings: Accuracy=7, Time=2, Interpretability=5. Baseline configuration: See Table A2 for complete configuration details. Test configuration: See Table A2 and Appendix C for complete configuration details.

5 Tested by Intel March 7, 2019. See Table A1 for complete configuration details.6 See endnote 5.7 See endnote 3. The configurations were the same as listed in Table A1 except that H2O Sparkling Water* was not used.8 See endnote 4. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation.

Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and

functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit intel.com/benchmarks.

Configurations: See Appendices A, B, and C for details. Performance results are based on Intel testing as of March 7, 2019 and may not reflect all publicly available security updates. See

configuration disclosure for details. No product or component can be absolutely secure. Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that

are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice Revision #20110804 Intel does not control or audit third-party data. You should review this content, consult other sources, and confirm whether referenced data

are accurate. All information provided here is subject to change without notices. Contact your Intel representative to obtain the latest Intel product

specifications and roadmaps. Intel® Turbo Boost Technology requires a PC with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology

performance varies depending on hardware, software, and overall system configuration. Check with your PC manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see intel.com/technology/turboboost.

Results in this document have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

For more complete information about performance and benchmark results, visit intel.com/benchmarks. Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and

configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel, the Intel logo, Xeon, and Optane are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. © Intel Corporation 0719/JSTA/KC/PDF 340301-001US

https://sloanreview.mit.edu/projects/artificial-intelligence-in-business-gets-real/

https://www2.deloitte.com/insights/us/en/topics/leadership/global-cio-survey-2018.html

https://www2.deloitte.com/insights/us/en/topics/leadership/global-cio-survey-2018.html

https://github.com/dmlc/xgboost

http://www.intel.com/benchmarks

http://www.intel.com/technology/turboboost

http://www.intel.com/benchmarks

Documents

Building a High-Performance and Accurate Machine-Learning ......H2O Driverless AI is a high-performance, single-node computing platform for automatic development and rapid deployment