Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
wwwbsces
Automating Big Data Benchmarking
and Performance Analysis with ALOJA
David Carrera Senior Researcher
wwwbsces
November 2015
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Setting up your Big Data system
Image source Intelreg Distribution for Apache Hadoop
Default values in Apache source not ideal
Large and spread eco systemndash Different distributions
ndash Product claims
Each job is differentndash No one-fits-all solution
Cloud vs On-premisendash IaaS
bull Tens of different VMs to choosendash PaaS
bull HDInsight CloudBigData EMR
New economic HWndash SSDs InfiniBand Networking
How do I set my system too many options
Terasort
K-means
Wordcount
Sample mappers and reducer for 3 popular
benchmarks
Eco-system is not transparent
ndash Needs auditing
Product claims on performance and TCO
BSCrsquos project ALOJA towards cost-effective Big Data
Benchmarking and Analysis tools
Online repository and largest Big Data repositoryndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations testedbull Of different NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
Commands and providers
Provisioning commands
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy
ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
Providers
On-premise
ndash Custom settings for clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack Rackspace AWS
(testing)
Cloud PaaS
ndash HDInsight CloudBigData EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Hadoop
ndash gt 100+ tunable parameters
ndash obscure and interrelated
bull mapredmapreducetasksspeculativeexecution
bull iosortmb 100 (300)
bull iosortrecordpercent 5 (15)
bull iosortspillpercent 80 (95 ndash 100)
ndash Similar for Hive Spark HBase
Dominated by rules-of-thumb
ndash Number of containers in parallel
bull 05 - 2 per CPU core
Large stack for tuning
Setting up your Big Data system
Image source Intelreg Distribution for Apache Hadoop
Default values in Apache source not ideal
Large and spread eco systemndash Different distributions
ndash Product claims
Each job is differentndash No one-fits-all solution
Cloud vs On-premisendash IaaS
bull Tens of different VMs to choosendash PaaS
bull HDInsight CloudBigData EMR
New economic HWndash SSDs InfiniBand Networking
How do I set my system too many options
Terasort
K-means
Wordcount
Sample mappers and reducer for 3 popular
benchmarks
Eco-system is not transparent
ndash Needs auditing
Product claims on performance and TCO
BSCrsquos project ALOJA towards cost-effective Big Data
Benchmarking and Analysis tools
Online repository and largest Big Data repositoryndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations testedbull Of different NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
Commands and providers
Provisioning commands
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy
ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
Providers
On-premise
ndash Custom settings for clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack Rackspace AWS
(testing)
Cloud PaaS
ndash HDInsight CloudBigData EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Default values in Apache source not ideal
Large and spread eco systemndash Different distributions
ndash Product claims
Each job is differentndash No one-fits-all solution
Cloud vs On-premisendash IaaS
bull Tens of different VMs to choosendash PaaS
bull HDInsight CloudBigData EMR
New economic HWndash SSDs InfiniBand Networking
How do I set my system too many options
Terasort
K-means
Wordcount
Sample mappers and reducer for 3 popular
benchmarks
Eco-system is not transparent
ndash Needs auditing
Product claims on performance and TCO
BSCrsquos project ALOJA towards cost-effective Big Data
Benchmarking and Analysis tools
Online repository and largest Big Data repositoryndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations testedbull Of different NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
Commands and providers
Provisioning commands
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy
ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
Providers
On-premise
ndash Custom settings for clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack Rackspace AWS
(testing)
Cloud PaaS
ndash HDInsight CloudBigData EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Eco-system is not transparent
ndash Needs auditing
Product claims on performance and TCO
BSCrsquos project ALOJA towards cost-effective Big Data
Benchmarking and Analysis tools
Online repository and largest Big Data repositoryndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations testedbull Of different NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
Commands and providers
Provisioning commands
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy
ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
Providers
On-premise
ndash Custom settings for clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack Rackspace AWS
(testing)
Cloud PaaS
ndash HDInsight CloudBigData EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
BSCrsquos project ALOJA towards cost-effective Big Data
Benchmarking and Analysis tools
Online repository and largest Big Data repositoryndash 50000+ runs of HiBench TPC-H and [some] BigBench
ndash Over 100 HW configurations testedbull Of different NodeVM disks and networks
bull Cloud Multi-cloud provider including both IaaS and PaaS
bull On-premise High-end HPC commodity low-power
Community ndash Collaborations with industry and Academia
ndash Presented in different conferences and workshops
ndash Visibility 47 different countries
httpalojabsces
Big Data Benchmarking
Online Repository
Web
Analytics
Commands and providers
Provisioning commands
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy
ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
Providers
On-premise
ndash Custom settings for clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack Rackspace AWS
(testing)
Cloud PaaS
ndash HDInsight CloudBigData EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Commands and providers
Provisioning commands
Connect
ndash Node and Cluster
ndash Builds SSH cmd line
bull SSH proxies
Deploy
ndash Creates a cluster
ndash Sets SSH credentials
ndash If created updates config as needed
ndash If stopped starts nodes
Start Stop
Delete
Queue jobs to clusters
Providers
On-premise
ndash Custom settings for clusters
bull Multiple disk types
bull Different architectures
Cloud IaaS
ndash Azure OpenStack Rackspace AWS
(testing)
Cloud PaaS
ndash HDInsight CloudBigData EMR soon
Code at httpsgithubcomAlojaalojatreemasteraloja-deploy
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Cluster and nodes definitions multi-provider abstraction
Steps to define a cluster
Import defaults (if any)ndash Sets OS version
Select providerndash Azure RackSpace AWS On-
premise vagranthellip
Name the cluster and size
Optionalndash Select VM type
ndash Attached disks
ndash Define metadata
ndash And costs
Nodes can also be definedndash For Web share folders etc
You can logically split clusters
Azure 8-datanode sample
load AZURE defaults
source $CONF_DIRcluster_defaultsconf
clusterName=azure-large-8
numberOfNodes=8
vmSize=Large
attachedVolumes=3
diskSize=1024 in GB
details
vmCores=4
vmRAM=7 in GB
costs
clusterCostHour=1584 in USD
clusterType=IaaS
Source sample httpsgithubcomAlojaalojablobmastershellconfcluster_al-08conf
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
8
Entry point for explore the results collected from the executions
ndash Index of executions
bull Quick glance of executions
bull Searchable Sortablendash Execution details
bull Performance charts and histograms
bull Hadoop counters
bull Jobs and task details
Data management of benchmark executionsndash Data importing from different clusters
ndash Execution validation
ndash Data management and backup
Cluster definitions ndash Cluster capabilities (resources)
ndash Cluster costs
Sharing resultsndash Download executions
ndash Add external executions
Documentation and Referencesndash Papers links and feature documentation
2) ALOJA-WEB Online Repository
Available at httpalojabsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Comparing 3 runs on same cluster different configs
Mappers and reducers 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
400s 2 containers Local disk
800s 3 containers Local disk
600s 2 containers Remote disk
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Comparing 3 runs on same cluster different configs
CPU utilization 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
Moderate iowait
Higher iowait
Very high iowait
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Comparing 3 runs on same cluster different configs
CPU queues 48-node cluster
URL httpalojabscesperfchartsexecs5B5D=90086ampexecs5B5D=90088ampexecs5B5D=90104
1 blocked process
4 blocked processes
4 blocked processes (map
phase)
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Impact of HW configurations in Speedup
Disks and Network Cloud remote volumes
Local
only
1
Remote
2
Remotes
3 Remotes
3
Remotes
tmp local
2
Remotes
tmp local
1 Remotes
tmp local
HDD-
ETH
HDD-IB
SSD-
ETH
SDD-IB
Speedup (higher is better)
Results using httphadoopbscesconfigimprovement
Details httpsrawgithubusercontentcomAlojaalojamasterpublicationsBSC-MSR_ALOJApdf
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Clusters by cost-effectiveness
URL httpalojabscesclustercosteffectiveness
bull Cluster ID reference
bull RL-06 = 8 performance1-8 VMs
bull RL-16 = 8 general1-8 VMs
bull RL-19 = 8 io1-15 VMs
bull RL-33 = 8 performance2-30 VMs
bull RL-30 = 8 io1-30 VMs
Performance2-30
Io1-30
Io1-15
General1-8
Performance1-8
Io1-30
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
This shows a sample of a new screen (with sample data) to find the most cost-effective cluster sizendash X axis number of datanodes (cluster size
ndash Left Y Execution time (lower is better)
ndash Right Y Execution cost
CostPerformance Scalability of cluster size
Execution time Execution cost
Recommended size
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Modeling Hadoop ndash Methodology
Methodologyndash 3-step learning process
ndash Different split sizes tested (10 le training le 50)
ndash Different learning algorithms Regression trees Nearest-neighbors methods LinearMultinomial regressions Neural networks Deep Learning
Learning resultsndash Mean Absolute Errors ~250s (ranges in [100s 6000s])
ndash Relative Absolute Errors between [010 025]
bull Depend on benchmark and of examples per benchmark
bull Some executions aremay be anomalies
15
ALOJA
Data-Set
Training
Validation
Testing
ModelSelect this
modelFinal
Model
Train
Test the model
Test the model
Tune algorithm re-train
NO
YES
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
Knowledge Discovery
Make analyzing results easier
ndash Multi-variable visualization
ndash Trees separating relevant attributes
ndash Other interesting tools
16
Tree Descriptor
Disk=HDD
Net=ETH
IOFBuf=128KB rArr 2935s
IOFBuf=64KB rArr 2942s
Net=IB
IOFBuf=128KB rArr 3118s
IOFBuf=64KB rArr 3125s
Disk=SSD
Net=ETH
IOFBuf=128KB rArr 1248s
IOFBuf=64KB rArr 1256s
Net=IB
IOFBuf=128KB rArr 1233s
IOFBuf=64KB rArr 124s1
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces
wwwbsces
Thank you
For further information please contact
davidcarrerabsces
wwwbsces