19
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs Grigori Fursin cTuning foundation and cKnowledge SAS github.com/ctuning/ck cKnowledge.io Accepted for Philosophical Transactions of the Royal Society A (Submitted: 12 June 2020) Abstract This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge). The CK concept is to decompose research projects into reusable components that encapsulate research artifacts and provide unified application programming interfaces (APIs), command-line interfaces (CLIs), meta descriptions and common automation actions for related artifacts. The CK framework is used to organize and manage research projects as a database of such components. Inspired by the USB ”plug and play” approach for hardware, CK also helps to assemble portable workflows that can automatically plug in compatible components from different users and vendors (models, datasets, frameworks, compilers, tools). Such workflows can build and run algorithms on different platforms and environments in a unified way using the customizable CK program pipeline with software detection plugins and the automatic installation of missing packages. This article presents a number of industrial projects in which the modular CK approach was successfully validated in order to automate benchmarking, auto-tuning and co-design of efficient software and hardware for machine learning (ML) and artificial intelligence (AI) in terms of speed, accuracy, energy, size and various costs. The CK framework also helped to automate the artifact evaluation process at several computer science conferences as well as to make it easier to reproduce, compare and reuse research techniques from published papers, deploy them in production, and automatically adapt them to continuously changing datasets, models and systems. The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge, best practices, artifacts, workflows and experimental results in a common, portable and reproducible format at cKnowledge.io. Keywords: research automation, reproducibility, reusability, portability, DevOps, AIOps, MLOps, FAIR principles, knowledge management, best practices, collaboration, optimization, portable workflow, microservices, adaptive container, machine learning, artificial intelligence, edge devices, MLPerf, API, crowd-benchmarking, crowd-tuning, auto-tuning, software/hardware co-design, efficient systems, Pareto efficiency, optimization repository 1 Motivation Ten years ago I developed the cTuning.org platform and released all my research code and data into the public domain in order to crowdsource the training of our machine learning-based compiler (MILEPOST GCC) [41]. My intention was to accelerate the very time-consuming auto-tuning process and help our compiler to learn the most efficient optimizations across real programs, datasets, platforms and environments provided by volunteers. We had a positive response from the community and, in just a few days, I collected as many optimization results as I had during the entire MILEPOST project. However, the initial excitement quickly faded when I struggled to reproduce most of the performance numbers and ML model predictions because even a tiny change in software, hardware, environment or the run-time state of the system could influence performance while I did not have a mechanism to detect such changes [51, 43]. 1 arXiv:2011.01149v2 [cs.LG] 30 Jan 2021

Grigori Fursin cTuning foundation and cKnowledge SAS

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Grigori Fursin cTuning foundation and cKnowledge SAS

Collective Knowledge organizing research projects as a database of

reusable components and portable workflows with common APIs

Grigori Fursin

cTuning foundation and cKnowledge SAS

githubcomctuningck cKnowledgeio

Accepted for Philosophical Transactions of the Royal Society A

(Submitted 12 June 2020)

Abstract

This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge)The CK concept is to decompose research projects into reusable components that encapsulate research artifactsand provide unified application programming interfaces (APIs) command-line interfaces (CLIs) meta descriptionsand common automation actions for related artifacts The CK framework is used to organize and manage researchprojects as a database of such componentsInspired by the USB rdquoplug and playrdquo approach for hardware CK also helps to assemble portable workflows thatcan automatically plug in compatible components from different users and vendors (models datasets frameworkscompilers tools) Such workflows can build and run algorithms on different platforms and environments in a unifiedway using the customizable CK program pipeline with software detection plugins and the automatic installation ofmissing packagesThis article presents a number of industrial projects in which the modular CK approach was successfully validated inorder to automate benchmarking auto-tuning and co-design of efficient software and hardware for machine learning(ML) and artificial intelligence (AI) in terms of speed accuracy energy size and various costs The CK frameworkalso helped to automate the artifact evaluation process at several computer science conferences as well as to make iteasier to reproduce compare and reuse research techniques from published papers deploy them in production andautomatically adapt them to continuously changing datasets models and systemsThe long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all theirknowledge best practices artifacts workflows and experimental results in a common portable and reproducibleformat at cKnowledgeio

Keywords research automation reproducibility reusability portability DevOps AIOps MLOps FAIR principles

knowledge management best practices collaboration optimization portable workflow microservices adaptive container

machine learning artificial intelligence edge devices MLPerf API crowd-benchmarking crowd-tuning auto-tuning

softwarehardware co-design efficient systems Pareto efficiency optimization repository

1 Motivation

Ten years ago I developed the cTuningorg platformand released all my research code and data into thepublic domain in order to crowdsource the trainingof our machine learning-based compiler (MILEPOSTGCC) [41] My intention was to accelerate thevery time-consuming auto-tuning process and help ourcompiler to learn the most efficient optimizations acrossreal programs datasets platforms and environments

provided by volunteers

We had a positive response from the community andin just a few days I collected as many optimizationresults as I had during the entire MILEPOST projectHowever the initial excitement quickly faded when Istruggled to reproduce most of the performance numbersand ML model predictions because even a tiny change insoftware hardware environment or the run-time state ofthe system could influence performance while I did nothave a mechanism to detect such changes [51 43]

1

arX

iv2

011

0114

9v2

[cs

LG

] 3

0 Ja

n 20

21

Develop new software

Publish new algorithms and models

Use standard data set

Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers

Engineers

Somehow build a working system based on published resultsand a new SWHW stack

Stable container

Real data

Attempt to reproduce results

Real accuracyReal speedSizeCosts

Increment

Continue rebuilding optimizing reproducing

Struggle to reproduce results and design an efficient system

Practitioners can take many months to optimize amp deploy

Very painful and time-consuming to build and optimize algorithms

with continuously changing software hardware models amp data

from different vendors

Timeline

Figure 1 Reproducing research papers and adopting novel techniques in production is a tedious repetitive andtime-consuming process because of continuously changing software hardware models and datasets and a lack ofcommon formats and APIs for shared artifacts (code data models experimental results scripts and so on)

Worse still I could not compare these empiricalresults with other published techniques because theyrarely included the full experiment specification withthe shared research code and all the necessary artifactsneeded to be able to reproduce results Furthermore itwas always very difficult to add new tools benchmarksand datasets to any research code because it requirednumerous changes in different ad-hoc scripts repetitiverecompilation of the whole project when new software wasreleased complex updates of database tables with resultsand so on [42]

These problems motivated me to establish thenon-profit cTuning foundation and continue workingwith the community on a common methodology andopen-source tools to enable collaborative reproduciblereusable and trustable RampD We helped to initiatereproducibility initiatives and support artifact evaluationat several computer science conferences in collaborationwith the ACM [5 40] We also promoted sharing of codeartifacts and results in a unified way along with researchpapers [4]

This community service gave me a unique chanceto participate in reproducibility studies of more than100 research papers at the International Conference onArchitectural Support for Programming Languages andOperating Systems (ASPLOS) International Symposiumon Code Generation and Optimization (CGO) ACMSIGPLAN Annual Symposium on Principles and Practiceof Parallel Programming (PPoPP) SupercomputingInternational Conference on Machine Learning and

Systems (MLSys) and other computer science conferencesover the past five years [27] I also started deploying someof these techniques in production in collaboration with myindustrial partners in order to better understand all theproblems associated with building trustable reproducibleand production-ready computational systems

This practical experience confirmed my previousfindings while sharing ad-hoc research code artifactstrained models and Docker images along with researchpapers is a great step forward it is only the tip ofthe reproducibility iceberg [42] The major challengeafterwards is threefold It is necessary to figureout how to use research techniques outside originalcontainers with other data code and models run themin a reliable and efficient way across rapidly evolvingsoftware heterogeneous hardware and legacy platformswith continuously changing interfaces and data formatsand balance multiple characteristics including speedlatency accuracy memory size power consumptionreliability and costs (Figure 1)

2 Collective Knowledge concept

When helping to organize the artifact evaluation processat systems and machine learning conferences I decidedto introduce an Artifact Appendix and a reproducibilitychecklist [4 27] My goal was to help researchersto describe how to configure build run validate andcompare research techniques in a unified way acrossdifferent conferences and journals It also led me to

2

notice that most research projects use some ad-hocscripts often with hardwired paths to perform thesame repetitive tasks including downloading modelsand datasets detecting platform properties installingsoftware dependencies building research code runningexperiments validating outputs reproducing resultsplotting graphs and generating papers [42]

This experience motivated me to search for a solutionfor automating such common tasks and making themreusable and customizable across different researchprojects platforms and environments First I startedlooking at related tools that were introduced to automateexperiments and make research more reproducible

bull Workflow frameworks such as MLFlow [57]Kedro [52] Amazon SageMaker [3] Kubeflow [34]Apache Taverna [56] popper [48] CommonWL [36]and many others help to abstract and automatedata science operations They are very usefulfor data scientists but do not yet provide auniversal mechanism to automatically build and runalgorithms across different platforms environmentslibraries tools models and datasets Researchersand engineers often have to implement thisfunctionality for each new project from scratchwhich can become very complicated particularlywhen targeting new hardware embedded devicesTinyML and IoT

bull Machine learning benchmarking initiatives such asMLPerf [53] MLModelScope [39] and Deep500 [37]attempt to standardise machine learning modelbenchmarking and make it more reproducibleHowever production deployment integration withcomplex systems and adaptation to continuouslychanging user environments platforms tools anddata are currently out of their scope

bull Package managers such as Spack [46] andEasyBuild [47] are very useful for rebuilding andfixing the whole software environment Howeverthe integration with workflow frameworks andautomatic adaptation to existing environmentsnative cross-compilation particularly for embeddeddevices and support for non-software packages(models datasets scripts) are still in progress

bull Container technology such as Docker [50] is veryuseful for preparing and sharing stable softwarereleases However it hides the software chaos ratherthan solving it lacks common APIs for researchprojects requires enormous amount of space hasvery poor support for embedded devices and doesnot yet help integrate models with existing projectslegacy systems and user data

bull PapersWithCode platform [24] helps to find relevantresearch code for machine learning papers andkeeps track of state-of-the-art ML research using

public scoreboards with non-validated experimentalresults from papers However my experiencereproducing research papers suggests that sharingad-hoc research code is not enough to make researchtechniques reproducible customizable portable andtrustable [42]

While testing all these useful tools and analyzingthe Jupyter notebooks Docker images and GitHubrepositories shared alongside research papers I startedthinking that it would be possible to reorganize themas some sort of database of reusable components witha common API command line web interface and metadescription We could then reuse artifacts and somecommon automation actions across different projectswhile applying DevOps principles to research projects

Furthermore we could also gradually abstract andinterconnect all existing tools rather than rewriting orsubstituting them This in turn helped to createrdquoplug and playrdquo workflows that could automaticallyconnect compatible components from different users andvendors (models datasets frameworks compilers toolsplatforms) while minimising manual interventions andproviding a common interface for all shared researchprojects

I called my project Collective Knowledge (CK)because my goal was to connect researchers andpractitioners in order to share their knowledge bestpractices and research results in the form of reusablecomponents portable workflows and automation actionswith common APIs and meta descriptions

3 CK framework and an open CKformat

I have developed the CK framework as a small Pythonlibrary with minimal dependencies to be very portablewhile offering the possibility of being re-implementedin other languages such as C C++ Java and GoThe CK framework has a unified CLI a PythonAPI and a JSON-based web service to manage CKrepositories and add find update delete rename andmove CK components (sometimes called CK entries orCK data) [9]

CK repositories are human-readable databases ofreusable CK components that can be created in any localdirectory and inside containers pulled from GitHub andsimilar services and shared as standard archive files [7]CK components simply wrap (encapsulate) user artifactsand provide an extensible JSON meta description withcommon automation actions [6] for related artifacts

Automation actions are implemented using CKmodules which are Python modules with functionsexposed in a unified way via CK API and CLI and whichuse dictionaries for input and output (extensible andunified CK inputoutput (IO)) The use of dictionaries

3

1) GitHub repo or archive file

ckrjson

cmalias-a-dataset

cmalias-a-module

cmalias-a-program

cmalias-a-paper

datasetcmalias-a-images

datasetimages1png

cmmetajson

cminfojson

programcmalias-a-detect-edges

programdetect-edgesprogramcpp

Makefile

runsh

check-outputsh

autotunesh

cmmetajson

cminfojson

papercmalias-a-report

paperreportplditex

cmmetajson

cminfojson

modulecmalias-a-program

moduleprogrammodulepy

cmmetajson

cminfojson

2) User home directory

$HOMEproject2000-forgot-everything

ckrjson

cmalias-a-dataset

cmalias-a-program

cmalias-a-experiment

cmalias-a-paper

datasetimages-20002png

cmmetajson

programcrazy-algorithmsourcecpp

buildbat

cmmetajson

experimentautotuningmanylogs

lots-of-statssql

cmmetajson

paperasplossourcetex

cmmetajson

3) Jupytercolab notebook

import matplotlibpyplot as plt

import pandas

import numpy

import ckkernel as ck

We can now access all our software projects as a database

r=ckaccess(lsquoactionrsquorsquosearchrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoadd_metarsquorsquoyesrsquo)

if r[lsquoreturnrsquo]gt0 ckerr(r)

list_of_all_ck_entries=r[lsquolstrsquo]

for ck_entry in list_of_ck_entries

CK will find all dataset entries in all CK-compatible projects

even old ones ndash you donrsquot need to remember

the project structure Furthermore you can continue

reusing project even if students or engineers leave

image=ck_entry[lsquopathrsquo]+ck_entry[lsquometarsquo][lsquoimage_filenamersquo]

hellip

Call reusable CK automation action to extract features

features=ckaccess(lsquoactionrsquorsquoget_featuresrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoimagersquoimage)

if features[lsquoreturnrsquo]gt0 ckerr(features)

4) Docker image

Install stable OS and packages

Set environment

Use familiar CK APICLI

to run experiments

inside or outside your VM

Move data outside VM in the CK format

to continue processing it via CK

$ ck pull repock-crowdtuning

$ ck add repo2000-forgot-everything

$ ck ls datasetimage

datasetimages

datasetimages-2000

$ ck ls program

programdetect-edges

programobject-classification

$ ck compile programdetect-edges --speed

Detecting compilers on your systemhellip

1) LLVM 1001

2) GCC 81

3) GCC 93

4) ICC 191

$ ck run programdetect-edges

Searching for datasets

Select dataset

1) images

2) images-2000

$ ck benchmark programdetect-edges

--record --ecord_uoa=autotuning

$ ck reproduce experimentautotuning

Figure 2 Organizing software projects as a human-readable database of reusable components that wrap artifactsand provide unified APIs CLI meta descriptions and common automation actions for related artifacts The CKAPIs and control files are highlighted in blue

makes it easier to support continuous integration toolsand web services and to extend the functionality whilekeeping backward compatibility The unified IO alsomakes it possible to reuse such actions across projectsand to chain them together into unified CK pipelines andworkflows

As I wanted CK to be non-intrusive and technologyneutral I decided to use a simple 2-level directorystructure to wrap user artifacts into CK componentsas shown in Figure 2 The root directory of theCK repository contains the ckrjson file to describethe repository and specify dependencies on other CKrepositories to explicitly reuse their components andautomation actions

CK uses cm directories (Collective Meta) similar togit to store meta information of all components as wellas Unique IDs to be able to find components even if their

user-friendly names have changed over time (CK alias)CK modules are always stored in module ltCK modulename gt directories in the CK repository CK componentsare stored in ltCK module name gt ltCK data namegt directories Each CK component has a cm directorywith the metajson file describing a given artifact andinfojson file to keep the provenance of a given artifactincluding copyrights licenses creation date names of allcontributors and so on

The CK framework has an internal default CKrepository with stable CK modules and the mostcommonly used automation actions across many researchprojects When the CK framework is used for the firsttime it also creates a local CK repository in the userspace to be used as a scratch pad

After discussing the CK CLI with colleagues I decidedto implement it in a similar way to natural language tomake it easier for users to remember the commands

ck ltact iongt ltCK module namegt ( f l a g s ) (input j son or input yaml )ck ltact iongt ltCK module namegtltCK data namegt ( f l a g s )ck ltact iongt ltCK repo s i t o r y namegtltCK module namegtltCK data namegt

The next example demonstrates how to compile andrun the shared automotive benchmark on any platformand then create a copy of the CK program component

4

pip i n s t a l l ck

ck pu l l repo minusminusu r l=https github com ctuning ckminuscrowdtuning

ck search datase t minusminustags=jpeg

ck search program cbenchminusautomotiveminuslowast

ck f i nd program cbenchminusautomotiveminussusan

ck load program cbenchminusautomotiveminussusan

ck help program

ck compi le program cbenchminusautomotiveminussusan minusminusspeedck run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4

ck run program minusminushelp

ck cp program cbenchminusautomotiveminussusan l o c a l program newminusprogramminusworkflowck f i nd program newminusprogramminusworkflow

ck benchmark program newminusprogramminusworkflow minusminusr ecord minusminusrecord uoa=myminust e s tck rep lay experiment myminust e s t

The CK program module describes dependencies onsoftware detection plugins and meta packages using

simple tags and version ranges that the community hasto agree on

rdquo compi le r rdquo

rdquonamerdquo rdquoC++ compi le r rdquo rdquo s o r t rdquo 10 rdquo tags rdquo rdquo compiler langminuscpprdquo

rdquo l i b r a r y rdquo

rdquonamerdquo rdquoTensorFlow C++ APIrdquo rdquo s o r t rdquo 20 rdquo ve r s i on f rom rdquo [ 1 1 3 1 ] rdquo v e r s i o n t o rdquo [ 2 0 0 ] rdquo no tags rdquo rdquo tensor f l owminus l i t e rdquo rdquo tags rdquo rdquo l i b t ensor f l ow v s t a t i c rdquo

I also implemented a simple access function in the CKPython API to access all the CK functionality in a very

simple and unified way

import ck k e rne l as ck

Equivalent o f rdquock compi le program cbenchminusautomotiveminussusan minusminusspeed rdquor=ck a c c e s s ( rsquo a c t i on rsquo rsquo compi le rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo speed rsquo rsquo yes rsquo )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

Equivalent o f rdquock run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4r=ck a c c e s s ( rsquo a c t i on rsquo rsquo run rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo env rsquo rsquoOMPNUMTHREADSrsquo 4 )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

5

Such an approach allowed me to connect colleaguesstudents researchers and engineers from differentworkgroups to implement share and reuse automationactions and CK components rather than reimplementingthem from scratch Furthermore the CollectiveKnowledge concept supported the so-called FAIRprinciples to make data findable accessible interoperableand reusable [55] while also extending it to code and otherresearch artifacts

4 CK components and workflowsto automate machine learningand systems research

One of the biggest challenges I have faced throughoutmy research career automating the co-design process ofefficient and self-optimizing computing systems has beenhow to deal with rapidly evolving hardware modelsdatasets compilers and research techniques It is forthis reason that my first use case for the CK frameworkwas to work with colleagues to collaboratively solve theseproblems and thus enable trustable reliable and efficientcomputational systems that can be easily deployed andused in the real world

We started using CK as a flexible playground todecompose complex computational systems into reusablecustomizable and non-virtualized CK components whileagreeing on their APIs and meta descriptions As the firststep I implemented basic actions that could automatethe most common RampD tasks that I encountered duringartifact evaluation and in my own research on systemsand machine learning [42] I then shared the automationactions to analyze platforms and user environmentsin a unified way detect already installed code dataand ML models (CK software detection plugins [32])and automatically download install and cross-compilemissing packages (CK meta packages [31]) At the sametime I provided initial support for different compilersoperating systems (Linux Windows MacOS Android)and hardware from different vendors including IntelNvidia Arm Xilinx AMD Qualcomm Apple Samsungand Google

Such an approach allowed my collaborators [12] tocreate share and reuse different CK components withunified APIs to detect install and use different AI andML frameworks libraries tools compilers models anddatasets The unified automation actions APIs andJSON meta descriptions of all CK components also helpedus to connect them into platform-agnostic portableand customizable program pipelines (workflows) witha common interface across all research projects whileapplying the DevOps methodology

Such rdquoplugampplayrdquo workflows [45] can automaticallyadapt to evolving environments models datasets andnon-virtualized platforms by automatically detecting the

properties of a target platform finding all requireddependencies and artifacts (code data and models)on a user platform with the help of CK softwaredetection plugins [32] installing missing dependenciesusing portable CK meta packages [31] building andrunning code and unifying and testing outputs [25]Moreover rather than substituting or competing withexisting tools the CK approach helped to abstract andinterconnect them in a relatively simple and non-intrusiveway

CK also helps to protect user workflows wheneverexternal files or packages are broken disappear or moveto another URL because it is possible to fix such issuesin a shared CK meta package without changing existingworkflows For example our users have already takenadvantage of this functionality when the Eigen librarymoved from BitBucket to GitLab and when the oldImageNet dataset was no longer supported but couldstill be downloaded via BitTorrent and other peer-to-peerservices

Modular CK workflows can help to keep track of theinformation flow within such workflows gradually exposeconfiguration and optimization parameters as vectors viadictionary-based IO and combine public and privatecode and data They also help to monitor modeland auto-tune system behaviour retarget research codeand machine learning models to different platforms fromdata centers to edge devices integrate them with legacysystems and reproduce results

Furthermore we can use CK workflows inside standardcontainers such as Docker while providing a unifiedCK API to customize rebuild and adapt them toany platform (Adaptive CK container) [2] thus makingresearch techniques truly portable reproducible andreusable I envision that such adaptive CK containers andportable workflows can complement existing marketplacesto deliver portable customizable trustable and efficientAI and ML solutions that can be continuously optimizedacross diverse models datasets frameworks librariescompilers and run-time systems

In spite of some initial doubts my collaborative CKapproach has worked well to decompose complex researchprojects and computational systems into reusablecomponents while agreeing on common APIs and metadescriptions the CK functionality evolved from just afew core CK modules and abstractions to hundreds of CKmodules [30] and thousands of reusable CK componentsand workflows [33] to automate the most repetitiveresearch tasks particularly for AI ML and systems RampDas shown in Figure 3

6

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 2: Grigori Fursin cTuning foundation and cKnowledge SAS

Develop new software

Publish new algorithms and models

Use standard data set

Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers

Engineers

Somehow build a working system based on published resultsand a new SWHW stack

Stable container

Real data

Attempt to reproduce results

Real accuracyReal speedSizeCosts

Increment

Continue rebuilding optimizing reproducing

Struggle to reproduce results and design an efficient system

Practitioners can take many months to optimize amp deploy

Very painful and time-consuming to build and optimize algorithms

with continuously changing software hardware models amp data

from different vendors

Timeline

Figure 1 Reproducing research papers and adopting novel techniques in production is a tedious repetitive andtime-consuming process because of continuously changing software hardware models and datasets and a lack ofcommon formats and APIs for shared artifacts (code data models experimental results scripts and so on)

Worse still I could not compare these empiricalresults with other published techniques because theyrarely included the full experiment specification withthe shared research code and all the necessary artifactsneeded to be able to reproduce results Furthermore itwas always very difficult to add new tools benchmarksand datasets to any research code because it requirednumerous changes in different ad-hoc scripts repetitiverecompilation of the whole project when new software wasreleased complex updates of database tables with resultsand so on [42]

These problems motivated me to establish thenon-profit cTuning foundation and continue workingwith the community on a common methodology andopen-source tools to enable collaborative reproduciblereusable and trustable RampD We helped to initiatereproducibility initiatives and support artifact evaluationat several computer science conferences in collaborationwith the ACM [5 40] We also promoted sharing of codeartifacts and results in a unified way along with researchpapers [4]

This community service gave me a unique chanceto participate in reproducibility studies of more than100 research papers at the International Conference onArchitectural Support for Programming Languages andOperating Systems (ASPLOS) International Symposiumon Code Generation and Optimization (CGO) ACMSIGPLAN Annual Symposium on Principles and Practiceof Parallel Programming (PPoPP) SupercomputingInternational Conference on Machine Learning and

Systems (MLSys) and other computer science conferencesover the past five years [27] I also started deploying someof these techniques in production in collaboration with myindustrial partners in order to better understand all theproblems associated with building trustable reproducibleand production-ready computational systems

This practical experience confirmed my previousfindings while sharing ad-hoc research code artifactstrained models and Docker images along with researchpapers is a great step forward it is only the tip ofthe reproducibility iceberg [42] The major challengeafterwards is threefold It is necessary to figureout how to use research techniques outside originalcontainers with other data code and models run themin a reliable and efficient way across rapidly evolvingsoftware heterogeneous hardware and legacy platformswith continuously changing interfaces and data formatsand balance multiple characteristics including speedlatency accuracy memory size power consumptionreliability and costs (Figure 1)

2 Collective Knowledge concept

When helping to organize the artifact evaluation processat systems and machine learning conferences I decidedto introduce an Artifact Appendix and a reproducibilitychecklist [4 27] My goal was to help researchersto describe how to configure build run validate andcompare research techniques in a unified way acrossdifferent conferences and journals It also led me to

2

notice that most research projects use some ad-hocscripts often with hardwired paths to perform thesame repetitive tasks including downloading modelsand datasets detecting platform properties installingsoftware dependencies building research code runningexperiments validating outputs reproducing resultsplotting graphs and generating papers [42]

This experience motivated me to search for a solutionfor automating such common tasks and making themreusable and customizable across different researchprojects platforms and environments First I startedlooking at related tools that were introduced to automateexperiments and make research more reproducible

bull Workflow frameworks such as MLFlow [57]Kedro [52] Amazon SageMaker [3] Kubeflow [34]Apache Taverna [56] popper [48] CommonWL [36]and many others help to abstract and automatedata science operations They are very usefulfor data scientists but do not yet provide auniversal mechanism to automatically build and runalgorithms across different platforms environmentslibraries tools models and datasets Researchersand engineers often have to implement thisfunctionality for each new project from scratchwhich can become very complicated particularlywhen targeting new hardware embedded devicesTinyML and IoT

bull Machine learning benchmarking initiatives such asMLPerf [53] MLModelScope [39] and Deep500 [37]attempt to standardise machine learning modelbenchmarking and make it more reproducibleHowever production deployment integration withcomplex systems and adaptation to continuouslychanging user environments platforms tools anddata are currently out of their scope

bull Package managers such as Spack [46] andEasyBuild [47] are very useful for rebuilding andfixing the whole software environment Howeverthe integration with workflow frameworks andautomatic adaptation to existing environmentsnative cross-compilation particularly for embeddeddevices and support for non-software packages(models datasets scripts) are still in progress

bull Container technology such as Docker [50] is veryuseful for preparing and sharing stable softwarereleases However it hides the software chaos ratherthan solving it lacks common APIs for researchprojects requires enormous amount of space hasvery poor support for embedded devices and doesnot yet help integrate models with existing projectslegacy systems and user data

bull PapersWithCode platform [24] helps to find relevantresearch code for machine learning papers andkeeps track of state-of-the-art ML research using

public scoreboards with non-validated experimentalresults from papers However my experiencereproducing research papers suggests that sharingad-hoc research code is not enough to make researchtechniques reproducible customizable portable andtrustable [42]

While testing all these useful tools and analyzingthe Jupyter notebooks Docker images and GitHubrepositories shared alongside research papers I startedthinking that it would be possible to reorganize themas some sort of database of reusable components witha common API command line web interface and metadescription We could then reuse artifacts and somecommon automation actions across different projectswhile applying DevOps principles to research projects

Furthermore we could also gradually abstract andinterconnect all existing tools rather than rewriting orsubstituting them This in turn helped to createrdquoplug and playrdquo workflows that could automaticallyconnect compatible components from different users andvendors (models datasets frameworks compilers toolsplatforms) while minimising manual interventions andproviding a common interface for all shared researchprojects

I called my project Collective Knowledge (CK)because my goal was to connect researchers andpractitioners in order to share their knowledge bestpractices and research results in the form of reusablecomponents portable workflows and automation actionswith common APIs and meta descriptions

3 CK framework and an open CKformat

I have developed the CK framework as a small Pythonlibrary with minimal dependencies to be very portablewhile offering the possibility of being re-implementedin other languages such as C C++ Java and GoThe CK framework has a unified CLI a PythonAPI and a JSON-based web service to manage CKrepositories and add find update delete rename andmove CK components (sometimes called CK entries orCK data) [9]

CK repositories are human-readable databases ofreusable CK components that can be created in any localdirectory and inside containers pulled from GitHub andsimilar services and shared as standard archive files [7]CK components simply wrap (encapsulate) user artifactsand provide an extensible JSON meta description withcommon automation actions [6] for related artifacts

Automation actions are implemented using CKmodules which are Python modules with functionsexposed in a unified way via CK API and CLI and whichuse dictionaries for input and output (extensible andunified CK inputoutput (IO)) The use of dictionaries

3

1) GitHub repo or archive file

ckrjson

cmalias-a-dataset

cmalias-a-module

cmalias-a-program

cmalias-a-paper

datasetcmalias-a-images

datasetimages1png

cmmetajson

cminfojson

programcmalias-a-detect-edges

programdetect-edgesprogramcpp

Makefile

runsh

check-outputsh

autotunesh

cmmetajson

cminfojson

papercmalias-a-report

paperreportplditex

cmmetajson

cminfojson

modulecmalias-a-program

moduleprogrammodulepy

cmmetajson

cminfojson

2) User home directory

$HOMEproject2000-forgot-everything

ckrjson

cmalias-a-dataset

cmalias-a-program

cmalias-a-experiment

cmalias-a-paper

datasetimages-20002png

cmmetajson

programcrazy-algorithmsourcecpp

buildbat

cmmetajson

experimentautotuningmanylogs

lots-of-statssql

cmmetajson

paperasplossourcetex

cmmetajson

3) Jupytercolab notebook

import matplotlibpyplot as plt

import pandas

import numpy

import ckkernel as ck

We can now access all our software projects as a database

r=ckaccess(lsquoactionrsquorsquosearchrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoadd_metarsquorsquoyesrsquo)

if r[lsquoreturnrsquo]gt0 ckerr(r)

list_of_all_ck_entries=r[lsquolstrsquo]

for ck_entry in list_of_ck_entries

CK will find all dataset entries in all CK-compatible projects

even old ones ndash you donrsquot need to remember

the project structure Furthermore you can continue

reusing project even if students or engineers leave

image=ck_entry[lsquopathrsquo]+ck_entry[lsquometarsquo][lsquoimage_filenamersquo]

hellip

Call reusable CK automation action to extract features

features=ckaccess(lsquoactionrsquorsquoget_featuresrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoimagersquoimage)

if features[lsquoreturnrsquo]gt0 ckerr(features)

4) Docker image

Install stable OS and packages

Set environment

Use familiar CK APICLI

to run experiments

inside or outside your VM

Move data outside VM in the CK format

to continue processing it via CK

$ ck pull repock-crowdtuning

$ ck add repo2000-forgot-everything

$ ck ls datasetimage

datasetimages

datasetimages-2000

$ ck ls program

programdetect-edges

programobject-classification

$ ck compile programdetect-edges --speed

Detecting compilers on your systemhellip

1) LLVM 1001

2) GCC 81

3) GCC 93

4) ICC 191

$ ck run programdetect-edges

Searching for datasets

Select dataset

1) images

2) images-2000

$ ck benchmark programdetect-edges

--record --ecord_uoa=autotuning

$ ck reproduce experimentautotuning

Figure 2 Organizing software projects as a human-readable database of reusable components that wrap artifactsand provide unified APIs CLI meta descriptions and common automation actions for related artifacts The CKAPIs and control files are highlighted in blue

makes it easier to support continuous integration toolsand web services and to extend the functionality whilekeeping backward compatibility The unified IO alsomakes it possible to reuse such actions across projectsand to chain them together into unified CK pipelines andworkflows

As I wanted CK to be non-intrusive and technologyneutral I decided to use a simple 2-level directorystructure to wrap user artifacts into CK componentsas shown in Figure 2 The root directory of theCK repository contains the ckrjson file to describethe repository and specify dependencies on other CKrepositories to explicitly reuse their components andautomation actions

CK uses cm directories (Collective Meta) similar togit to store meta information of all components as wellas Unique IDs to be able to find components even if their

user-friendly names have changed over time (CK alias)CK modules are always stored in module ltCK modulename gt directories in the CK repository CK componentsare stored in ltCK module name gt ltCK data namegt directories Each CK component has a cm directorywith the metajson file describing a given artifact andinfojson file to keep the provenance of a given artifactincluding copyrights licenses creation date names of allcontributors and so on

The CK framework has an internal default CKrepository with stable CK modules and the mostcommonly used automation actions across many researchprojects When the CK framework is used for the firsttime it also creates a local CK repository in the userspace to be used as a scratch pad

After discussing the CK CLI with colleagues I decidedto implement it in a similar way to natural language tomake it easier for users to remember the commands

ck ltact iongt ltCK module namegt ( f l a g s ) (input j son or input yaml )ck ltact iongt ltCK module namegtltCK data namegt ( f l a g s )ck ltact iongt ltCK repo s i t o r y namegtltCK module namegtltCK data namegt

The next example demonstrates how to compile andrun the shared automotive benchmark on any platformand then create a copy of the CK program component

4

pip i n s t a l l ck

ck pu l l repo minusminusu r l=https github com ctuning ckminuscrowdtuning

ck search datase t minusminustags=jpeg

ck search program cbenchminusautomotiveminuslowast

ck f i nd program cbenchminusautomotiveminussusan

ck load program cbenchminusautomotiveminussusan

ck help program

ck compi le program cbenchminusautomotiveminussusan minusminusspeedck run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4

ck run program minusminushelp

ck cp program cbenchminusautomotiveminussusan l o c a l program newminusprogramminusworkflowck f i nd program newminusprogramminusworkflow

ck benchmark program newminusprogramminusworkflow minusminusr ecord minusminusrecord uoa=myminust e s tck rep lay experiment myminust e s t

The CK program module describes dependencies onsoftware detection plugins and meta packages using

simple tags and version ranges that the community hasto agree on

rdquo compi le r rdquo

rdquonamerdquo rdquoC++ compi le r rdquo rdquo s o r t rdquo 10 rdquo tags rdquo rdquo compiler langminuscpprdquo

rdquo l i b r a r y rdquo

rdquonamerdquo rdquoTensorFlow C++ APIrdquo rdquo s o r t rdquo 20 rdquo ve r s i on f rom rdquo [ 1 1 3 1 ] rdquo v e r s i o n t o rdquo [ 2 0 0 ] rdquo no tags rdquo rdquo tensor f l owminus l i t e rdquo rdquo tags rdquo rdquo l i b t ensor f l ow v s t a t i c rdquo

I also implemented a simple access function in the CKPython API to access all the CK functionality in a very

simple and unified way

import ck k e rne l as ck

Equivalent o f rdquock compi le program cbenchminusautomotiveminussusan minusminusspeed rdquor=ck a c c e s s ( rsquo a c t i on rsquo rsquo compi le rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo speed rsquo rsquo yes rsquo )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

Equivalent o f rdquock run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4r=ck a c c e s s ( rsquo a c t i on rsquo rsquo run rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo env rsquo rsquoOMPNUMTHREADSrsquo 4 )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

5

Such an approach allowed me to connect colleaguesstudents researchers and engineers from differentworkgroups to implement share and reuse automationactions and CK components rather than reimplementingthem from scratch Furthermore the CollectiveKnowledge concept supported the so-called FAIRprinciples to make data findable accessible interoperableand reusable [55] while also extending it to code and otherresearch artifacts

4 CK components and workflowsto automate machine learningand systems research

One of the biggest challenges I have faced throughoutmy research career automating the co-design process ofefficient and self-optimizing computing systems has beenhow to deal with rapidly evolving hardware modelsdatasets compilers and research techniques It is forthis reason that my first use case for the CK frameworkwas to work with colleagues to collaboratively solve theseproblems and thus enable trustable reliable and efficientcomputational systems that can be easily deployed andused in the real world

We started using CK as a flexible playground todecompose complex computational systems into reusablecustomizable and non-virtualized CK components whileagreeing on their APIs and meta descriptions As the firststep I implemented basic actions that could automatethe most common RampD tasks that I encountered duringartifact evaluation and in my own research on systemsand machine learning [42] I then shared the automationactions to analyze platforms and user environmentsin a unified way detect already installed code dataand ML models (CK software detection plugins [32])and automatically download install and cross-compilemissing packages (CK meta packages [31]) At the sametime I provided initial support for different compilersoperating systems (Linux Windows MacOS Android)and hardware from different vendors including IntelNvidia Arm Xilinx AMD Qualcomm Apple Samsungand Google

Such an approach allowed my collaborators [12] tocreate share and reuse different CK components withunified APIs to detect install and use different AI andML frameworks libraries tools compilers models anddatasets The unified automation actions APIs andJSON meta descriptions of all CK components also helpedus to connect them into platform-agnostic portableand customizable program pipelines (workflows) witha common interface across all research projects whileapplying the DevOps methodology

Such rdquoplugampplayrdquo workflows [45] can automaticallyadapt to evolving environments models datasets andnon-virtualized platforms by automatically detecting the

properties of a target platform finding all requireddependencies and artifacts (code data and models)on a user platform with the help of CK softwaredetection plugins [32] installing missing dependenciesusing portable CK meta packages [31] building andrunning code and unifying and testing outputs [25]Moreover rather than substituting or competing withexisting tools the CK approach helped to abstract andinterconnect them in a relatively simple and non-intrusiveway

CK also helps to protect user workflows wheneverexternal files or packages are broken disappear or moveto another URL because it is possible to fix such issuesin a shared CK meta package without changing existingworkflows For example our users have already takenadvantage of this functionality when the Eigen librarymoved from BitBucket to GitLab and when the oldImageNet dataset was no longer supported but couldstill be downloaded via BitTorrent and other peer-to-peerservices

Modular CK workflows can help to keep track of theinformation flow within such workflows gradually exposeconfiguration and optimization parameters as vectors viadictionary-based IO and combine public and privatecode and data They also help to monitor modeland auto-tune system behaviour retarget research codeand machine learning models to different platforms fromdata centers to edge devices integrate them with legacysystems and reproduce results

Furthermore we can use CK workflows inside standardcontainers such as Docker while providing a unifiedCK API to customize rebuild and adapt them toany platform (Adaptive CK container) [2] thus makingresearch techniques truly portable reproducible andreusable I envision that such adaptive CK containers andportable workflows can complement existing marketplacesto deliver portable customizable trustable and efficientAI and ML solutions that can be continuously optimizedacross diverse models datasets frameworks librariescompilers and run-time systems

In spite of some initial doubts my collaborative CKapproach has worked well to decompose complex researchprojects and computational systems into reusablecomponents while agreeing on common APIs and metadescriptions the CK functionality evolved from just afew core CK modules and abstractions to hundreds of CKmodules [30] and thousands of reusable CK componentsand workflows [33] to automate the most repetitiveresearch tasks particularly for AI ML and systems RampDas shown in Figure 3

6

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 3: Grigori Fursin cTuning foundation and cKnowledge SAS

notice that most research projects use some ad-hocscripts often with hardwired paths to perform thesame repetitive tasks including downloading modelsand datasets detecting platform properties installingsoftware dependencies building research code runningexperiments validating outputs reproducing resultsplotting graphs and generating papers [42]

This experience motivated me to search for a solutionfor automating such common tasks and making themreusable and customizable across different researchprojects platforms and environments First I startedlooking at related tools that were introduced to automateexperiments and make research more reproducible

bull Workflow frameworks such as MLFlow [57]Kedro [52] Amazon SageMaker [3] Kubeflow [34]Apache Taverna [56] popper [48] CommonWL [36]and many others help to abstract and automatedata science operations They are very usefulfor data scientists but do not yet provide auniversal mechanism to automatically build and runalgorithms across different platforms environmentslibraries tools models and datasets Researchersand engineers often have to implement thisfunctionality for each new project from scratchwhich can become very complicated particularlywhen targeting new hardware embedded devicesTinyML and IoT

bull Machine learning benchmarking initiatives such asMLPerf [53] MLModelScope [39] and Deep500 [37]attempt to standardise machine learning modelbenchmarking and make it more reproducibleHowever production deployment integration withcomplex systems and adaptation to continuouslychanging user environments platforms tools anddata are currently out of their scope

bull Package managers such as Spack [46] andEasyBuild [47] are very useful for rebuilding andfixing the whole software environment Howeverthe integration with workflow frameworks andautomatic adaptation to existing environmentsnative cross-compilation particularly for embeddeddevices and support for non-software packages(models datasets scripts) are still in progress

bull Container technology such as Docker [50] is veryuseful for preparing and sharing stable softwarereleases However it hides the software chaos ratherthan solving it lacks common APIs for researchprojects requires enormous amount of space hasvery poor support for embedded devices and doesnot yet help integrate models with existing projectslegacy systems and user data

bull PapersWithCode platform [24] helps to find relevantresearch code for machine learning papers andkeeps track of state-of-the-art ML research using

public scoreboards with non-validated experimentalresults from papers However my experiencereproducing research papers suggests that sharingad-hoc research code is not enough to make researchtechniques reproducible customizable portable andtrustable [42]

While testing all these useful tools and analyzingthe Jupyter notebooks Docker images and GitHubrepositories shared alongside research papers I startedthinking that it would be possible to reorganize themas some sort of database of reusable components witha common API command line web interface and metadescription We could then reuse artifacts and somecommon automation actions across different projectswhile applying DevOps principles to research projects

Furthermore we could also gradually abstract andinterconnect all existing tools rather than rewriting orsubstituting them This in turn helped to createrdquoplug and playrdquo workflows that could automaticallyconnect compatible components from different users andvendors (models datasets frameworks compilers toolsplatforms) while minimising manual interventions andproviding a common interface for all shared researchprojects

I called my project Collective Knowledge (CK)because my goal was to connect researchers andpractitioners in order to share their knowledge bestpractices and research results in the form of reusablecomponents portable workflows and automation actionswith common APIs and meta descriptions

3 CK framework and an open CKformat

I have developed the CK framework as a small Pythonlibrary with minimal dependencies to be very portablewhile offering the possibility of being re-implementedin other languages such as C C++ Java and GoThe CK framework has a unified CLI a PythonAPI and a JSON-based web service to manage CKrepositories and add find update delete rename andmove CK components (sometimes called CK entries orCK data) [9]

CK repositories are human-readable databases ofreusable CK components that can be created in any localdirectory and inside containers pulled from GitHub andsimilar services and shared as standard archive files [7]CK components simply wrap (encapsulate) user artifactsand provide an extensible JSON meta description withcommon automation actions [6] for related artifacts

Automation actions are implemented using CKmodules which are Python modules with functionsexposed in a unified way via CK API and CLI and whichuse dictionaries for input and output (extensible andunified CK inputoutput (IO)) The use of dictionaries

3

1) GitHub repo or archive file

ckrjson

cmalias-a-dataset

cmalias-a-module

cmalias-a-program

cmalias-a-paper

datasetcmalias-a-images

datasetimages1png

cmmetajson

cminfojson

programcmalias-a-detect-edges

programdetect-edgesprogramcpp

Makefile

runsh

check-outputsh

autotunesh

cmmetajson

cminfojson

papercmalias-a-report

paperreportplditex

cmmetajson

cminfojson

modulecmalias-a-program

moduleprogrammodulepy

cmmetajson

cminfojson

2) User home directory

$HOMEproject2000-forgot-everything

ckrjson

cmalias-a-dataset

cmalias-a-program

cmalias-a-experiment

cmalias-a-paper

datasetimages-20002png

cmmetajson

programcrazy-algorithmsourcecpp

buildbat

cmmetajson

experimentautotuningmanylogs

lots-of-statssql

cmmetajson

paperasplossourcetex

cmmetajson

3) Jupytercolab notebook

import matplotlibpyplot as plt

import pandas

import numpy

import ckkernel as ck

We can now access all our software projects as a database

r=ckaccess(lsquoactionrsquorsquosearchrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoadd_metarsquorsquoyesrsquo)

if r[lsquoreturnrsquo]gt0 ckerr(r)

list_of_all_ck_entries=r[lsquolstrsquo]

for ck_entry in list_of_ck_entries

CK will find all dataset entries in all CK-compatible projects

even old ones ndash you donrsquot need to remember

the project structure Furthermore you can continue

reusing project even if students or engineers leave

image=ck_entry[lsquopathrsquo]+ck_entry[lsquometarsquo][lsquoimage_filenamersquo]

hellip

Call reusable CK automation action to extract features

features=ckaccess(lsquoactionrsquorsquoget_featuresrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoimagersquoimage)

if features[lsquoreturnrsquo]gt0 ckerr(features)

4) Docker image

Install stable OS and packages

Set environment

Use familiar CK APICLI

to run experiments

inside or outside your VM

Move data outside VM in the CK format

to continue processing it via CK

$ ck pull repock-crowdtuning

$ ck add repo2000-forgot-everything

$ ck ls datasetimage

datasetimages

datasetimages-2000

$ ck ls program

programdetect-edges

programobject-classification

$ ck compile programdetect-edges --speed

Detecting compilers on your systemhellip

1) LLVM 1001

2) GCC 81

3) GCC 93

4) ICC 191

$ ck run programdetect-edges

Searching for datasets

Select dataset

1) images

2) images-2000

$ ck benchmark programdetect-edges

--record --ecord_uoa=autotuning

$ ck reproduce experimentautotuning

Figure 2 Organizing software projects as a human-readable database of reusable components that wrap artifactsand provide unified APIs CLI meta descriptions and common automation actions for related artifacts The CKAPIs and control files are highlighted in blue

makes it easier to support continuous integration toolsand web services and to extend the functionality whilekeeping backward compatibility The unified IO alsomakes it possible to reuse such actions across projectsand to chain them together into unified CK pipelines andworkflows

As I wanted CK to be non-intrusive and technologyneutral I decided to use a simple 2-level directorystructure to wrap user artifacts into CK componentsas shown in Figure 2 The root directory of theCK repository contains the ckrjson file to describethe repository and specify dependencies on other CKrepositories to explicitly reuse their components andautomation actions

CK uses cm directories (Collective Meta) similar togit to store meta information of all components as wellas Unique IDs to be able to find components even if their

user-friendly names have changed over time (CK alias)CK modules are always stored in module ltCK modulename gt directories in the CK repository CK componentsare stored in ltCK module name gt ltCK data namegt directories Each CK component has a cm directorywith the metajson file describing a given artifact andinfojson file to keep the provenance of a given artifactincluding copyrights licenses creation date names of allcontributors and so on

The CK framework has an internal default CKrepository with stable CK modules and the mostcommonly used automation actions across many researchprojects When the CK framework is used for the firsttime it also creates a local CK repository in the userspace to be used as a scratch pad

After discussing the CK CLI with colleagues I decidedto implement it in a similar way to natural language tomake it easier for users to remember the commands

ck ltact iongt ltCK module namegt ( f l a g s ) (input j son or input yaml )ck ltact iongt ltCK module namegtltCK data namegt ( f l a g s )ck ltact iongt ltCK repo s i t o r y namegtltCK module namegtltCK data namegt

The next example demonstrates how to compile andrun the shared automotive benchmark on any platformand then create a copy of the CK program component

4

pip i n s t a l l ck

ck pu l l repo minusminusu r l=https github com ctuning ckminuscrowdtuning

ck search datase t minusminustags=jpeg

ck search program cbenchminusautomotiveminuslowast

ck f i nd program cbenchminusautomotiveminussusan

ck load program cbenchminusautomotiveminussusan

ck help program

ck compi le program cbenchminusautomotiveminussusan minusminusspeedck run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4

ck run program minusminushelp

ck cp program cbenchminusautomotiveminussusan l o c a l program newminusprogramminusworkflowck f i nd program newminusprogramminusworkflow

ck benchmark program newminusprogramminusworkflow minusminusr ecord minusminusrecord uoa=myminust e s tck rep lay experiment myminust e s t

The CK program module describes dependencies onsoftware detection plugins and meta packages using

simple tags and version ranges that the community hasto agree on

rdquo compi le r rdquo

rdquonamerdquo rdquoC++ compi le r rdquo rdquo s o r t rdquo 10 rdquo tags rdquo rdquo compiler langminuscpprdquo

rdquo l i b r a r y rdquo

rdquonamerdquo rdquoTensorFlow C++ APIrdquo rdquo s o r t rdquo 20 rdquo ve r s i on f rom rdquo [ 1 1 3 1 ] rdquo v e r s i o n t o rdquo [ 2 0 0 ] rdquo no tags rdquo rdquo tensor f l owminus l i t e rdquo rdquo tags rdquo rdquo l i b t ensor f l ow v s t a t i c rdquo

I also implemented a simple access function in the CKPython API to access all the CK functionality in a very

simple and unified way

import ck k e rne l as ck

Equivalent o f rdquock compi le program cbenchminusautomotiveminussusan minusminusspeed rdquor=ck a c c e s s ( rsquo a c t i on rsquo rsquo compi le rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo speed rsquo rsquo yes rsquo )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

Equivalent o f rdquock run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4r=ck a c c e s s ( rsquo a c t i on rsquo rsquo run rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo env rsquo rsquoOMPNUMTHREADSrsquo 4 )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

5

Such an approach allowed me to connect colleaguesstudents researchers and engineers from differentworkgroups to implement share and reuse automationactions and CK components rather than reimplementingthem from scratch Furthermore the CollectiveKnowledge concept supported the so-called FAIRprinciples to make data findable accessible interoperableand reusable [55] while also extending it to code and otherresearch artifacts

4 CK components and workflowsto automate machine learningand systems research

One of the biggest challenges I have faced throughoutmy research career automating the co-design process ofefficient and self-optimizing computing systems has beenhow to deal with rapidly evolving hardware modelsdatasets compilers and research techniques It is forthis reason that my first use case for the CK frameworkwas to work with colleagues to collaboratively solve theseproblems and thus enable trustable reliable and efficientcomputational systems that can be easily deployed andused in the real world

We started using CK as a flexible playground todecompose complex computational systems into reusablecustomizable and non-virtualized CK components whileagreeing on their APIs and meta descriptions As the firststep I implemented basic actions that could automatethe most common RampD tasks that I encountered duringartifact evaluation and in my own research on systemsand machine learning [42] I then shared the automationactions to analyze platforms and user environmentsin a unified way detect already installed code dataand ML models (CK software detection plugins [32])and automatically download install and cross-compilemissing packages (CK meta packages [31]) At the sametime I provided initial support for different compilersoperating systems (Linux Windows MacOS Android)and hardware from different vendors including IntelNvidia Arm Xilinx AMD Qualcomm Apple Samsungand Google

Such an approach allowed my collaborators [12] tocreate share and reuse different CK components withunified APIs to detect install and use different AI andML frameworks libraries tools compilers models anddatasets The unified automation actions APIs andJSON meta descriptions of all CK components also helpedus to connect them into platform-agnostic portableand customizable program pipelines (workflows) witha common interface across all research projects whileapplying the DevOps methodology

Such rdquoplugampplayrdquo workflows [45] can automaticallyadapt to evolving environments models datasets andnon-virtualized platforms by automatically detecting the

properties of a target platform finding all requireddependencies and artifacts (code data and models)on a user platform with the help of CK softwaredetection plugins [32] installing missing dependenciesusing portable CK meta packages [31] building andrunning code and unifying and testing outputs [25]Moreover rather than substituting or competing withexisting tools the CK approach helped to abstract andinterconnect them in a relatively simple and non-intrusiveway

CK also helps to protect user workflows wheneverexternal files or packages are broken disappear or moveto another URL because it is possible to fix such issuesin a shared CK meta package without changing existingworkflows For example our users have already takenadvantage of this functionality when the Eigen librarymoved from BitBucket to GitLab and when the oldImageNet dataset was no longer supported but couldstill be downloaded via BitTorrent and other peer-to-peerservices

Modular CK workflows can help to keep track of theinformation flow within such workflows gradually exposeconfiguration and optimization parameters as vectors viadictionary-based IO and combine public and privatecode and data They also help to monitor modeland auto-tune system behaviour retarget research codeand machine learning models to different platforms fromdata centers to edge devices integrate them with legacysystems and reproduce results

Furthermore we can use CK workflows inside standardcontainers such as Docker while providing a unifiedCK API to customize rebuild and adapt them toany platform (Adaptive CK container) [2] thus makingresearch techniques truly portable reproducible andreusable I envision that such adaptive CK containers andportable workflows can complement existing marketplacesto deliver portable customizable trustable and efficientAI and ML solutions that can be continuously optimizedacross diverse models datasets frameworks librariescompilers and run-time systems

In spite of some initial doubts my collaborative CKapproach has worked well to decompose complex researchprojects and computational systems into reusablecomponents while agreeing on common APIs and metadescriptions the CK functionality evolved from just afew core CK modules and abstractions to hundreds of CKmodules [30] and thousands of reusable CK componentsand workflows [33] to automate the most repetitiveresearch tasks particularly for AI ML and systems RampDas shown in Figure 3

6

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 4: Grigori Fursin cTuning foundation and cKnowledge SAS

1) GitHub repo or archive file

ckrjson

cmalias-a-dataset

cmalias-a-module

cmalias-a-program

cmalias-a-paper

datasetcmalias-a-images

datasetimages1png

cmmetajson

cminfojson

programcmalias-a-detect-edges

programdetect-edgesprogramcpp

Makefile

runsh

check-outputsh

autotunesh

cmmetajson

cminfojson

papercmalias-a-report

paperreportplditex

cmmetajson

cminfojson

modulecmalias-a-program

moduleprogrammodulepy

cmmetajson

cminfojson

2) User home directory

$HOMEproject2000-forgot-everything

ckrjson

cmalias-a-dataset

cmalias-a-program

cmalias-a-experiment

cmalias-a-paper

datasetimages-20002png

cmmetajson

programcrazy-algorithmsourcecpp

buildbat

cmmetajson

experimentautotuningmanylogs

lots-of-statssql

cmmetajson

paperasplossourcetex

cmmetajson

3) Jupytercolab notebook

import matplotlibpyplot as plt

import pandas

import numpy

import ckkernel as ck

We can now access all our software projects as a database

r=ckaccess(lsquoactionrsquorsquosearchrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoadd_metarsquorsquoyesrsquo)

if r[lsquoreturnrsquo]gt0 ckerr(r)

list_of_all_ck_entries=r[lsquolstrsquo]

for ck_entry in list_of_ck_entries

CK will find all dataset entries in all CK-compatible projects

even old ones ndash you donrsquot need to remember

the project structure Furthermore you can continue

reusing project even if students or engineers leave

image=ck_entry[lsquopathrsquo]+ck_entry[lsquometarsquo][lsquoimage_filenamersquo]

hellip

Call reusable CK automation action to extract features

features=ckaccess(lsquoactionrsquorsquoget_featuresrsquo

lsquomodule_uoarsquorsquodatasetrsquo

lsquoimagersquoimage)

if features[lsquoreturnrsquo]gt0 ckerr(features)

4) Docker image

Install stable OS and packages

Set environment

Use familiar CK APICLI

to run experiments

inside or outside your VM

Move data outside VM in the CK format

to continue processing it via CK

$ ck pull repock-crowdtuning

$ ck add repo2000-forgot-everything

$ ck ls datasetimage

datasetimages

datasetimages-2000

$ ck ls program

programdetect-edges

programobject-classification

$ ck compile programdetect-edges --speed

Detecting compilers on your systemhellip

1) LLVM 1001

2) GCC 81

3) GCC 93

4) ICC 191

$ ck run programdetect-edges

Searching for datasets

Select dataset

1) images

2) images-2000

$ ck benchmark programdetect-edges

--record --ecord_uoa=autotuning

$ ck reproduce experimentautotuning

Figure 2 Organizing software projects as a human-readable database of reusable components that wrap artifactsand provide unified APIs CLI meta descriptions and common automation actions for related artifacts The CKAPIs and control files are highlighted in blue

makes it easier to support continuous integration toolsand web services and to extend the functionality whilekeeping backward compatibility The unified IO alsomakes it possible to reuse such actions across projectsand to chain them together into unified CK pipelines andworkflows

As I wanted CK to be non-intrusive and technologyneutral I decided to use a simple 2-level directorystructure to wrap user artifacts into CK componentsas shown in Figure 2 The root directory of theCK repository contains the ckrjson file to describethe repository and specify dependencies on other CKrepositories to explicitly reuse their components andautomation actions

CK uses cm directories (Collective Meta) similar togit to store meta information of all components as wellas Unique IDs to be able to find components even if their

user-friendly names have changed over time (CK alias)CK modules are always stored in module ltCK modulename gt directories in the CK repository CK componentsare stored in ltCK module name gt ltCK data namegt directories Each CK component has a cm directorywith the metajson file describing a given artifact andinfojson file to keep the provenance of a given artifactincluding copyrights licenses creation date names of allcontributors and so on

The CK framework has an internal default CKrepository with stable CK modules and the mostcommonly used automation actions across many researchprojects When the CK framework is used for the firsttime it also creates a local CK repository in the userspace to be used as a scratch pad

After discussing the CK CLI with colleagues I decidedto implement it in a similar way to natural language tomake it easier for users to remember the commands

ck ltact iongt ltCK module namegt ( f l a g s ) (input j son or input yaml )ck ltact iongt ltCK module namegtltCK data namegt ( f l a g s )ck ltact iongt ltCK repo s i t o r y namegtltCK module namegtltCK data namegt

The next example demonstrates how to compile andrun the shared automotive benchmark on any platformand then create a copy of the CK program component

4

pip i n s t a l l ck

ck pu l l repo minusminusu r l=https github com ctuning ckminuscrowdtuning

ck search datase t minusminustags=jpeg

ck search program cbenchminusautomotiveminuslowast

ck f i nd program cbenchminusautomotiveminussusan

ck load program cbenchminusautomotiveminussusan

ck help program

ck compi le program cbenchminusautomotiveminussusan minusminusspeedck run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4

ck run program minusminushelp

ck cp program cbenchminusautomotiveminussusan l o c a l program newminusprogramminusworkflowck f i nd program newminusprogramminusworkflow

ck benchmark program newminusprogramminusworkflow minusminusr ecord minusminusrecord uoa=myminust e s tck rep lay experiment myminust e s t

The CK program module describes dependencies onsoftware detection plugins and meta packages using

simple tags and version ranges that the community hasto agree on

rdquo compi le r rdquo

rdquonamerdquo rdquoC++ compi le r rdquo rdquo s o r t rdquo 10 rdquo tags rdquo rdquo compiler langminuscpprdquo

rdquo l i b r a r y rdquo

rdquonamerdquo rdquoTensorFlow C++ APIrdquo rdquo s o r t rdquo 20 rdquo ve r s i on f rom rdquo [ 1 1 3 1 ] rdquo v e r s i o n t o rdquo [ 2 0 0 ] rdquo no tags rdquo rdquo tensor f l owminus l i t e rdquo rdquo tags rdquo rdquo l i b t ensor f l ow v s t a t i c rdquo

I also implemented a simple access function in the CKPython API to access all the CK functionality in a very

simple and unified way

import ck k e rne l as ck

Equivalent o f rdquock compi le program cbenchminusautomotiveminussusan minusminusspeed rdquor=ck a c c e s s ( rsquo a c t i on rsquo rsquo compi le rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo speed rsquo rsquo yes rsquo )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

Equivalent o f rdquock run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4r=ck a c c e s s ( rsquo a c t i on rsquo rsquo run rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo env rsquo rsquoOMPNUMTHREADSrsquo 4 )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

5

Such an approach allowed me to connect colleaguesstudents researchers and engineers from differentworkgroups to implement share and reuse automationactions and CK components rather than reimplementingthem from scratch Furthermore the CollectiveKnowledge concept supported the so-called FAIRprinciples to make data findable accessible interoperableand reusable [55] while also extending it to code and otherresearch artifacts

4 CK components and workflowsto automate machine learningand systems research

One of the biggest challenges I have faced throughoutmy research career automating the co-design process ofefficient and self-optimizing computing systems has beenhow to deal with rapidly evolving hardware modelsdatasets compilers and research techniques It is forthis reason that my first use case for the CK frameworkwas to work with colleagues to collaboratively solve theseproblems and thus enable trustable reliable and efficientcomputational systems that can be easily deployed andused in the real world

We started using CK as a flexible playground todecompose complex computational systems into reusablecustomizable and non-virtualized CK components whileagreeing on their APIs and meta descriptions As the firststep I implemented basic actions that could automatethe most common RampD tasks that I encountered duringartifact evaluation and in my own research on systemsand machine learning [42] I then shared the automationactions to analyze platforms and user environmentsin a unified way detect already installed code dataand ML models (CK software detection plugins [32])and automatically download install and cross-compilemissing packages (CK meta packages [31]) At the sametime I provided initial support for different compilersoperating systems (Linux Windows MacOS Android)and hardware from different vendors including IntelNvidia Arm Xilinx AMD Qualcomm Apple Samsungand Google

Such an approach allowed my collaborators [12] tocreate share and reuse different CK components withunified APIs to detect install and use different AI andML frameworks libraries tools compilers models anddatasets The unified automation actions APIs andJSON meta descriptions of all CK components also helpedus to connect them into platform-agnostic portableand customizable program pipelines (workflows) witha common interface across all research projects whileapplying the DevOps methodology

Such rdquoplugampplayrdquo workflows [45] can automaticallyadapt to evolving environments models datasets andnon-virtualized platforms by automatically detecting the

properties of a target platform finding all requireddependencies and artifacts (code data and models)on a user platform with the help of CK softwaredetection plugins [32] installing missing dependenciesusing portable CK meta packages [31] building andrunning code and unifying and testing outputs [25]Moreover rather than substituting or competing withexisting tools the CK approach helped to abstract andinterconnect them in a relatively simple and non-intrusiveway

CK also helps to protect user workflows wheneverexternal files or packages are broken disappear or moveto another URL because it is possible to fix such issuesin a shared CK meta package without changing existingworkflows For example our users have already takenadvantage of this functionality when the Eigen librarymoved from BitBucket to GitLab and when the oldImageNet dataset was no longer supported but couldstill be downloaded via BitTorrent and other peer-to-peerservices

Modular CK workflows can help to keep track of theinformation flow within such workflows gradually exposeconfiguration and optimization parameters as vectors viadictionary-based IO and combine public and privatecode and data They also help to monitor modeland auto-tune system behaviour retarget research codeand machine learning models to different platforms fromdata centers to edge devices integrate them with legacysystems and reproduce results

Furthermore we can use CK workflows inside standardcontainers such as Docker while providing a unifiedCK API to customize rebuild and adapt them toany platform (Adaptive CK container) [2] thus makingresearch techniques truly portable reproducible andreusable I envision that such adaptive CK containers andportable workflows can complement existing marketplacesto deliver portable customizable trustable and efficientAI and ML solutions that can be continuously optimizedacross diverse models datasets frameworks librariescompilers and run-time systems

In spite of some initial doubts my collaborative CKapproach has worked well to decompose complex researchprojects and computational systems into reusablecomponents while agreeing on common APIs and metadescriptions the CK functionality evolved from just afew core CK modules and abstractions to hundreds of CKmodules [30] and thousands of reusable CK componentsand workflows [33] to automate the most repetitiveresearch tasks particularly for AI ML and systems RampDas shown in Figure 3

6

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 5: Grigori Fursin cTuning foundation and cKnowledge SAS

pip i n s t a l l ck

ck pu l l repo minusminusu r l=https github com ctuning ckminuscrowdtuning

ck search datase t minusminustags=jpeg

ck search program cbenchminusautomotiveminuslowast

ck f i nd program cbenchminusautomotiveminussusan

ck load program cbenchminusautomotiveminussusan

ck help program

ck compi le program cbenchminusautomotiveminussusan minusminusspeedck run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4

ck run program minusminushelp

ck cp program cbenchminusautomotiveminussusan l o c a l program newminusprogramminusworkflowck f i nd program newminusprogramminusworkflow

ck benchmark program newminusprogramminusworkflow minusminusr ecord minusminusrecord uoa=myminust e s tck rep lay experiment myminust e s t

The CK program module describes dependencies onsoftware detection plugins and meta packages using

simple tags and version ranges that the community hasto agree on

rdquo compi le r rdquo

rdquonamerdquo rdquoC++ compi le r rdquo rdquo s o r t rdquo 10 rdquo tags rdquo rdquo compiler langminuscpprdquo

rdquo l i b r a r y rdquo

rdquonamerdquo rdquoTensorFlow C++ APIrdquo rdquo s o r t rdquo 20 rdquo ve r s i on f rom rdquo [ 1 1 3 1 ] rdquo v e r s i o n t o rdquo [ 2 0 0 ] rdquo no tags rdquo rdquo tensor f l owminus l i t e rdquo rdquo tags rdquo rdquo l i b t ensor f l ow v s t a t i c rdquo

I also implemented a simple access function in the CKPython API to access all the CK functionality in a very

simple and unified way

import ck k e rne l as ck

Equivalent o f rdquock compi le program cbenchminusautomotiveminussusan minusminusspeed rdquor=ck a c c e s s ( rsquo a c t i on rsquo rsquo compi le rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo speed rsquo rsquo yes rsquo )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

Equivalent o f rdquock run program cbenchminusautomotiveminussusan minusminusenv OMPNUMTHREADS=4r=ck a c c e s s ( rsquo a c t i on rsquo rsquo run rsquo

rsquo module uoa rsquo rsquo program rsquo rsquo data uoa rsquo rsquo cbenchminusautomotiveminussusan rsquo rsquo env rsquo rsquoOMPNUMTHREADSrsquo 4 )

i f r [ rsquo r e turn rsquo ]gt0 r e turn r un i f i e d e r r o r handl ing

p r i n t ( r )

5

Such an approach allowed me to connect colleaguesstudents researchers and engineers from differentworkgroups to implement share and reuse automationactions and CK components rather than reimplementingthem from scratch Furthermore the CollectiveKnowledge concept supported the so-called FAIRprinciples to make data findable accessible interoperableand reusable [55] while also extending it to code and otherresearch artifacts

4 CK components and workflowsto automate machine learningand systems research

One of the biggest challenges I have faced throughoutmy research career automating the co-design process ofefficient and self-optimizing computing systems has beenhow to deal with rapidly evolving hardware modelsdatasets compilers and research techniques It is forthis reason that my first use case for the CK frameworkwas to work with colleagues to collaboratively solve theseproblems and thus enable trustable reliable and efficientcomputational systems that can be easily deployed andused in the real world

We started using CK as a flexible playground todecompose complex computational systems into reusablecustomizable and non-virtualized CK components whileagreeing on their APIs and meta descriptions As the firststep I implemented basic actions that could automatethe most common RampD tasks that I encountered duringartifact evaluation and in my own research on systemsand machine learning [42] I then shared the automationactions to analyze platforms and user environmentsin a unified way detect already installed code dataand ML models (CK software detection plugins [32])and automatically download install and cross-compilemissing packages (CK meta packages [31]) At the sametime I provided initial support for different compilersoperating systems (Linux Windows MacOS Android)and hardware from different vendors including IntelNvidia Arm Xilinx AMD Qualcomm Apple Samsungand Google

Such an approach allowed my collaborators [12] tocreate share and reuse different CK components withunified APIs to detect install and use different AI andML frameworks libraries tools compilers models anddatasets The unified automation actions APIs andJSON meta descriptions of all CK components also helpedus to connect them into platform-agnostic portableand customizable program pipelines (workflows) witha common interface across all research projects whileapplying the DevOps methodology

Such rdquoplugampplayrdquo workflows [45] can automaticallyadapt to evolving environments models datasets andnon-virtualized platforms by automatically detecting the

properties of a target platform finding all requireddependencies and artifacts (code data and models)on a user platform with the help of CK softwaredetection plugins [32] installing missing dependenciesusing portable CK meta packages [31] building andrunning code and unifying and testing outputs [25]Moreover rather than substituting or competing withexisting tools the CK approach helped to abstract andinterconnect them in a relatively simple and non-intrusiveway

CK also helps to protect user workflows wheneverexternal files or packages are broken disappear or moveto another URL because it is possible to fix such issuesin a shared CK meta package without changing existingworkflows For example our users have already takenadvantage of this functionality when the Eigen librarymoved from BitBucket to GitLab and when the oldImageNet dataset was no longer supported but couldstill be downloaded via BitTorrent and other peer-to-peerservices

Modular CK workflows can help to keep track of theinformation flow within such workflows gradually exposeconfiguration and optimization parameters as vectors viadictionary-based IO and combine public and privatecode and data They also help to monitor modeland auto-tune system behaviour retarget research codeand machine learning models to different platforms fromdata centers to edge devices integrate them with legacysystems and reproduce results

Furthermore we can use CK workflows inside standardcontainers such as Docker while providing a unifiedCK API to customize rebuild and adapt them toany platform (Adaptive CK container) [2] thus makingresearch techniques truly portable reproducible andreusable I envision that such adaptive CK containers andportable workflows can complement existing marketplacesto deliver portable customizable trustable and efficientAI and ML solutions that can be continuously optimizedacross diverse models datasets frameworks librariescompilers and run-time systems

In spite of some initial doubts my collaborative CKapproach has worked well to decompose complex researchprojects and computational systems into reusablecomponents while agreeing on common APIs and metadescriptions the CK functionality evolved from just afew core CK modules and abstractions to hundreds of CKmodules [30] and thousands of reusable CK componentsand workflows [33] to automate the most repetitiveresearch tasks particularly for AI ML and systems RampDas shown in Figure 3

6

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 6: Grigori Fursin cTuning foundation and cKnowledge SAS

Such an approach allowed me to connect colleaguesstudents researchers and engineers from differentworkgroups to implement share and reuse automationactions and CK components rather than reimplementingthem from scratch Furthermore the CollectiveKnowledge concept supported the so-called FAIRprinciples to make data findable accessible interoperableand reusable [55] while also extending it to code and otherresearch artifacts

4 CK components and workflowsto automate machine learningand systems research

One of the biggest challenges I have faced throughoutmy research career automating the co-design process ofefficient and self-optimizing computing systems has beenhow to deal with rapidly evolving hardware modelsdatasets compilers and research techniques It is forthis reason that my first use case for the CK frameworkwas to work with colleagues to collaboratively solve theseproblems and thus enable trustable reliable and efficientcomputational systems that can be easily deployed andused in the real world

We started using CK as a flexible playground todecompose complex computational systems into reusablecustomizable and non-virtualized CK components whileagreeing on their APIs and meta descriptions As the firststep I implemented basic actions that could automatethe most common RampD tasks that I encountered duringartifact evaluation and in my own research on systemsand machine learning [42] I then shared the automationactions to analyze platforms and user environmentsin a unified way detect already installed code dataand ML models (CK software detection plugins [32])and automatically download install and cross-compilemissing packages (CK meta packages [31]) At the sametime I provided initial support for different compilersoperating systems (Linux Windows MacOS Android)and hardware from different vendors including IntelNvidia Arm Xilinx AMD Qualcomm Apple Samsungand Google

Such an approach allowed my collaborators [12] tocreate share and reuse different CK components withunified APIs to detect install and use different AI andML frameworks libraries tools compilers models anddatasets The unified automation actions APIs andJSON meta descriptions of all CK components also helpedus to connect them into platform-agnostic portableand customizable program pipelines (workflows) witha common interface across all research projects whileapplying the DevOps methodology

Such rdquoplugampplayrdquo workflows [45] can automaticallyadapt to evolving environments models datasets andnon-virtualized platforms by automatically detecting the

properties of a target platform finding all requireddependencies and artifacts (code data and models)on a user platform with the help of CK softwaredetection plugins [32] installing missing dependenciesusing portable CK meta packages [31] building andrunning code and unifying and testing outputs [25]Moreover rather than substituting or competing withexisting tools the CK approach helped to abstract andinterconnect them in a relatively simple and non-intrusiveway

CK also helps to protect user workflows wheneverexternal files or packages are broken disappear or moveto another URL because it is possible to fix such issuesin a shared CK meta package without changing existingworkflows For example our users have already takenadvantage of this functionality when the Eigen librarymoved from BitBucket to GitLab and when the oldImageNet dataset was no longer supported but couldstill be downloaded via BitTorrent and other peer-to-peerservices

Modular CK workflows can help to keep track of theinformation flow within such workflows gradually exposeconfiguration and optimization parameters as vectors viadictionary-based IO and combine public and privatecode and data They also help to monitor modeland auto-tune system behaviour retarget research codeand machine learning models to different platforms fromdata centers to edge devices integrate them with legacysystems and reproduce results

Furthermore we can use CK workflows inside standardcontainers such as Docker while providing a unifiedCK API to customize rebuild and adapt them toany platform (Adaptive CK container) [2] thus makingresearch techniques truly portable reproducible andreusable I envision that such adaptive CK containers andportable workflows can complement existing marketplacesto deliver portable customizable trustable and efficientAI and ML solutions that can be continuously optimizedacross diverse models datasets frameworks librariescompilers and run-time systems

In spite of some initial doubts my collaborative CKapproach has worked well to decompose complex researchprojects and computational systems into reusablecomponents while agreeing on common APIs and metadescriptions the CK functionality evolved from just afew core CK modules and abstractions to hundreds of CKmodules [30] and thousands of reusable CK componentsand workflows [33] to automate the most repetitiveresearch tasks particularly for AI ML and systems RampDas shown in Figure 3

6

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 7: Grigori Fursin cTuning foundation and cKnowledge SAS

2015 2020

Figure 3 Collective Knowledge framework provided a flexible playground for researchers and practitioners todecompose complex computational systems into reusable components while agreeing on APIs and meta descriptionsOver the past 5 years the Collective Knowledge project has grown to thousands of reusable CK components andautomation actions

5 CK use cases

51 Unifying benchmarking auto-tuningand machine learning

The MILEPOST and cTuning projects were like theApollo mission on the one hand we managed todemonstrate that it was indeed possible to crowdsourceauto-tuning and machine learning across multipleusers to automatically co-design efficient software andhardware [41 44] On the other hand we exposed somany issues when using machine learning for systemoptimization in the real world that I had to stop thisresearch and have since focused on solving a large numberof related engineering problems

For this reason my first step was to test the CK conceptby converting all artifacts and automation actions fromall my past research projects related to self-optimizingcomputer systems into reusable CK components andworkflows I shared them with the community inthe CK-compatible Git repositories [7] and startedreproducing experiments from my own or related researchprojects [20]

I then implemented a customizable and portableprogram pipeline as a CK module to unify benchmarkingand auto-tuning while supporting all research techniquesand experiments from my PhD research and theMILEPOST project [10] Such a pipeline could performcompiler auto-tuning and softwarehardware co-designcombined with machine learning in a unified way acrossdifferent programs datasets frameworks compilers MLmodels and platforms as shown in Figure 4

The CK program pipeline helps users to graduallyexpose different design choices and optimizationparameters from all CK components (modelsframeworks compilers run-time systems hardware)via unified CK APIs and meta descriptions and thusenable the whole ML and system auto-tuning It also

helps users to keep track of all information passedbetween components in complex computational systemsto ensure the reproducibility of results while finding themost efficient configuration on a Pareto frontier in termsof speed accuracy energy and other characteristicsalso exposed via unified CK APIs More importantlyit can be now reused and extended in other real-worldprojects [12]

52 Bridging the growing gap betweeneducation research and practice

During the MILEPOST project I noticed how difficult itis to start using research techniques in the real worldNew software hardware datasets and models are usuallyavailable at the end of such research projects makingit very challenging time-consuming and costly to makeresearch software work with the latest systems or legacytechnology

That prompted us to organize a proof-of-conceptproject with the Raspberry Pi foundation to checkif it was possible to use portable CK workflows andcomponents to enable sustainable research software thatcan automatically adapt to rapidly evolving systemsWe also wanted to provide an open-source tool forstudents and researchers to help them share their codedata and models as reusable portable and customizableworkflows and artifacts - something now known as FAIRprinciples [55]

For this purpose we decided to reuse the CKprogram workflow to demonstrate that it was possible tocrowdsource compiler auto-tuning across any RaspberryPi device with any environment and any version ofany compiler (GCC or LLVM) to automatically improvethe performance and code size of the most popularRPi applications CK helped to automate experimentscollect performance numbers on live CK scoreboardsand automatically plug in CK components with various

7

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 8: Grigori Fursin cTuning foundation and cKnowledge SAS

Choose optimization

strategy

Perform SWHW DSE (skeleton transforms

hardware config compiler flags

model topology and params hellip)

Perform stat

analysis

Detect (Pareto)frontier

Model behavior predict

optimizations

Reduce complexity

CK program module with pipeline function

Compile program

Run code

CK module (code data and model wrappers) with a unified IO

CK

inp

ut

(JSO

Nd

ict)

CK

ou

tpu

t (J

SON

dic

t)Unified input

Behavior

Choices

Features

State

Action

Unified output

Behavior

Choices

Features

State

b = B( c f s ) hellip hellip hellip hellip

Model behavior of a computational system as a function of all choices

(MLSWHW) featuresand run-time state

Flatten input into vectorsto unify auto-tuning

statistical analysis and machine learning

Frameworks tools datasets models Generated files

Set environment

for a given version

Automationaction(Python

function)

Figure 4 Portable customizable and reusable program pipeline (workflow) assembled from the CK componentsto unify benchmarking auto-tuning and machine learning It is possible to gradually expose design search spaceoptimizations features and the run-time state from all CK components to make it easier to unify performancemodelling and auto-tuning

machine learning and predictive analytics techniquesincluding decision trees nearest neighbour classifierssupport vector machines (SVM) and deep learning toautomatically learn the most efficient optimizations [45]

With this project we demonstrated that it waspossible to reuse portable CK workflows and let usersparticipate in collaborative auto-tuning (crowd-tuning)on new systems while sharing best optimizations andunexpected behaviour on public CK scoreboards evenafter the project

53 Co-designing efficient software andhardware for AI ML and otheremerging workloads

While helping companies to assemble efficient softwareand hardware for image classification object detectionand other emerging AI and ML workloads I noticed thatit can easily take several months to build an efficient andreliable system before moving it to production

This process is so long and tedious because onehas to navigate a multitude of design decisions whenselecting components from different vendors for differentapplications (image classification object detectionnatural language processing (NLP) speech recognitionand many others) while trading off speed latencyaccuracy energy and other costs what networkarchitecture to deploy and how to customize it (ResNet

MobileNet GoogleNet SqueezeNet SSD GNMT) whatframework to use (PyTorch vs MXNet vs TensorFlowvs TF Lite vs Caffe vs CNTK) what compilers touse (XLA vs nGraph vs Glow vs TVM) what librariesand which optimizations to employ (ArmNN vs MKL vsOpenBLAS vs cuDNN) This is generally a consequenceof the target hardware platform (CPU vs GPU vs DSPvs FPGA vs TPU vs Edge TPU vs numerousaccelerators) Worse still this semi-manual process isusually repeated from scratch for each new version ofhardware models frameworks libraries and datasets

My modular CK program pipeline shown in Figure 5helped to automate this process We extended itslightly to plug in different AI and ML algorithmsdatasets models frameworks and libraries for differenthardware such as CPU GPU DSP and TPU anddifferent target platforms from servers to Android devicesand IoT [12] We also customized this ML workflowwith the new CK plugins that performed pre- andpost-processing of different models and datasets to makethem compatible with different frameworks backendsand hardware while unifying benchmarking results suchas throughput latency mAP (mean Average Precision)recall and other characteristics We also exposed differentdesign and optimization parameters including modeltopology batch sizes hardware frequency compiler flagsand so on

Eventually CK allowed us to automate and

8

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 9: Grigori Fursin cTuning foundation and cKnowledge SAS

Research papers

Automation actions (tasks)

Unified CK APICLI PythonCC++Java REST

hellip

CK module(Python implementation

of a given action)

JSON input

EnvironmentDependencies

Parameters

JSON output

ResultsCharacteristics

Features

JSON describing codedata and dependencies

compatible with this action

Extensible metadata

hellip

Reusable CK components

CK dependencies describe compatible versions of code data and modelsfor a given algorithm and a given target platform CK actions can then automatically

detect or download amp install all the necessary dependencies for a given target

ck pull repock-mlperf

CK component

Unified API and JSON IO

ck detect soft --tags=libtensorflow

ck install package ndashtags=imagenet

ck run programobject-detection-onnx-py

ck autotuneprogram

ck reproduceexperiment hellip

Portablefile-based

CK solutionsampworkflows

Figure 5 The use of the CK framework to automate benchmarking optimization and co-design of efficient softwareand hardware for machine learning and artificial intelligence The goal is to make it easier to reproduce reuse adoptand build upon ML and systems research

systematise design space exploration (DSE) ofAIMLsoftwarehardware stacks and distribute itacross diverse platforms and environments [8] Thisis possible because CK automatically detects allnecessary dependencies on any platform installs andorrebuilds the prerequisites runs experiments and recordsall results together with the complete experimentconfiguration (resolved dependencies and their versionsenvironment variables optimization parameters and soon) in a unified JSON format inside CK repositoriesCK also ensured the reproducibility of results whilemaking it easier to analyze and visualize results locallyusing Jupyter notebooks and standard toolsets orwithin workgroups using universal CK dashboards alsoimplemented as CK modules [20]

Note that it is also possible to share the entireexperimental setup in the CK format inside Dockercontainers thus automating all the DSE steps using theunified CK API instead of trying to figure them outfrom the ReadMe files This method enables CK-poweredadaptive containers that help users to start using andcustomizing research techniques across diverse softwareand hardware from servers to mobile devices in justa few simple steps while sharing experimental resultswithin workgroups or along research papers in the CKformat reproducing and comparing experiments andeven automatically reporting unexpected behaviour suchas bugs and mispredictions [35]

Eventually I managed to substitute my originalcTuning framework completely with the modular

portable customizable and reproducible experimentalframework while addressing most of the engineering andreproducibility issues exposed by the MILEPOST andcTuning projects [41] It also helped me to return to myoriginal research on lifelong benchmarking optimizationand co-design of efficient software and hardware foremerging workloads including machine learning andartificial intelligence

54 Automating MLPerf and enablingportable MLOps

The modularity of my portable CK program workflowhelped to enable portable MLOps when combined withAI and ML components FAIR principles and theDevOps methodology For example my CK workflowsand components were reused and extended by GeneralMotors and dividiti to collaboratively benchmark andoptimize deep learning implementations across diversedevices from servers to the edge [11] They were alsoused by Amazon to enable scaling of deep learningon AWS using C5 instances with MXNet TensorFlowand BigDL [49] Finally the CK framework madeit easier to prepare submit and reproduce MLPerfinference benchmark results (rdquofair and useful benchmarkfor measuring training and inference performance ofML hardware software and servicesrdquo) [53 23] Theseresults are aggregated in the public optimizationrepository [20] to help users find and compare differentAIMLsoftwarehardware stacks in terms of speed

9

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 10: Grigori Fursin cTuning foundation and cKnowledge SAS

accuracy power consumption costs and other collectedmetrics

55 Enabling reproducible papers withportable workflows and reusableartifacts

Ever since my very first research project I wanted to beable to easily find all artifacts (code data models) fromresearch papers reproduce and compare results in just afew clicks and immediately test research techniques inthe real world with different platforms environments anddata It is for that reason that one of my main goalswhen designing the CK framework was to use portableCK workflows for these purposes

I had the opportunity to test my CK approachwhen co-organizing the first reproducible tournamentat the ACM ASPLOSrsquo18 conference (the InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems) The tournamentrequired participants to co-design Pareto-efficient systemsfor deep learning in terms of speed accuracy energycosts and other metrics [29] We wanted to extendexisting optimization competitions tournaments andhackathons including Kaggle [19] ImageNet [18] theLow-Power Image Recognition Challenge (LPIRC) [21]DAWNBench (an end-to-end deep learning benchmarkand competition) [17] and MLPerf [53 22] with acustomizable experimental framework for collaborativeand reproducible optimization of Pareto-efficient softwareand hardware stack for deep learning and other emergingworkloads

This tournament helped to validate my CK approachfor reproducible papers The community submittedfive complete implementations (code data scripts etc)for the popular ImageNet object classification challengeWe then collaborated with the authors to convert theirartifacts into the CK format evaluate the convertedartifacts on the original or similar platforms andreproduce the results based on the rigorous artifactevaluation methodology [5] The evaluation metricsincluded accuracy on the ImageNet validation set (50000images) latency (seconds per image) throughput (imagesper second) platform price (dollars) and peak powerconsumption (Watts) Since collapsing all metricsinto one to select a single winner often results inover-engineered solutions we decided to aggregate allreproduced results on a universal CK scoreboard shownin Figure 6 and let users select multiple implementationsfrom a Pareto frontier based on their requirements andconstraints

We then published all five papers with our unifiedartifact appendix [4] and a set of ACM reproducibilitybadges in the ACM Digital Library [38] accompaniedby adaptive CK containers (CK-powered Docker)and portable CK workflows covering a very diversemodelsoftwarehardware stack

bull Models MobileNets ResNet-18 ResNet-50Inception-v3 VGG16 AlexNet SSD

bull Data types 8-bit integer 16-bit floating-point(half) 32-bit floating-point (float)

bull AI frameworks and libraries MXNetTensorFlow Caffe Keras Arm Compute LibrarycuDNN TVM NNVM

bull Platforms Xilinx Pynq-Z1 FPGA Arm CortexCPUs and Arm Mali GPGPUs (Linaro HiKey960and T-Firefly RK3399) a farm of Raspberry Pidevices NVIDIA Jetson TX1 and TX2 and IntelXeon servers in Amazon Web Services Google Cloudand Microsoft Azure

The reproduced results also exhibited great diversity

bull Latency 4 500 milliseconds per image

bull Throughput 2 465 images per second

bull Top 1 accuracy 41 75 percent

bull Top 5 accuracy 65 93 percent

bull Model size (pre-trained weights) 2 130megabytes

bull Peak power consumption 25 180 Watts

bull Device frequency 100 2600 megahertz

bull Device cost 40 1200 dollars

bull Cloud usage cost 26E-6 95E-6 dollars perinference

The community can now access all the aboveCK workflows under permissive licenses and continuecollaborating on them via dedicated GitHub projects withCK repositories These workflows can be automaticallyadapted to new platforms and environments by eitherdetecting already installed dependencies (frameworkslibraries datasets) or rebuilding dependencies using CKmeta packages supporting Linux Windows MacOS andAndroid They can be also extended to expose newdesign and optimization choices such as quantizationas well as evaluation metrics such as power or memoryconsumption We also used these CK workflows tocrowdsource the design space exploration across devicesprovided by volunteers such as mobile phones laptopsand servers with the best solutions aggregated on liveCK scoreboards [20]

After validating that my portable CK programworkflow can support reproducible papers for deeplearning systems I decided to conduct one more testto check if CK could also support quantum computingRampD Quantum computers have the potential to solvecertain problems dramatically faster than conventionalcomputers with applications in areas such as machine

10

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 11: Grigori Fursin cTuning foundation and cKnowledge SAS

CK workflow1 with validated results

AWS with c518xlarge instance Intelreg Xeonreg Platinum 8124M

From the authors ldquoThe 8-bit optimized model is automatically

generated with a calibration process from FP32 model without the

need of fine-tuning or retraining We show that the inference

throughput and latency with ResNet-50 Inception-v3 and SSD are

improved by 138X-29X and 135X-3X respectively with negligible

accuracy loss from IntelCaffe FP32 baseline and by 56X-75X and

26X-37X from BVLC Cafferdquo

httpsgithubcomctuningck-request-asplos18-caffe-intel

Figure 6 The CK dashboard with reproduced results from the ACM ASPLOS-REQUESTrsquo18 tournament to co-designPareto-efficient image classification in terms of speed accuracy energy and other costs Each paper submission wasaccompanied by the portable CK workflow to be able to reproduce and compare results in a unified way

learning drug discovery materials optimization financeand cryptography However it is not yet known whenthe first demonstration of quantum advantage will beachieved or what shape it will take

This led me to co-organize several Quantum hackathonssimilar to the REQUEST tournament [26] with IBMRigetti Riverlane and dividiti My main goal was tocheck if we could aggregate and share multidisciplinaryknowledge about the state-of-the-art in quantumcomputing using portable CK workflows that can runon classical hardware and quantum platforms from IBMRigetti and other companies can be connected to a publicdashboard to simplify reproducibility and comparison ofdifferent algorithms across different platforms and can beextended by the community even after hackathons

Figure 7 shows the results from one of suchQuantum hackathons where over 80 participants fromundergraduate and graduate students to startup foundersand experienced professionals from IBM and CERNworked together to solve a quantum machine learningproblem designed by Riverlane All participants weregiven some labelled quantum data and asked to developalgorithms for solving a classification problem

We also taught participants how to perform theseexperiments in a collaborative reproducible andautomated way using the CK framework so that theresults could be transfered to industry For examplewe introduced the CK repository with workflows andcomponents for the Quantum Information Science Kit(QISKit) - an open source software development kit(SDK) for working IBM Q quantum processors [13]Using the CK program workflow from this repository

the participants were able to start running quantumexperiments with a standard CK command

ck pu l l repo minusminusu r l=https github com ctuning ckminusq i s k i tck run program q i s k i t minusdemo minusminuscmd key=quantum co in f l i p

Whenever ready the participants could submit theirsolutions to the public CK dashboards to let other usersvalidate and reuse their results [20 26]

Following the successful validation of portableCK workflows for reproducible papers I continuedcollaborating with ACM [1] and ML and systemsconferences to automate the tedious artifact evaluationprocess [5 42] For example we developed several CKworkflows to support the Student Cluster CompetitionReproducibility Challenge (SCC) at the Supercomputingconference [15] We demonstrated that it was possibleto reuse the CK program workflow to automate theinstallation execution customization and validation ofthe SeisSol application (Extreme Scale Multi-PhysicsSimulations of the Tsunamigenic 2004 SumatraMegathrust Earthquake) [54] from the SC18 StudentCluster Competition Reproducibility Challenge acrossseveral supercomputers and HPC clusters [14] Wealso showed that it was possible to abstract HPC jobmanagers including Slurm and Flux and connect themwith our portable CK workflows

Some authors already started using CK to share theirresearch research artifacts and workflows at different MLand systems conferences during artifact evaluation [28]My current goal is to make the CK onboarding assimple as possible and help researchers to automaticallyconvert their ad-hoc artifacts and scripts into CKworkflows reusable artifacts adaptive containers and live

11

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 12: Grigori Fursin cTuning foundation and cKnowledge SAS

The most efficient design

Figure 7 The CK dashboard connected with portable CK workflows to visualize and compare public results fromreproducible Quantum Hackathons Over 80 participants worked together to solve a quantum machine learningproblem and minimise time to solution

dashboards

56 Connecting researchers andpractitioners to co-design efficientcomputational systems

CK use cases demonstrated that it was possible to developand use a common research infrastructure with differentlevels of abstraction to bridge the gap between researchersand practitioners and help them to collaborativelyco-design efficient computational systems Scientistscould then work with a higher-level abstraction whileallowing engineers to continue improving the lower-levelabstractions for continuously evolving software andhardware in deploying new techniques in productionwithout waiting for each other as shown in Figure 8Furthermore the unified interfaces and meta descriptionsof all CK components and workflows made it possible to

explain what was happening inside complex and rdquoblackboxrdquo computational systems integrate them with legacysystems use them inside rdquoadaptiverdquo Docker and sharethem along with published papers while applying theDevOps methodology and agile principles in scientificresearch

6 CK platform

The practical use of CK as a portable and customizableworkflow framework in multiple academic and industrialprojects exposed several limitations

bull The distributed nature of the CK technology thelack of a centralized place to keep all CK componentsautomation actions and workflows and the lackof a convenient GUI made it very challenging tokeep track of all contributions from the community

12

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 13: Grigori Fursin cTuning foundation and cKnowledge SAS

Develop new software

Publish new algorithms and models Publish new algorithms and models

Reported accuracy and other results

Design new hardware

Researchers using CK

Engineers using CKAdaptive CK container

Real data

Attempt to reproduce results

Increment

Practitioners using CK

Share portable Collective Knowledge solution to automatically build a system using the latest or stable software and hardware

Use standard data set

Autotune to balance all characteristics depending on use

case (cloud vs edge)

Real accuracyReal speedSizeCosts

Automatically rebuild system

Timeline

Figure 8 The CK concept helps to connect researchers and practitioners to co-design complex computational systemsusing DevOps principles while automatically adapting to continuously evolving software hardware models anddatasets The CK framework and public dashboards also help to unify automate and crowdsource the benchmarkingand auto-tuning process across diverse components from different vendors to automatically find the most efficientsystems on the Pareto frontier

As a result it is not easy to discuss and testAPIs add new components and assemble workflowsautomatically validate them across diverse platformsand environments and connect them with legacysystems

bull The concept of backward compatibility of CKAPIs and the lack of versioning similar to Javamade it challenging to keep stable and bug-freeworkflows in the real world any bug in a reusableCK component from one GitHub project couldeasily break dependent workflows in another GitHubproject

bull The CK command-line interface with the access to allautomation actions with numerous parameters wastoo low-level for researchers This is similar to thesituation with Git a powerful but quite complex andCLI-based tool that requires extra web services suchas GitHub and GitLab to make it more user-friendly

This feedback from CK users motivated me to startdeveloping cKnowledgeio (Figure 9) an open web-basedplatform with a GUI to aggregate version and test allCK components and portable workflows I also wantedto replace aging optimization repository at cTuningorgwith an extensible and modular platform to crowdsourceand reproduce tedious experiments such as benchmarkingand co-design of efficient systems for AI and ML acrossdiverse platforms and data provided by volunteers

The CK platform is inspired by GitHub and PyPII see it as a collaborative platform to share reusableautomation actions for repetitive research tasks andassemble portable workflows It also includes theopen-source CK client [16] that provides a commonAPI to initialise build run and validate differentresearch projects based on a simple JSON or YAMLmanifest This client is connected with live scoreboardson the CK platform to collaboratively reproduce andcompare the state-of-the-art research results duringartifact evaluation that we helped to organize at ML andSystems conferences [20]

My intention is to use the CK platform to complementand enhance MLPerf the ACM Digital LibraryPapersWithCodecom and existing reproducibilityinitiatives and artifact evaluation at ACM IEEE andNeurIPS conferences with the help of CK-poweredadaptive containers portable workflows reusablecomponents rdquoliverdquo papers and reproducible resultsvalidated by the community using realistic data acrossdiverse models software and hardware

7 CK demo automatingand customizing AIMLbenchmarking

I prepared a live and interactive demonstration ofthe CK solution that automates the MLPerf inference

13

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 14: Grigori Fursin cTuning foundation and cKnowledge SAS

CK JSON API

Object detection

Object classification

Speech recognitionhellip

Algorithms

Traininginferencehellip

CK JSON API

MobileNets

ResNet

VGGhellip

SqueezeDethellip

Models

CK JSON API

TensorFlow

Caffe

MXNethellip

PyTorchCaffe2hellip

Software

CK JSON API

ImageNet

KITTI

VOChellip

Real data setshellip

Data sets

CK JSON API

CPU

GPU

TPUhellip

NN acceleratorshellip

Hardware

CK solutions and adaptive containers with reusable components portable workflows common APIs and unified IO

Algorithms

Models

Datasets

Software Hardware

CK helps to run solutions on any tech stack

Initialize Build Run Validate

CK enables ldquoliverdquo research papers that can be validated by the community across diverse platforms models and datasets cKnowledgeioreproduced-resultsUnified

inputUnified output

MLAI papers

cKnowledgeio platform

Systems papers

CK solutions and adaptive containers can be shared along with research papers to make it easier to reproduce the results and try the algorithms with the latest or different components

cKnowledgeiosolutionscKnowledgeiocdocker

Results shared by volunteers

Original results from the authors

Figure 9 cKnowledgeio an open platform to organize scientific knowledge in the form of portable workflows reusablecomponents and reproducible research results It already contains many automation actions and components neededto co-design efficient and self-optimizing computational systems enable reproducible and live papers validated bythe community and keep track of the state-of-the-art research techniques that can be deployed in production

benchmark [53] connects it with the live CKdashboard (public optimization repository) [20] andhelps volunteers to crowdsource benchmarking and designspace exploration across diverse platforms similar to theCollective Tuning Initiative [41] and the SETIhomeproject cKnowledgeiotest

This demonstration shows how to use a unified CK APIto automatically build run and validate object detectionbased on SSD-Mobilenet TensorFlow and COCO datasetacross Raspberry Pi computers Android phones laptopsdesktops and data centers This solution is based on asimple JSON file describing the following tasks and theirdependencies on CK components

bull prepare a Python virtual environment (can beskipped for the native installation)

bull download and install the Coco dataset (50 or 5000images)

bull detect C++ compilers or Python interpreters neededfor object detection

bull install Tensorflow framework with a specified versionfor a given target machine

bull download and install the SSD-MobileNet modelcompatible with selected Tensorflow

bull manage installation of all other dependencies andlibraries

bull compile object detection for a given machine andprepare prepost-processing scripts

This solution was published on the cKnowledgeioplatform using the open-source CK client [16] to helpusers participate in crowd-benchmarking on their ownmachines as follows

I n s t a l l the CK c l i e n t from PyPi us ing pip i n s t a l l cbench

Download and bu i ld the s o l u t i o n on a given machine ( example f o r Linux ) cb i n i t demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

Run the s o l u t i o n on a given machine cb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

14

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 15: Grigori Fursin cTuning foundation and cKnowledge SAS

The users can then see the benchmarking results(speed latency accuracy and other exposedcharacteristics through the CK workflow) on thelive CK dashboard associated with this solutionand compare them against the official MLPerfresults or with the results shared by other userscKnowledgeiodemo-result

After validating this solution on a given platform theusers can also clone it and update the JSON descriptionto retarget this benchmark to other devices and operating

systems such as MacOS Windows Android phonesservers with CUDA-enabled GPUs and so on

The users have the possibility to integrate such MLsolutions with production systems with the help of unifiedCK APIs as demonstrated by connecting the above CKsolution for object detection with the webcam in anybrowser cKnowledgeiodemo-solution

Finally it is possible to use containers with CKrepositories workflows and common APIs as follows

docker run ctuning cbrainminusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows bin bash minusc rdquocb benchmark demominusobjminusdetec t i onminuscocominust fminuscpuminusbenchmarkminusl inuxminusportab leminusworkf lows

docker run ctuning cbenchminusmlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux bin bash minusc rdquocb benchmark mlperfminusi n f e r en c eminusv05minus detec t i onminusopenvinominusssdminusmobilenetminuscocominus500minus l i nux

Combining Docker and portable CK workflows enablesrdquoadaptiverdquo CK containers for MLPerf that can be easilycustomized rebuilt with different ML models datasetscompilers frameworks and tools encapsulated inside CKcomponents and deployed in production [2]

8 Conclusions and future work

My very first research project to prototype an analogneural network stalled in the late 1990s because it took mefar too long to build from scratch all the infrastructureto model and train Hopfield neural networks generatediverse datasets co-design and optimize software andhardware run and reproduce all experiments comparethem with other techniques from published papers anduse this technology in practice in a completely differentenvironment

In this article I explain why I have developedthe Collective Knowledge framework and how it canhelp to address the issues above by organizing allresearch projects as a database of reusable componentsportable workflows and reproducible experiments basedon FAIR principles (findable accessible interoperableand reusable) I also describe how the CK frameworkattempts to bring DevOps and rdquoSoftware 20rdquo principlesto scientific research and to help users share andreuse best practices automation actions and researchartifacts in a unified way alongside reproducible papersFinally I demonstrate how the CK concept helpsto complement unify and interconnect existing toolsplatforms and reproducibility initiatives with commonAPIs and extensible meta descriptions rather thanrewriting them or competing with them

I present several use cases of how CK helps to connectresearchers and practitioners to collaboratively designmore reliable reproducible and efficient computationalsystems for machine learning artificial intelligence andother emerging workloads that can automatically adapt

to continuously evolving software hardware models anddatasets I also describe the httpscKnowledgeioplatform that I have developed to organize knowledgeparticularly about AI ML systems and other innovativetechnology in the form of portable CK workflowsautomation actions reusable artifacts and reproducibleresults from research papers My goal is to help thecommunity to find useful methods from research papersquickly build them on any tech stack integrate them withnew or legacy systems start using them in the real worldwith real data and combine Collective Knowledge and AIto build better AI systems

Finally I demonstrate the concept of rdquoliverdquo researchpapers connected with portable CK workflows and onlineCK dashboards to let the community automaticallyvalidate and update experimental results even after theproject detect and share unexpected behaviour andfix problems collaboratively [45] I believe that such acollaborative approach can make computational researchmore reproducible portable sustainable explainable andtrustable

However CK is still a proof-of-concept and thereremains a lot to simplify and improve Future workwill intend to make CK more user-friendly and tosimplify the onboarding process as well as to standardiseall APIs and JSON meta descriptions It will alsofocus on development of a simple GUI to create andshare automation actions and CK components assembleportable workflows run experiments compare researchtechniques generate adaptive containers and participatein lifelong AI ML and systems optimization

My long-term goal is to use CK to develop avirtual playground an optimization repository and amarketplace where researchers and practitioners assembleAI ML and other novel applications similar to livespecies that continue to evolve self-optimize and competewith each other across diverse tech stacks from differentvendors and users The winning solutions with the best

15

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 16: Grigori Fursin cTuning foundation and cKnowledge SAS

trade-offs between speed latency accuracy energy sizecosts and other metrics can then be selected at any timefrom the Pareto frontier based on user requirements andconstraints Such solutions can be immediately deployedin production on any platform from data centers to theedge in the most efficient way thus accelerating AI MLand systems innovation and digital transformation

Acknowledgements

I thank Sam Ainsworth Erik Altman Lorena BarbaVictor Bittorf Unmesh D Bordoloi Steve BrierleyLuis Ceze Milind Chabbi Bruce Childers NikolayChunosov Marco Cianfriglia Albert Cohen CodyColeman Chris Cummins Jack Davidson AlastairDonaldson Achi Dosanjh Thibaut Dumontet DebojyotiDutta Daniil Efremov Nicolas Essayan Todd GamblinLeo Gordon Wayne Graves Christophe Guillon HerveGuillou Stephen Herbein Michael Heroux PatrickHesse James Hetherignton Kenneth Hoste RobertHundt Ivo Jimenez Tom St John Timothy M JonesDavid Kanter Yuriy Kashnikov Gaurav Kaul SergeyKolesnikov Shriram Krishnamurthi Dan Laney AndreiLascu Hugh Leather Wei Li Anton Lokhmotov PeterMattson Thierry Moreau Dewey Murdick Luigi NardiCedric Nugteren Michael OrsquoBoyle Ivan Ospiov BhaveshPatel Gennady Pekhimenko Massimiliano Picone EdPlowman Ramesh Radhakrishnan Ilya Rahkovsky VijayJanapa Reddi Vincent Rehm Alka Roy ShubhadeepRoychowdhury Dmitry Savenko Aaron Smith JimSpohrer Michel Steuwer Victoria Stodden RobertStojnic Michela Taufer Stuart Taylor Olivier TemamEben Upton Nicolas Vasilache Flavio Vella Davide DelVento Boris Veytsman Alex Wade Pete Warden DaveWilkinson Matei Zaharia Alex Zhigarev and other greatcolleagues for interesting discussions practical use casesand useful feedback

References

[1] ACM Pilot demo rdquoCollective Knowledge packagingand sharingrdquo httpsyoutubeDIkZxraTmGM

[2] Adaptive CK containers with a unified CK API tocustomize rebuild and adapt them to any platformand environment as well as to integrate them withexternal tools and data httpscKnowledgeio

cdocker

[3] Amazon SageMaker fully managed service to buildtrain and deploy machine learning models quicklyhttpsawsamazoncomsagemaker

[4] Artifact Appendix and reproducibility checklist tounify and automate the validation of results atsystems and machine learning conferences https

cTuningorgaechecklisthtml

[5] Artifact Evaluation reproducing results frompublished papers at machine learning and systemsconferences httpscTuningorgae

[6] Automation actions with a unified API andJSON IO to share best practices and introduceDevOps principles to scientific research https

cKnowledgeioactions

[7] CK-compatible GitHub GitLab and BitBucketrepositories with reusable components automationactions and workflows httpscKnowledgeio

repos

[8] CK dashboard with results from CK-poweredautomated and multi-objective design spaceexploration of image classification across differentmodels software and hardware https

cknowledgeioml-optimization-repository

[9] CK GitHub an open-source and cross-platformframework to organize software projects as adatabase of reusable components with commonautomation actions unified APIs and extensiblemeta descriptions based on FAIR principles(findability accessibility interoperability andreusability) httpsgithubcomctuningck

[10] CK repository with CK workflows andcomponents to reproduce the MILEPOSTproject httpsgithubcomctuning

reproduce-milepost-project

[11] CK use case from General Motors CollaborativelyBenchmarking and Optimizing Deep LearningImplementations httpsyoutube

1ldgVZ64hEI

[12] Collective Knowledge real-world use cases https

cKnowledgeorgpartners

[13] Collective Knowledge Repository for the QuantumInformation Software Kit (QISKit) https

githubcomctuningck-qiskit

[14] Collective Knowledge Repository to automatethe installation execution customization andvalidation of the SeisSol application from theSC18 Student Cluster Competition ReproducibilityChallenge across different platforms environmentsand datasets httpsgithubcomctuning

ck-scc18wiki

[15] Collective Knowledge workflow to prepare digitalartifacts for the Student Cluster CompetitionReproducibility Challenge httpsgithubcom

ctuningck-scc

[16] Cross-platform CK client to unify preparationexecution and validation of research techniquesshared along with research papers https

githubcomctuningcbench

16

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 17: Grigori Fursin cTuning foundation and cKnowledge SAS

[17] DAWNBench an end-to-end deep learningbenchmark and competition httpsdawn

csstanfordedubenchmark

[18] Imagenet challenge (ILSVRC) Imagenet large scalevisual recognition challenge where software programscompete to correctly classify and detect objects andscenes httpwwwimage-netorg

[19] Kaggle platform for predictive modelling andanalytics competitions httpswwwkagglecom

[20] Live scoreboards with reproduced results fromresearch papers at machine learning andsystems conferences httpscKnowledgeio

reproduced-results

[21] LPIRC low-power image recognition challengehttpsrebootingcomputingieeeorglpirc

[22] MLPerf a broad ML benchmark suite for measuringperformance of ML software frameworks MLhardware accelerators and ML cloud platformshttpsmlperforg

[23] MLPerf Inference Benchmark results https

mlperforginference-results

[24] PapersWithCodecom provides a free and openresource with Machine Learning papers code andevaluation tables httpspaperswithcodecom

[25] Portable and reusable CK workflows https

cKnowledgeioprograms

[26] Quantum Collective Knowledge and reproduciblequantum hackathons keeping track of thestate-of-the-art in quantum computing usingportable CK workflows live CK dashboards andreproducible results httpscknowledgeioc

eventreproducible-quantum-hackathons

[27] Reproduced papers with ACM badges based onthe cTuning evaluation methodology https

cKnowledgeioreproduced-papers

[28] Reproduced papers with CK workflows duringartifact evaluation at ML and Systemsconferences httpscknowledgeioq=

reproduced-papersANDportable-workflow-ck

[29] ReQuEST open tournaments on collaborativereproducible and Pareto-efficient softwarehardwareco-design of deep learning and other emergingworkloads httpscknowledgeiocevent

request-reproducible-benchmarking-tournament

[30] Shared and versioned CK modules https

cKnowledgeiomodules

[31] Shared CK meta packages (code data models)with automated installation across diverse platformshttpscKnowledgeiopackages

[32] Shared CK plugins to detect software (modelsframeworks data sets scripts) on a user machinehttpscKnowledgeiosoft

[33] The Collective Knowledge platform organizingknowledge about artificial intelligence machinelearning computational systems and otherinnovative technology in the form of portableworkflows automation actions and reusableartifacts from reproduced papers https

cKnowledgeio

[34] The Machine Learning Toolkit for Kuberneteshttpswwwkubefloworg

[35] The public database of image misclassificationsduring collaborative benchmarking of MLmodels across Android devices using the CKframework httpcknowledgeorgrepo

webphpwcid=modelimageclassification

collective_training_set

[36] Peter Amstutz Michael R Crusoe NebojsaTijanic Brad Chapman John Chilton MichaelHeuer Andrey Kartashov Dan Leehr HerveMenager Maya Nedeljkovich Matt Scales StianSoiland-Reyes and Luka Stojanovic CommonWorkflow Language v10 httpsdoiorg10

6084m9figshare3115156v2 7 2016

[37] T Ben-Nun M Besta S Huber A NZiogas D Peter and T Hoefler A ModularBenchmarking Infrastructure for High-Performanceand Reproducible Deep Learning IEEE May 2019The 33rd IEEE International Parallel amp DistributedProcessing Symposium (IPDPSrsquo19)

[38] Luis Ceze Ceze Natalie Enright Jerger BabakFalsafi Grigori Fursin Anton Lokhmotov ThierryMoreau Adrian Sampson and Phillip StanleyMarbell ACM ReQuESTrsquo18 Proceedings ofthe 1st on Reproducible Quality-Efficient SystemsTournament on Co-Designing Pareto-Efficient DeepLearning 2018

[39] Cheng Li and Abdul Dakkak and Jinjun Xiongand Wei Wei and Lingjie Xu and Wen-meiHwu Across-Stack Profiling and Characterizationof Machine Learning Models on GPUs https

arxivorgabs190806869 2019

[40] Bruce R Childers Grigori Fursin ShriramKrishnamurthi and Andreas Zeller ArtifactEvaluation for Publications (Dagstuhl PerspectivesWorkshop 15452) Dagstuhl Reports 5(11)29ndash352016

[41] Grigori Fursin Collective Tuning Initiativeautomating and accelerating development andoptimization of computing systems httpshal

inriafrinria-00436029v2 June 2009

17

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 18: Grigori Fursin cTuning foundation and cKnowledge SAS

[42] Grigori Fursin Enabling reproducible ML andSystems research the good the bad and the uglyhttpsdoiorg105281zenodo4005773 2020

[43] Grigori Fursin and Christophe DubachCommunity-driven reviewing and validation ofpublications In Proceedings of the 1st ACMSIGPLAN Workshop on Reproducible ResearchMethodologies and New Publication Models inComputer Engineering TRUST rsquo14 New York NYUSA 2014 Association for Computing Machinery

[44] Grigori Fursin Yuriy Kashnikov Abdul WahidMemon Zbigniew Chamski Olivier Temam MirceaNamolaru Elad Yom-Tov Bilha MendelsonAyal Zaks Eric Courtois Francois Bodin PhilBarnard Elton Ashton Edwin Bonilla JohnThomson Christopher Williams and MichaelF P OrsquoBoyle Milepost gcc Machine learningenabled self-tuning compiler InternationalJournal of Parallel Programming 39296ndash3272011 101007s10766-010-0161-2

[45] Grigori Fursin Anton Lokhmotov DmitrySavenko and Eben Upton A CollectiveKnowledge workflow for collaborative researchinto multi-objective autotuning and machinelearning techniques httpscknowledgeio

reportrpi3-crowd-tuning-2017-interactiveJanuary 2018

[46] T Gamblin M LeGendre M R Collette G LLee A Moody B R de Supinski and S FutralThe spack package manager bringing order tohpc software chaos In SC rsquo15 Proceedings ofthe International Conference for High PerformanceComputing Networking Storage and Analysis pages1ndash12 Nov 2015

[47] Kenneth Hoste Jens Timmerman Andy Georgesand Stijn Weirdt Easybuild Building software withease pages 572ndash582 11 2012

[48] I Jimenez M Sevilla N Watkins C MaltzahnJ Lofstead K Mohror A Arpaci-Dusseau andR Arpaci-Dusseau The popper convention Makingreproducible systems evaluation practical In2017 IEEE International Parallel and DistributedProcessing Symposium Workshops (IPDPSW) pages1561ndash1570 2017

[49] Gaurav Kaul Suneel Marthi and Grigori FursinScaling deep learning on AWS using C5 instanceswith MXNet TensorFlow and BigDL From theedge to the cloud httpsconferencesoreillycomartificial-intelligenceai-eu-2018

publicscheduledetail71549 2018

[50] Dirk Merkel Docker Lightweight linux containersfor consistent development and deployment LinuxJ 2014(239) March 2014

[51] Todd Mytkowicz Amer Diwan Matthias Hauswirthand Peter F Sweeney Producing wrong datawithout doing anything obviously wrong InProceedings of the 14th International Conference onArchitectural Support for Programming Languagesand Operating Systems pages 265ndash276 2009

[52] QuantumBlack Kedro the open source library forproduction-ready machine learning code https

githubcomquantumblacklabskedro 2019

[53] Vijay Janapa Reddi Christine Cheng David KanterPeter Mattson Guenther Schmuelling Carole-JeanWu Brian Anderson Maximilien Breughe MarkCharlebois William Chou Ramesh Chukka CodyColeman Sam Davis Pan Deng Greg DiamosJared Duke Dave Fick J Scott Gardner ItayHubara Sachin Idgunji Thomas B Jablin JeffJiao Tom St John Pankaj Kanwar David LeeJeffery Liao Anton Lokhmotov Francisco MassaPeng Meng Paulius Micikevicius Colin OsborneGennady Pekhimenko Arun Tejusve RaghunathRajan Dilip Sequeira Ashish Sirasao Fei SunHanlin Tang Michael Thomson Frank Wei EphremWu Lingjie Xu Koichi Yamada Bing Yu GeorgeYuan Aaron Zhong Peizhao Zhang and YuchenZhou MLPerf Inference Benchmark https

arxivorgabs191102549 2019

[54] Carsten Uphoff Sebastian Rettenberger MichaelBader Elizabeth H Madden Thomas UlrichStephanie Wollherr and Alice-Agnes GabrielExtreme scale multi-physics simulations of thetsunamigenic 2004 sumatra megathrust earthquakeIn Proceedings of the International Conference forHigh Performance Computing Networking Storageand Analysis SC rsquo17 New York NY USA 2017Association for Computing Machinery

[55] Mark D Wilkinson Michel Dumontier IJsbrand JanAalbersberg Gabrielle Appleton Myles AxtonArie Baak Niklas Blomberg Jan-Willem BoitenLuiz Bonino da Silva Santos Philip E Bourneet al The fair guiding principles for scientific datamanagement and stewardship Scientific data 32016

[56] Katherine Wolstencroft Robert Haines DonalFellows Alan Williams David Withers StuartOwen Stian Soiland-Reyes Ian Dunlop AleksandraNenadic Paul Fisher Jiten Bhagat KhalidBelhajjame Finn Bacall Alex Hardisty AbrahamNieva de la Hidalga Maria P Balcazar VargasShoaib Sufi and Carole Goble The Tavernaworkflow suite designing and executing workflowsof Web Services on the desktop web or in thecloud Nucleic Acids Research 41(W1)W557ndashW56105 2013

18

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work
Page 19: Grigori Fursin cTuning foundation and cKnowledge SAS

[57] Matei Zaharia Andrew Chen Aaron Davidson AliGhodsi Sue Ann Hong Andy Konwinski SiddharthMurching Tomas Nykodym Paul Ogilvie ManiParkhe Fen Xie and Corey Zumar Acceleratingthe Machine Learning Lifecycle with MLflow IEEEData Eng Bull 4139ndash45 2018

19

  • 1 Motivation
  • 2 Collective Knowledge concept
  • 3 CK framework and an open CK format
  • 4 CK components and workflows to automate machine learning and systems research
  • 5 CK use cases
    • 51 Unifying benchmarking auto-tuning and machine learning
    • 52 Bridging the growing gap between education research and practice
    • 53 Co-designing efficient software and hardware for AI ML and other emerging workloads
    • 54 Automating MLPerf and enabling portable MLOps
    • 55 Enabling reproducible papers with portable workflows and reusable artifacts
    • 56 Connecting researchers and practitioners to co-design efficient computational systems
      • 6 CK platform
      • 7 CK demo automating and customizing AIML benchmarking
      • 8 Conclusions and future work