Floris Turkenburg - UvA...However, despite all these new many-core systems being developed, there is a lack of tools available for users to visualize and monitor the many-core system

Bachelor Informatica

ExtendingManyMan with additionalback-ends for big.LITTLE andParallella

Floris Turkenburg

June 17, 2015

Supervisor(s): Roy Bakker (CSA, UvA)

Signed:

Informatica—

Universiteit

vanAmst

erdam

2

Abstract

As the demand for high performance, yet power efficient, processors still increases, new many-coresystems and architectures are rapidly being developed. For research purposes, understanding ofthe capabilities of these systems and comparing them is important, which generally requires toolsfor testing and monitoring. It is desired to have one tool that can be used on all systems, insteadof separate tools for every system. With this in mind, ManyMan, an interactive visualization anddynamic management tool for many-core systems, has been extended for use on a big.LITTLEsystem and the Parallella-16 system.The ManyMan developed for the big.LITTLE has been found to be easily modifiable for useon regular Linux systems, such as laptops. The Epiphany chip on the Parallella does not alloweasy monitoring or management, accordingly, there is still some room left for improvement inthe Parallella ManyMan.With this expansion, ManyMan has taken its first steps towards more global use in many-coresystems.

Contents

1 Introduction 31.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related work 52.1 Gpfmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 EnergyMonitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Hardware 73.1 big.LITTLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.1.1 The board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1.2 Power sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Parallella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.1 The board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.2 The Epiphany chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Software 124.1 Kivy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.2 The big.LITTLE kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Epiphany SDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Implementation 145.1 Back-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.1.1 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.1.2 Task creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.1.3 Task interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.1.4 big.LITTLE specific back-end . . . . . . . . . . . . . . . . . . . . . . . . . 165.1.5 Parallella specific back-end . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.2 Front-end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2.1 The main view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2.2 The detailed views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Evaluation 21

7 Conclusions 22

8 Future work 24

2

CHAPTER 1

Introduction

With the ever continuing demand and need for more processing power, and the desire of men topush computers to their limits, researchers have always been looking for ways to improve comput-ers and their performance. One way of achieving increased performance is increasing the amountof transistors on a computer-chip, which could be realized by the technological progressions indecreasing the transistor size. Following Moore’s Law, which states that the amount of transis-tors on a single chip doubles every 18 months, this way of increasing computing performancehas been the main trend for years. However, about ten years ago, researchers and manufacturersencountered a different barrier. The tiny transistors are harder to regulate in terms of power,and thus, are becoming less power-efficient. Also the increase in clock-frequency of the processor,which has also been a means to improve performance, requires more power to be supplied. Withmore power, comes more heat, heat which can not be sufficiently drained from the dense chipsas the cooling technology does not improve fast enough [23].

In order to continue improving performance, researchers and chip manufacturers have turnedto multi- and many-core systems. These systems contain many smaller cores on a chip insteadof a small number of big cores. Besides the increase in performance, these systems also provideimprovements in terms of power consumption. Different types of cores can be integrated on onechip to match the needs of different usage models, idle cores can be powered down to save power,and load can be balanced better to distribute heat across the chip, improving reliability andpower leakage [18, 14].

With new many-core systems being developed frequently, it is important to be able to getan understanding of the system and how it works. However, despite all these new many-coresystems being developed, there is a lack of tools available for users to visualize and monitor themany-core system. Therefore, ManyMan has been created by Jimi van der Woning in 2012 [24].ManyMan is a tool that offers interactive visualization and dynamic management of many-coresystems, initially developed for the 48-core Intel Single-chip Cloud Computer (SCC) [19]. Itprovides the user with information such as CPU usage and memory usage on chip-, core- andprocesslevel. Not only does ManyMan combine all this different information into one tool, italso gives the user the ability to manage the many-core system, such as the starting of tasks onspecified cores and frequency scaling. These properties make ManyMan a tool that makes theunderlying hardware more accessible to the user, and is of great use in the process of under-standing and testing the capabilities of the many-core system. To improve the user’s experience,ManyMan has been optimized for usage on a multi-touch supported device, while it can still becontrolled using a mouse and keyboard.

In the field of research and education, it is often relevant to discover and compare the capa-bilities of multiple types of many-core systems or boards. Facilitated by the current many-coretrend, more and more systems become available. If every board or system provides its own toolfor managing and monitoring the many-core, researchers and/or students do not only need toget an understanding of the many-core system, but also need to learn (to work with) the tool

3

in order to do so. This costs time and effort best spent otherwise, and having to work with allkinds of different tools can be a nuisance. Therefore, having one general tool that can be used fordifferent many-core systems is desired. One tool to manage them all, a ‘many many-core man-ager’. Setting out to develop such a tool, the goal of this project is to explore the capabilities ofextending ManyMan to support additional many-core systems by developing ManyMan softwarefor two many-core systems. The many-core systems in question are the big.LITTLE [16] and theParallella [11], which are both available at the CSA group of the University of Amsterdam forthis project.

For this project, the following items are of importance:

• How to retrieve the details and real time information from the many-core systems.

• Processing of this information for use in the visualization.

• Controlling the many-core system remotely through the front-end.

• Modification of the ManyMan front-end to properly represent the targeted many-core sys-tem.

This thesis will describe the software that has been developed in order to extend ManyManfor use on the big.LITTLE and Parallella. The many-core systems will be shortly described andfuture improvements of ManyMan will be proposed. In the scope of this thesis, “the big.LITTLE”or “the big.LITTLE system” will refer to the ODROID-XU3 board/system (see section 3.1) usedfor this project, unless specified differently.

1.1 Outline

In chapter 2, some related works are discussed. The hardware that is used for this project isdescribed in chapter 3, followed by the relevant software tools in chapter 4. The back- andfront-ends that have been made for the big.LITTLE and Parallella are described in chapter 5and are evaluated in chapter 6. Finally, the conclusions of the project are discussed in chapter7, ending with suggestions for future work in chapter 8.

4

CHAPTER 2

Related work

Despite the rapid development in many-core systems and architectures, monitoring-, visualization-and management tools and software for these systems is still scarce. Van der Woning alreadydiscussed several available tools in [24], and their advantages and disadvantages. The followingsections will add a few tools to the list.

2.1 Gpfmon

Gpfmon[5] is a graphical front-end to pfmon, a performance monitoring tool originally developedfor the Linux 2.6 kernel by Hewlett-Packard and CERN. Gpfmon provides a convenient and userfriendly way to launch pfmon/perfmon2 [15] monitoring sessions, providing an advantage on topof pfmon to both less advanced users and advanced users requiring visualization capabilities. Notonly does the tool relieve users from writing 250-character long command lines, it also providesvisual aid in event selection, plots and project + monitoring session management and comparison.With this tool, one can get a visualized representation of the collected information about theperformance of a system or application, such as stall cycles, TLB misses and memory accesslatency, which is retrieved from the Performance Monitoring Unit (PMU) by perfmon2/pfmon.Gpfmon also supports remote monitoring sessions via SSH, lifting the burden of running the GUIfrom the monitored machine, and enabling the monitoring of machines on a less robust network,which for example might not support X-forwarding [20]. Some features of gpfmon are shown infigure 2.1.

However, the information that this tool provides is too extensive and too detailed for Many-Man purposes. Furthermore, it does not allow (easy) tasks management such as task migration,and simultaneous system wide monitoring and per task/thread monitoring is not possible.

Figure 2.1: Some of gpfmon’s views (from [5])

5

Figure 2.2: The ODROID-XU3 EnergyMonitor tool

2.2 EnergyMonitor

The EnergyMonitor is a tool provided by Hardkernel [6] to monitor the power consumption ofthe ODROID-XU3 board. This tool monitors the voltage, watts and amperes for the big cores(A15), the LITTLE cores (A7), the GPU and the DRAM separately (see figure 2.2). The valuesfor the power consumption are obtained from the power sensors which are integrated on theboard. Besides the power statistics, the tool also displays the current CPU/GPU frequencies perprocessor and the temperatures for the big cores and the GPU. This tool has been developed foreasy access to the power statistics of the board, aiding developers and users in their process ofdebugging for power consumption on the ODROID-XU3 board. As this tool is limited to solelydisplaying the statistics and does not offer any interaction, it does not meet the requirements ofManyMan, and has not been used as is. It has however been a reference point when implementingthe retrieval of power consumption in the ManyMan back-end for the big.LITTLE.

6

CHAPTER 3

Hardware

For this project the existing ManyMan tool is extended to support use on two additional many-core architectures. These are the big.LITTLE [16] and the Parallella [11] and will be discussedin sections 3.1 and 3.2 respectively.

3.1 big.LITTLE

In 2011, ARM announced their big.LITTLE technology which served as one of their answersto the demand for more performance but also better power efficiency of mobile devices. Thebig.LITTLE architecture provides big processors for maximum compute performance paired withLITTLE processors for maximum power efficiency. The two types are fully coherent and havethe same instruction set architecture (ISA). This allows the same instructions or program tobe executed on both processor type in a consistent manner, facilitating in easy task migrationbetween the big and LITTLE cores.

3.1.1 The board

The specific board that is used for this project is the ODROID-XU3, displayed in figure 3.1 and3.2. The main component of this board is the Samsung Exynos 5422 Application Processor.This System-on-Chip contains a CortexTM-A15 2.0GHz quad core with 2MB L2-Cache (the bigcores) and a CortexTM-A7 1.4GHz quad core CPU with 512KB L2-Cache (the LITTLE cores).Also on this chip, are the ARM Mali-T628 MP6 600MHz GPU and 2GB LPDDR3 RAM whichruns at 933MHz. An overview is shown in figure 3.3.

The board includes 5 USB Host ports (4x USB2.0, 1x USB3.0), a 10/100 Ethernet port,MicroSD and eMMC connectors for storage and boot, and a Micro HDMI connector. A 5V/4ADC adapter must be used to power the board. The board came with a fan which is mountedover the Exynos 5422 for cooling and can be controlled through the PWM cooling fan connector.

Figure 3.1: The ODROID-XU3 board

7

Figure 3.2: The ODROID-XU3 board (topview)

Figure 3.3: ODROID-XU3 block diagram

8

3.1.2 Power sensors

Integrated on the board are four current and power monitors, placed on the I2C buses to theA15 quad core, A7 quad core, GPU and DRAM. These monitors are INA231 Current/Powermonitors from Texas Instruments [7], requiring a power supply of 2.7V to 5.5V for operation.Each monitor can report the current (in amperes), power (in watts) and voltage (in volts) onbuses that vary from 0V to 28V. Having a maximum Gain Error of 0.5%, the INA231 offers highaccuracy in the measurements. The INA231 is specified to operate in temperatures ranging from-40◦Cto +125◦C, which is realistic for the scope of this project (and in most other cases).

3.2 Parallella

In 2008, Andreas Olofsson, a processor developer/designer founded Adapteva with the mission tocreate an easy programmable general purpose floating-point processor which would have a morethan 10 times better energy efficiency than legacy CPU architectures. The new architecture hadto be scalable to thousands of cores, easy programmable in ANSI-C, have a high raw performance(2 GFLOPS/core), be implementable by a small team of engineers and reach an energy efficiencyof 50 GFLOPS/W. A year and a half later, in 2009, Olofsson announced the Epiphany many-corearchitecture. This architecture “was a clean slate design based on a bare-bones floating-pointRISC instruction set architecture (ISA) and a packet based mesh Network-On-Chip (NOC) foreffectively connecting together thousands of individual processors on a single chip.” [22] targetinga power consumption of 1W per 50 GFLOPS of performance.

In May 2011, Adapteva introduced their first Epiphany based product, a 16-core 32 GFLOPSchip (E16G301). A few months later in August 2011, Adapteva released a 64-core design (theE64G401) which could achieve a performance of 50 GFLOPS/W (and even 70 GLOPS/W with-out IO).

Being a new company and not having established a big community yet, Adapteva did notgain much position on the market with their Epiphany chips, despite them being the mostenergy-efficient floating-point processors available. In order to establish a community aroundthe Epiphany and to finance further development, Adapteva started a Kickstarter project [12]in September 2012, named “Parallella”. Within a month, Adapteva had raised close to 1M USDfrom almost 5,000 project backers. In June 2014, all of the Parallella boards promised to thebackers were delivered.

Figure 3.4: The Parallella-16 Zynq 7020

3.2.1 The board

The Parallella-16 board is a fully open-source credit-card sized computer containing a 16-coreEpiphany E16G301 coprocessor, a Xilinx Zynq 7010/7020, and 1 GB of RAM (see figure 3.4 and3.5). The Xilinx Zynq is a System-on-Chip (SoC) containing two ARM CortexTM-A9 processorcores and FPGA logic. A Gigabit-Ethernet port, USB port and MicroHDMI port are present onthe board, along with a MicroSD card slot from which the board is booted. To power the board,either a 5V DC barrel connector or MicroUSB can be used. On the back/bottom of the board,

9

Figure 3.5: Parallella-16 board top view

four expansion connectors are placed to provide access the power supply, I2C, UART, GPIOand JTAG interface, and to the Epiphany chip eLink interface. The eLink interface is used toexchange data between the Epiphany coprocessor and the ARM core, and is implemented in theFPGA logic on the Zynq. A block of 32 MiB memory is shared between the ARM cores and theEpiphany by default. The Epiphany system is shown in more detail in figure 3.6.

Figure 3.6: The Adapteva Epiphany System (from [26])

10

3.2.2 The Epiphany chip

The Epiphany E16G301 is a System-on-Chip containing 16 superscalar floating-point RISC CPUs(eCore). Each eCore is capable of two floating-point operations per clock cycle and one inte-ger calculation per clock cycle. The CPU is efficiently programmable in C/C++ and has ageneral-purpose instruction set, specialized for compute intensive applications. The SoftwareDevelopment Kit that provided to program for the Epiphany is described in section 4.3. Thememory architecture is a flat and unprotected memory map, providing up to 1MB of local mem-ory for each core. Each core can access its own local memory, other cores’ memories, and sharedoff-chip DRAM. The local memory per core is comprised of four separate banks to supportsimultaneous instruction fetching, data fetching, and multicore communication.

The communication in the Epiphany chip is supported by the eMesh Network-on-Chip (NoC),which consists of three independent 2D scalable mesh networks, one for off-chip write trans-actions, one for on-chip write transactions and one for read requests.

11

CHAPTER 4

Software

In order to extend the ManyMan tool, some additional software was required. First, and mostimportantly, a newer version of the Kivy framework was used for the front-end, this is discussed insection 4.1. Second, section 4.2 describes the changes that have been made in the ODROID-XU3kernel to allow for frequency scaling. In order to write programs for the Epiphany coprocessor,Adapteva provides the Epiphany SDK (eSDK), which is discussed in section 4.3.

4.1 Kivy

The ManyMan front-end is completely written in Python and uses the Kivy framework [8] tobuild up the application. Kivy is an open source project and provides good support for multi-touch purposes. It is highly portable and can be run not only on standard operating systemssuch as Linux, Windows and Mac OS, but also on mobile operating systems like Android andIOS, without needing to change the source code of the application. For these reasons, Kivy is asuitable candidate for the ManyMan tool.

The original ManyMan supported Kivy version 1.2.0, however, now three years later, Kivyhas reached version 1.9.0. This version has been used in the scope of this project. The upgradeto this newer version of Kivy required a few changes in the front-end due to name changes anddeprecated properties. For example, the image name popup-background has been changed tomodalview-background and updates for the text-input widget and the introduction of Focus-Behaviour caused the self-made extension of this widget by the original writer of ManyMan tobecome obsolete, and even unusable. Besides these few changes that concern the ManyMan tooldirectly, the most important changes in Kivy are internal. Many new features have been addedto the framework and bugs have been fixed in order to provide for a better user-experience. For adetailed overview of the changes made to Kivy, one can consult their website [21]. As mentionedbefore, Kivy is an open source project, and as such, it also provides a platform for users tosubmit their home-made classes or improvements and extensions to the framework. This highlycontributes to the development of Kivy and their aim to provide users with the best experienceand broad possibilities.

4.2 The big.LITTLE kernel

The big.LITTLE system used in this project runs on the Linux kernel made for the ODROID-XU3 by Hardkernel [10]. The default kernel configuration however, does not support manualCPU-frequency scaling, which is required for the ManyMan tool. Therefore, a new kernel hasbeen compiled with some changes in the kernel configuration. In the CPU Frequency scalingsection, the CONFIG CPU FREQ GOV USERSPACE option was enabled.

12

Figure 4.1: The eSDK framework (from [2])

4.3 Epiphany SDK

Adapteva has released a fully open source software development kit (SDK) to facilitate the writingof parallelized C code for the Epiphany chip, the eSDK [2]. The eSDK framework is shown infigure 4.1. Some of the key components of the eSDK are the optimized ANSI-C compiler, amulti-core debugger, communication and hardware utility libraries and an Eclipse IDE. Eachcore in the Epiphany chip runs a separate program, which is built and loaded onto the eCore bya host processor (the ARM-A9 on the Parallella). The host processor can access the eCores byuse of the Epiphany Host Library (eHAL). This library provides methods for loading programson the eCores, starting the programs, resetting cores, and passage messages to communicate withthe eCores. Utilities on the Epiphany cores are provided by a standard C environment and theeLib API.

The basic steps to run a program on the Epiphany are: 1) The host program initializes aworkgroup by specifying the number of rows and columns and the position of the start node inthe group; 2) the host resets the nodes and loads the device-side executable on the eCores; 3)the host signals all the eCores to start the execution; 4) the host communicates with the eCoreseither through shared memory or through the local memory of a core; 5) when the execution iscomplete, the host is signalled and it reads the results either from the eCore’s local memory orfrom shared memory [26].

13

CHAPTER 5

Implementation

ManyMan consist of two separate layers, the front-end, which is responsible for the visualizationand the interaction with the user and typically runs on a touch-supported device, and the back-end, which runs on the many-core system or the system that controls the many-core device,and handles the execution of tasks and the retrieval of the required data. For this project,and as the title suggests, two additional back-ends have been created, one for the big.LITTLEsystem and one for the Parallella-16. These back-ends are described in section 5.1, startingwith the general part of the back-ends followed by the more specific features for the separatesystems. Additionally, as the two many-core systems differ in both characteristics and availableinformation from one another, as well as from the Intel SCC, the front-end has been modified tosupport these characteristics. The two resulting front-ends are described in section 5.2. In casemore (detailed) information is wanted, one can consult Van Der Woning’s articles [24, 25] andthe source code [9].

5.1 Back-end

ManyMan is Open Source and published under the GNU General Public License [4] and thus,the source code was available for this project. This eliminated the need to create the back-ends(aswell as the front-ends) from scratch and as a result, the basis of the software remains similar tothe original ManyMan. The ManyMan structure is complex and well designed/implemented, andas such, it takes some dedication to fully understand the program. But when full understandingof the code is reached, it is a nice and structured piece of software to work with.

The back-end functions as the server in the ManyMan software. It initializes the basicinformation about the many-core system, such as the number of cores and the coregroups (interms of frequency and/or voltage level). This is loaded from a default settings dictionary inthe code but can also be updated by providing a settings file when starting the back-end. ATCP server thread is started to accept an incoming connection from the front-end and handlecommunication.

5.1.1 Monitoring

One of the most important parts of ManyMan is the monitoring of the the many-core systems.The back-end needs to retrieve the information about running processes, the payload of the cores,memory usage, etc. and send this to the front-end so it can be visualized. This information isretrieved through several commands and/or programs, described in the following paragraphs.

In order to retrieve the CPU usage for the individual CPU cores, the command mpstat is usedfor each core. This command outputs the CPU usage of the specified core with an interval of 1second. The format of the output however, can differ between systems, for example, it dependson the time format of the system. On some systems, the AM/PM indication is added to thetime. This has to be taken into account when parsing the output, since the output lines are split

14

on spaces. Currently, the back-end works in both cases where AM/PM is, and is not, indicated.Originally, the top command was used to retrieve the payload per core. This was suitable for theIntel SCC since the top command would only obtain the information for the core it was ran on.However, in the case of the big.LITTLE and the Parallella, this command retrieves the payloadinformation of the system as a whole, containing all the CPU’s, causing the top command to beno longer suitable to retrieve individual core payload. For this reason, top has been replaced bympstat. Also, mpstat produces few other information than needed, unlike top.

The memory usage for each core is determined by the collective memory usage of the tasksthat are running on said core. Note that knowing the memory usage per core might not bedirectly useful since the big.LITTLE and Parallella are shared-memory systems (for the ARMcores), but it could help to identify the core where a memory-intensive task is running. In orderto prevent each core from parsing the same data in search for their individual tasks, a top

command is run for each task, specified with the task’s Process ID (PID). This causes top toonly output the process information for the given PID. This output is then filtered on the PIDusing grep, resulting in output that only contains the lines with the process information. Thissaves the trouble and time of parsing the additional (irrelevant) lines in the back-end. As anexample, top produces about 6 lines before the line containing the process information, as everyline is read individually, 6 (useless) iterations have to be done before the required information isfound.

Note that the determined memory usage per core only consists of the memory usage ofthe tasks on that core that have been started using ManyMan. As there is no simple way toretrieve the total memory usage per core, one would have to retrieve the memory usage of everyprocess running on the system, and determine to which core it belongs (which could be morethan one core). This would burden the system (this would have to be done once every second)unnecessarily as monitoring per core memory usage is often not of great importance and evenuseless on shared memory systems.

5.1.2 Task creation

In order to start a new task, a Python subprocess is opened which first checks if the giventask is an existing executable on the system. If so, the Parent Process ID (PPID) is printed tothe output, needed to determine the task’s PID which will be described later. Then the task isassigned to the specified core and started, using the taskset command. This command can beused to set or retrieve the CPU affinity of an existing process, or launch a new process with agiven CPU affinity, which makes it an ideal command to start and move tasks on the many-coresystems. The output of the task is set to be line buffered, in order to prevent a delay whenreading the output.

When the task is started, its Process ID can be determined. This is done by using the ps

command and filtering on the PPID of the task with grep. This will obtain the PID’s of the childprocesses. Since the actual process that is running and needs to be monitored is not necessarilythe direct child of the PPID, for instance when the task is executed via a shell script or with sudo,the found child processes are recursively checked for children until the process that matches thetaskname has been found. Usually, the recursion only consists of one branch, as the initial PPIDis a new process and generally only starts one child-process and does not fork. The intermediatePID’s of the parents are also stored for later use in the task interaction. One could wonder whythe taskname is not used directly to find the PID, but since multiple tasks with the same namecan be run at a time, this would not guarantee the right PID. Please note that problems mayarise when the task finishes before the PID could be determined. This could be fixed by checkingif the Python subprocess has terminated and taking actions accordingly.

5.1.3 Task interaction

When a task is created, certain actions can be performed to interact with the task. A task canbe paused/resumed, moved between cores, stopped and killed.

Pausing and stopping a task is done by sending a POSIX STOP signal to the process. If

15

there are intermediate processes, like sudo, these will also be stopped. This has to be taken intoaccount when continuing the task. Pausing or stopping a task has become internally the sameand the distinction between the two has become more of a visual feature. In order to move tasksfrom one core to another on the Intel SCC, it was necessary to checkpoint a task, which was donewith use of the Berkeley Lab Checkpoint/Restart (BLCR) library [17]. This creates a contextfile which can be moved to another core in order to restart the task on that core. A checkpointedtask can be terminated and as such, it will release its occupied resources. As evaluated by vander Woning [24], checkpointing a task creates a lot of overhead and writing to disk can be slow,this, plus the fact that checkpointing is not necessary on the big.LITTLE and the Parallella,resulted in the BLCR commands to be replaced by taskset. However, this has the downsidethat neither a stopped task nor a paused task will release its resources. If memory usage is anissue, BLCR could be re-implemented. To kill a task, a KILL signal is send to the process, the topcommand monitoring the task is stopped and the task instance is removed from the back-end.When continuing a task, it is important to not only send a CONT signal to the process, but alsoto the possible intermediate processes, which were found and stored when getting the PID, inorder to prevent the back-end from hanging on a task thread when trying to read the output ofthe task. Moving a task to a specific core is also done by using taskset provided with the PIDof the task, after which the task is continued in case it was not already running.

5.1.4 big.LITTLE specific back-end

For the big.LITTLE system, it is possible to change the CPU frequency of the big and LITTLEcore groups. This is done with use of the cpufreq-info and cpufreq-set commands fromcpufrequtils. When starting up the back-end, cpufreq-info is called to check if the userspacegovernor is available (see section 4.2), this governor is needed to be able to manually set thefrequencies. If it is available, this governor will be set for each core, along with the minimumand maximum frequency for the core. These limits are determined from the frequency tables inthe settings. For the LITTLE cores, the frequency ranges from 200 to 1400MHz, and for the bigcores the frequency ranges from 200 to 2000MHz, both with intervals of 100MHz.

When the back-end receives a request to change the frequency, it will use the cpufreq-set toset the frequency of the specified core. In the big.LITTLE system used for this project, the bigcores were grouped together, and the LITTLE cores were grouped together which means thatall cores in the same group will run at the same frequency. Accordingly, setting the frequencyof all cores in a group or setting the frequency of just one core in a group, will have the same result.

The big.LITTLE system also allowed for the monitoring of the power usage of several com-ponents. This is done by a INA231 sensor on the I2C buses. At the start up of the back-end,these sensors are enabled. Then a simple script is used to read the values from the sensors forthe big and the LITTLE cores. The volts, watts and amperes can be retrieved from the sensors,but currently, only the watts are used in the front-end.

On a side note, though important (!), the big.LITTLE back-end must be run with rootprivileges (sudo), as cpufreq-set requires super-user rights.

5.1.5 Parallella specific back-end

The monitoring of the Epiphany coprocessor is not a trivial task. In order to start a task orprogram on the Epiphany chip, one must start a host program on the host core(s) (the ARM-A9 CPU’s) which loads an Epiphany program onto the Epiphany cores and is responsible forthe execution. This makes it impossible to track tasks and their status which are running onthe Epiphany chip from the back-end. To get any information about a running process on theEpiphany, the process must provide this information itself, for example by writing to certainregisters, and the host program must read and process this information. As there is no stan-dard method (yet) of doing this, it is program dependent. It has to be noted that running twoincompatible programs can cause the Parallella board to freeze/crash, again showing the chal-lenges faced when managing and monitoring tasks on the Epiphany. This problem occurs for

16

Figure 5.1: Epiphany power consumption as a function of the voltage (from [3], edited)

example when the ERM (Epiphany Resource Manager) program, provided be Adapteva in theepiphany-examples repository [1], is active when starting another Epiphany example program(besides the “erm example” program). With programming programs for the Epiphany chip notbeing the focus of this project, the ManyMan back-end is able to start, and retrieve results from,host programs for the Epiphany, but no real-time monitoring of Epiphany tasks has yet beenimplemented.

Directly changing the frequencies of the CPU and coprocessor on the Parallella is not pos-sible and are set to a frequency of 667MHz for the Zynq ARM-A9 dual-core and 600MHz forthe Epiphany. It is however possible to change the voltage of the Epiphany chip to some ex-tend. Using the eVolt program, provided by Adapteva in the Parallella-utils repository [13], thevoltage can be set, ranging from 0.900V to 1.200V. As the voltage and clockspeed are related,changing the voltage will also affect the frequency. Figure 5.1 shows the power consumption ofthe Epiphany as a function of the voltage with all 16 cores executing a heavy duty workload.The maximum operating frequency is shown for each voltage level.

Since the Parallella does not come with an active cooling system (like a fan), it has been founduseful to monitor the temperature of the board. The temperature is retrieved with use of thextemp utility, from the same repository as mentioned above, slightly modified for compatibilitywith the back-end. This utility retrieves the temperature of the Zynq chip in Celsius.

5.2 Front-end

The visualization of the many-core systems is done by the front-end. The front-end typically doesnot run on the many-core system itself, but on a separate device. When the front-end is started,it will connect with the back-end for which the IP-address is supplied in the default settings orthrough a separate settings file. When connected, the front-end will receive the necessary dataneeded to build up the user interface, such as the configuration of the cores.

5.2.1 The main view

The main overview in the ManyMan front-end consists of several components. Figure 5.2 showsthe overview for the big.LITTLE and figure 5.3 shows the overview for the Parallella.

17

On the left side, the list of available tasks is shown and a button is provided to add newtasks to this list (see figure 5.4). When this button is pressed, a pop-up opens where the pathto a new program/task can be entered, with use of an included virtual keyboard or the systemkeyboard. The task will be added to the task list by clicking on the create button. A task canbe started on a core by dragging it from the task list onto a core in the middle section of theoverview. The tasks in the list all provide two buttons, the left button duplicates the task in thelist, the right button will start the task on a core chosen by ManyMan. The graph underneaththe task list displays the overall CPU usage of the system. As of the next patch, this graph willalso display the total memory usage of the system.

The right side (figure 5.5) of the main view contains a help and an exit button, and the listof finished and failed tasks executed by the ManyMan. In the bottom right corner, a graph isdrawn. For the big.LITTLE system, this graph displays the power usage in Watts for the bigand LITTLE core groups. For the Parallella, this graph is used to show the temperature of theZynq chip in Celsius.

The middle part of the overview visualizes the cores of the many-core system, along withsliders to set the CPU frequency, on the big.LITTLE, or the voltage level of the Epiphany, onthe Parallella. The cores are displayed in a way that corresponds to the characteristics of themany-core system. The big and LITTLE cores for the big.LITTLE, and the ARM A9 dual-coreand 16-core coprocessor for the Parallella. Each core has a coloured overlay that visualizes theCPU usage on said core, ranging from fully covering and red at 100% CPU usage, to not coveringand green at 0% CPU usage.

5.2.2 The detailed views

Clicking on a core will open a pop-up that can be dragged, scaled and rotated. These proper-ties allow the user to arrange multiple pop-ups in a way that he/she finds useful. This pop-updisplays more detailed information about the corresponding core (figure 5.6). A list of tasks andtheir states on the core is shown and two graphs displaying the CPU and memory usage of thetasks on the core. From here, a task can be moved to a different core by dragging it from thepop-up to a core in the main view. If the task is not released on a core, it will be stopped andmoved to the task list in the main view. The tasks in the list contain an information icon which,when clicked on, opens a pop-up for the information of the task (see figure 5.7).

Figure 5.2: The big.LITTLE ManyMan main view.

18

Figure 5.3: The Parallella ManyMan main view.

Figure 5.4: The task list and cpu-usage. Figure 5.5: Finished tasks and power usage.

The task information pop-up again contains two graphs to display the CPU and memoryusage of the task. On the right, there is a scrollview, containing the last 100 lines of output ofthe task. This number can be changed but note that a too high number might slow down the

19

Figure 5.6: Core information pop-up. Figure 5.7: Task information pop-up

front-end. The complete output of a task will also be written to a file on the front-end device.Above the output, buttons are provided to control the task. The stop and pause button

will send a request to the back-end to stop the task, but not kill or terminate it. The differencebetween stopping and pausing a task is that a paused task will remain on the core, and a stoppedtask will be moved back to the task list in the main view. This is however mainly a visual feature.Internally, paused and stopped tasks do not differ and will both still be on a core and occupyingresources, as explained in section 5.1.3. When a task is paused, the pause button becomes aresume button, in order to continue the paused task on the core.

The smart-move button will move the task to a different core, chosen by the back-end.The choice for a core is based on the current workload of the core and the amount of tasksassigned to the core. It has to be noted that the “smart-move” option does not (yet) include thecharacteristics of the big and LITTLE cores when determining the best core.

A kill button has been added to terminate a running task and remove it from the system. Akilled task will not end up in the finished task list but its output will still be saved to a file.

20

CHAPTER 6

Evaluation

When evaluating the ManyMan, one could perform usability tests for the front-end. This hasalready been done by van der Woning when developing ManyMan for the Intel SCC. The resultsof these tests pointed out that ManyMan is a intuitive tool for visualization and managementof many-core systems and that it looked great. This was the general opinion of both ComputerScience and non-Computer Science students. As the new front-ends created for this project donot introduce any radical changes with regards to the original front-end, these test results arealso assumed to apply for this project.

During the development of the new front- and back-ends, it has been noted that the Many-Man for the big.LITTLE system can easily be modified to apply to regular Linux system. Severaltimes during the development of the big.LITTLE back- and front-end, a regular Acer Laptop,containing four CPUs and running Linux Mint 17.1, has been used to test and run the program.The reason for doing this was that the big.LITTLE board was not always accessible, for instancewhen working at home. The only real changes that have to be made in order to properly runManyMan on regular Linux systems are the disabling/removing of the power monitoring func-tions in the back-end, as most systems do not provide the power sensors, and some small changesto hard-coded properties in the front-end which were implemented for the big.LITTLE (such asthe layout difference between big and LITTLE cores). Besides this, providing the appropriatesettings files when starting the back- and front-end will take care of most of the differences inregular system characteristics.

A tool like ManyMan is not just a nice graphic toy to visualize many-core systems, but itcan also be used for research purposes. By using ManyMan, one can easily run test programs onspecific cores while being able to scale the CPU frequencies with a single click. Provided withreal-time feedback in form of graphs as well as values, the user can easily see and interpret theresults of his actions. For example, the big.LITTLE can be tested on its power consumptionwhen running programs on the big and/or LITTLE cores, for different frequencies. All in a userfriendly manner, being spared the trouble of opening multiple terminals for running the testprograms, changing cores, adjusting frequency and retrieving the power usage, and needing toknow all the commands for doing this.

21

CHAPTER 7

Conclusions

ManyMan has been developed to offer interactive visualization and dynamic management ofmany-core systems. With this tool, many-core systems become more accessible and easier to testand evaluate. Relatively few to no tools are yet available that provide the user with both infor-mation of the many-core on task-, core-, and system-level, and the ability to start and managetasks. ManyMan provides these possibilities through a intuitive, userfriendly, multi-touch sup-porting application. With the large offer of many-core systems, it has become important to testand compare different many-core systems, with regards to both research and education. Needingto use many different tools for different many-core systems is a nuisance and troublesome, as aresult, a general tool is desired that can be used on different many-core systems in a consistentway. This project has set out to extend ManyMan for use on two new many-core systems, inorder to set ManyMan on its path to become this general tool.

Two new back-ends have been created, one for ODROID-XU3 big.LITTLE system and onefor the Parallella-16. Task migration on these systems is done via the taskset command. Thiscommand has replaced the use of the BLCR library, providing faster migration of tasks. How-ever, with this checkpointing disabled, stopped and paused tasks will remain on the CPU, andnever release resources until they are either killed or finished. This can be an issue on systemswith little RAM, in which case, the BLCR library could be re-implemented.

The kernel for the big.LITTLE has been slightly modified to allow for CPU frequency scal-ing. Power statistics are retrieved from the INA341 sensors, which include power consumption,current and voltage for the big cores, LITTLE cores, GPU and DRAM. However, currently onlythe power consumption in watts is used in the ManyMan. When changing the frequencies on thebig.LITTLE, results can immediately be seen in terms of power consumption displayed in thebottom-right graph in the front-end. It has also been found that the big.LITTLE ManyMan caneasily be ported to be used on a regular Linux system, such as a laptop, due to the big.LITTLEnot having very special or abnormal hardware or characteristics (its power resides in efficientlyscheduling of tasks between the big and LITTLE cores). Providing settings files to the front-and back-end, suited to the regular system, will take care of most of the differences.

Monitoring the Epiphany chip on the Parallella turned out to not be an easy task. In or-der to get any information about the processes running on the Epiphany chip, these processesmust provided this themselves and the host program (running on the ARM A9 dual-core) mostproperly retrieve this information. As such, it is the responsibility of the programmer to provideprocess information. This makes it nearly impossible for the back-end to perform any monitoringof tasks running on the Epiphany. It also has to be noted that running incompatible programson the Epiphany, or improperly resetting the Epiphany between program executions, cause theEpiphany to crash, along with the rest of the board.

The front-ends have been adjusted to display the underlying many-core systems in a way

22

that matches their characteristics, contributing to the intuitive and user friendly properties ofManyMan. The new front-ends have been built to support Kivy version 1.9.0, introducing somebug fixes and improving user experience through updated features. Furthermore, the ability toproperly kill running tasks has been implemented, and some issues have been solved with regardsto not, or incorrectly, updating widgets. However, these fixes have not yet been applied to theIntel SCC ManyMan.

Besides having a nice graphic interface, ManyMan is also suited for research purposes.Through ManyMan, test programs can be easily run, while monitoring information such aspower consumption or memory usage, and scaling frequencies, all with just a coupled of clicks(or taps). This relieves a user from the need to have multiple terminals open to perform thesetask, and memorizing the correct commands, potentially causing the user to lose sight of whatis going on.

With providing such easy-to-use and clear features to the user, and the expanding supportfor additional many-core systems and architectures, ManyMan is well on its way of becomingthe general visualization and management tool for many-core systems and potentially even forregular PCs.

23

CHAPTER 8

Future work

During the development of this project, some tasks turned out to be more complicated and morework than expected. For instance the compiling of the new kernel and successfully flashing theMicro SD card took more time than necessary due to inexperience of this process. Also, theParallella initially was unstable and rarely booted correctly, requiring a lot of reboots and powerdisconnections. Flashing a more recent version of the Ubuntu image to a new Micro SD cardeventually solved this. As a result, time did not allow the implementation of all features wantedin ManyMan, which means there is some room for future work.

Most importantly, the monitoring of tasks on the Epiphany is currently not sufficient to theneeds of the user. One would like to keep track of which program is running on which eCore, thiscould for instance be retrieved directly from the host program, which loads these programs on thespecific eCores. This however, will require the programs to follow a certain format of providingthis information, and the back-end should be able to process this information. Furthermore, onewould also like to monitor the percentage of activity on the Epiphany and its memory usage.The Epiphany Resource Manager (ERM) and erm example program from the Epiphany exampleprograms provided by Adapteva [1] are an example of tracking the activity on the Epiphany.

In the ManyMan for the big.LITTLE and Parallella, some bugs introduced by the SCC Many-Man were fixed and more recent version of Kivy has been used. The ManyMan for the SCC,however, has not been modified. Applying these bug fixes and updating the Kivy version in theSCC ManyMan will help ManyMan to stay up-to-date.

Currently, ManyMan consists of three separate front-ends and three separate back-ends forthe three many-core systems. To keep ManyMan organized and modular, the three front-endscould be integrated into one front-end, in which the user can switch between the available many-core systems, either internally in the front-end, for instance via a drop-down list, or by supplyingthe settings file for the targeted many-core system. In the current front-ends, solely providing acorresponding settings file is not sufficient for switching between many-core systems.

Improvements in the smart-move function can also be made, such as taking into account thebig and LITTLE core characteristics when selecting the most suitable core to run the task on.Also, a scheduler could be implemented to let the back-end switch running tasks between cores.As an example, if a task is using 100% of the CPU on a LITTLE core, the scheduler can decideto move the task to a big core.

24

Bibliography

[1] Epipany-examples GitHub. Online, https://github.com/adapteva/epiphany-examples[Visited June 2015].

[2] Epipany SDK Reference. Online, http://www.adapteva.com/docs/epiphany_sdk_ref.

pdf [Visited June 2015].

[3] Epiphany E16G301 datasheet. Online, http://adapteva.com/docs/e16g301_datasheet.pdf [Visited June 2015].

[4] GNU Licenses. Online, http://www.gnu.org/licenses/.

[5] The gpfmon home page. Online, http://andrzejn.web.cern.ch/andrzejn/ [Visited June2015].

[6] Hardkernel EnergyMonitor. Online, https://github.com/hardkernel/EnergyMonitor

[Visited June 2015].

[7] INA231 Power Monitor, Texas Instruments. Online, http://www.ti.com/product/ina231[Visited June 2015].

[8] Kivy Organization. Kivy - Open source Python library for rapid development of applicationsthat make use of innovative user interfaces, such as multi-touch apps. Online, http://www.kivy.org/ [Visited May 2015].

[9] Manyman source. Online, https://github.com/FlorisTurkenburg/ManyMan.

[10] ODROID-XU3 Kernel. Online, https://github.com/hardkernel/linux/tree/

odroidxu3-3.10.y [Visited April 2015].

[11] Parallella. https://www.parallella.org/.

[12] Parallella Kickstarter project. Online, https://www.kickstarter.com/projects/

adapteva/parallella-a-supercomputer-for-everyone [Visited June 2015].

[13] Parallella-utils GitHub. Online, https://github.com/parallella/parallella-utils

[Visited June 2015].

[14] S. Borkar. Thousand core chips: a technology perspective. In Proceedings of the 44th annualDesign Automation Conference, pages 746–749. ACM, 2007.

[15] S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux. Citeseer, 2006.

[16] P. Greenhalgh. big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7. ARM Whitepaper, 2011. http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf.

[17] P. H. Hargrove and J. C. Duell. Berkeley lab checkpoint/restart (BLCR) for Linux clusters.In Journal of Physics: Conference Series, volume 46, page 494. IOP Publishing, 2006.

[18] J. Held, J. Bautista, and S. Koehl. From a Few Cores to Many: A Tera-scale ComputingResearch Overview. white paper, Intel, 2006.

25

https://github.com/adapteva/epiphany-examples

http://www.adapteva.com/docs/epiphany_sdk_ref.pdf

http://www.adapteva.com/docs/epiphany_sdk_ref.pdf

http://adapteva.com/docs/e16g301_datasheet.pdf

http://adapteva.com/docs/e16g301_datasheet.pdf

http://www.gnu.org/licenses/

http://andrzejn.web.cern.ch/andrzejn/

https://github.com/hardkernel/EnergyMonitor

http://www.ti.com/product/ina231

http://www.kivy.org/

http://www.kivy.org/

https://github.com/FlorisTurkenburg/ManyMan

https://github.com/hardkernel/linux/tree/odroidxu3-3.10.y

https://github.com/hardkernel/linux/tree/odroidxu3-3.10.y

https://www.parallella.org/

https://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone

https://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone

https://github.com/parallella/parallella-utils

http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf

[19] J. Howard, S. Dighe, S. R. Vangal, G. Ruhl, N. Borkar, S. Jain, V. Erraguntla, M. Konow,M. Riepen, M. Gries, et al. A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling. Solid-State Circuits, IEEE Journalof, 46(1):173–183, 2011.

[20] S. Jarp, R. Jurga, and A. Nowak. Perfmon2: a leap forward in performance monitoring. InJournal of Physics: Conference Series, volume 119, page 042017. IOP Publishing, 2008.

[21] Kivy. Changelog. Online, http://www.kivy.org/#changelog [Visited May 2015].

[22] A. Olofsson, T. Nordstrom, and Z. Ul-Abdin. Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany. arXiv preprint arXiv:1412.5538, 2014.

[23] K. Olukotun and L. Hammond. The future of microprocessors. Queue, 3(7):26–29, 2005.

[24] J. van der Woning. Interactive visualization and dynamic task management of many-coresystems. A case study: The Intel Single-chip Cloud Computer. 2012. http://dare.uva.

nl/cgi/arno/show.cgi?fid=447352.

[25] J. van der Woning and R. Bakker. Interactive Visual Task Management on the 48-core IntelSCC. In The 6th Many-core Applications Research Community (MARC) Symposium, pages40–45. ONERA, The French Aerospace Lab, 2012.

[26] A. Varghese, B. Edwards, G. Mitra, and A. P. Rendell. Programming the AdaptevaEpiphany 64-core Network-on-chip Coprocessor. In Parallel & Distributed Processing Sym-posium Workshops (IPDPSW), 2014 IEEE International, pages 984–992. IEEE, 2014.

26

http://www.kivy.org/#changelog

http://dare.uva.nl/cgi/arno/show.cgi?fid=447352

http://dare.uva.nl/cgi/arno/show.cgi?fid=447352

Documents

Floris Turkenburg - UvA...However, despite all these new many-core systems being developed, there is a lack of tools available for users to visualize and monitor the many-core system