24
news Also in this issue Simulating clouds with the Met Office The newsletter of EPCC, the supercomputing centre at the University of Edinburgh In this issue Our new European exascale projects Fortissimo: HPC-based simulation for industry Alan Turing Institute for data science EPCC’s Silver Jubilee Driving life sciences forward with big data ©iStock.com/jacek kadaj Issue 78 AUTUMN 2015

EPCC News 78

Embed Size (px)

DESCRIPTION

Contents include: HPC simulation for industry: Fortissimo makes it easier Working towards exascale: Introducing our new projects Big data, big compute: How to extract value Big data and life sciences: EPCC seminars raise awareness of the benefits Alan Turing Institute: The new national institute for data science Big data and genomics: Data analytics support global supply chain Numerical weather prediction: Improving the Met Office cloud model ARCHER eCSE: Supporting research software on ARCHER HPC energy efficiency: Adept benchmarks released Women in HPC: Celebrating two years EPCC's new online learning courses in HPC

Citation preview

Page 1: EPCC News 78

news

Also in this issue

Simulating clouds with the Met Office

The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

In this issue

Our new European exascale projects

Fortissimo: HPC-based simulation for industry

Alan Turing Institute for data science

EPCC’s Silver Jubilee

Driving life sciences forward with

big data

©iStock.com/jacek kadaj

Issue 78 AUTUMN 2015

Page 2: EPCC News 78

2

Contact us

3

4

10

6

12

8

11

5

Contents

www.epcc.ed.ac.uk [email protected] +44 (0)131 650 5030

EPCC is a supercomputing centre based at The University of Edinburgh, which is a charitable body registered in Scotland with registration number SC005336.

From the Directors

14

17

18

16

15

Staff profiles Meet some of the people who work at EPCC

HPC simulation for industryFortissimo makes it easier

Working towards exascaleIntroducing our new projects

Big data, big computeHow to extract value

Big data and life sciencesEPCC seminars raise awareness of the benefits

Alan Turing InstituteThe new national institute for data science

Big data and genomicsData analytics support global supply chain

Numerical weather prediction Improving the Met Office cloud model

ARCHER eCSESupporting research software on ARCHER

HPC energy efficiencyAdept benchmarks released

Women in HPCCelebrating two years

Learn with usWe’ve launched two online learning courses

Event reviewsEPCC’s Silver Jubilee ParCo 15 in Edinburgh PRACE Summer of HPC

OutreachBuilding a PC What’s inside an HPC box?

As we noted in the last edition, this is EPCC’s Silver Jubilee year. It was a great pleasure to see so many past and present EPCC staff at our 25th Birthday Party celebration in September. We were overwhelmed by the number of people who remember their time at EPCC, often at the start of their careers, with great fondness. We were amazed by the distances some people travelled to be at the event – including a ‘day trip’ from the West Coast of the USA.

September also saw the first International Review of EPCC for over a decade. Initial feedback from the Panel of Reviewers is that EPCC is a successful organisation with great staff and a wide variety of activities. The detailed feedback and recommendations, expected by November, will be designed to help us take our next steps forward and grow sensibly and sustainably. With this in mind, and our recent project successes, EPCC has been hiring over the summer. With just over 80 staff today, we expect to be close to 90 by Christmas.

A key driver of our current growth is the surge in demand for data-focussed projects, particularly in the bioinformatics and medical domains. This issue of EPCC News

has several articles about this increasingly important area for EPCC.

Although much of the current hype around big data is unjustified, it is clear that across all areas of scientific and industrial research, the ability to capture ever larger amounts of data is producing a new set of challenges within many organisations. While it is often characterised as an analytical challenge, much of the demand is driven by the need for better ‘data engineering’ – making sure data can be stored, transmitted and accessed in the most optimal fashion. It is still the case that moving more than 100 Terabytes around for analytical purposes requires considerable thought and planning.

We hope you enjoy this issue of EPCC News. If you have any suggestions for future articles please let us know – and we are always keen to hear from former EPCC staff on where your careers have taken you.

Welcome to the Autumn 2015 edition of EPCC News.

If you’re going to Supercomputing’15 in Texas, look out for us in Booth 2503.Our new booth, under construction.

22

Alison Kennedy & Mark ParsonsEPCC Executive Directors

[email protected]@epcc.ed.ac.uk

Page 3: EPCC News 78

3The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

Meet Applications Developer Manos Farsarakis and Systems Architect Donald Scobbie.

I first came to EPCC in 2013 as an MSc student. Having wholeheartedly enjoyed my experience at EPCC from all aspects (except perhaps the weather!), I was delighted to stay on as an Applications Developer.

I currently work on a number of very different projects. This is how I like it. I think that in science, just as in life, one should not be narrow-minded. You should explore all different directions and when possible combine knowledge and expertise in order to advance science and humanity.

Most of my work since I started my job at EPCC has been on the Intel Parallel Computing Centre’s project. I am investigating the performance optimisation potential of GS2, a

physics application used to study low-frequency turbulence in magnetised plasma, on Intel Xeon processors and Intel Xeon Phi coprocessors.

Having many years’ experience in education, I am very pleased to be involved in many aspects of ARCHER training. Most notably, I lead Software Carpentry workshops and am currently organising our first Data Carpentry workshop. I was also happy to be one of the lead developers of the well-received ‘ARCHER Challenge’ Supercomputing App developed by EPCC for ARCHER outreach.

I’ve had two very interesting years here at EPCC and I can only hope that the ones to come are just as exciting!

Emmanouil (Manos) [email protected]

Staff profiles

Donald [email protected]

I recently joined EPCC after nearly three years at the Health Informatics Centre and the Farr Institute at the University of Dundee. There I worked on a range of system issues in research computing within the University and the NHS, encompassing research data management life cycle models, system security and information governance, Safe Haven design, security and operation, and large scale data management and processing for genetic and image data.

I’m relatively new to academic and research computing with most of my prior experience being in embedded real-time control, telecomms and large scale network management systems. I have worked in research and

development roles and in customer facing solution and service delivery roles in which I have always tried to remain as much a hands-on engineer as the positions allowed for. However, increasingly, my preferred choice of programming language has been giving colleagues cause for concern; at least the compilers are still being maintained.

Much of my experience has been in the field of Test and Measurement and I hold eight patents in network related metric generation, data acquisition and state modelling. More recently I have been concerned with software defined platforms and particularly the control and testing of software defined networks. I very much look forward to exploring these at EPCC.

Page 4: EPCC News 78

4

Making HPC simulation more accessible to SMEs will increase their competitiveness – one of the key goals of the ICT for Manufacturing SMEs (I4MS) programme that supports Fortissimo.

The Fortissimo project, led by EPCC, has developed the Marketplace based on requirements generated by a series of experiments in which small companies have developed applications that can run on cloud-based HPC systems. In addition to the technical work, each experiment has addressed the business aspects of using HPC in the cloud in order to establish viable business models.

The work done by EPCC and Glasgow-based Integrated Environmental Solutions (IES)

demonstrates the scale of benefits that can be achieved. IES provides software used by architects to assess the effects of the sun on building designs. By using cloud-based HPC the software can run simulations in hours rather than days, dramatically increasing productivity and cutting costs without the end-user needing to know anything about HPC or needing to buy expensive hardware.

The Marketplace will be the one-stop-shop for finding cloud-based HPC applications. After a lot of work to design and implement the initial set of features, the Marketplace went public last month as we start to add services. As more features and services are added the Marketplace will grow and ultimately become self sustaining.

Fortissimo Marketplace: open for business

Mark Sawyer, [email protected]

www.fortissimo-marketplace.comwww.iesve.com

Fortissmo lowers the barriers for industrial users to access HPC-based simulation by offering the benefits of the cloud computing model (such as pay-per-use), together with support for first time users.

Page 5: EPCC News 78

5The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

INTERTWinE addresses a problem that is already apparent today, and will become increasingly critical as heterogeneous and hierarchical system architectures become more complex.

It is unlikely that in the short to medium term there will be a single programming model or API that can target all aspects of such an architecture effectively. INTERTWinE

therefore tackles the interoperability of parallel programming models that each target specific levels of the system hierarchy, both on an implementation and specification level.INTERTWinE brings together the principal European organisations driving the implementation of parallel programming models: EPCC, BSC, KTH, INRIA, the Fraunhofer society, the German Aerospace Centre, TS-SFR Solutions for Research, the University Jaume I de Castellon and the University of Manchester.

We are very proud to be leading two new projects focused on the Exascale challenge: NEXTGenIO and INTERTWinE. Both started on October 1st 2015 and will run for 36 months.

Building the exascale future

NEXTGenIOwww.nextgenio.eu

Project Coordinator:Mark Parsons [email protected]

Project Manager: Michèle Weiland [email protected]

INTERTWinEwww.intertwine-project.eu

Project Coordinator: Mark Bull [email protected]

Project Manager: George Beckett [email protected]

Michèle [email protected]

NEXTGenIO is one of the larger projects. It addresses a key challenge not just for Exascale, but for HPC and data intensive computing in general: the challenge of I/O performance.

As core-counts have increased over the years, the performance of I/O subsystems have struggled to keep up with computational performance and have become a key bottleneck on today’s largest systems.

NEXTGenIO will develop a prototype computing platform that uses on-node non-volatile memory, bridging the latency gap between DRAM and disk, thus removing this bottleneck. In addition to the hardware that will be built as part of

the project, NEXTGenIO will develop the software stack (from OS and runtime support, to programming models and tools) that goes hand-in-hand with this new hardware architecture. Two particular focal points are a data- and power-aware job scheduling system, as well as an I/O workload and workflow simulator that will allow us to stress-test our developments. We believe that the new platform being developed by NEXTGenIO will be capable of delivering transformational performance.NEXTGenIO is a collaboration between HPC centres (EPCC and BSC), hardware developers and system integrators (Intel and Fujitsu), tools developers (TUD and Allinea), and end users (ECMWF and Arctur).

INTERTWINE

NEXTGENIO

The European Commission has recently funded a round of Horizon 2020 FETHPC research projects focused on the Exascale challenge. In total, 19 new projects successfully bid for this funding: 6 of them, the ‘larger’ projects, investigate Exascale system solutions, whereas the 13 ‘smaller’ projects address more specific problems, such as algorithms or APIs.

Page 6: EPCC News 78

6

Big data analytics today typically means the application of statistical mathematics to large, complex datasets in order to uncover hidden patterns and provide useful insights. It is a response to the big data explosion; statistical approaches are the only feasible methods for many data-driven research questions, and many of these may demand serious computing power.

Volume, variety, velocityThe use of the “three Vs” to characterise “big data” provides a useful starting point for classifying digital data and the types of computation that can be brought to bear upon them.

The volume of a dataset can be measured two ways – the size of the individual digital objects (file size in bytes, for instance), and the number of digital objects in the dataset. These are not exclusive: a dataset may comprise a large number of very large digital objects, and both measures need to be factored in to the data’s underlying “computability”. And the numbers are staggering. Current estimates suggest that 40 zettabytes – 43 trillion terabytes – will be created globally by 2020.

Variety in a dataset is something of a misnomer; it refers properly to the complexity of analysis required for individual digital objects. In this sense, scientific data tends to be low in variety – simulations and sensors produce numbers encoded in well-specified ways – although the increasing quantity of science data recorded as digital images introduces the need for more statistical techniques in terms of classification theory, fuzzy matching and so on. Additionally, complexity can very quickly mount up once we start to look at combining data from multiple heterogeneous and potentially distributed sources.

The velocity of a dataset means the speed with which it changes; the live customer transaction database in a busy supermarket, for instance. In terms simply of the dataset there is a trade off between velocity and volume: a high-velocity dataset can be recorded as a static one of large volume (a transaction log, an initial state and a sequence of differences, or a time series of individual records).

One aspect of time which can be critical to the recording of a dataset (as opposed to its analysis) is whether the data are recording an observation or just a measurement. An observation is time-critical – eg a supernova explosion – whereas a measurement is, in principle, reproducible. Simulation data, for instance, are always reproducible; gene sequences are reproducible

Big data, big compute

Rob [email protected]

The adoption of big data analytics techniques by business is a relatively new phenomenon, but the application of computational, statistical and probabilistic techniques to extract bulk properties (averages, states, rules, phase transitions) from complex systems can be traced back to early 20th Century statistical mechanics. Science has been doing data analytics for a long time.

Gathering data for their own sake is pointless without the analysis techniques and computing power to process them. But what techniques, and what kind of computing power? The differing nature of data-driven analysis in different fields and areas of application demands a variety of computational approaches. Through our current activities in big data – some of them described elsewhere in this newsletter – we’re aiming to understand how big data and big compute can be brought together to create, well, big value.

Page 7: EPCC News 78

7The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

from the same strands of DNA. In some cases, a measurement which is in principle reproducible may be practically irreproducible – experimentally crashing a very large crude carrier into a pier, for instance. In these cases, these data can be treated as observations.

Time criticality in big data computing Whether a dataset comprises observations or measurements speaks more to the care required to preserve it rather than the need to treat it differently in computational terms. The velocity of the dataset becomes important when compared to the time to analyse one of its component objects, and the degree of time criticality of the compute process itself.

In the context of computing with big data, time criticality can be thought of as the answer to the question “Do we have to analyse these data now, or can it wait until tomorrow?” If the answer is “tomorrow” (for a definition of “tomorrow” which is case-dependent, of course), the computing process is probably not time critical.

Time criticality in computing is not the same as data velocity, nor necessarily the same as real-time analytics. It is more a measure of the timescale on which the insights provided by the data-crunching process continue to be relevant,

useful and valuable (another “V”): a weather forecast for yesterday has little value.

Much scientific data analysis is not time-critical, unlike in business where insights from data may have a useful life measured in months or weeks. Time-critical analytics are characteristic of decision-support systems.

Time criticality can have a significant impact on the design of the whole approach to data analysis. A sub-second decision timeframe demands enormously high throughput and blazingly fast algorithms – perhaps field programmable gate arrays or other reconfigurable hardware; high volume, high velocity data may need to be processed and reduced on a timescale dictated by the available storage capacity; algorithms may not be able to “wait” for a complete dataset to arrive but instead begin to work incrementally with data as soon as they become available.

The key to applying big computing to big data, then, lies in understanding not only the nature of the question being asked but also the nature of the underlying data – how big, how many, how complex, how time-critical. Here the three Vs can provide a useful means of classifying problems: big data, big compute, or both at once?

EPCC’s big data projects address the problems of processing, storing and analysing raw data. Current examples include:

AstroData: Astronomical data

Aviagen: Genomic datasets

EUDAT: European Data Infrastructure

Farr Institute of Health Informatics Research: Building a new secure, data-driven computational platform for health research

ICORDI: International Collaboration on Data Infrastructure

Next Generation Sequencing: Collaborating with Edinburgh Genomics to provide the infrastructure and support to optimise high-throughput gene sequencing

PERICLES: Fighting ‘semantic decay’

Read more: www.epcc.ed.ac.uk/research/data/data-research-projects

Page 8: EPCC News 78

8

HPC and big data have the potential to drastically accelerate discovery life cycles across life sciences in areas such as:

Chemistry: including bio-chemistry, molecular modelling, and protein folding

Bio-engineering: including agricultural engineering

Genomics and proteomics: including next generation sequencing

Biology: including molecular biology

Pharmacology: including pharmacokinetic/pharmacodynamic (PK/PD) modelling

Analytics: including statistical analysis and bioinformatics

Energy: including biofuels.

This can significantly strengthen a company’s competitive standing. In life sciences the benefit of accelerated discovery lifecycles extrapolated across the entire Scottish life sciences sector could be a potential catalyst for growth,

drastically improving the standing of the Scottish sector in global markets. In turn, this could protect and create jobs in technology provision and technology service consumption.

The Scottish life sciences and healthcare sectors consist of a large number of companies, many of them small to medium sized enterprises (SMEs), which face particular problems with technology adoption. HPC and big data technologies can be expensive to acquire and maintain, companies commonly don’t have the knowhow to exploit them, and generally they lack access to other resources such as financial capital.

Showing the benefitsTo accelerate adoption of such technologies, a level of awareness has to be created to promote both the technical benefits and the help that is available to encourage adoption.

HPC and big data in life sciences

EPCC recently ran a series of seminars for the Scottish life sciences and healthcare industries. Our objective was to accelerate the adoption of high performance computing (HPC) and big data in the sectors by raising awareness of the potential benefits of these technologies.

The industry-targeted talks at our seminars included:

• Demystifying Big Data (EPCC)

• HPC, Big Data, and BioInformatics (EPCC)

• The Scottish Big Data Ecosystem (EPCC)

• How Genomic Big Data is shaping animal breeding for the 21st century (Aviagen)

• Genomic Research at Scale (Aridhia Informatics)

• Use of Real World Data to Support Drug Development (GSK)

• Big Data challenges and solutions in Healthcare and Life Sciences (IBM)

• Scottish Innovation Centre Value (The Data Lab)

©iStock.com/aydinmutlu

Page 9: EPCC News 78

9The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

These events were of significant value in uncovering the details of the needs and wants of Scottish life science companies. This information will be invaluable in shaping a number of follow-on services in HPC and big data including training, consultancy and software development for delivery by EPCC and its partners.

The events were delivered by EPCC, a recognised leader in the delivery of HPC and big data services, in collaboration with three sector-focused centres: BioCity, Edinburgh BioQuarter and BioDundee.

With broad geographic community reach, these centres have a unified focus to accelerate developments across the Scottish life sciences and healthcare sectors. By encouraging collaborations between industry, academia and the public sector they have each built exceptionally strong community networks; valuable targets for disseminating the benefits of HPC and big data technologies.

Support for small and medium sized industriesThe seminars’ technical content was complemented with talks about how Scottish life science companies can accelerate adoption of HPC and big data technologies.

In fact, with many SMEs attending the seminars, aspects of the seminars that explained the Scottish HPC and big data ecosystem were of particular value. The events were particularly relevant in explaining how barriers to adoption could be lowered for SMEs by working with organisations such as Scottish Enterprise (SE), the Scottish Innovation Centres (such as the Data Lab, The Digital Health Institute and Stratified Medicine Scotland), Farr Institute Scotland, as well as some of the newly created academic collaborations such as the Alan Turing Institute at the University of Edinburgh.

George Graham, [email protected]

The seminars were well-attended with around 80 delegates in total, and 100% of delegates rated the events as useful or very useful to their business needs.

Find out more about our big data work:www.epcc.ed.ac.uk/research/data

Image from Adam Carter’s

‘Demystifying big data’ talk.

©iStock.com/James Anderson

Page 10: EPCC News 78

10

The Alan Turing Institute

Kevin CollinsAssistant Principal Industry Engagement, Industry Funding and Big Data, University of [email protected]

Mark [email protected]

Alan Turing Institute website:www.turing.ac.uk

As data science becomes increasingly pervasive across industry and commerce, the Institute will attract the best data scientists and mathematicians from the UK and across the globe to break new boundaries in how we use big data in a fast moving, competitive global knowledge economy. Creating industry impact and addressing skills gaps have been identified as key priorities for the Institute and EPCC sees a clear opportunity to engage with the Institute in this context.

Since the development of the Institute was announced by Chancellor George Osborne in March 2014, rapid progress has been made and a number of key milestones have already been achieved. The Institute is now fully constituted and has begun operations. It will be headquartered at the British Library with significant additional activity undertaken across the UK by the founding universities. The leadership of the Institute has been appointed, including City veteran Howard Covington as Chairman and Andrew Blake, former Director of Microsoft Research UK, as Director.

The Alan Turing Institute marked its first few days of operations with the confirmation of £10 million of research funding from Lloyd’s Register Foundation, a research partnership with GCHQ, a collaboration with Cray Inc. and EPSRC, and its first research activities. The latter collaboration

leverages the ARCHER national supercomputing service operated by EPCC at Edinburgh.

The breadth of industry and commercial interest in the Institute has been high which is very encouraging, and discussions are ongoing with a number of companies who are expected to join Lloyds Register Foundation as strategic partners in due course.

As part of the Institute’s role to engage UK companies, and the physical and social sciences communities a series of big data summits are being held across the UK this autumn. Three of these will be in Edinburgh focussed on Media, Credit Risk and Sensors. Further seminars covering a diverse range of topics including health, customer-facing industries, Finance, Big Data for SME’s, Manufacturing, Privacy & Security, Government & Policy will be held at other Universities.

These summits are aimed at encouraging discussion of the challenges encountered in dealing with big data and identifying high impact research areas for the Institute.

For more information see the Alan Turing Institute website at http:// www.turing.ac.uk. The initial milestones signal excellent progress towards the Alan Turing Institute having a significant early impact on the national imperative to make the UK a global leader in data science and big data research and application.

The Alan Turing Institute is the UK’s national institute for data science, named in honour of the war-time hero and father of Computer Science. Over the past year EPCC has been involved in helping the University bid for and win a key role as one of six joint venture partners in the Institute.

Edinburgh’s co-joint venture partners are the Universities of Cambridge, Oxford, Warwick, University College London and the Engineering and Physical Sciences Research Council (EPSRC).

Page 11: EPCC News 78

11The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

Genomic selection techniques are routinely used at Aviagen’s R&D centre in Scotland as part of its pedigree breeding operations.

Currently the genomic information collected comes from genotype data from SNP (single nucleotide polymorphism) arrays. However, SNP arrays cover only a fraction of genetic variation. Using whole genome sequence data could drastically improve the selection process by directly correlating genetic architecture to economic value. Increasing selection accuracy using advanced genome selection techniques, underpinned by large-scale data analytics, as part of its pedigree breeding operations, has the potential to considerably strengthen Aviagen’s ability to deliver genetic improvements into the broiler value chain.

There are however significant costs associated with using whole genome sequencing. With SNP data, for example, for every selection candidate in the breeding programme more than 600,000 data points (SNP markers) are available. Due to the prolific reproductive performance of chickens, the number of genotyped animals is now close to 150,000. The amount

of genomic data accumulated rapidly grows as more chicken lines are genotyped. Using whole genome sequence data would incur an order of magnitude (at least) increase in the data points collected per bird. Aviagen’s current data analytics platforms are unable to cope with this step change in data processing requirement.

The solution for Aviagen is to invest in a radically upgraded data analytics platform and workflow. In collaboration with EPCC, and assisted by Scottish Enterprise, a project is underway to leverage cutting-edge technological innovations in big data management techniques and analytical frameworks to develop novel tools and pipelines that will store, search, visualise and analyse genotype and sequence data.

Together with an upgraded IT infrastructure, this will deliver an industry-leading selection process, significantly strengthening Aviagen’s position in the global broiler market, driving revenue and market share growth. In turn this will have a positive impact on the Scottish economy, with a significant Scottish broiler sector directly linked to Aviagen’s strength as a supplier.

Driving efficiencies in a global supply chain with big dataAs a breeding company for broiler chickens, the accuracy of Aviagen’s selection processes is critical to the company’s success in a highly competitive global market.

It’s not only the Scottish economy that stands to benefit from this collaboration. The ability to select the most efficient targets for production will have a considerable advantage in sustaining and securing the global poultry supply chain. Yields can be improved whilst at the same time drastically reducing environmental footprint, and improving animal health and welfare.

George Graham, [email protected]

©iStock.com/blackdovfx

Page 12: EPCC News 78

12

Numerical Weather Prediction (NWP) and climate models have developed considerably over the past few decades along with the computational power to drive them.

Thirty years ago, resolvable length scales of atmospheric flows were on the order of 100km in operational models, where now they are on the order of 10km for global operational models and 1km for regional models. With an increase in resolution comes increased accuracy, but even at these higher resolutions the fundamental fluid motions of clouds and turbulent flows remain at the subgrid scale.

In order for models to represent and account for the interaction of these small-scale flows with the larger-scale meteorology, physically-based parametrizations are developed. A key tool in understanding the fundamental physics of these flows and thus development of the parametrizations is Large Eddy Simulation (LES).

A highly scalable LESThe Met Office Large Eddy Model (LEM) has been the workhorse of

cloud process modelling in the Met Office and many UK universities. Originally developed in the 1990s and written predominantly in Fortran 77 with severe limitations in its MPI decomposition, it has failed to keep pace with modern HPC architectures and now lacks the scalability enjoyed by its contemporary operational models.

Despite the shortcomings of the software implementation of the LEM, the scientific foundation is well regarded and well tested and so the MONC project sought to produce a new LES model that is built on the science of the LEM, but with modern software design that is capable of running on tens of thousands of cores and enables high resolution, large domain simulations.

A flexible approachMONC will be used to simulate a wide variety of atmospheric flows, such as dry boundary layers, fog, stratocumulus or deep moist convection. Each requires its own particular scientific configuration using varying levels of complexity or different numerical implementation.

The Met Office/NERC Cloud model (MONC) has been developed in a collaboration between EPCC and the Met Office.

MONC delivers a highly scalable and flexible Large Eddy Simulation (LES) model capable of simulating clouds and other turbulent flows at resolutions of tens of metres on very large domains.

Numerical modelling of clouds and atmospheric flows

©iStock.com/environmantic

Page 13: EPCC News 78

13The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

In recognition of the wide variety of choices a scientist may want to have, and with an eye on future development of new or more efficient codes, a flexible plug ‘n’ play approach was adopted when building the new model. This involved a software infrastructure whereby different components of the model can be chosen and configured at run time. A centrally held and updated model state which can be passed to each component allows for a very simple interface and so allows rapid development of new components.

Gathering insightA scalable and flexible model is all very well, but the key thing a scientist requires is the output of the diagnostic state of the system.

The complex nature of many of the simulations which will be carried out with MONC leads to a wealth of interesting and insightful diagnostics which could potentially be used to understand the underlying physics of the behaviour of clouds. To get at this information, the scalable core of the model needs to be extended to a scalable

and efficient diagnostics system.

Another innovation in MONC is the diagnostic server. This allows diagnostics to be calculated and gathered on demand and then farms them off to the dedicated processes which can then asynchronously process statistics and write out to netcdf files.

Just the beginningThe original MONC project is now coming to an end and has undoubtedly been a great success.

A beta release of the code is in preparation and the code is now hosted on the Met Office Shared Repository, where the community can start to engage. EPCC involvement will continue with follow-on funding to make further optimisations to the dynamical solver and the microphysics components.

It is only a matter of time before the fruits of our labour pull through to fundamental understanding of clouds and vital improvements in climate prediction and weather forecasting.

Dr Ben ShipwayMet Office Scientistwww.metoffice.gov.uk/research/people/ben-shipway

“EPCC has been doing the code development of MONC, and using our experience of HPC we have produced a model which has already been run on over 32,000 cores and over 2 billion grid points.

“This scale is far beyond what the community can currently work at and we see no reason why MONC can not be scaled beyond this to 100,000 cores in the future.

“Our modular design of MONC means that it is trivial to plug in additional functionality as future scientists require.”

Nick Brown, EPCC

These images show the same simulation of a bubble of warm air rising through the atmosphere, with different filters applied.

To begin with the bubble was warmest in the middle. The simulation has run for 230 seconds and in that time the warmest, less dense, air has risen up faster (higher) than the surrounding cooler, more dense, air.

Page 14: EPCC News 78

14

The programme provides funding for established ARCHER communities but some is also set aside for “New Communities” – that is, communities new to ARCHER looking to move simulations from a Tier-2 (Regional) level to a Tier-1 (National) level.

Our job at EPCC is to oversee the whole process. This starts with the issuing of a call. Once proposals have been received we arrange reviews and an associated panel meeting where decisions are made on which proposals to fund. We then let applicants know the results and projects are set up. Unsuccessful applicants are given feedback on their proposal and advice for future submissions.

Throughout the project we liaise with the technical staff working on the projects and run many courses and webinars which they attend. Finally we deal with the closing of projects and collecting of final reports, summaries of which can be found on the eCSE webpages on the ARCHER website.

Calls are issued every 4 months and over the course of 4 years, a total of

56 person-years (672 person months) of effort will be awarded. So far 486 person-months have been awarded across 49 projects from 5 separate funding calls.

The frequent nature of calls means that as soon as the review process from a call is complete, it’s time to open the next call! However, this provides momentum and we believe having a regular timetable of calls helps users with the process of submitting proposals.

Technical staff working on eCSE projects may be researchers located at the institution of the PI, third parties, or can include staff from the centralised CSE support team or a mixture of the above. So far we have PIs and staff from 21 different institutions around the UK with a wide variety of projects and subject areas.

Around the time of a call opening we run an ARCHER webinar giving advice on how to submit a proposal. This gives prospective applicants a chance to see how the process works and to ask questions.

ARCHER eCSE programme

Chris Johnson, [email protected]

The ARCHER eCSE programme is run by a team of three people within the CSE team at EPCC: Chris Johnson, Xu Guo and Lorna Smith.

The ARCHER embedded Computational Science & Engineering programme (eCSE) provides funding for researchers across the UK to work on the development of software running on ARCHER. This can include improving the performance or usability of the software, inserting new functionality or efforts to improve its long-term sustainability.

More information about the eCSE programme can be found on the ARCHER website: www.archer.ac.uk/community/eCSE/

Bone modelling on ARCHER. From the “Voxel-based finite element modelling with VOX-FE2” eCSE project completed by Neelofer Banglawala and Iain Bethune (both EPCC) and Michael Fagan and Richard Holbrey (University of Hull). This is a Paraview visualisation of 3D stress and strain patterns within an 8-million element simulation.

The next eCSE call (eCSE07) opens on 24th November, 2015 and closes on 19th January, 2016.

Page 15: EPCC News 78

15The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

After a couple of years of measuring and trying to understand the power and energy consumption of (parallel) software and hardware, we have now released one of the key tools that we’ve been using as part of this research: the Adept Benchmark Suite!

While measuring performance (ie time to solution) is well understood, doing the same for power or energy is much less straightforward and often hardware dependent. The Adept Benchmark Suite relies on third party power measurement (such as instrumentation of the hardware) to be in place. However, to get users started with initial experiments, we provide a library to use RAPL (Running Average Power Limit) counters on Intel processors to measure the power of CPUs and memory, as well as some example code on how to use this library within the Adept Benchmarks.

The Adept Benchmark Suite consists of a wide range of benchmarks representative of common workloads, from high-performance embedded computing all the way to high-performance scientific computing.

They are designed to characterise the efficiency (both in terms of performance and energy) of computer systems, from the hardware and system software stack to the compilers and programming models. Care was taken to expose and (where at all feasible) eliminate overheads so that measurements can be as accurate as possible. We want to get to the core cost of different computations and this often relies on techniques to avoid the compiler or CPU being smarter than you would like them to be.

Find out more

Download the benchmarks from EPCC’s GitHub: https://github.com/EPCCed

We would very much welcome any feedback or comments – please let us know if you have suggestions for improvements or any feature requests!

Contact

By email: [email protected] Or on Twitter: @adept_project

Adept Benchmark Suite releasedThe Adept project addresses the challenge of energy-efficient use of parallel technologies. Adept builds on the expertise of software developers from high-performance computing (HPC) to exploit parallelism for performance, and on the expertise of Embedded systems engineers in managing energy usage.

Michèle Weiland, [email protected]

Nick Johnson, [email protected]

©iStock.com/tusumaru

Page 16: EPCC News 78

Women in HPC: powering careers for women in High Performance ComputingIn September Women in HPC in collaboration with BCSWomen (British Computer Society) ran the second annual UK Women in HPC workshop at the BCS in London. The 40 attendees heard from five speakers about working in different career paths in HPC, from working as a journalist in the field of HPC, to selling hardware to being an academic. The day was rounded off with a panel discussion on working in HPC, overcoming pitfalls (and how to ask your manager for a payrise!) and finally a speed networking event.

Upcoming eventsWomen in HPC at SC15:

• Workshop: Changing the face of HPC. 20 Nov 2015, 8.30-12.00.

• Women in HPC BoF: Pathways and Roadblocks. 18 Nov 2015, 10.30-12.00.

WiSE Software Carpentry for Women in HPC:

• In collaboration with ARCHER and SSI, Women in HPC will run the first UK WiSE Software Carpentry course, building on the success of the programme in the USA. 14-15 Dec 2015. Registration: www.archer.ac.uk/training/courses/2015/12/sw_carp_manchester/

Toni Collis, [email protected]

Women in HPCWomen in HPC will turn two years old at the end of this year!

For details of all our events please visit:www.womeninhpc.org.uk/events

16

Following a highly successful 2014, Women in HPC has provided an exciting programme of international events in 2015.

In May we ran a two-day training ‘Introduction to HPC’ in collaboration with the PRACEDays15 conference in Dublin, Ireland. This was followed by our most successful international workshop at ISC in Frankfurt in 2015, discussing the work of WHPC, showcasing early-career women and how WHPC can further improve the representation of women.

We will be finishing the year on a high with a BoF and workshop at SC15 in Austin, Texas.

Page 17: EPCC News 78

17The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

This course introduces the important ideas and concepts of data science and allows students to gain the basic skills that would be expected of a data scientist.

It has two broad themes: the importance of looking after data (so that it can be analysed) and data analysis.

The data management strand covers subjects relating to databases and data storage, archiving and legal and ethical issues. We then cover data analytics techniques and look at how some of these can be done at scale with technologies such as Hadoop.

Successful completion of one of the stand-alone courses described below leads to a Postgraduate Professional Development award from The University of Edinburgh. The courses can also be taken as part of an online Certificate in Data Science, Technology and Innovation from The University of Edinburgh. Each course corresponds to 20 SCQF credits and is assessed online through coursework.

To find out more about the expected learning outcomes, course structure, assessment, entry requirements and course fees, see: www.epcc.ed.ac.uk/online-courses

EPCC’s online distance learning courses start in January 2016. Apply now!

High-performance computing (HPC) is a fundamental technology used in solving scientific and commercial problems. Modern supercomputers are parallel computers, comprising thousands of processors.

Parallel programming techniques can also be applied to smaller systems such as multi-core desktops, graphics processors and computing clusters.

The course has three main themes: Hardware, Architectures & System Software; Parallel Programming; Applications on HPC systems.

You will explore these topics by running parallel programs on real HPC systems such as the UK national supercomputer, ARCHER.

Practical Introduction to HPC

Practical Introduction to Data Science

Our flexible, online courses are designed to fit around your other commitments. Students are given access to the computing facilities of the ARCHER system, the UK national supercomputer service.

Adam Carter, [email protected]

Online distance learning from EPCC: apply now!

©iStock.com/pichet

Page 18: EPCC News 78

18

Since 1990, EPCC has trained and employed generations of HPC experts and industry professionals.

As attendance at the dinner organised to celebrate the 25th anniversary of EPCC so clearly demonstrated, the sense of belonging to the EPCC community extends beyond the duration of training or employment. Over 160 people attended the celebration at the University’s Playfair Library with many coming from overseas as well as within the UK.

EPCC was born through the ideas of three people – David Wallace, Stuart Pawley and Ken Bowler – in the, then, Department of Physics. Indeed, EPCC had an unusually long gestation period of 9 years,

with the first simulations run on parallel computers in London and the only method of communication by post. Never say that things have not improved, or that network performance has never been worse – it has!

Computers at Edinburgh soon followed, however, and the first two ICL Distributed Array Processors (DAPs) enabled researchers in Physics to publish 186 computational science papers in six years. This was a truly remarkable feat, especially when you remember that these computers had less performance and memory than today’s mobile phones.

EPCC developed through links with Roland Ibbett in Computer Science

Celebrating EPCC’s Silver Jubilee

Maureen [email protected]

September saw an unparalleled coming together of EPCC staff, past and present, in celebration of the centre’s considerable achievements over the past 25 years.

Above, left to right: Prof. Arthur Trew, Prof. Sir David Wallace, Prof. Roland Ibbett and Mr Richard Field.

Above, left to right: Mr Brian Gilmore, Prof. Mark Parsons, Prof. Lesley Yellowlees and Prof. Richard Kenway.

Page 19: EPCC News 78

19The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

and Richard Field in the University’s Computing Service and gained a business-orientated culture through its first Chairman, the late Jeff Collins.

Our experience clearly demonstrates the importance of starting from the right place with the right ethos because no-one 25 years ago would have predicted the range of activities that we have undertaken successfully, or the degree to which our working practices have been adopted across many sectors of the University.

The evening commenced with a drinks reception after which the current Directors, Mrs Alison Kennedy and Professor Mark

Parsons, began proceedings by formally welcoming guests and thanking staff and supporters from the past 25 years. During his speech, Professor Parsons personally thanked Professor Francis Wray for his exceptional contribution to EPCC over many years.

After brief speeches, guests enjoyed a three-course dinner and surprise entertainment from operatic act the Singing Waiters. Everyone present cherished the opportunity to catch up with ex colleagues and to make new connections. The festive atmosphere that enveloped the evening continued long into the night.

“It was wonderful to see so many staff from the past 25 years coming together to celebrate this important milestone. As we

often say, EPCC is not about machines but its people. Organising the celebration gave us an opportunity to re-establish contact with many ex-staff who are pursuing successful careers in this country and abroad. The dinner has given us a wonderful opportunity to meet again and exchange memories of the past

as well as hopes for the future”.

Mark Parsons, EPCC Executive Director

Page 20: EPCC News 78

20

Over 150 delegates attended the conference which featured 50 main-track papers and 8 mini-symposia. Themes for the conference included multicore and heterogenous computing, the challenges of exascale and programming methods.

Contributors were from across the world with around 20 different countries represented, giving the conference a truly international dimension. In a series of keynote talks, Simon McIntosh-Smith, Rick Stevens, Keshav Pingali and Steve Furber shared their insights in the areas of extreme scaling, neuromorphic chip technology, data-centric foundations for parallel programming and bio-inspired massive parallelism respectively. The full programme can be found at www.parco2015.org.

The conference was hosted in the Informatics Forum and was organised by the School of Informatics and EPCC, together with conference series organisers, Gerhard Joubert (Technical University of Clausthal) and Frans Peters (ParCo Conferences). In addition to the main conference programme, the delegates enjoyed a historical walking tour of Edinburgh and a dinner in the inspiring surroundings of the Playfair Library.

Proceedings of the conference will be published later this year, and we would like to thank all those involved in organising and participating in ParCo 2015 for making it a success. We would also like to thank our industry sponsors Cray, SGI, DDN and Seagate.

ParCo 2015 comes to town

Mark Sawyer, [email protected]

The International Parallel Computing Conference (ParCo) was held in Edinburgh from 1st–4th September. ParCo 2015 was the 16th instance of the series which started in 1983, making it the longest running series of international conferences in Europe on advances in the development and application of parallel computing technologies.

Simon McIntosh-Smith, Head of the HPC Research Group at the University of Bristol, delivers his keynote: Scientific Software Challenges in the Extreme Scaling Era.

ParCo takes place every two years, and an announcement about the next conference will be made in due course at the conference series homepage:www.parco.org

Page 21: EPCC News 78

21The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

In total, 20 students participated in this year’s programme, hosted at HPC centres in Cyprus, the Czech Republic, Germany, Hungary, Ireland, Italy, Slovenia, Spain and the UK.

EPCC welcomed three of these students to Edinburgh: Anna Chantzoplaki from Greece, Jana Boltersdorf from Germany, and Ondrej Vysocky from the Czech Republic.

After a training week in Barcelona with all the other SoHPC students, our visitors had to adjust to the “summer” climate in Edinburgh – but they had soon settled into life here and started making the most of the experience, both in the workplace and outside it.

During their 8 weeks in Edinburgh, each student worked on their own project, mentored by EPCC staff members.

Jana and Ondrej both worked on projects looking at the FFEA (fluctuating finite element analysis) code, written by Dr Sarah Harris’s Theoretical Physics group at the University of Leeds. Jana was investigating introducing MPI parallelisation into the code, while Ondrej worked on developing the user interface.

Anna worked on the “Design a supercomputer” challenge application, to be used in EPCC’s outreach activities. This app allows people at outreach events such as science festivals to compete against each other to design their own virtual supercomputer, constrained by budget and energy efficiency.

In the last week of the programme, Anna was able to join the EPCC outreach team for a successful trial of the application at a high school in Edinburgh.

PRACE Summer of HPC 2015

For first-hand accounts of the students’experiences, see: https://summerofhpc.prace-ri.eu/blogs-2015/

The students’ final presentations can be viewed on YouTube – search for “Summer of HPC 2015 playlist”.

The blogs and video presentations are also available on public Facebook and Twitter feeds: www.facebook.com/SummerOfHPC www.twitter.com/SummerOfHPC

Summer of HPC website: http://summerofhpc.prace-ri.eu/

Catherine Inglis, [email protected]

The PRACE Summer of HPC (SoHPC) programme, now in its third year, offers senior undergraduate and postgraduate students the chance to undertake a two-month summer placement at an HPC centre in another country.

Page 22: EPCC News 78

22

I have often been struck by how many people have never seen inside any computer – let alone an HPC machine – and so they have no idea that many of the components are the same, it’s how they are connected that makes the difference.

I thought it would be good to get some old computer kit together and let people have a go at building it up, learning along the way what each part of a computer is for and how they fit together.

I was lucky enough to get ten old PCs from one of the School of Physics’ computing labs, and set about turning them into an activity that could be used at a science festival. After breaking down one of the machines into individual components I found that you could strip down the whole machine to just the case and system board, only needing 5 screws to put it back together!

After installing Ubuntu Linux with a “Congratulations” banner (to be displayed when the machine booted) and putting together an instruction booklet, I packed the kit into my car and headed off to Melrose to try it out.

We ran the Junkyard Challenge four times, with kids working in pairs (sometimes with a parent’s help) on their own machine. It took about 30-40 mins for each group to successfully build and boot up their machine, giving us time to disassemble them again ready for the next group.

In total we managed 22 successful builds during the day, and all of the people I spoke to said they really enjoyed it! For many this was the first time they had ever taken the lid off a computer to see what was inside it, let alone handling all the components and piecing them together. Amazingly, almost all of the machines booted up first time!

What’s inside the box? Building computers from scratch

Iain Bethune, [email protected]

Read more outreach stories on our blog: www.epcc.ed.ac.uk/blog

In September, a team of us attended Bang Goes The Borders, a science festival hosted by St Mary’s School in Melrose. We took along one of our new activities: the Build-a-PC Junkyard Challenge...

Page 23: EPCC News 78

23The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

Who’s Wee Archie?

Nick [email protected] [email protected]

Wee Archie has been designed to meet the following criteria:

• Transportability – the design is robust and portable, and only requires a network and power cable to be attached.

• Functionality - Wee Archie will demonstrate parallel concepts, and this is best done through running applications.

• Visibility – the connections between components of complex systems are often hidden from the general user. Wee Archie is designed to expose the connections and hardware safely.

Using Wee Archie: an application

Dinosaur racing has been a popular activity at EPCC outreach events for several years: audiences can configure their own simulated dinosaur, with the resulting movement data used to create races against other dinosaurs. This

has worked well when the application is able to run the simulation engine on ARCHER, the UK national HPC service. But the amount of data that needs to be communicated is large and not every outreach venue has a sufficiently fast internet connection. A laptop can run the simulation, but is very slow and lessens the impact of the application.

Wee Archie with its 64 computation cores addresses this problem. The dinosaur simulation code has been ported to Wee Archie, allowing us to run the code in a parallel system without concern about remote connections.

The audience can watch the machine in use, seeing how multiple parts of the system work together: as the LED matrices change from idle to busy, each shows the amount of work being done to simulate their dinosaur.

The EPCC Outreach team is always looking for new ways to introduce supercomputing to a general audience. Remote connections to ARCHER could be used, but how would audiences know what is happening? Is it really running on a remote system or is it faked? Enter Wee Archie: a portable, functional cluster developed by EPCC to demonstrate applications and concepts relating to parallel systems.

Wee Archie consists of 18 Raspberry Pi 2 Model B boards with three network switches housed in a custom design casing. Sixteen of these boards are for computation; the other two form a control and monitoring system. Each board has an associated LED matrix used to show live operational metrics.

Left: the custom cluster housing. Above: a computation node – board and

matrix indicator.

Page 24: EPCC News 78

24The newsletter of EPCC, the supercomputing centre at the University of Edinburgh

EPCC is one of Europe’s leading supercomputing centres and operates ARCHER, a 118,080-processor Cray XC30 system.

ARCHER is the UK academic High Performance Computer System.

These programmes equip participants with the multidisciplinary skills and knowledge to lead the way in the fields of High Performance Computing and Data Science.

Through our strong links with industry, we also offer our students the opportunity to undertake their Master’s dissertation with one of a wide range of local companies.

The University of Edinburgh is consistently ranked among the top 50 universities in the world*.

*Times Higher World University Ranking

Postgraduate Master’s Degrees in High Performance Computing

and in HPC with Data Science

www.epcc.ed.ac.uk/msc

These MSc programmes are offered by EPCC, an institute at the University of Edinburgh.