The impact of supercomputers on MSR

Preview:

Citation preview

The impact of supercomputers on MSR

Y. Kamei C. Huang A. Osaka N. Ubayashi

MSR Next Generation 2014@HKUST

Who am I?

❖ Yasutaka Kamei http://posl.ait.kyushu-u.ac.jp/~kamei/

❖ My research interests are

2

Summer Winter

Understanding

OSS Collaboration Improving

Software Quality Scaling up

MSR Analysis

Today...

❖ Derive messages from HPC community to MSR community. •  Make use of High Performance Computing

(HPC) in MSR.

HPC MSR 3

2014: A Space Odyssey

❖ MSR researchers will explore treasure in

the Universe anytime soon.

4

2004 2014

2014: A Space Odyssey

❖ MSR researchers will explore treasure in

the Universe anytime soon.

5

2004 2014

Diversity in software engineering

research @ FSE 2013

20,028 projects as the Universe

2014: A Space Odyssey

❖ MSR researchers will explore treasure in

the Universe anytime soon.

6

2004 2014

Diversity in software engineering

research @ FSE 2013

20,028 projects as the Universe Challenges in Mining Whole Software Universe

One solution is

❖ Supercomputer

❖ In the case of FX10, •  CPU: 16 cores •  Memory: 32 GByte

7

× 4,800 nodes

However…

❖ The adoption rate for HPC is still low.

8

Domain-Specific techniques for using HPC? Only Fortran

and C?

My tool is imple-mented by

Prof. Chiba says

❖ Via collaboration of CREST project,

9

We can use Java, Ruby and Python on FX10!

Case Study

❖ Evaluate the impact that HPC can have on MSR analyses.

❖ Apply HPC (FX10) to Code Clone Detection.

10

Code Clone

❖ A code fragment that has identical or similar code fragments

11

copy%and%paste� copy%and%paste�

code%clone�

clone%fragment�

clone%fragment�

clone%fragment�

Hotta et al. CSMR 2012

Type-3 Clones

❖ Programmers often make some changes to code fragments after copy-and-paste.

12

Zhang et al. ICSM 2012

final  public  void  daload()  {   countLabels  =  0;   try  {        position++;        bCodeStream[i++]  =    OPC_daload;      }  catch  (Exception  e)  {        resizeByteArray(OPC_daload);  }  }  

Type-3 Clones

❖ Programmers often make some changes to code fragments after copy-and-paste.

13

Zhang et al. ICSM 2012

final  public  void  daload()  {   countLabels  =  0;   try  {        position++;        bCodeStream[i++]  =    OPC_daload;      }  catch  (Exception  e)  {        resizeByteArray(OPC_daload);  }  }  

final  public  void  daload()  {    countLabels  =  0;  try  {        position++;        bCodeStream[i++]  =    OPC_daload;      }  catch  (Exception  e)  {        resizeByteArray(OPC_daload);  }  }  

copy-and-paste

Type-3 Clones

❖ Programmers often make some changes to code fragments after copy-and-paste.

14

Zhang et al. ICSM 2012

final  public  void  daload()  {   countLabels  =  0;   try  {        position++;        bCodeStream[i++]  =    OPC_daload;      }  catch  (Exception  e)  {        resizeByteArray(OPC_daload);  }  }  

final  public  void  daload()  {    countLabels  =  0;  

try  {        position++;        bCodeStream[i++]  =    OPC_daload;      }  catch  (Exception  e)  {        resizeByteArray(OPC_daload);  }  }  

copy-and-paste

stackDepth  +=  2;  if  (stackDepth  >  stackMax)      stackMax  =  stackDepth;

gap

added code fragment

Type-3 clones

Our collaborator

❖ Dr. Keisuke Hotta •  Postdoc •  Osaka University, Japan

•  Visiting Researcher •  Bremen University, Germany

❖ Help our group to use Scorpio (jar file), which is a PDG-based Type-3 clone detection tool.

15

❖ Environment

❖ Dataset •  Apache CXF •  LOC: 830K

•  SIZE: 150MB 16

CPU Memory [GB] per node

Cores × Nodes

Desktop 1 Intel® Core™ i7 16 12×1 Desktop 2 Xeon E5-2630 v2 144 12×1 FX10 SPARC64™ IXfx 32 16×190

Case Study Setting

17

127h28m42s

2h15m

16m58s

Desktop 1 Desktop 2 FX10

FX10 is much faster! Time

How to run Scorpio in FX10

❖ Describe only 20-30 lines of (bash) code to run Scorpio in FX10.

18

#!/bin/bash #PJM ‒L “rscgrp=debug” #PJM ‒L “node=190” #PJM ‒L “elapse=30:00” #PJB ‒j #PJM ‒S module load Java

…⋯

java scorpio.jar

How many nodes do we use?

How long do we use FX10?

What are output options?

Current our challenges

19

Apache CXF 6,000 files

Apache All Projects

770,000 files

UCI Dataset 390,000,000

files

Done Doing ToDo

20 14

127h28m42s

2h15m

16m58s

Desktop 1 Desktop 2 FX10

FX10 is much faster!

Time

Case Study ❖ Evaluate the impact that HPC can have on MSR analyses.

❖ Apply HPC (FX10) to Code Clone Detection.

7

Today...

❖ Derive messages from HPC community to MSR community. •  Make use of High Performance Computing

(HPC) in MSR.

HPC MSR 2

2014: A Space Odyssey

❖ MSR researchers will explore treasure in

the Universe anytime soon.

3

2004 2014

Diversity in software engineering research @ FSE 2013

20,028 projects as the Universe Challenges in Mining Whole Software Universe

Recommended