UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August...

UW-MSR Workshop:Accelerating the Pace of Software Tools Research: Sharing InfrastructureAugust 2001

Software Engineering Tools Research on Only $10 a DayWilliam GriswoldUniversity of California, San Diego

Goals1. How program analysis is used in software

engineering, and how that impacts research

2. Issues for tool implementation, infrastructure

3. Infrastructure approaches and example uses

– My lab’s experiences, interviews with 5 others

– Not every infrastructure out there, or IDE infras

4. Lessons learned

5. Challenges and opportunities

Base Assumptions• Software engineering is about coping with the

complexity of software and its development– Scale, scope, arbitrariness of real world

• Evaluation of SE tools is best done in settings that manifest these complexities– Experiment X involves a tool user with a need

– Hard to bend real settings to your tool

• Mature infrastructure can put more issues within reach at lower cost– Complete & scalable tools, suitable for more

settings

Background

Role of Program Analysis in SE

• Behavioral: find/prevent bugs; find invariants

– PREfix, Purify, HotPath, JInsight, DejaVu, TestTube

• Structural: find design anomalies, architecture

– Lackwit, Womble, RM, Seesoft, RIGI, slicers

• Evolutionary: enhance, subset, restructure

– Restructure, StarTool, WolfPack

Discover hidden or dispersed program properties, display them in a natural form,

and assist in their change

Analysis Methods• Dynamic

– Trace analysis

– Testing

• Static– Lexical (e.g., grep, diff)

– Syntactic

– Data-flow analysis, abstract interpretation

– Constraint equation solving

– Model checking; theorem proving

Issues are remarkably similar across methods

Use in Iterative Analysis Cycle1. Programmer identifies problem or task

Is this horrific hack the cause of my bug?

2. Choose program source model and analysisI think I’ll do a slice with Sprite. [data-flow analysis]

3. Extract (and analyze) model[Programmer feeds code to slicer, chooses variable

reference in code that has wrong value]

4. Render model (and analysis)[Tool highlights reached source text]

5. Reason about results, plan course of actionNope, that hack didn’t get highlighted…

Steps 2-4 may be done manually, with ad hoc automation, an interactive tool, or a batch tool.

User-tool “interface” is rich and dynamic.

Interactive, Graphical, Integrated

The Perfect Tool User““Your tool will Your tool will solve all sorts of problemssolve all sorts of problems. . But it’ll have to analyze my But it’ll have to analyze my entire 1 MLOC entire 1 MLOC programprogram, which , which doesn’t compiledoesn’t compile right now, right now, and is written in and is written in 4 languages4 languages. I want the . I want the results as results as fastfast as compilation, with an as compilation, with an intuitive intuitive graphicalgraphical display display linked back to linked back to the source the source andand integrated into our IDE integrated into our IDE. I . I want to want to save the resultssave the results, and have them , and have them automatically automatically updatedupdated as I change the as I change the program. Oh, I use program. Oh, I use WindowsWindows, but some of , but some of my colleagues use my colleagues use UnixUnix. It’s . It’s OK if the tool OK if the tool misses stuff or returns lots of datamisses stuff or returns lots of data, we can , we can post-process. We just want a net win.”post-process. We just want a net win.”

For our most recent tool, the first study involved a 500 KLOC Fortran/C app developed on SGI’s

Unique Infrastructure Challenges• Wide-spectrum needs (e.g., GUI)

– Provide function and/or outstanding interoperability

• Whole-program analysis versus interactivity– Demand, precompute, reuse [Harrold], modularize

• Source-to-source analysis and transformation– Analyze, present, modify as programmer sees it

• Ill-defined task space and process structure

Saving grace is programmer: intelligent, adaptive– Can interpret, interpolate, iterate; adjust process– Requires tool (and hence infrastructure) support

Infrastructure Spectrum• Monolithic environment

– Generative environment (Gandalf, Synthesizer Generator), programming language (Refine)

– Reuse model: high-level language

• Generator (compiler) or interpreter

• Component-based– Frameworks, toolkits (Ponder), IDE plug-in

support

– Reuse model: interface

• Piecewise replacement and innovation

• Subclassing (augmentation, specialization)

Monolithic Environments• Refine: syntactic analysis & trans env

[Reasoning]

– Powerful C-like functional language w/ lazy eval.

– AST datatype w/grammar and pattern language

– Aggregate ADT’s, GUI, persistence, C/Cobol targets

– Wolfpack C function splitter took 11 KLOC (1/2 reps, 5% LISP), no pointer analysis; slow [Lakhotia]

• CodeSurfer: C program slicing tool [GrammaTech]

– Rich GUI, PDG in repository, Scheme “back door”

– ~500 LOC to prototype globals model [Lakhotia]

– Not really meant for extension, code transformation

Great for prototyping and one-shot tasks

Components Overview1. Standalone components

– Idea: “Ad hoc” composition, lots of choices– Example Component: EDG front-ends– Example Tools: static: Alloy [Jackson]

dynamic: Daikon [Ernst]

2. Component architectures– Idea: Components must conform to design rules– Examples: data arch:Aristotle [Harrold]

control arch:Icaria [Atkinson]

3. Analyses (tools) as components– Idea: Infrastructure-independent tool design– Example: StarTool [Hayes]

Standalone Components• Component generators

– Yacc, lex, JavaCC, Jlex, JJTree, ANTLR …– Little help for scoping, type checking (symbol

tables)

• Representation packages for various languages– Icaria (C AST), GNAT (Ada), EDG (*), …

• GUI systems galore, mostly generic– WFC, Visual Basic, Tcl/Tk, Swing; dot, vcg

• Databases and persistence frameworks

• Few OTS analyses available– Model checkers and SAT constraint solvers

Edison Design Group Front-Ends• Front-ends for C/C++, Fortran, Java (new)

– Lexing, parsing, elaborated AST, generates C

• Thorough static error checking– Know what you get, but not robust to errors

• API’s best for translation to IR– Simple things can be hard; white-box reuse

• Precise textual mappings– C/C++ AST is post-processed, but columns

correct

• C++ front-end can’t handle some features

Alloy Tool [Jackson]

• Property checker for Alloy OO spec language– Takes spec and property, finds counterexamples

– Uses SAT constraint solvers for analysis back-end

– Spec language designed explicitly for analyzability

• Front-end– Wrote own lexer (JLex), parser (CUP), AST

– Eased because of analyzability

• Translation to SAT formula “IR”– Aggregate is mapped to collection of scalars

– Several stages of formula rewriting

Ad Hoc Component Example, Static Analysis

Alloy, cont’d• Uses 3 SAT solvers, each with strengths

– National challenge resulted in standard SAT “IR”

– Allowed declarative format for hooking in a solver

• Java Swing for general GUI, dot for graphs– Scalars are mapped back to aggregates, etc., and

results are reported as counterexamples

– Currently don’t map results directly back to program• Expects to use variables as a way to map to source

• About 20 KLOC of new code to build Alloy

Alloy: Lessons• Designing for analyzability a major benefit

– Eases all aspects of front-end and translation to SAT

– Adding 3 kinds of polymorphism added 20 KLOC!

• SAT solver National Challenge a boon– Several good solver components– Standard IR eased integration

• SAT solver start/stop protocol the hardest– Primitive form of computational steering– Subprocess control, capturing/interpreting

output

Daikon Tool [Ernst]

• Program invariant detector for C and Java– Instruments program at proc entries/exits, runs it– Infers variable value patterns at program points

• Programs with test-suites have been invaluable– Class programs with grading suites– Siemens/Rothermel C programs with test-suites

• Front-end the least interesting, 1/2 the work– Parser, symbol table, AST/IR manipulation,

unparser• Get any two: manipulation toys with symbol table• Symbol table the hardest, unparser the easiest

– Lots of choices, a few false starts

Ad Hoc Component Example, Dynamic Analysis

Daikon: Choosing Java Front-End• Byte-code instrumenters (JOIE, Bobby)

– Flexible and precise insertion points– Loss of names complicates mapping to source– Byte codes generated are compiler dependent– Debugging voluminous instrumentation is hard

• Source-level instrumentation– Java lacks “insertability”, e.g., no comma

operation– Invalidates symbol table, etc.– Chose Jikes, an open source compiler (got 2 of

4)• Added AST manipulation good enough to unparse

• New byte-code instrumenters; EDG for Java

Ad Hoc Components: Critique• Freedom is great, but integration is weak

– Data bloat: replicated and unused functionality– Minimal support for mapping between reps

• Data: implementation of precise mappings• Control: synchronize to compute only what’s

needed

• Scalability a huge issue; data-flow information for a 1 MLOC program, highly optimized:

500 MB AST500 MB BB/CFG500 MB Bit-vectors

Component-based architecture to the rescue

Space translates to time by stressing memory hierarchy

Aristotle Infrastructure [Harrold]• Data-flow analysis and testing infra for C

• Database is universal integration mechanism– Provides uniform, loose integration

• Separately compiled tools can write and read DB– Added ProLangs framework [Ryder] at modest

• Scalability benefits– Big file system overcomes space problem– Persistence mitigates time problem

• Performance still an issue, hasn’t been focus– Loose control integration produces reps in toto– DB implemented with flat files

Data-based Component Architecture

Icaria Infrastructure [Atkinson]

• Scalable data-flow (and syntactic) infra for C– Hypothesis: need optimized components, control

integration, and user control for good performance

• Space- and time-tuned data structures– AST, BB’s, CFG; bit-vectors semi-sparse &

factored– Memory allocation pools, free “block”– Steensgaard pointer analysis

• Also piggybacked with CFG build pass for locality

• Event-based demand-driven architecture– Compute all on demand; even discard/recompute– Persistently store “undemandable” information

Control-based Component Architecture

Event-based Demand Architecture

Icaria: User Control• Declarative precision management

– Context sensitivity (call stack modelling)– Pointer analysis (e.g., distinguish struct fields)

• Iteration strategies– With tuned bit-vector stealing and reclamation

• Declarative programmer input– ANSI/non-ANSI typing, memory allocators, …– Adds precision, sometimes speed-up

• Termination control– Suspend/resume buttons, procedural hook– Because analysis is a means to an end (a task)

Icaria: The Price of Performance• Must conform to architectural rules to get

performance benefits– E.g., can’t demand/discard/redemand your

AST unless it meets architecture’s protocol• May cascade into a lot of front-end work

– Can buy in modularly, incrementally• “Demand” in batch

• Don’t discard

• Reconsider demand strategy for new analysis– I.e., when to discard, what to save persistently

Icaria: Scenario – Java Retarget• Use existing AST or derive off of Ponder’s

• Rethink pointer analysis– Calls through function pointers mean bad CG– Intersect (filter) Steensgaard with language

types? • Modular; variant works for C

• Rethink 3-address code and call-graph– Small methods (many, deep calling contexts)– “Allocation contexts” instead of calling

contexts?• Context sensitivity module would support

• Existing analyses not likely reusable OTS

Icaria: Applications• Icaria supports Cawk, Sprite slicer, StarTool

– Cawk generated by Ponder syntactic infra [Atkinson]

– Slicer is 6 KLOC: 50% GUI, 20% equations• Discard AST, CFG• Persistently store backwards call-graph

• Scalability– Simple Cawk scripts run at 500 KLOC/minute

– Sliced gcc (200 KLOC) on 200MHz/200MB UltraSparc• 1 hour --> 1/2 minute by tuning function pointers• Dependent on program and slice• Other parameters less dramatic

Designing for Reusable Analyses• Approaches assume that tool is coded

“within” infrastructure– Complicates migration to a new infrastructure

• Genoa [Devanbu] and sharlit [Tjiang] are “monolithic” language/generator solutions

• How design a reusable “analysis component”?– A client of infrastructure, so incomplete

• Addressed for StarTool reengineering tool– Only front-end infra and target lang., not

Tcl/tk GUI

Analysis Components

StarTool: Main View• “Referenced-by” relation for entity in clustered hierarchy

• Views are navigable, customizable, and annotable

StarTool: Adapter Approach [Hayes]

More responsibility in Star relieves all future adapters

Adapter

• What adapter interface allows best retargets?

• Interpose an adapter [GHJV] to increase separation of analysis and infra

• Low-level: a few small, simple operations– E.g., generic tree traversal ops

Did 3 retargets, including to GNAT Ada AST [Dewar]

StarTool: Lessons Learned• Retargets range from 500 to 2000 LOC

– Precise mappings to source, language complexity

• Best interface assumes nothing about infra– In extreme, don’t assume there’s an AST at all

– Means providing operations that make StarTool’s implementation easy (despite that there’s just one)• E.g., iterator for “all references similar to this ”• Metaquery operations resolve feature specifics

– Gives adapter lots of design room, can choose best– More, bigger ops; mitigated by template class [GHJV]

– Got multi-language tool using 2 levels of adapters

Observations• Infrastructures for prototyping or scalability

– 1000 LOC effort won’t scale-up, yet

– Absolute effort is lessening, scale increasing

– Boring stuff is still 1/2+ effort

• Trend towards components– Span of requirements, performance, IDE

integration

– Many components are programmable, however

• Interactive whole-program analysis stresses modularity (reuse) of infrastructure – Much reuse is white-box

Conclusion!

Observations, cont’d• Retargeting is expensive, defies infrastructure

– Symbol table (scoping, typing), and base analyses– Language proliferation & evolution continue,

slowly– Tool retargets lag language definition, maybe a lot

• Bigger components are better [Sullivan]– Many small components complicate integration– Mitigates symbol-table issue– Reuse still hard, sometimes white-box

• Language analyzability has big impact– Front-end, mappings, precise and fast analysis– Designers need to consider consequences

Open Issues• Effective infrastructures for “deep” analysis

– In principle not hard– In practice, performance/precision tradeoffs can

require significant rewrites for “small” change

• Out of private toolbox, beyond white-box reuse– Fragile modularity, complexity, documentation

• Robustness– Useful for incomplete or evolving systems– Complicates the analysis, results harder to

interpret

• Modification: beyond instrumentation & translation

Emerging Challenges• Integration into IDE’s

– GUI dependence, native AST; reuse across IDE’s

• What is a program? What is the program?– Multi-language programs

– Federated applications, client-server apps

– Trend is towards writing component glue• Less source code (maybe), but huge apps• How treat vast, numerous packages? Sans source?• Current tools provide/require stub code

• Multi-threading is entering the main stream

Opportunities• Faster computers, better OS’s and compilers

– Basic Dell’s can take two processors, and it works

• Compatibility packages: Cygwin, VMware, Exceed

• Emergence of Java, etc., for tool construction– Better type systems, garbage collection– API model, persistence, GUI, multi-threading– (Maybe better analyzability, too)

• Infrastructure– Modular analyses [Ryder], incremental update– Visualization toolkits (e.g., SGI’s MineSet)

• Open source: share, improve; benchmarks

URL’sRefine www.reasoning.com

CodeSurfer www.grammatech.com

EDG www.edg.com

Alloy sdg.lcs.mit.edu/alloy

Daikon cs.washington.edu/homes/mernst/daikon

Aristotle www.cc.gatech.edu/aristotle

ProLangs www.prolangs.rutgers.edu

Icaria, etc. www.cs.ucsd.edu/~wgg/Software

Thanks!Michael Ernst: Dynamic analysis

Daniel Jackson: Alloy

Mik Kersten: IDE integration

Mary Jean Harrold: Aristotle

Arun Lakhotia: Refine and CodeSurfer

Nicholas Mitchell: Compiler infras, EDG

John Stasko: Visualization

Michelle Strout: Compiler infrastructures

Kevin Sullivan: Mediators and components

UW-MSR Workshop: Accelerating the Pace of Software Tools Research: Sharing Infrastructure August...

Documents

CS130 – Software Tools

HPE FlexNetwork MSR Router Seriesh20628. · HPE FlexNetwork MSR Router Series Comware 7 Security Configuration Guide Part number: 5200-3024 Software version: MSR-CMW710-R0413 Document

HP A-MSR Router Series - Hewlett Packard Enterpriseh17007. A-MSR Router Series Interface Module Guide Part number: 5998-2053 Software version: CMW520-R2207P02 Document version: …

Publisher Software Tools

HPE FlexNetwork MSR Router Series - Hewlett Packardh20628. · HPE FlexNetwork MSR Router Series Comware 7 MPLS Configuration Guide Part number: 5998-8694a Software version: CMW710-R0305

Software Testing Methodologies & Tools - RGMCET …€¦ · 4. Software testing Tools – Dr.K.V.K.K.Prasad, Dreamtech. Software Testing Methodologies & Tools Dept. Of CSE Page 2

MSR/SBSE Tools and Infrastructures - Diomidis …MSR/SBSE Tools and Infrastructures By Diomidis Spinellis (rapporteur), Tse-Hsun (Peter) Chen, Yasutaka Kamei, Masanari Kondo, Neil

Computer Aided Software Engineering Tools (CASE) · Computer Aided Software Engineering Tools (CASE) ... allows control over project ... Computer Aided Software Engineering Tools

HP MSR Router Seriesh20628. · HP MSR Router Series Network Management and Monitoring Configuration Guide(V7) Part number: 5998-7724b Software version: CMW710-R0304 Document version:

H3C MSR 900 Routers - Hewlett Packard Enterprise · The H3C MSR 900 Routers Installation Guide describes how to install the H3C MSR 900 Routers, maintain software and hardware of

Software Tools Design

HPE FlexNetwork MSR Router Series - Apache …h20628. FlexNetwork MSR Router Series Comware 7 Security Configuration Guide Part number: 5998-6958 Software version: CMW710-R0403L02

Software Management Tools

Software Tools

Nios II Software Build Tools Reference, Nios II Software ... 2 Chapter 15: Nios II Software Build Tools Reference Nios II Software Build Tools Utilities Nios II Software Developer’s

MSR 175 Shock Transportation Data Logger: Protect your goods! · 2019. 3. 11. · MSR Dashboard, MSR ReportGenerator and MSR ShockViewer. The MSR Dashboard allows you to configure

Software Analysis Tools

MSR 145 - Condair · • Read the operating instructions carefully before using the MSR 145 or the MSR software . This will protect you personally and avoid damage to the unit . •

· 2020. 10. 5. · 04 05 msr-i msr- 1 s 0.6 0.5 0.4 0.3 msr- msr- msr- msr- msr 1b 1 noi -2205 exi rcl/4 rtl/4 k 100 l/min(nor.) ) mpq 0.2 0.1 *22-14 120 msr-ib.msr- 1.2 1.0 0.8

MSR-Documentation MSR-Report€¦ · · 2015-02-27MSR-Documentation MSR-Report Concepts of MSRREP.DTD MSR-MEDOC Arbeitsgruppe DTD, Roman Reimer, STZ XI-Works MSR-Report msrrep-sp