View
217
Download
0
Embed Size (px)
Citation preview
2
Outline
Legacy systems Reverse architecting Architecture exploration
Extraction Abstraction Presentation
Evaluation
3
Motivation
Multi-channel distribution Web enable existing applications
Due dilligence / QA Company merger
Helping software immigrants
Estimating new functionality
Documentationat best
out of date
4
Legacy Systems
Definition: Any information
system that significantly resists evolution
to meet new and changing business requirements
Characteristics Large Geriatric Outdated
languages Outdated
databases Isolated
5
Software Volume
Capers Jones software size estimate: 700,000,000,000 lines of code (7 * 109 function points ) (1 fp ~ 110 lines of code)
Total nr of programmers: 10,000,000 40% new dev. 45% enhancements, 15%
repair (2020: 30%, 55%, 15%)
7
Reverse Architecting: Motivation
Architecture description lost or outdated
Obtain advantages of expl. arch.: Stakeholder communication Explicit design decisions Transferable abstraction
Architecture conformance checking Quality attribute analysis
8
Software Architecture
Structure(s) of a system which comprise the software components the externally visible properties of
those systems and the relationships among them
9
Architectural Structures
Module structure Data model structure Process structure Call structure Type structure GUI flow ...
10
Processview
Physicalview
Developmentview
Logicalview
Use caseview
The 4 + 1 View Model
Extract & compare!
11
Reverse Engineering
The process of analyzing a subject system with two goals in mind: to identify the system's components
and their interrelationships; and, to create representations of the system
in another form or at a higher level of abstraction.
DecompilationReverse Architecting
12
Reengineering
The examination and alteration of a subject system
to reconstitute it in a new form and the subsequent implementation
of that new form
Beyond analysis -- actually improve.
14
Program Understanding
the task of building mental models of an underlying software system
at various abstraction levels, ranging from models of the code itself to ones of the underlying application domain,
for software maintenance, evolution, and reengineering purposes 50% of
maintenanceeffort!!
15
Cognitive Processes
Building a mental model Top down / bottom up / opportunistic Generate and validate hypotheses Chunking: create higher structures
from chunks of low-level information Cross referencing: understand
relationships
16
Supporting Program Understanding
Architects build up mental models: various abstractions of software system hierarchies for varying levels of detail graph-like structures for dependencies
How can we support this process? infer number of predefined abstractions enrich system’s source code with
abstractions let architect explore result
17
Architecture Exploration
Lesson from compiler construction:split processing in separate stages
parsing turns source code into intermediate form
optimisation improves intermediate form code generation emits the machine code
Goal: Translate source code into form that can easily be processed by humans
Similarity with compilers: translate source code into form that can
be processed by machines
18
Architecture Exploration
Extract src models from system artifacts Query/manipulate to infer new knowledge Present different views on results
extract resultsrepository view
query
artifacts
20
Source Model Extraction
Derive information from system artifacts variable usage, call graphs, file
dependencies, database access, …
Challenges Accurate & complete results Flexible: easy to write and adapt Robust: deal with irregularities in input
21
Grammar Challenges
Syntax Errors Language Dialects Local Idioms
Missing Parts Embedded Languages Preprocessing
• Additional problem: grammar availability– process languages without grammar
(e.g. undisclosed proprietary languages)– development of full grammar is expensive
(Cobol: 1500 productions, 4-5 months)
22
Processing Artifacts
Syntactical analysis generate / hand-code / reuse parser
Lexical analysis tools like perl, grep, Awk or LSME, MultiLex generally easier to develop
accurate complete flexible robust
syntactical + + – –lexical – – + +
23
Island Grammars
Grammar containing: detailed productions for constructs of interest liberal productions that catch remainder
Islands:accuracy & completeness
Water:robustness
24
Island Grammars
Grammar containing: detailed productions for constructs of interest liberal productions that catch remainder
Input
Parse tree “standard” grammar
Parse tree island grammar
25
Accept larger language: catch dialects, syntax errors, embedded languages, …
Lisland
Island Grammars
Grammar containing: detailed productions for constructs of interest liberal productions that catch remainder
L
26
GL
Gi
GL
Gi’
Island Grammars
Grammar containing: detailed productions for constructs of interest liberal productions that catch remainder
Often smaller grammar can share productions can have different structure
27
lexical syntax~[] Water {avoid}
context-free syntaxWater PartPart* Input
Example (Water)
Water is “fall-back”
28
lexical syntax~[] Water {avoid}[A-Z][A-Z0-9]* Id
context-free syntaxWater PartPart* Input“CALL” Id CallCall Part
Example (Program Calls)
Water is “fall-back”
30
Query and Manipulate
Goals: infer new knowledge & abstractions filter information
Example structures: Perform graph Call graph (OI, PVL) Screen flow Batch job Subsystem dbs
In search formore abstraction
31
Combining Data & Functionality
Cluster analysis technique for finding groups in data Relies on metrics to compare distance
between data items Concept analysis
for finding groups too Relies on maximal subsets of data items
sharing a set of features
32
Cluster Analysis
Calculate distance (similarity) number between all data items (record fields)
Use clustering to find hierarchyField Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
33
DendrogramField Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
0 1
NameTitleInitialPrefix
34
Dendrogram0 1
NameTitleInitialPrefix
NumberNb-ExtZipcode
Field Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
35
Dendrogram0 1
NameTitleInitialPrefix
NumberNb-ExtZipcode
Distance is 1
Field Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
36
Dendrogram0 1
NameTitleInitialPrefix
NumberNb-ExtZipcode
CityDistance is 1
Field Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
37
Dendrogram0 1
NameTitleInitialPrefix
NumberNb-ExtZipcode
City
Street
Field Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
38
Dendrogram0 1
NameTitleInitialPrefix
NumberNb-ExtZipcode
City
Street
Field Name P1 P2 P3 P4NAME 1 0 0 0TITLE 1 0 0 0INITIAL 1 0 0 0PREFIX 1 0 0 0NUMBER 0 0 0 1NUMBER-EXT 0 0 0 1ZIPCODE 0 0 0 1STREET 0 0 1 1CITY 0 1 0 1
40
Dendrogram from Real Data0 1 2
AmountAccountOfficeName
BankCityIntAccountOfficeType
PaymentKindRelationNr
ChangeDate
TitleCdPrefixInitial
ZipCdCountyCd
StreetNr
MortSeqNrMortNr
CityStreet
Name
41
Concept Analysis
Relies on maximal subsets of data items sharing a set of features
Concept analysis finds a latticeField Name P1 P2 P3 P4NAME xTITLE xINITIAL xPREFIX xNUMBER xNUMBER-EXT xZIPCODE xSTREET x xCITY x x
42
Concept LatticeField Name P1 P2 P3 P4NAME xTITLE xINITIAL xPREFIX xNUMBER xNUMBER-EXT xZIPCODE xSTREET x xCITY x x
All Variablestop
bottomP1 P2 P3 P4
Set of features
Set of items(field names)
43
Concept Lattice
top
P1
Name TitleInitial Prefix
P4
Number Nb-ExtZipcode Street City
P1 P2 P3 P4
bottom
All Variables
Field Name P1 P2 P3 P4NAME xTITLE xINITIAL xPREFIX xNUMBER xNUMBER-EXT xZIPCODE xSTREET x xCITY x x
44
Concept Lattice
top
P1
Name TitleInitial Prefix
P4
P1 P2 P3 P4
P3 P4
Street
P2 P4
City
Number Nb-ExtZipcode Street City
All Variables
bottom
Field Name P1 P2 P3 P4NAME xTITLE xINITIAL xPREFIX xNUMBER xNUMBER-EXT xZIPCODE xSTREET x xCITY x x
45
Concept Lattice
top
P1
Name TitleInitial Prefix
P4
P1 P2 P3 P4
P3 P4
Street
P2 P4
City
All Variables
Number Nb-ExtZipcode Street City
bottom
47
System Views
Grouping method based on feature table
Metrics or subset based Find alternative system views:
Kruchten’s logical view Object-based view on procedural code Starting point for “objectification”
Keep “human in the loop”
48
Types
A type describes a set of possible values
A type groups variables A type encapsulates representation Parameter types provide interfaces Types provide component connectors
Types are architectural structures
49
But types are already available...
Not in a legacy language like Cobol: Data division declares variables +
structure No separation between type/variable. Repeated structure per variable. No enumeration types, no ranges. No parameters for sections
Similar problems with other legacy languages
50
Automatic Type Inference
Group variables based on usage Initially:
Each variable unique primitive type
From statements infer equivalencies: Assignment v := ev := e Comparison e1 > e2e1 > e2 Computation e1 + e2e1 + e2
DATA DIVISION.01 PERSON. 03 INITIALS PIC X(05). 03 NAME PIC X(27). 03 STREET PIC X(18).01 TAB000 03 A00-NAME-PART. 05 A00-POS PIC X(01) OCCURS 40. 03 A00-MAX PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.01 N000. 03 N100 PIC S9(03) COMP-3 VALUE 0. ...PROCEDURE DIVISION. R210-INITIAL SECTION. MOVE INITIALS TO A00-NAME-PART. PERFORM R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION. ... PERFORM UNTIL N100 > A00-MAX ... IF A00-FILLED = N100 ...
Example
DATA DIVISION.01 PERSON. 03 INITIALS PIC X(05). 03 NAME PIC X(27). 03 STREET PIC X(18).01 TAB000 03 A00-NAME-PART. 05 A00-POS PIC X(01) OCCURS 40. 03 A00-MAX PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.01 N000. 03 N100 PIC S9(03) COMP-3 VALUE 0. ...PROCEDURE DIVISION. R210-INITIAL SECTION. MOVE INITIALS TO A00-NAME-PART. PERFORM R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION. ... PERFORM UNTIL N100 > A00-MAX ... IF A00-FILLED = N100 ...
N100, A00-MAX and A00-FILLED are equivalent
Example
DATA DIVISION.01 PERSON. 03 INITIALS PIC X(05). 03 NAME PIC X(27). 03 STREET PIC X(18).01 TAB000 03 A00-NAME-PART. 05 A00-POS PIC X(01) OCCURS 40. 03 A00-MAX PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.01 N000. 03 N100 PIC S9(03) COMP-3 VALUE 0. ...PROCEDURE DIVISION. R210-INITIAL SECTION. MOVE INITIALS TO A00-NAME-PART. PERFORM R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION. ... PERFORM UNTIL N100 > A00-MAX ... IF A00-FILLED = N100 ...
Example
DATA DIVISION.01 PERSON. 03 INITIALS PIC X(05). 03 NAME PIC X(27). 03 STREET PIC X(18).01 TAB000 03 A00-NAME-PART. 05 A00-POS PIC X(01) OCCURS 40. 03 A00-MAX PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.01 N000. 03 N100 PIC S9(03) COMP-3 VALUE 0. ...PROCEDURE DIVISION. R210-INITIAL SECTION. MOVE INITIALS TO A00-NAME-PART. PERFORM R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION. ... PERFORM UNTIL N100 > A00-MAX ... IF A00-FILLED = N100 ...
INITIALSsubtype of A00-NAME-PART
Example
DATA DIVISION.01 PERSON. 03 INITIALS PIC X(05). 03 NAME PIC X(27). 03 STREET PIC X(18).01 TAB000 03 A00-NAME-PART. 05 A00-POS PIC X(01) OCCURS 40. 03 A00-MAX PIC S9(03) COMP-3 VALUE 40. 03 A00-FILLED PIC S9(03) COMP-3 VALUE 0.01 N000. 03 N100 PIC S9(03) COMP-3 VALUE 0. ...PROCEDURE DIVISION. R210-INITIAL SECTION. MOVE INITIALS TO A00-NAME-PART. PERFORM R300-COMPOSE-NAME. R300-COMPOSE-NAME SECTION. ... PERFORM UNTIL N100 > A00-MAX ... IF A00-FILLED = N100 ...
Example
56
System Level Types
Propagate types across modules Calls Database operations File I/O Include files / copybooks
Lift type dependencies to package level
57
Type Inference Case Study (I)
100,000 lines Cobol / CICS system First param of all batch progs:
program-fields info required for restart and error recovery literals in subroutine field: all progs
First param of all on line progs: dfhcommarea mapped to appropriate record --> type
58
Type Inference Case Study (II)
Programs with integer parameter Used as enumeration type Value represents function to be performed Program as package
Parameter links Formal parameters of same type RA31.6 = RA36.4
Relations between copybooks
60
Presentation Desiderata
Show multiple structures Show relationships between structures Multiple levels of abstraction
Zoom in, zoom out Visual as well as textual information
Graph visualization Browsing and searching
61
Presenting ArchitecturesUsing Hypertext
Hyperlinked pages for system elements
Multiple structures, multiple views Backbone: system hierarchy, sources Abstractions become additional
navigation structures Text & clickable graphs
62
Types of navigation
Vertical browsing supported by hierarchical structures zoom into more detailed level
system subsystem program … source
Horizontal browsing supported by graph-like structures find related on same abstraction level
called programs, variables of same type, etc
63
Presentation Challenges
Handling abstractions not visible in code Giving abstractions a meaningful name
e.g., name for inferred type Defining starting points for browsing
lists of types, programs, copybooks, words, lits
add cross-cutting hyperlinks on all levels
64
Advanced Documentation Generation
DocGen Provide technical documentation Used for all ABN AMRO Cobol sources Customizable product line
TypeExplorer Include inferred types as navigation
structure Advance level of abstraction
65
Tool Sets
Rigi (Victoria) Bauhaus
(Stuttgart) Dali (SEI) Portable Bookshelf
(Toronto) DocGen
(Amsterdam)
Extract Query Abstract Present Visualize Browse Search
66
SWARM / WCRE 2001
The UML Rationale recovery Pattern-oriented software architecture Architecture description languages Dynamic analysis Software product lines Software architecture “user’s guide”
67
Summary
Extract, abstract, present Multiple structures Zoom in/out, switch abstraction levels Browse / hypertext Compiler construction technology Active area of research Experiment in your projects
68
Further Reading (I)
A. van Deursen and T. Kuipers. Identifying Objects using Cluster &Concept Analysis. ICSE’99
A. van Deursen and T. Kuipers. Building Documentation Generators. ICSM’99.
A. van Deursen and L. Moonen. Exploring Legacy Systems Using Types. WCRE’00.
A. van Deursen. Software Architecture Recovery and Modeling. WCRE’2001 workshop report. Applied Computing Review, ACM, 2002.