60
Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Component Search

and Retrieval

Advanced Reuse SeminarsEduardo Cruz

Page 2: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Information Retrieval - 1948

Structured Documents Unstructured Documents

No software documentation standard

Semi-Structured Documents

Calvin Northrup Mooers

Page 3: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Mooers' Law: “An information retrieval systemwill tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it,” 1959

Calvin Northrup Mooers

Page 4: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Mass Production Software components

[Mcllroy, 1968]

Page 5: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

“software industry is weakly founded, and that one aspect of this weakness

is the absence of a software components subindustry”

[McIlroy, 1968]

Page 6: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

“The storage and retrieval of software assets is nothing but a specialized form of information storage and retrieval”

[Mili, 1998]

Page 7: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Software Library

Browsing – Inspecting without a predefined criterion Retrieval – Satisfy a predefined matching criterion

Page 8: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Classification Scheme

Facet-based Better than hierarchical classification Manual classification different facets Automatic classification

Controlled Vocabulary Semantic information

Uncontrolled Vocabulary Big software libraries Little or no descriptors

Page 9: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Recall and Precision

High Precision – Most retrieved elements are relevant

High Recall – Few elements left behind Spreading Activation (Relaxed Search) – Related

matches are retrieved Coverage – The average number of assets that are

visited over the total size of the library

Page 10: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Asset Representation

Library representation is made in full knowledge of the artifact. User representation is made in ignorance of the artifact

Asset representation is purposefully abstract to capture important features while overlooking miner or irrelevant details

Asset's surrogate is used in retrieval literature

Page 11: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Asset retrieval Goals

Exact retrieval – Black box reuse Approximate retrieval – White box reuse

Generative modification – Reusing the design Compositional modification – using building blocks of the

retrieved asset

Page 12: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Usually non included information

Interface description Non-functional requirements Interoperability

Page 13: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Situational Model x System Model

Component retrieval model [Lucrédio et. al, 2004]

Page 14: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

“Repository representation is made in full knowledge of the artifact at hand”

“User representation is made in ignorance of the artifact”

[Mili, 1998]

Page 15: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Scott Henninger

Page 16: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Tools

Page 17: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Component Search Tools

Web Delphi Search Engine Ispey CSourceSearch.net (2004) Gonzui SourceBank Koders (2004) Codase (2005)

Aplications Agora (1998) Codebroker (2002) Koders Enterprise (2004) Maracatu (2005)

Page 18: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz
Page 19: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Delphi Search Engine

Page 20: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Ispey.com

Page 21: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

SPARS-J – (2003)

Filter

Page 22: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

SourceBank

Filter

Page 23: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

CSourceSearch.Net – (2004)

Page 24: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Koders.com – (2004)

Page 25: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

CODASE – Launched Sep 9, 2005

Example Searches

Browsing

Multiple Search Options

“…based on the number of people in your company, starting from $5,000 USD”

Page 26: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

CODASE - Browsing

Page 27: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Other Tools

Page 28: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz
Page 29: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

AGORA - Location and Indexing (1998)

INTERNETJavaBeans

AgentJavaBeansIntrospector

JavaBeansAgent

JavaBeansIntrospector

JavaBeansAgent

JavaBeansIntrospector

AltaVistaSearch

Index ServerFilter

INDEX

AltaVista Query Server

Web Server

Page 30: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Component Rank (1998)

V1

V3

V2

0.2

0.2

0.2

0.20.4

0.4

0.4

D12 = 0.5

D13 = 0.5

D23 = 1

D31 = 1Nodes vEdges eGraph GWeight wDistribution Ratio d

Page 31: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

“Classes defining data structures and their containers are

highly ranked”

Page 32: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Clustered Component Graph

V3

V2

V1

V1 ≡ V4 , V2 ≡ V6

V7

V6

V4 V5

V7

V’26

V’14 V’5

V’3

Page 33: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

NO MORE MULTIPLE

DISCONNECTED COMPONENTS

V3

V2

V1

V7

V6

V4 V5

Page 34: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Component Rank System Architecture

.java file ≡ component

(1) Similarity Measurement

(2) Clustering

(3) Use Relation Extraction

(4) Component Graph Construction

(5) Component Rank Computation by

Repetition

(6) De-Clustering to Original Component Graph

INPUT

OUTPUT

Order of Weights ≡ Component Rank of .java files

Page 35: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Simple Copied Components

A

B

A

B

X

Y

Copied Components

OtherComponents

Non-clustered component Graph

A’

B’

X’

Y’

1/4

Clustering Before Weight Computation

1/4

1/4

1/4

A’

B’

X’

Y’

1/3

Clustering After Weight Computation

1/3

1/6

1/6

Page 36: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

DO NOT COUNT SIMPLY

DUPLICATED COMPONENTS

Page 37: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Copied AND MODIFIED Components

A

B

A

C

X

Y

Copied andModified

Components

OtherComponents

Non-clustered component Graph

X’

Y’

Clustering Before Weight Computation

1/5

1/5

Original Components

A

B’ C’

2/5

1/51/5

X’

Y’

Clustering Before Weight Computation

1/5

1/6

A’

B’ C’

1/3

1/61/6

Page 38: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Beyond Searching and Browsing

Searching and browsing Require users to initiate the information seeking process

Information access and Information Delivery

Page 39: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

CodeBroker – (2001)

Components repositories are often so large that software developers cannot learn about all of the components

Component repositories are not static New components added Old components updated

Context-Aware browsing

Page 40: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

May not have suficient knowledge about the reuse repository

May perceive that reuse costs more than developing from scratch

May not be able to use the repository by formulating a proper query

May not be able to understand the found components

Page 41: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

BeliefVaguely

Known

Information Islands

Well Known

L4: Entire Information Space

Unknowncomponents

Page 42: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

L3:Belief

L2:Vaguely

Known

CodeBroker

L1: Well

Known

L4: Entire Information Space

Information Use:

L1 – Use by Memory

L2 – Use by Recall

L3 – Use by Anticipation

L4 – Use by Delivery

Already Known Components

Irrelevant Components

Task Relevant Information

Page 43: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Program Aspects

Concept Formal Informal

Indentation, comments, identifier names (semantic) Executability

Code Constraint environment

Signature

Page 44: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Information delivery

Feedback After execution of the action

Feedforward Affects the execution of the action

Page 45: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Information delivery

Interruptive Noninterruptive

Page 46: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Latent Semantic Analysis (LSA)

Synonymy Polysemy

“Text documents and queries are represented as vectors in the semantic space, based on the words contained and the similarity between a query and a document is determined by the distance of their respective vectors”

Page 47: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz
Page 48: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Comm

ents

signa

ture

Discourse model

User model

Page 49: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Koders Enterprise – (2004)

Page 50: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

M.A.R.A.C.A.T.U. – Modern Architecture for Retrieving All Components At The Universe (2005)

Page 51: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Using Structural Context to Recommend Source

Code Examples

Reid Holmes and Gail C. Murphy

University of British Columbia

Software Practices Lab

Page 52: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

The Problem: A Concrete Example

Frameworks can improve developer productivity. But developers can become stuck trying to use the APIs

Imagine trying to use the Eclipse APIs to place text in the status line of the Eclipse IDE

Eclipse has 38,000 public methods

Page 53: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Structural Context

ProjectRepository

Development Environment

Examples

Using Structural Context to Recommend Source Code Examples - Reid Holmes and Gail C. Murphy

Page 54: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Strathcona: Extract Structural Context

ViewPart

SampleView

setMessage(Strin

g)

IStatusLineManag

ersetMessage(String)

Page 55: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Visual representation Highlights key relationships between example and query

Multiple examples can be quickly viewed

Strathcona: Example Navigation

Page 56: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Strathcona: Viewing Example Source

Code view Example shows how to get a status line manager Example is not a perfect match, but good enough to help

Page 57: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Conclusion

Information Delivery Similarity Analyser Ranking – Metrics Context Automatic Facet Classification

Uncontrolled vocabulary + additional terms

Page 58: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

References [McIlroy, 1968] M. D. McIlroy, Mass Produced Software Components , NATO Software Engineering Conference Report,

Garmisch, Germany, October, 1968, pp. 79-85.

[Mili, 1998] A. Mili, R. Mili, R. T. Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, Vol. 5, 1998, pp. 349-414

[Seacord, 1998] Robert C. Seacord, Scott A. Hissam, Kurt C. Wallnau. "Agora: A Search Engine for Software Components," IEEE Internet Computing, vol. 02,  no. 6,  pp. 62-70,  November/December,  1998

[Szyperski, 1999] Szyperski C., “Component Software: Beyond Object-Oriented Programming”. Addison Wesley, 1999

[Dey, 2001] Dey, A.. Understanding and Using Context. Personal Ubiquitous Comput. 5, 1 (Jan. 2001)

[Greengrass, 2001] Greengrass, Ed. Information retrieval: A survey. DOD Technical Report TR-R52-008-001, 2001

[Ye, 2001] Ye, Y. and Fischer, G. Context-Aware Browsing of Large Component Repositories. In Proceedings of the 16th IEEE international Conference on Automated Software Engineering (November 26 - 29, 2001). ASE. IEEE Computer Society, Washington, DC, 99.

[Ye, 2002] Y. Yunwen and G. Fischer. Information delivery in support of learning reusable software components on demand. In Proceedings of the 7th international conference on Intelligent user interfaces, California, USA

[Ye, 2002] Ye, Y. and Fischer, G. Supporting Reuse by Delivering Task Relevant and Personalized Information. In Proceedings of the 24th International Conference on Software Engineering. p. 513-523, Orlando, Florida, May, 2002

Page 59: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

Bibliography [Inoue, 2003] K. Inoue et al.: "Component Rank: Relative Significance Rank for

Software Component Search", Proceedings of ICSE 2003

[Maxville, 2003] Valerie Maxville, Chiou Peng Lam, Jocelyn Armarego. "Selecting Components: a Process for Context-Driven Evaluation," apsec, p. 456,  10th Asia-Pacific Software Engineering Conference (APSEC'03),  2003

[Maxville, 2004] Valerie Maxville, Jocelyn Armarego, Chiou Peng Lam. "Intelligent Component Selection," compsac, pp. 244-249,  28th Annual International Computer Software and Applications Conference (COMPSAC'04),  2004.

[Prado, 2004] Lucrédio, D.; Almeida, E, S.; Prado, A, F. A Survey on Software Components Search and Retrieval, In the 30th IEEE EUROMICRO Conference, Component-Based Software Engineering Track, 2004, Rennes - France. IEEE Press,2004

[Holmes, 2005] Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th international Conference on Software Engineering (St. Louis, MO, USA, May 15 - 21, 2005). ICSE '05

Page 60: Component Search and Retrieval Advanced Reuse Seminars Eduardo Cruz

“Imperfect technology in a working market is sustainable;

perfect technology without any market will vanish”

[Szyperski, 1999]