2 Motivation Distributed Systems Notoriously difficult to build without appropriate assistance. First ones were based on low-level message-passing mechanisms

2

Motivation• Distributed Systems

• Notoriously difficult to build without appropriate assistance.• First ones were based on low-level message-passing

mechanisms.• Less abstractions meant simpler debuggers.

• Not much difference between the code manipulated by the user and what actually got executed.

• Nowadays: Object Oriented Middleware• Abstraction frameworks introduce code.

• not meant to be seen by the user at development time.• little care is taken at runtime.

• Result: large discrepancies• Middleware: development made easier; or• Middleware: debugging made harder?

3

Motivation (cont.)

• Related tasks• Testing

• Debuggers help fix errors.• Testing is paramount for finding them.• Easier to do if support is provided by the environment.

• Setting up test scenarios is also a part of the debugging process.

• Launching remote processes.• Simulating failure and observing system behavior.

• A basis for automated testing.• Remote interaction & data

collection.

4

Our approach

• Simplest idea possible• OO Middleware allows developers to think of their

Distributed Systems as if they were multithreaded, Object-Oriented systems (abstraction framework).

• We want to:• preserve that illusion at debug time.• level interactivity with what’s provided by today’s source-level

debuggers.

• Involves capturing key points of system execution and displaying them to the user, hiding middleware-related code execution (debug time complexity-hiding).

5

Our approach (cont.)

• Also involves…• Implementing the distributed thread abstraction

• Threads that span multiple machines.• Core element for causal relation tracking in synchronous-

call, Distributed Object Systems (DOSes) [Li, 2003], [Netzer, 1993].

• Distributed thread visualization• Our hypothesis

• Expected extension for symbolic debuggers;• could lead to valuable insight;• possible source of information

for other algorithms

6

Our approach (cont.)

• However• Arguably useful [Burnett, 1997]

• Maybe it doesn’t lead to valuable insight after all.

• Scales poorly• Could be an issue for some applications.

• Difficult to implement• Little support from runtime environments.

• Could be unworkable in some situations• Too much interference.• The “timeout” issue.

6

7

Initial implementation• Targeted at Java/CORBA environments.• Allows (*will be available shortly):

• dynamic distributed thread tracking and visualization;• fine grained control over the flow of execution (stopping,

resuming and inspecting distributed threads);• arbitrary state inspection;• launching/killing remote processes;• on-the-fly data visualization;• stable property detection*.

• Would be perfect if• could replay execution and;• detect some unstable predicates

[Garg, 1997; Tomlinson, 1993].

8

Eclipse

• Provides the model• Sophisticated interface and interaction set

• Perfect environment for such a tool• Extensive support from the framework;• open source implementation;• widely adopted.

• Phase 1:• Implement GUI extensions for accommodating distributed

threads and• controlling remote applications.• Other visual aids (call graphs, dynamic

distributed thread visualization).

9

Architecture

10

Architecture (application)

11

Architecture (another view)

• Current state-of-the-art: part of the execution controller, small part of theexecution monitor.

12

Rough “screenshot”

13

Limitations/considerations

• Works only with synchronous-call models;• timeouts must be disabled for state inspections to

work;• currently tied to Java/JDI;• could produce too much overhead;• could fail miserably with some middleware thread

handling schemes;• limited support for remote process management;• must yet be improved before attaining

acceptable performance levels.

14

Availability

• More information (including source code) can be found at the project web site:

http://eclipse.ime.usp.br/projects/DistributedDebugging

• Contributions are more than welcome.

• Thank you!

15

References[Burnett, 1997] M. Burnett et. al., A Bug's Eye View of Immediate Visual

Feedback in Direct-Manipulation Programming Systems. In Empirical Studies of Programmers, Washington, D.C., October, 1997.

[Garg, 1997] V. K. Garg. Methods for Observing Global Properties in Distributed Systems. IEEE Concurrency, Vol. 5, pages 69-77. October, 1997.

[Li, 2003] J. Li. Monitoring and Characterization of Component-Based Systems with Global Causality Capture. 23rd ICDCS. Providence, Rhode Island, May 19-22, 2003.

[Netzer, 1993] R. H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs. In Proc. Workshop on Parallel and Distributed Debugging, pages 1-10. 1993

[Tomlison, 1993] I. Tomlinson and V.K. Garg. Detecting Relational Global Predicates in Distributed Systems. In Proceedings of the 1993 ACM/ONR Workshop on Parallel and Distributed Debugging. pages 21-31, USA

16

Questions?

Documents

2 Motivation Distributed Systems Notoriously difficult to build without appropriate assistance. First ones were based on low-level message-passing mechanisms