A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong...

A Progressive Fault Tolerant Mechanism in Mobile Agent

Systems

Michael R. Lyu and Tsz Yeung Wong

July 27, 2003 SCI Conference

Computer Science Department

Chinese University of Hong Kong

Outline

Introduction of the Problem

Problem Solutions– Server failure detection and recovery– Agent failure detection and recovery

Reliability Evaluations– Using agent implementation– Using Stochastic Petri Net Simulation

Introduction of the problem

Mobile agents are autonomous software codes sent out in netwroks to perform services on behalf of their host.

We focus on designing a fault-tolerant mobile agent system The challenge is:

– Guarantee service availability in the presence of server failures.– Guarantee service availability in the presence of agent failures.– Preserve data consistency in both agents and servers.– Preserve the exactly-once property.– Guarantee the agent can eventually finish its tasks.

Introduction of the problem

Fault-tolerance is classified into levels

– Level 0: No tolerance to faults– Level 1: Server failure detection and recovery– Level 2: Agent failure detection and recovery

Level 0

No tolerance to faults

– Why agents die? because of server failure because of faults inside agent

– Application has to restart manually.

– Affected server may leave an inconsistent state after recovery.

Level 1

Server failure detection and recovery

– Incorporate a failure detection program (monitor, watchdog).

– When a server restarts, abort all uncommitted transactions in the server.

This preserves data consistency

– When the agent re-executes after the initial states Visited servers will be visited again Violates exactly-once execution property

Level 2

Agent failure detection and recovery

– When a server fails, its residing agents are lost. We aim at recovering such loss in this level

– By using checkpointing We checkpoint agent internal data We use checkpointed data to recover lost agents. Agent data consistency is preserved

– Recovery of agent happens on the failed server This preserves the exactly-once execution property.

Design of Level 1 FT

We have a global daemon which monitors all the servers.

Single point of failure problem

monitoring daemon

server pool

When the daemon recovers a server– It aborts all the uncommitted transactions performed

by those lost agents.– This preserves data consistency in the server.

This technique is– Easy to implement– Can be deployed on every existing mobile agent

platform, without modifying the platform.

We use cooperative agents.– Actual agent– Witness agent

Actual agent performs actual computation for the user.

Witness agent monitors the availability of actual agent.– It follows behind the actual agent.

In our protocol, actual agents are able to communicate with the witness agent

– the message is not a broadcast one, but a peer-to-peer one

– Actual agent assumes that the witness agent is in the previous server

– Actual agent must know the address of the previous server

Protocol of Level 2 FT

Server i-1 Server i Server i+1

Server logServer log Server log

Arrive at i

Arrive at iAgentmessagesboxLeave i

Leave i

arrive

Checkpointing happens!!

Protocol of Level 2 FT

Server i-1 Server i Server i+1

Server log Server log

Agentmessagesbox

Arrive at i+1

Leave i+1

Arrive at i+1

Leave i+1

spawnArrive at i

Leave i

Failure and Recovery Scenarios

We only cover stopping failures.

(I.e., assuming Byzantine failures do not exist)

We handle most kinds of failures:– Witness agents fail to receive “arrive at i” message– Witness agents fail to receive “leave i” message– Witness agent failures

Missing arrive message

The reason may be:

1. message is lost

2. message arrives after timeout period

3. actual agent dies when it is ready to leave server i-1

4. actual agent dies when it has just arrive at server i, without logging.

5. actual agent dies when it has just arrive at server i, with logging.

Arrive at i

It is simple for the 1st and 2nd case.

Server i-1 Server i

Server logServer log

Arrive at i

found log

retransmitmessage

For the 3rd and 4th cases, recovery takes place.

Server i-1 Server i

checkpointdata

no log

retransmitmessage

For the 5th case, it results in missing detection.– since log appears in the server– the consequence is that “leave i” message never

arrives.

Missing leave message

The reason may be:

1. message is lost.

2. message arrives after timeout period

3. actual agent dies when it has just sent the “arrive at i” message

4. actual agent dies when it has just logged the message “leave i” message.

leave i

The 3rd case is the same as the previous missing detection case.

Server i-1 Server i

checkpointdata

no log

retransmitmessage

Arrive at i

In this case, the recovery action is the same as the previous section.

– When failure happens, the agent should be performing computation.

– So, when server recovers, the agent’s computation has aborted.

This results in missing detection again.

– This can be compensated by the 3rd case in the previous discussion.

– It is because the witness will never receive “arrive i+1”.

Witness Failure Scenarios

There is a chain of witness agents leaving on the itinerary of the agent

– The latest witness monitors the actual agent.– Other witnesses monitor its preceding witness.

0 1 2 i-1 i

Witnessing dependency

Witness Failure Scenarios

Server i-2 Server i-1 Server i

recovery recovery

Simplification

Assume that 2-server failure would not happen– We can simplify our witnessing dependency

i-2 i-1 i

Simplification

If failure strikes server i-1– Witness on server i-2 can recover witness on server

If failure strikes server i-2– Will not recover it– Because within a short period, no failure would

happen

Reliability Evaluation

The results are obtained by– an agent system implementation using Concordia.– simulation using Stochastic Petri Net.– aim: to measure the percentage of successful round-trip-travel.

about 60%

about 5%

Foragentfailuredetectiononly

about 60%

about 140%

# of original agent

# of recovered agent

Conclusion

Categorized the fault-tolerance of mobile agent system.

Designed a scheme for both server and agent failure detection and recovery.

Analyzed most failure scenarios in mobile agent systems.

Conducted performance evaluations which show– Our scheme is a promising technique– Trade-off between cost and levels of reliability

A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong...

Documents

LYU9905 Security in Mobile Agent E- Commerce Systems Prepared by : Wong Ka Ming, Caris Wong Tsz Yeung, Ah Mole Supervisor : LYU Rung Tsong Michael

Tsz: 10 /2004

Carol yeung

Thermodynamics [AP-2013] Lecture 4B by Ling-Hsiao Lyu ...lyu/lecture_files_en/lyu_TD_Notes/TD...Thermodynamics [AP-2013] Lecture 4B by Ling-Hsiao Lyu 2015 p. 4B- 3 Exercise: Write

Internet avenue lyu

Lyu Martins 2014 JoA BWB

EE 02 TSZ i

1 P2P Digital TV Recorder Supervisor: Professor Michael R. Lyu Prepared by:Ho Tsz Wing, Andy Lau Wai Shun, Jack

Chung-Pan Tang , Tsz-Yeung Wong, Patrick P. C. Lee The Chinese University of Hong Kong NOMS’12

Curriculum Vitae of Longyun Lyu - Hong Kong Baptist …cmcs.hkbu.edu.hk/ueditor/php/upload/file/20150706/...Curriculum Vitae of Longyun Lyu Basic Information Name: Longyun Lyu Gender:

LYU 0004 Mobile Agent’s Community

Tai Po Sam Yuk Secondary School 2019 - 2020 Timetabletpsy.edu.hk/StudentLife/Timetable.pdf · 2:55 Ho Wing Yung Lam Tsz Yan Lam Tsz Yan Hsu Chun Ki/ Lam Yin Ting Yeung Kwok Ying

TSZ Sondernummer 2015

Lyu Yeonhwan

Supervised by Prof. LYU Rung Tsong Michael Student: Chan Wai Yeung (1155005589) Lai Tai Shing (1155005604)

Tsz Yeung Liu, Tom STUDIO AIR (Final Part A+B+C)

ME - hkfa.org.hkshih yu lun tpe zhang shengyu chn yeung lik hang hkg chan tsz long hkg kam chun ho hkg chan yee him caleb hkg strip 7 9:20 am referee(s): 辛克勇 (chn) yeung wai

THE COMPANIES ORDINANCE - HKU Alumni EF AoA.… · Web viewTSZ CHING ESTATE, TSZ WAN SHAN, KOWLOON, HONG KONG. (ENGINEER) (Sd.) YEUNG CHIN HO DANIEL (楊展豪) FLAT B, FLOOR

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE ...cslui/PUBLICATION/tdsc2008.pdfA Precise Termination Condition of the Probabilistic Packet Marking Algorithm Tsz-Yeung Wong, Man-Hon Wong,

Ken Yeung - resume