49
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]> Project funded by the Future and Emerging Technologies arm of the IST Programme Immune System and Search Technology Designing a Fast Search Algorithm for P2P Network using concepts from Immune Systems

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) Project funded by the Future and Emerging Technologies arm of the IST Programme Immune

Embed Size (px)

Citation preview

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Immune System and Search Technology

Designing a Fast Search Algorithm for P2P Network using concepts from

Immune Systems

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Overview of the Presentation● P2P Network

– Paradigm for Decentralised Computing

● Immune System Features

● Experimental Setup

● Simulation Results

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Peer To Peer Network● Most Direct Method of Connecting Computers

– Simple

– Inexpensive

– No Boss

– No Regulation

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Peer To Peer Network● PCs at the edge of the network are called “Peers”● Peers can retrieve objects directly from each other

Advantages of a P2P Network

A large collection of peers may be available for content distribution--sometimes millions!

User takes advantage of the network’s currently available resources.

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Peer To Peer Network

● Problem of Hugeness– Emergence of Protocol

● Network Structure ● Degree of Centralization

Unstructured Network

Loosely Structured Network

Structured Network

Hybrid Decentralized

Napster

Pure Decentralized

Gnutella Freenet CAN, CHORD

Partially Centralized

FastTrack, Kazaa, Morpheus

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

P2P: Hybrid Decentralized (Napster)

When peer connects, it informs central server:

– IP address– content

Centralized

directory server

peers

Alice

Bob

1

1

1

1

3

Alice queries for

Das Wunder von Bern

Alice requests file from Bob

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

While file transfer is decentralized, locating content is highly centralized

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

P2P: Hybrid Decentralized (Napster)

Centralized

directory server

peers

Alice

Bob

1

1

1

1

3

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

● Fast ● Single point of failure

– Application crash● Performance bottleneck● Huge database to

maintain● Copyright infringement

– Legal proceedings may result in the company having to shut down directory server

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

P2P: Intermediate Arrangement (Kazaa)

FeatureHas a centralized server that

• maintains user registrations, • logs users into the systems to keep

statistics, • provides downloads of client

software.

Two client types are supported: Supernodes (fast cpus + high bandwidth connections)Nodes (slower cpus and/or connections)

Supernodes addresses are provided in the initial download. They also maintain searchable indexes and proxies search requests for users.

^

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

P2P: Pure Decentralized (Gnutella) Basic Feature● no hierarchy, peers have

similar responsibilities: no group leader

● no peer maintains directory info

● highly decentralized

Joining Algorithm ● use bootstrap node to

learn about others● Join message

^

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

P2P: Pure Decentralized (Gnutella)

^

Message Query : ● Send query to

neighbors● If queried peer has

object, it sends message back to querying peer

● The queried peer forwards the query to its immediate neighbor.

● The resulting results are carried back to the user.

● A message Flooding occurs

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

P2P: Pure Decentralized (Gnutella) Pros : ● Totally Decentralized query ● Robust; Query doesn't stop on

break down of one of the nodes● Fresh Results : No outdated Index

Cons ● Query radius: Query Radius can

be long● Excessive query traffic : 25% of

the total traffic is query traffic● Total Traffic in Gnutella Network

is 1.7 Gbps 1.7% of total traffic in US Internet Backbone

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa

P2P: Pure Decentralized (Gnutella)

Challenges Ahead : ● Reduce Query time● Stop Flooding; use Intelligent

method for search to stop network congestion

Relation Between Data and Topology

Structured and Loosely Structured Topology

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Unstructured – Gnutella Structured – CHORD Loosely Structured - Freenet

P2P: Structured Decentralized Network Distributed Hash Table :

Data or metadata is carefully placed across nodes in a deterministic fashionEvery file and every node (ip) generates a unique hash address helping in placement of dataEach node has to keep information of limited number of neighborsSearch is very fast, typically of the order log(n)Extremely scalable

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Unstructured – Gnutella Structured – CHORD Loosely Structured - Freenet

P2P: Structured Decentralized Network Disadvantages● Locality is destroyed.

– Data items (i.e. files) from a single site are not usually co-located, meaning that opportunities for enhanced browsing, pre-fetching and efficient searching are lost.

● Useful application level information is lost.– The data used by many applications is naturally

described using hierarchies, which expose relationships between items near to each other. The virtualization of the file namespace by generating keys discards this information.

● P2P Networks are extremely transient● Difficult to have keyword search and not exact-

match queries

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Unstructured – Gnutella Structured – CHORD Loosely Structured - Freenet

P2P: Loosely Structured Network ● Freenet is in between the two. ● File locations are affected by routing hints, but they are not

completely specified, so not all searches succeed.● It essentially pools unused disk space in peer computers to

create a collaborative virtual file system.● Files are replicated when they are searched.

Unstructured NetworkLoosely Structured

NetworkStructured Network

Hybrid Decentralized Napster

Pure Decentralized Gnutella Freenet CAN, CHORD

Partially CentralizedFastTrack, Kazaa,

Morpheus

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search

Search MechanismTopology PlacementDataMessage Routing

Search CriterionExpressiveness (Key-lookup, Keyword, Rank Keyword)Efficiency (Bandwidth, Processing Power, Storage)Quality of Service (Number of Results, Response Time)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom to chose how much data to store, where to store)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Artificial Immune System● Relatively new branch of computer science

– Using natural immune system as a metaphor for solving computational problems

– Not modelling the immune system

● Variety of applications so far …– Fault diagnosis (Ishida)– Computer security (Forrest, Kim)– Novelty detection (Dasgupta)– Robot behaviour (Lee)– Machine learning (Hunt, Timmis, de Castro)

– AIS are computational systems, inspired by theoretical immunology and observed immune functions, which are applied to complex problem domains (Timmis, 2001)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Why the Immune System?

● Recognition– Anomaly detection– Noise tolerance

● Robustness● Feature extraction● Diversity● Memory● Distributed● Adaptive

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Role of the Immune System

● Protect our bodies from infection

● Primary immune response– Launch a response to

invading pathogens● Secondary immune

response– Remember past

encounters– Faster response the

second time around

Lymphatic vessels

Lymph nodes

Thymus

Spleen

Tonsils andadenoids

Bone marrow

Appendix

Peyer’s patches

Primary lymphoidorgans

Secondary lymphoid organs

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Role of the Immune System

● Protect our bodies from infection

● Primary immune response– Launch a response to

invading pathogens● Secondary immune

response– Remember past

encounters– Faster response the

second time around

MHC protein Antigen

APC

Peptide

T-cell

Activated T - cell

B- cell

Lymphokines

Activated B -cell (plasma cell)

( I )

( III )

( IV )

( V )

( VI )

( VII )

( II )

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Role of the Immune System MHC protein Antigen

APC

Peptide

T-cell

Activated T - cell

B- cell

Lymphokines

Activated B -cell (plasma cell)

( I )

( III )

( IV )

( V )

( VI )

( VII )

( II )

Epitopes

-B cell Receptors

Antigen

The immune recognition is based

on the complementarily

between the binding region of

the receptor and a portion of the antigen called

epitope.

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Role of the Immune System MHC protein Antigen

APC

Peptide

T-cell

Activated T - cell

B- cell

Lymphokines

Activated B -cell (plasma cell)

( I )

( III )

( IV )

( V )

( VI )

( VII )

( II )

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Role of the Immune System MHC protein Antigen

APC

Peptide

T-cell

Activated T - cell

B- cell

Lymphokines

Activated B -cell (plasma cell)

( I )

( III )

( IV )

( V )

( VI )

( VII )

( II )

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Role of the Immune System

Auto Immune Reaction (Self NonSelf Discrimination)Self Presented at beginning

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

General Framework for AIS

Application Domain

Representation

Affinity Measures

Immune Algorithms

Solution

P2P Network Search

Search Item - Antigen

Similarity (message,search item)

ImmuneSearch Algorithm

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Reiterating the Perspective

Solution

P2P Network Search

Search Item - Antigen

Similarity (message,search item)

ImmuneSearch Algorithm

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Modeling the Network

Information Profile – PopSearch Profile – Classical

User

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Profile is thought to be continuous It is represented by a 10-bit binary string That is, it is assumed there are 1024 categoriesProfiles close to each other (pop,rap) are close in terms of Hamming Distance

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Modeling the Network

Zipf Law (Information and SearchProfile)

1

1

1

1

1

1

1

3

0

3

0

0

0

2

2

3

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Zipf Law Power Law to calculate probability of occurrece of a pattern r Pr ia , r is the ith frequent keyword, a is a constant close to 1Nr = K/ia Nr = N

N = 16, K = 7.68, a = 1K/1 = 7, K/2 = 4 K/3 = 3, K/4 = 2

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Flooding

Flooding essentially implies sending the message packet to all the neighboring nodes

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Random Walk

A Message packet travels at its will

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Immune Search

Algorithm Consists of two parts

1. The movement of Message Packets

2. Rearrangement of Topology

Proliferation

MutationHigh Concentration of Packets

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Immune Search

Algorithm Consists of two parts

1. The movement of Message Packets

2. Rearrangement of Topology

Proliferation

Mutation

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Immune Search

Aim Cluster Similar Nodes (Similar in Information and Search Profile)

AlgorithmMove nodes similar to user node closer to the user (change their neighborhood)

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Immune Search

Movement Depends on1. The Distance from the

user node2. Amount of Matching3. Age

Aim Cluster Similar Nodes (Similar in Information and Search Profile)

AlgorithmMove nodes similar to user node closer to the user

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Immune Search

Movement Depends on1. The Distance from the

user node2. Amount of Matching3. Age

Aim Cluster Similar Nodes (Similar in Information and Search Profile)

AlgorithmMove nodes similar to user node closer to the user

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Search the Network – Immune Search

Movement Depends on1. The Distance from the

user node2. Amount of Matching3. Age

Aim Cluster Similar Nodes (Similar in Information and Search Profile)

AlgorithmMove nodes similar to user node closer to the user

No Movement

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results

Experiment : • Run for 100

generation, without changing the participating nodes

• Each Generation 100 searches by users selected randomly

Efficiency • No. Of Search Items

found in 50 time steps

Comparison• Random Walk,

Flooding, • Proliferation

100

100

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results Fairness Criteria

• Search Criteria is same HD(Search,query)

• Number of query packets are same

• Initial Number of packets in Random Walk is higher than Proliferation

• Flooding is not continued for 50 time steps

100

100

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results Fairness Criteria

• Proliferation1 and ImmuneSearch have same proliferation rate • Proliferate

HD(Search,query) < 2• Proliferation2 has higher

proliferation rate • Proliferate

HD(Search,query) < 3• Proliferation2 has almost

same number of packets as ImmuneSearch

100

100

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results(Cost) No of Packets

staying for 50 time stepsLimited Flooding – 16ImmuneSearch - 2Proliferation1 – 2

Proliferation is self-regulatory

100

100

Performance ImmuneSearch Proliferation1

Proliferation2 RandomWalkLimited Flooding

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Clustering (Most Frequent Token)

100

100

Cluster Very Fast – within 24 generation clustersNot one cluster but two/three clustersInformation Profile and Search Profile interminglesSo clusters are not very tightThis allows

Proliferation to flourish without much wasting

Lesser frequent tokens can cluster

Information Profile – Pop

Search Profile – Classical

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Clustering (Less Frequent Token)

100

100

Clustering of second, third and eleventh most frequent tokens

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results

Experiment : Change 5%, 10%, --- 50% ofthe node at each

generation

100

100

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results

Experiment : Change 5%, 10%, --- 50% ofthe node at each

generation

100

100

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results

Observations• ImmuneSearch is better

till 50% replacement than simple proliferation

• 5% replacement is some times better than without replacement scheme

100

100

Search MechanismTopology Message Routing

Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)

System RequirementAutonomy (Freedom from storing data)

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Clustering (In Changing Condition)

100

Clustering of most frequent tokens with 5%, 10% and 20% replacement.

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Experimental Results(Amount of Change in Neighborhood)

● Change of 20% óf the node after 100 generations without replacement

● The neighborhood change rate drop after some time

● In 5% continuous replacement, it always changes maintaining a more or less constant rate

● The new nodes participate in this change

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

● The network works as a self correcting/organizing system

● The proliferation – mutation combination is a good alternative for random walk and flooding

● Topology evolution helps in enhancing the performance of the network

● The design is robust● Simulate it on other overlay topologies

Discussion and Future Work

Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>

Project funded by the Future and Emerging Technologies arm of the IST Programme

Questions and Answers