37
1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Embed Size (px)

Citation preview

Page 1: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

1

RESOURCE DISCOVERY

Presenter Cugrave Nguyễn Phương Hagrave

2

Overview

Introduction of problems Several approaches Solution model

3

Overview

Introduction of problems Several approaches Solution model

4

Introduction

The goal of VN-Grid project is connecting the available computational resources on the network to utilize available resources from those sites to resolve big scientific problems

Therefore knowing resources available from all Grid sites and finding which Grid sites having available resources are necessary =gt Resource Discovery services

5

Functions of Resource Discovery

The prospective Resource Discovery services in each Gridsite must be able to know find and provide the resource information from others

The main function is that when receiving a specific request about resources form client Resource Discovery services must find out reliable information about Gridsites in the network that possess available resources satisfying the query

6

Resources in VN-Grid

There are three kinds of resources Resources for executing job or computing

resources It is information about the resources used to execute submitted job for example the computational power data storage network bandwidth

Information about services these are information about the services which user wants to learn about for example Information Services Resource Discovery Services

Information about applications these are information about special applications deployed on Grid such as MPI POP C

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 2: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

2

Overview

Introduction of problems Several approaches Solution model

3

Overview

Introduction of problems Several approaches Solution model

4

Introduction

The goal of VN-Grid project is connecting the available computational resources on the network to utilize available resources from those sites to resolve big scientific problems

Therefore knowing resources available from all Grid sites and finding which Grid sites having available resources are necessary =gt Resource Discovery services

5

Functions of Resource Discovery

The prospective Resource Discovery services in each Gridsite must be able to know find and provide the resource information from others

The main function is that when receiving a specific request about resources form client Resource Discovery services must find out reliable information about Gridsites in the network that possess available resources satisfying the query

6

Resources in VN-Grid

There are three kinds of resources Resources for executing job or computing

resources It is information about the resources used to execute submitted job for example the computational power data storage network bandwidth

Information about services these are information about the services which user wants to learn about for example Information Services Resource Discovery Services

Information about applications these are information about special applications deployed on Grid such as MPI POP C

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 3: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

3

Overview

Introduction of problems Several approaches Solution model

4

Introduction

The goal of VN-Grid project is connecting the available computational resources on the network to utilize available resources from those sites to resolve big scientific problems

Therefore knowing resources available from all Grid sites and finding which Grid sites having available resources are necessary =gt Resource Discovery services

5

Functions of Resource Discovery

The prospective Resource Discovery services in each Gridsite must be able to know find and provide the resource information from others

The main function is that when receiving a specific request about resources form client Resource Discovery services must find out reliable information about Gridsites in the network that possess available resources satisfying the query

6

Resources in VN-Grid

There are three kinds of resources Resources for executing job or computing

resources It is information about the resources used to execute submitted job for example the computational power data storage network bandwidth

Information about services these are information about the services which user wants to learn about for example Information Services Resource Discovery Services

Information about applications these are information about special applications deployed on Grid such as MPI POP C

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 4: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

4

Introduction

The goal of VN-Grid project is connecting the available computational resources on the network to utilize available resources from those sites to resolve big scientific problems

Therefore knowing resources available from all Grid sites and finding which Grid sites having available resources are necessary =gt Resource Discovery services

5

Functions of Resource Discovery

The prospective Resource Discovery services in each Gridsite must be able to know find and provide the resource information from others

The main function is that when receiving a specific request about resources form client Resource Discovery services must find out reliable information about Gridsites in the network that possess available resources satisfying the query

6

Resources in VN-Grid

There are three kinds of resources Resources for executing job or computing

resources It is information about the resources used to execute submitted job for example the computational power data storage network bandwidth

Information about services these are information about the services which user wants to learn about for example Information Services Resource Discovery Services

Information about applications these are information about special applications deployed on Grid such as MPI POP C

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 5: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

5

Functions of Resource Discovery

The prospective Resource Discovery services in each Gridsite must be able to know find and provide the resource information from others

The main function is that when receiving a specific request about resources form client Resource Discovery services must find out reliable information about Gridsites in the network that possess available resources satisfying the query

6

Resources in VN-Grid

There are three kinds of resources Resources for executing job or computing

resources It is information about the resources used to execute submitted job for example the computational power data storage network bandwidth

Information about services these are information about the services which user wants to learn about for example Information Services Resource Discovery Services

Information about applications these are information about special applications deployed on Grid such as MPI POP C

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 6: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

6

Resources in VN-Grid

There are three kinds of resources Resources for executing job or computing

resources It is information about the resources used to execute submitted job for example the computational power data storage network bandwidth

Information about services these are information about the services which user wants to learn about for example Information Services Resource Discovery Services

Information about applications these are information about special applications deployed on Grid such as MPI POP C

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 7: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

7

Resources in VN-Grid

Characteristics of Resources in VN-Grid environment The resources are heterogeneous not only in the

network but also in each The resources have variety of properties with

different data types The existing resources continuously vary especially

the computing resources for example CPUs memory disk network bandwidth

New resources are continually being published

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 8: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

8

Forwarding in VN-Grid

The proposed VN-Grid infrastructure simulates a Peer-to-Peer model in which clients control the networking instead of servers that means those peers could exchange information directly

Interacting is limit to known peers The peers are equally considered The number of peers participating in Grid can be

raised enormously

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 9: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

9

Summary

Good resource discovery services must Provide the most exact update and sufficient

information with timely solution Be flexible with features of resources such as

variety heterogeneity and newly added resources Be scalable to adapt with the number of peers in

Grid environment rising Reduce the expense of transmitting information in

P2P environment

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 10: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

10

Overview

Introduction of problems Several approaches Solution model

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 11: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

11

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 12: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

12

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 13: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

13

JSDL

JSDL is used to describe the requirements of computational jobs for submission to resources particularly in Grid environments

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 14: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

14

JSDL

ltJobDefinitiongtltJobDescriptiongt

ltJobIdentification gtltApplication gtltResources gtltDataStaging gt

ltJobDescriptiongtltxsdanyothergt

ltJobDefinitiongt

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 15: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

This is a complex type that defines the operating system required by the jobcomplex typeOperatingSystem

This is a boolean that designates whether the job must have exclusive access to the resources allocated to it by the consuming systemxsdbooleanExclusiveExecution

This element describes a filesystem that is required by the jobcomplex typeFileSystem

This element is a complex type specifying the set of named hosts which may be selected for running the jobcomplex typeCandidateHosts

DescriptionTypeName of attribute

Resources

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 16: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

This is a range value that describes the required amount of disk space for each resource allocated to the job

jsdlRangeValue_TypeIndividualDiskSpace

This element is a range value specifying the required amount of virtual memory for each of theresources to be allocated for this job submission

jsdlRangeValue_Type

IndividualVirtualMemory

This element is a range value specifying the amount of physical memory required on each indi-vidual resource

jsdlRangeValue_Type

IndividualPhysicalMemory

This element is a range value specifying the bandwidth requirements of each individual resource

jsdlRangeValue_Type

IndividualNetworkBandwidth

This element is a range value specifying the number of CPUs for each of the resources to be allocated to the job submission

jsdlRangeValue_TypeIndividualCPUCount

This element is a range value specifying the total number of CPU seconds required on each resource to execute the job

jsdlRangeValue_TypeIndividualCPUTime

This element is a range value specifying the speed of each CPU required by the job in the execution environment

jsdlRangeValue_TypeIndividualCPUSpeed

DescriptionTypeName of attribute

Resources

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 17: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

This element is a range value specifying the total number of resources required by the job jsdlRangeValue_TypeTotalResourceCount

This is a range value that describes the required total amount of disk space that should be allocated to the job jsdlRangeValue_TypeTotalDiskSpace

This element is a range value specifying the required total amount of virtual memory for the jobsubmission jsdlRangeValue_TypeTotalVirtualMemory

This element is a range value specifying the required amount of physical memory for the entirejob across all resources jsdlRangeValue_TypeTotalPhysicalMemory

This element is a range value specifying the total number of CPUs required for this job submission jsdlRangeValue_TypeTotalCPUCount

This element is a range value specifying total number of CPU seconds required across all CPUs used to execute the job jsdlRangeValue_TypeTotalCPUTime

DescriptionTypeName of attribute

Resources

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 18: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

This element is a simple type containing a single name of a hostxsdstringHostName

CandidateHosts

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 19: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

This is a token that describes the type of filesystem of the containing FileSystem element

jsdlFileSystemTypeEnumerationFileSystemType

This is a range value that describes the required amount of disk space on the containing FileSystem element for the job

jsdlRangeValue_TypeDiskSpace

This is a string that describes a remote location that MUST be made available locally for the jobxsdstringMountSource

This is a string that describes a local location that MUST be made available in the allocated resources for the jobxsdstringMountPoint

xsdstringDescription

FileSystem

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 20: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

xsdstringDescription

This element is a string that defines the version of the operating system required by the jobxsdstringOperatingSystemVersion

This is a complex type that contains the name of the operating systemcomplex typeOperatingSystemType

OperatingSystem

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 21: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

Resources

This element is a token specifying the CPU architecture required by the job in the execution environment

jsdlProcessorArchitectureEnumerationCPUArchitectureName

CPUArchitecture

This is a token type that contains the name of the operating system

jsdlOperatingSystemTypeEnumerationOperatingSystemName

OperatingSystemType

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 22: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

22

RDSResults EDAGrid

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltResource rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltWSGRAMgtltReservationAddressgt

ltOSNamegt ltOSVersiongt ltPlatformgt

ltResourcegt +ltDiscoverygt

ltRDSResultSetgt

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 23: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

23

RDSResults

ltRDSResultSet count = ltnumgt gtltDiscovery rank = ldquordquo id = ldquordquogt

ltIndividual rank = ldquordquo id = ldquordquo nodeCount = ldquordquo gt

ltIndividualgtltTotalgtltInteractBandwidthgt

ltDiscoverygt ltRDSResultSetgt

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 24: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

24

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 25: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

25

Ranking EDAGrid

FUN CTION Ranking_AlgorithmFOR each (Ri satisfy the individual condition)Rank(Ri) = 0FOR each (Aj is the attribute of job)Rank(Ri) = Rank(Ri) + (w[j] R[ij]A[j])Next ANext RSort the Resource SetReturn list of resource with order

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 26: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

26

Set-matching EDAGrid

The set-matching algorithm Create an empty set Add the resources into the set with the higher

rank one after one the lower Each time check if the Total condition is met

and if the InteractBandwidth is violated Terminate the loop if these conditions are

satisfied or the number of gridnodes reaches the number user required

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 27: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

27

Several Approaches

Resources descriptionndash Global Grid Forum JSDL ndash EDAGrid RDSResult

Matching methodndash EDAGrid Ranking Set-matching

Forwarding methodndash Napster Centralized Indexingndash Gnutella Flooding Queryndash Chord Indexing Using Distributed Hash Tablesndash HyperCuP system Interests-basedndash Ant Colony Optimizing

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 28: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

28

Centralized Indexing

Proposes a centralization management for the whole resources of all the grid sites

There is a server machine which holds all the index of available resources on the network

Users start the query process by sending the request to the index server

The server will send the answers to the users bases on the information stored

Advantages quickly Disadvantages

ndash The bottleneck at the server machinendash Update the information continuouslyndash Not suitable to the P2P

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 29: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

29

Flooding

The query will be routed from one peer to all of its neighbors By this way the query will be sent throughout the network If the peer finds out the resources in its local storage it will send the

answer to the original peer who makes the request Using Time-To-Live (TTL) to limit the number of hops a request could

be sent so that after a certain times to be sent the request will automatically disappear out of the network

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 30: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

30

Indexing Using Distributed Hash Tables

In this method each peer in the network has a partition of the hash table

Each entry in the hash table is the key space which point to the peer where the search file can be found

When there is a request of a file the file name will be hash by a uniform hash function

Base on the hash value and the hash table the look up value will be found and return to the requester

The cost of this method consists of the cost to build and update the hash table and route the query to the location search file

Disadvantages not apply to the complex query

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 31: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

31

Interests-based Methods

This method based on the interest of users The idea is to search on the peers that seem to contain what

users have required To reach this point peers are organized into groups of similar

interest Therefore the search queries will be forwarded to the interest

group to get the high hit rate and reduce the redundant time to search on other peers

Disadvantage ndash the peers interest may change over time ndash peers have more than one interest

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 32: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

32

Ant Colony Optimizing

Ants start from their nests and wander randomly The ants which found food will return to their nests in terms of their memory and drop

pheromone on trails Other ants which come across such a trail will follow the trail to check the food instead of

wandering randomly If they find the food they will return home and reinforce the pheromone on the trail A key point is that the pheromone evaporates over time The more time it takes for an ant to travel back to its nest the more pheromone will be

evaporated When an ant reaches an intersection the ant has to decide which branch to take The ants which take a short branch march faster than those which take a long branch Therefore the pheromone density on the short branch remains higher Other ants will more likely choose the branch in terms of the pheromone density Eventually all the ants which go to get the food will take the shortest branch

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 33: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

33

Ant Colony Optimizing

Initially the ants may take the

paths of ArarrBrarrCrarrE ArarrBrarrE or ArarrBrarrDrarrE After the initial stage most

of the ants will take the shortest path ArarrBrarrE

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 34: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

34

Overview

Introduction of problems Several approaches Solution model

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 35: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

35

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 36: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

36

Reference

[1]Thien-Nga Nguyen-Vu Information Service API Specification raquoVNGRID PROJECT [2]Tran Vu Pham Lydia MS Lau and Peter M Dew An Ontology-based Adaptive Approach

to P2P Resource Discovery in Distributed School of Computing University of Leeds Leeds UK

[3]Tuan Anh Nguyen VN-Grid_design-Oct_1 VNGrid Project[4] Project Overview_VN-Grid Project wiki httpwwwcsehcmuteduvn~vngridwikiProject_Overview

[4] [JSDL]Job Submission Description Language (JSDL) Specification Version 10 httpforgegridforumorgprojectsjsdl-wg

Nguyễn Quang Hugraveng Nguyễn Thanh Sơn USER-DRIVEN GRID RESOURCE DISCOVERY Khoa Khoa Học amp Kỹ Thuật Maacutey Tiacutenh Nhagrave A3 Trường Đại học Baacutech Khoa ndash ĐHQG TpHCM

Yuhui Deng middot FrankWang middot Adrian Ciura Ant colony optimization inspired resource discovery in P2P Grid systems Springer Science+Business Media LLC 2008

Zenggang Xiong 12 Yang Yang1 Xuemin Zhang2 Fu Chen110485791048579104857910485791048579Li Liu1 Integrating Genetic and Ant Algorithm into P2P Grid Resource Discovery School of Information Engineering University of Science and Technology Beijing Beijing 100083 China

Tran Vu Pham A Collaborative e-Science Architecture for Distributed Scientific Communities The University of Leeds School of Computing October 2006

37

Thank You

Page 37: 1 RESOURCE DISCOVERY Presenter: Cù Nguyễn Phương Hà

37

Thank You