RoboCup Standard Platform League: Strategies and Challenges€¦ · RoboCup Standard Platform...

Preview:

Citation preview

RoboCup Standard Platform League:Strategies and Challenges

Aris Valtazanos

School of InformaticsStructure and Synthesis of Robot Motion

March 3, 2011

slide 1 of 33 www.inf.ed.ac.uk

Talk overview

• The RoboCup Standard Platform League• Team EdInferno

• Overall framework• Novel techniques and algorithms

• Future endeavours

slide 2 of 33 www.inf.ed.ac.uk

Humanoid league

• Custom-made robots, focus on hardware and control

slide 3 of 33 www.inf.ed.ac.uk

Middle Size league

• More standardised design

• Fully autonomous

• On-board sensing and omnidirectional vision

• Only ball is colour-coded - not even the goals!

slide 4 of 33 www.inf.ed.ac.uk

Small Size league

• Very fast wheeled robots

• Can probably already beat humans!

• But not fully autonomous - off-board, overhead vision system

slide 5 of 33 www.inf.ed.ac.uk

Simulation league

• Two categories: 2-D and 3-D league

• 2-D: focus on multi-agent coordination, team strategies, etc.

• 3-D: simulated matches between teams of NAO robots, (basic)modelling of dynamics

slide 6 of 33 www.inf.ed.ac.uk

So why a Standard Platform League?

• A league that provides a testbed with realistic constraints . . .

• . . . without the need to invest too much effort on hardware anddynamic primitives

• Moreover, all teams use the same robot platform!

• So, success depends solely on algorithmic merit• Various domains of interest:

• Physical actions (locomotion, kicking)• Decision making algorithms• Multi-robot communication and cooperation• Vision-based localisation• Belief estimation• . . . and several more

slide 7 of 33 www.inf.ed.ac.uk

Platform

• 2000(?)-2007: SONY Aibo (4-legged league)

• 2008-present: Aldebaran NAO (SPL replaces the 4-leggedleague)

slide 8 of 33 www.inf.ed.ac.uk

History

• 2008: Total disaster (according to eyewitnesses)

• 2009-2010: Improvement and expansion

• Winners: B-Human (x2)• But state-of-the-art still consists of:

• Fast, robust locomotion• Good, strong kicks• Good enough vision-based localisation

• Very little in the way of:• Team cooperation (e.g. passing)• Team coordination (e.g. role assignment)• . . . and anything else you would label as “artificial intelligence”

slide 9 of 33 www.inf.ed.ac.uk

Some technicalities

• Pitch size: 6x4m

• Team size: 4 robots (as of 2011 - previously 3)• Visual cues:

• Goalmouths: one blue, one yellow (localisation)• Localisation beacons• Ball: orange• Waistbands: pink for one team, light blue for the other (swap at

half time)• Lines, boxes, penalty spots, etc.

slide 10 of 33 www.inf.ed.ac.uk

Some more technicalities - NAO robot

• Height: ∼ 60cm

• Built-in closed loop walking engine - max speed: 9.5cm/s (someteams have their own, faster engines)

• Two cameras - top & bottom (normally only use the latter)• Field of view: 58◦ (diagonal)• To change their FoV, robots can either move or turn their head

• Two sonar sensors accross chestboard - range up to 2m

• Various other sensors: touch, force-sensitive etc.

slide 11 of 33 www.inf.ed.ac.uk

Team EdInferno

• Sep. 2009: team established (i.e. first robots arrived)

• Sep. 2009 - Mar. 2010: familiarisation with the platform -walking engine did not exist at that time, lots of frustratingproblems

• Apr. 2010 - Aug. 2010: first serious attempt to create acomplete framework, 2 more robots acquired

• Sep. 2010 - Dec. 2010: main development period, qualificationfor RoboCup

• Jan. 2011 - present: 5 more robots acquired, work towards fullintegration of all modules

slide 12 of 33 www.inf.ed.ac.uk

Behaviour module

• Basically everything that doesn’t involve vision, localisation, orinter-robot communication

• Main functionalities:• Belief estimation and sensor fusion• Role assignment and decision making• Path planning and action execution

• Required inputs: locations of salient objects in field of view,communicated info from teammates, other sensor readings(sonar)

slide 13 of 33 www.inf.ed.ac.uk

(Main) Behaviour module components

• Vision “helper”

• Belief estimator

• Decison maker

• Path target selector

• Path planner

• Action executor

slide 14 of 33 www.inf.ed.ac.uk

Vision “helper”

• Basic trigonometric functions

• E.g. convert image coordinates to real world distances throughrobot’s kinematics

• Ball tracker

• Field of view bounds calculation

slide 15 of 33 www.inf.ed.ac.uk

Belief estimator

• Preliminaries:• Observation: Anything the robot sees or senses, e.g. “a robot at

location (0.5,0.5)”• Belief: A confidence-based deduction based on a history of

observations, “I am 80% confident that the robot at (0.5, 0.5) isteammate #1”.

• Sensor fusion: combine vision and sonar readings into a singleset of observations

• Information sharing: update these observations fromcorresponding teammate observations

• Observation assignment: for each current observation, find bestmatching past belief

slide 16 of 33 www.inf.ed.ac.uk

Belief estimator (cont.)

• Particle filtering: for each teammate/adversary, maintain a set ofhypotheses (particles) over their possible states

• Two main steps:• Predict: Given a (probabilistic) motion model, estimate how each

particle might next move• Update: Compute the likelihood of each update based on the

current sensor readings

• Subject to consistent observations, particles may converge

• Role assignment: egocentrically determine each teammember’s role(e.g. who should go kick the ball)

slide 17 of 33 www.inf.ed.ac.uk

Particle filter toy example

slide 18 of 33 www.inf.ed.ac.uk

Decision maker

• Based on own inferred role and current beliefs, determine theappropriate action

• Possible actions: move(dx,dy,dθ), kick(type,speed),scan(dyaw,dpitch), getup(front/back)

• Choice of action should depend on belief confidence (e.g. ifwe’re not sure where the ball is, scanning should be the highestpriority)

• Also requires fine-tuned thresholds, e.g. for kicking

slide 19 of 33 www.inf.ed.ac.uk

Path target selector

• Invoked if selected action == move

• Chooses an appropriate target for path planning• More challenging than it sounds! E.g., for kickers:

• Determine where we would like the ball to eventually be, from alist of candidate affordable locations, and subject to a set ofconstraints

• Compute best kicking position and posture that will allow us tokick ball to this desired location

slide 20 of 33 www.inf.ed.ac.uk

Path planner

• As with path target selector, invoked only if selected action ==move

• As name suggests, plans a path that will lead robot to desiredlocation

• Two cases:• If no objects (e.g. other robots, goal posts) in view, simply plan a

straight path• Else, plan a path, every point of which is at least some safety

distance from each obstacle

slide 21 of 33 www.inf.ed.ac.uk

Action executor

• Simply executes the selected action!

• If selected action == move, executes first step of computedtrajectory

• May also execute two moves at once, e.g. move and scan

slide 22 of 33 www.inf.ed.ac.uk

Research contributions (in progress!)

• Reachable sets: improve particle filtering algorithm byaccounting for the physical capabilities of the adversaries

• Intent inference, escape, deceit: synthesise more intelligentbehaviours that exploit the observability constraints andstrategic limitations of the adversaries

• Bringing the above together in a closed-loop sense

slide 23 of 33 www.inf.ed.ac.uk

Reachable sets

slide 24 of 33 www.inf.ed.ac.uk

Composable reachable sets

• Initial idea: composable reachable sets

• Compute different sets for each capability hypothesis for theadversary offline

• Online, always pick the one that most closely matches theadversary’s observed behaviour, based on particle filterestimates

• Result: more flexible decision making that adapts locally, in theface of noisy observations

slide 25 of 33 www.inf.ed.ac.uk

Composable reachable sets

• Offers some performance improvement

• But sensory information is too noisy to allow accurate estimationof velocities

• Need a more flexible approach that adapts to adversary overtime

• New approach: use the reachable set as a proposaldistribution inside the particle filter

• State estimation is still probabilistic and data-driven, but withadditional physical constraints

slide 26 of 33 www.inf.ed.ac.uk

Intent inference, escape and deceit

• Very difficult for robots to execute complicated strategies(passing, attack formations etc)

• But they can be strategic in different ways!

• Approach: form flexible probabilistic models of the adversaries,through which their capabilities may be exploited

slide 27 of 33 www.inf.ed.ac.uk

Intent inference

• Decompose adversary’s behaviour into a set of coarse classes(intent templates), and define a probability distribution over them

• E.g. {Move towards ball, move towards me, move randomly,stand still}

• At time t :• Compute the expected moves for each template• Pick a template randomly (proportionally to its weight)

• At time t + 1, adjust intent template weights based on the actualmove of the robot

slide 28 of 33 www.inf.ed.ac.uk

Escape strategies

• Idea: Robots are faced with strong sensory limitations. . .

• . . . but this is also true of their opponents!

• Select actions and trajectories so they exploit these capabilitiesand hide information from the adversary:

β̂ = argmaxβ∈BT

1|β|

|β|∑k=1

dist(βk , vbsij ) (1)

ρ̂ = argmaxρ∈RT

1|ρ|

|ρ|∑k=1

dist(ρk , sbsij ) (2)

slide 29 of 33 www.inf.ed.ac.uk

Deceit

• Escape strategies are one-step predictive

• Can we extend this to greater time horizons?

• Deceptive move: maximise deviation from the move theadversary expects you to do, while minimising the distance toyour own goal:

d̂m = argminm∈DM

wDDtµ(m) + wUUt

µ(m) (3)

whereDtµ(dm) = −dist(dm,E t

µ), (4)

Utµ(dm) = dist(dm,Gt

µ) (5)

slide 30 of 33 www.inf.ed.ac.uk

Regret minimisation

• Well-studied game-theoretic concept

• Aim: learn and adapt to adversary’s strategic model

• Our approach: adjust weight distributions for intent templatesand deceit online, based on difference between expected andactual moves

slide 31 of 33 www.inf.ed.ac.uk

Regret minimisation algorithm

slide 32 of 33 www.inf.ed.ac.uk

Complete decision making algorithm

slide 33 of 33 www.inf.ed.ac.uk

Recommended