Markov Decision Processes for Path Planning in Unpredictable Environment

Timothy Boger and Mike Korostelev

• Problem• Application• MDP’s• A* Search & Manhattan • Current Implementation

Overview

Competition Overview

• Unpredictable Environment• Semi-Known Environment

“Each team is faced with the mission of providing medical supplies to a victim trapped

in a building following an earthquake.”

• Indoor Aerial Robotics Competition (IARC)

• IARC 2012: Urban Search and Rescue – Drexel Autonomous Systems Lab

• Have access to unknown shortcuts• Shortest time to reach finish line• Focus on vision and navigation

algorithms, particularly maze solving and collision avoidance

• Fully autonomous systems

Application Platform and Motivation

Autonomous Unmanned Aerial Vehicles

Ultrasonic Sensors

High Level And Low

Level Control

Inertial Measurement

High Level Control

Maze SolvingPath PlanningDecision Making

Environment InitializationOverride Controls

• Addition of Shortcuts and Obstacles• Knowledge of affected area• Costly detours or beneficial detours• Need to evaluate whether an area is should

be avoided or will the benefit outweigh the cost?

Maze Solving and Problem Statement

Proposed: Markov Decision Processes

How to formulate problem?– Movement cost no longer static– Damaging event changed area dynamics– Semi-known environment – Follow established path or deviate from path to explore safer ways to goal

Markov Decision Processes

An MDP is a reinforcement learning problem.

1 2 3 4 5

State St

Action atSt St+1

atrt+1

Agent and environment interact at discrete time steps t = 0, 1, 2, …Agent observes state at step t: st S produces action at step t: at A(st)gets resulting reward: rt+1 Rgets to resulting state: st+1

Markov Decision Processes: Policy

Policy is a mapping from states to actions

Markov Decision Processes: Rewards

Suppose a sequence of rewards:

How to maximize?

T is time until terminal state is reached.

When a task requires a large number of state transitions consider discount factor

Where ɣ, 0 ≤ ɣ ≤ 1

Nearsighted 0←ɣ→1 Farsighted

Markov Property – Memory less Ideally a state should summarize past sensations as to retain essential essential information.

When reinforcement learning task has a Markov Property – it is a Markov Decision Process.

Define MDP:

one-step dynamics – transition probabilities

0.1 0.1

Markov Decision Processes: Value

Why move to some states and not others?

Value of State: The expected return starting from that state.

State – value function for policy ∏ (depends on policy)

Value of taking an action for policy ∏

Markov Decision Processes: Bellman Equations

This is a set of equations (in fact, linear), one for each stateThe value function for pi is its unique solution

Current Implementation And Challenges

The Bellman equation expresses a relationship between the value of a state s and the values of its successor states s’.

Markov Decision Processes: Optimization

1 2 3 4 5

State St

Action at

• A* Search Algorithm, Manhattan Heuristic

• Affected region and probability of shortcuts and obstacles not taken into account

• Semi-known environment not taken into account

A* Search Algorithm and the Manhattan Heuristic

Destination: 31st St. & Park Ave.

Origin: 3rd Ave. & 26th St.

7 Blocks

Need to score the path to decide which way to search

F = G + H

G = the movement cost to move from the starting point A to a given square on the grid, following the path generated to get there.

H = the estimated movement cost to move from that given square on the grid to the final destination, point B.

Horizontal movement cost – 10Diagonal movement cost – 14

H – No diagonals allowed

When Destination location is unknown - Dijkstra's Algorithm - meaning H = 0

• A* Search Algorithm, Manhattan Heuristic

• Affected region and probability of shortcuts and obstacles not taken into account

• Semi-known environment not taken into account

• Java Android maze solver

Markov Decision Processes for Path Planning in Unpredictable Environment

Documents

SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. · Title: SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. Subject: SCIENTIFIC ABSTRACT MARKOV, P.U. - MARKOV, V.A. Keywords: 303!5

THE UNPREDICTABLE PAST MAKING SENSE

Markov Chains Regular Markov Chains Absorbing Markov Chains

The Reinstatement of Progressive Canada (January 11, 2016) · The Reinstatement of Progressive Canada: AN UNPREDICTABLE PATH TO A PREDICTABLE OUTCOME [Ottawa – January 11, 2016]

9 Markov chains and Hidden Markov Models - Freie … · 9 Markov chains and Hidden Markov Models We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward,

Markov Chains. Summary Markov Chains Discrete Time Markov Chains Homogeneous and non-homogeneous Markov chains Transient and steady state Markov chains

IPv6 Deploymet for the Enterprise · 2013. 4. 2. · Migration path for IPv6 High connection rate for “always-on” devices Unpredictable traffic patterns, Cyber Attacks Limitations

Skenario 1 Batuk Unpredictable

An Unpredictable Past view content

Hurricane Irma - Florida Department of Transportation · Hurricane Irma’s Unpredictable Path ... and Panhandle As traffic increased ... petroleum companies keep a larger supply

NAVIGATINGTHROUGH THE UNPREDICTABLE

Hidden Markov Model - The Most Probable Path

Geometric and Markov Chain Based Methods for Data AnalysisGeometric and Markov Chain Based Methods for Data Analysis Middlebrooks Introduction Part 1 Di usion Maps Transition Path

Markov Latent Feature Models - Columbia University …az2385/docs/zp16_icml.pdf · We introduce Markov latent feature models ... path based tree structure to select latent features

Complex and Unpredictable Cardano

Markov Systems and Markov Decision Processes...States, Actions, Observations Passive Controlled Fully Observable Markov Model Markov Decision Process (MDP) Hidden State Hidden Markov

Hidden Markov Model Nov 11, 2008 Sung-Bae Cho. Hidden Markov Model Inference of Hidden Markov Model Path Tracking of HMM Learning of Hidden Markov Model

111008 Unpredictable Spaces Metanomics Transcript

IdiopathicPulmonary Fibrosis ( IPF) - Genentech Unpredictable Journey.pdfAN UNPREDICTABLE JOURNEY Here, we follow Eugene’s journey with IPF. IPF is an irreversible, unpredictable,

Nobody’s Unpredictable