Machine Learning for Control: Experiments with Agile...

Machine Learning for Control: Experimentswith Agile Quadrupeds and Perching UAVs

Russ TedrakeAssistant Professor

MIT Computer Science and Artificial Intelligence Lab

Russ Tedrake, MIT Machine Learning for Control

The State-of-the-Art in Bipedal Walking Robots

Honda’s ASIMO

Performance of Honda’s ASIMO Control

Works well on flat terrain, and even up stairs

Trajectories are constrained by an overly restrictive measure ofdynamic balance

Cannot compete with humans in terms of:

Speed (.83 m/s top speed)Efficiency (uses roughly 20xenergy, scaled, as a human)Robustness (no examples onuneven or unmodelled terrain)

The Challenge: Underactuated Systems

Walking is underactuated

Consider a 7 link planar biped:

6 actuators (one at each joint)

Want to control 7+ degrees offreedom.

Note: “Fully”-actuated if we assume that one foot is boltedto the ground (walking robotic arms)

Honda’s ASIMO Control

Honda’s solution: constrain thedynamics

Keep foot flat on the ground(fully actuated)Estimate danger of foot roll bymeasuring ground reaction forcesCarefully design desiredtrajectoriesKeep knees bent (avoidsingularities)High-gain PD control

Same approach used by a largenumber of “ZMP walkers”

Passive Dynamic Walking

[Collins et al., 2001]

Robust and Efficient Bipeds

To achieve high performance, we must relinquish control!

High-gain feedback is not necessary

Passive walker only works downhill, initialized by a “skilledhand”

Challenge: How do you design a minimal control system topush and pull the natural dynamics?

A Machine Learning Approach

Reinforcement learning

Every time the robot runs, give it a score(cost function)

“Learning” algorithm on robotassociates control actions with rewards

Through trial and error, the robot canlearn very advanced skills

Improved algorithms allow the robot tolearn more skills in less trials

Reinforcement learning is also known asapproximate optimal control

Reinforcement learning example

Learning To Walk in 20 minutes

Science, 2005

Learning Results

Learning only on the robot (no simulation)

Algorithm comes with theoretical convergence guarantees

Stochastic gradient descent on the average reward, despitenoise and disturbancesConverges to local minima

Very fast and robust convergence

Learns from a “blank slate” in < 20 minutesAfter initial learning, adapts to changes in a few steps

Always learning, always adapting to the terrain as it walks

Learning Results

How far can we take this learning idea?

Simple Bipeds on Rough Terrain

Quadrupeds on rough terrain (motion planning)

High-dimensional underactuated motion planning

Formulate underactuated control as a search problem (aka.kino-dynamic motion planning)

Current real-time methods limited to relatively low-dimensionalproblems (simple robots)Underactuated systems are notoriously difficult (tunnels andtubes)

Big idea: Exploit structure in the eqs of motion

H(q)q + C(q, q)q + G(q) = Bτ.

Example: Dimensionality reduction for motion planning

Make a high-dimensional underactuated system act like alow-dimensional fully-actuated systemMethod either produces the correct torques, or says “can’t dothat”Perform a low-dimensional search

Little Dog highlights

Flapping-Winged Flight

Autonomous Flapping-Winged Flight

Control in unsteady fluid dynamics

Two Goals for Flapping Flight

Probably won’t beat a propellor for efficient forward flight

Two goals for outperforming fixed-wing aircraft:

1 Aggressive aerial maneuvers (e.g. landing on a perch)

2 “Harvesting” energy from the air

The Heaving Foil (work with Jun Zhang, NYU)

[Vandenberghe et al., 2006]

Rigid, symmetric wing

Driven vertically

Free to rotatehorizontally

Symmetry breaking leads to forward flight

Flow visualization

Prospects for optimization

Previous work only consider sinusoidal trajectories

Control problem: optimize stroke form to maximize efficiency

CFD model[Alben and Shelley, 2005]

Takes approximately 36 hours to simulate 30 flaps

Can we perform the optimization directly in the fluid?

Learning results

Parameterized splinetrajectory

Policy gradient learning

Learning improves efficiency3x in just 15 minutes

Dynamic explanation ofoptima

Implications:

Optimization inexperimental fluiddynamicsControl for the birds

0 10 20 30 40 504

TrialRe

IC − Sine WaveIC − Smoothed Square Wave

An airplane that can land on a perch

Conventional aircraft use high-gain feedback and avoidcomplicated nonlinear dynamics regimes (just like ASIMO)

Higher performance in maneuverability and efficiency if youexploit the nonlinear, unsteady fluid dynamics (just likewalking)

A benchmark problem: landing on a perch

Experiment Design

Glider (no propellor)

Dihedral (passive rollstability)

Offboard sensing andcontrol

System Identification

Real flight data

Very high angle-of-attack regimesSurprisingly good match to theoryVortex shedding

Lift Coefficient

−20 0 20 40 60 80 100 120 140 160−1.5

−0.5

Angle of Attack

Glider DataFlat Plate Theory

Drag Coefficient

−20 0 20 40 60 80 100 120 140 160−0.5

Angle of Attack

Glider DataFlat Plate Theory

Glider Perching

Enters motion capture @ 6 m/s.

Perch is < 3.5 m away.

Entire trajectory @ 1 second.

RequiresSeparation!

Conclusions

New tools from machine learning can help solveunderactuated control problems, such as the control oflocomotion, but we must take advantage of:

Mechanical design and plant dynamicsClassical control

Approximate optimal control solutions can exploit thedynamics of walking machines, to produce efficient and robustgaits.

Initial evidence suggests that designing control policies forfluid systems (at intermediate Reynolds numbers) might beeasier than completely describing their dynamics (birds don’tsolve Navier Stokes)

Alben, S. and Shelley, M. (2005).Coherent locomotion as an attracting state for a free flappingbody.Proceedings of the National Academy of Sciences,102(32):11163–11166.

Collins, S. H., Wisse, M., and Ruina, A. (2001).A three-dimensional passive-dynamic walking robot with twolegs and knees.International Journal of Robotics Research, 20(7):607–615.

Vandenberghe, N., Childress, S., and Zhang, J. (2006).On unidirectional flight of a free flapping wing.Physics of Fluids, 18.

Machine Learning for Control: Experiments with Agile...

Documents

American Honda’s Civic Natural Gas - Home | · PDF fileAmerican Honda’s Civic Natural Gas ... with traction control Security system with remote ... Air conditioning with air filtration

ASIMO Humanoid robot

TF(ESCENARIOS-FAR) ASIMO

Lignes d’échantillonnage NomoLine€¦ · asimo. Tos roits rsers A EN-PLM-11481B Masimo nternational Tel 41 2 2 1111 info-international@masimo.com Masimo .S. Tel 1 7 asimo info-america@masimo.com

Agile Product Lifecycle Management - Oracle · Database Upgrade Guide 2 Agile Product Lifecycle Management Agile 9.2.1 Agile 9.2.0.2 Agile 9.2.0.1 Agile 9.2 Agile 9.1 SP1-SP4 Agile

AT&T and Red Bull Racing Honda’s relationship brings A

Prueba ENLACE 2010 Tercer Grado - · PDF fileENLACE.10_3°SEC 2 3. Daniel elaboró la siguiente síntesis de los textos relacionados con Asimo: Asimo es un robot pequeño y ligero

Comp 3104 20091 Sojourner 1996 MIT Kismet Honda Asimo 2003 NavLab (CMU) Why Study Robots now?

Honda’s Technical Workforce Development Initiativeknowledgecenter.csg.org/kc/system/files/Honda's Technical Workforce... · Honda’s Technical Workforce Development Initiative

ASIMO(FAITH FOSTER).ppt

Agile Agile Agile Agile Agile Agile Agile: Transitioning ...joejr.com/Agile-ftta-11-2016.pdf · from from from from from from from from ... Scaling Agile • 3800 respondents in latest

ASIMO - IDA > Home729G43/projekt/studentpap... · ASIMO kan också höra med hjälp av sina mikrofoner som sitter på huvudet. Dessa mikrofoner gör att roboten kan följa röstkommandon,

Agile vs agile (vs agile)

Learning Together: ASIMO Developing an Interactive ... · Learning Together: ASIMO Developing an Interactive Learning Partnership with Children

Honda’s Global Operation · Honda’s Global Operation and Our Business in Canada Honda Motor Co., Ltd March 2016. Table of Contents 1．Introduction 2．Principle of Global Operation

Christian Bermúdez uscl021812...3. Un robot debe proteger su propia existencia, a no ser que esté en conflicto con las dos primeras leyes. ASIMO ASIMO (acrónimo de "Advanced Step

Humanoid Robots - uni-hamburg.detams.informatik.uni-hamburg.de/lehre/2010ws/seminar/ra/PDF/Humaniod...ASIMO Concept ASIMO History Creating New Mobility Follwing in the steps of Honda

Brain Interface Design for Asynchronous Control. ASIMO made by HONDA

HONDA ADVANCED · 2008-11-17 · Honda’s 4-stroke engines are far more fuel and oil efficient than 2-stroke engines on the market. And because of Honda’s superior build quality,

Honda Cars HONDA EnjoyHonda r Enjoy Honda MOTEGI rHonda ... · ASIMO