43
Introduction Introduction Dr. Khaled Wassif Dr. Khaled Wassif Spring 2008-2009 Spring 2008-2009 Machine Machine Learning Learning

Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Embed Size (px)

Citation preview

Page 1: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

IntroductionIntroduction

Dr. Khaled WassifDr. Khaled Wassif

Spring 2008-2009Spring 2008-2009

Machine LearningMachine Learning

Page 2: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

OutlineOutline Why Machine Learning?Why Machine Learning?

Relevant disciplinesRelevant disciplines

What is a well-defined learning problem?What is a well-defined learning problem?

How to design a learning system?How to design a learning system?

What issues arise in machine learning What issues arise in machine learning problems?problems?

Machine Learning ResearchMachine Learning ResearchMachine Learning

By Dr. Khaled Wassif

Slide 1- 2

Page 3: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Adding new kind of capability for computers:Adding new kind of capability for computers:

– Data miningData mining

» E.g. extracting new information from medical records, E.g. extracting new information from medical records, maintenance records, etc.maintenance records, etc.

– Self-customizing programsSelf-customizing programs

» A learning newsreader/browser that learns what you like A learning newsreader/browser that learns what you like and seeks it out.and seeks it out.

– Applications we can’t program by handApplications we can’t program by hand

» E.g. speech recognition and autonomous drivingE.g. speech recognition and autonomous driving

Why Machine Learning?Why Machine Learning?

Machine Learning

By Dr. Khaled Wassif

Slide 1- 3

Page 4: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Understanding human learning and teaching:Understanding human learning and teaching:

– Mature mathematical models might contribute Mature mathematical models might contribute approaching into biological ones.approaching into biological ones.

The time is right:The time is right:

– Recent progress in algorithms and theoryRecent progress in algorithms and theory

– Enormous amounts of data and applicationsEnormous amounts of data and applications

» Increasing online data, increasing networking to share itIncreasing online data, increasing networking to share it

– Substantial computational powerSubstantial computational power

– Promising industryPromising industry

Why Machine Learning?Why Machine Learning?

Machine Learning

By Dr. Khaled Wassif

Slide 1- 4

Page 5: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Typical Data Mining TaskTypical Data Mining Task

Machine Learning

By Dr. Khaled Wassif

Slide 1- 5

Data:Data:

Given:Given:─ 9714 patient records, each describing a pregnancy and birth.9714 patient records, each describing a pregnancy and birth.

─ Each patient record contains 215 features, some unspecified Each patient record contains 215 features, some unspecified (we have little control over data).(we have little control over data).

─ Learn to predict classes of future patients at high risk for Learn to predict classes of future patients at high risk for emergency cesarean section.emergency cesarean section.

Page 6: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Data Mining ResultData Mining Result

Machine Learning

By Dr. Khaled Wassif

Slide 1- 6

One of 18 learned rules:One of 18 learned rules:If If No previous vaginal delivery, andNo previous vaginal delivery, and

Abnormal 2nd Trimester Ultrasound, andAbnormal 2nd Trimester Ultrasound, andMalpresentation at admissionMalpresentation at admission

Then Then Probability of Emergency C-Section is 0.6Probability of Emergency C-Section is 0.6

Over training data: 26/41 = .63,Over training data: 26/41 = .63,Over test data: 12/20 = .60Over test data: 12/20 = .60

Data:Data:

Page 7: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Credit Risk AnalysisCredit Risk Analysis

Machine Learning

By Dr. Khaled Wassif

Slide 1- 7

Rules learned:Rules learned:If If Other-Delinquent-Accounts > 2, andOther-Delinquent-Accounts > 2, and

Number-Delinquent-Billing-Cycles > 1Number-Delinquent-Billing-Cycles > 1Then Then Profitable-Customer? = NoProfitable-Customer? = No

[Deny Credit Card application][Deny Credit Card application]

If If Other-Delinquent-Accounts = 0, andOther-Delinquent-Accounts = 0, and(Income > $30k) OR (Years-of-Credit > 3)(Income > $30k) OR (Years-of-Credit > 3)

Then Then Profitable-Customer? = YesProfitable-Customer? = Yes[Accept Credit Card application][Accept Credit Card application]

Data:Data:

Page 8: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Software that Customizes to UserSoftware that Customizes to User

Machine Learning

By Dr. Khaled Wassif

Slide 1- 8

Adaptive Adaptive Web SitesWeb Sites

Page 9: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Now most pocket Speech Recognizers or Translators Now most pocket Speech Recognizers or Translators are running on some sort of learning device --- the are running on some sort of learning device --- the more you play/use them, the smarter they become!more you play/use them, the smarter they become!

Natural Language Processing and Natural Language Processing and Speech RecognitionSpeech Recognition

Machine Learning

By Dr. Khaled Wassif

Slide 1- 9

Page 10: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Behind a security camera, Behind a security camera, most likely there is a most likely there is a computer that is learning computer that is learning and/or checking!and/or checking!

Object RecognitionObject Recognition

Machine Learning

By Dr. Khaled Wassif

Slide 1- 10

Page 11: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

The best helicopter pilot The best helicopter pilot is now a computer!is now a computer!

– it runs a program that it runs a program that learns how to fly and learns how to fly and make acrobatic military make acrobatic military exercises by itself!exercises by itself!

– no taped instructions, no taped instructions, joysticks, or things like joysticks, or things like ……

Robotic ControlRobotic Control

Machine Learning

By Dr. Khaled Wassif

Slide 1- 11

Page 12: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Now cars can find their own ways!Now cars can find their own ways!

Robotic ControlRobotic Control

Machine Learning

By Dr. Khaled Wassif

Slide 1- 12

Page 13: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Reading, digesting, and Reading, digesting, and categorizing a vast text categorizing a vast text database is too much for database is too much for human!human!

Text MiningText Mining

Machine Learning

By Dr. Khaled Wassif

Slide 1- 13

We want:We want:

Page 14: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Understanding Brain ActivitiesUnderstanding Brain Activities

Machine Learning

By Dr. Khaled Wassif

Slide 1- 14

Page 15: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

GacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtcgcgggctctattgtctcgatccgcggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatGacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtcgcgggctctattgtctcgatccgcggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattatgagcggactgaagccactttggattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagtcgctggcagcacttgcgaatgtactgatgttcctcataagactggcaaatcgctacaaataaattaaagtccggcggctaattatgagcggactgaagccactttggattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagtcgctggcagcacttgcgaatgtactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaccagatggctggtgttttggcacagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaccagatggctggtgttttggcacagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagatcggcgctgataaggcccaatgtcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatattcattcctttggttctgtaccgcaccgcataatcctcgccagatcggcgctgataaggcccaatgtcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgagttatcacaacaccatatccatacatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccacttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgagttatcacaacaccatatccatacatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagagcgcaatgccaaggacaattgccgtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatcgaatgtggaaggcgcccagcaggcaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttattcgacagagcgcaatgccaaggacaattgccgtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatcgaatgtggaaggcgcccagcaggcaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcaggttcggttgtgtaaccttcgtgttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagctggccaattgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcaggttcggttgtgtaaccttcgtgttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagctggccaattcgtattatcgctgtttacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcaaacggaaacggagcgagcactatagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgcgtattatcgctgtttacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcaaacggaaacggagcgagcactatagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcaggcgaaagcggcagtgagcgaaagcgagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcaagactatagactccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcaggcgaaagcggcagtgagcgaaagcgagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcaagactatagattctcatacagattgagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttttctacgcttgcactgaaaatagaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattctcatacagattgagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttttctacgcttgcactgaaaatagaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccggcttgtaattgaagttgttgatgagttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagtcgtttaattcaattaaaaggtagaaattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccggcttgtaattgaagttgttgatgagttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagtcgtttaattcaattaaaaggtagaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacactgggttgttgactcgccagattgattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacactgggttgttgactcgccagattgattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctcttccatttggttttgtatttacgtacnnnnnnnnhjhjhjhjhjhjhjcbashyudtsscfs\aggctctgatccaagcaactgcaatccggatttcgattttctcttccatttggttttgtatttacgtacnnnnnnnnhjhjhjhjhjhjhjcbashyudtsscfs\xnbxncjuauxvxuaafxgxjbxnvxfaquixaxbahxvybvbbnvnbvbnvbnvvvbnvvbnvnbvagdqsgddqsaachjdchxklCVXOIDUQUIFUYVFJHVFJHVEDFQWW;ODJKJBFDJHFJFKJHFKJHFKJEHFEHFKJEHFKJHWFHFKJHEFaagcattctaatgaagacttggaxnbxncjuauxvxuaafxgxjbxnvxfaquixaxbahxvybvbbnvnbvbnvbnvvvbnvvbnvnbvagdqsgddqsaachjdchxklCVXOIDUQUIFUYVFJHVFJHVEDFQWW;ODJKJBFDJHFJFKJHFKJHFKJEHFEHFKJEHFKJHWFHFKJHEFaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggattaccaaaagaggcatttgttttcttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtgaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggattaccaaaagaggcatttgttttcttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtttcaaatcgaatactccacatctcttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagcgagagtctgcttgaggtgctggtcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgccgtttcaaatcgaatactccacatctcttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagcgagagtctgcttgaggtgctggtcgtcccagctggataaccttgctgtacagatcggcatctgcctggagggcacgatcgaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtttcgagttgacgggctgcaccacgctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatccttaagaaagcaaatccttccagtggacgaacttcacctgctcgctgggaatagcgttgttgtcaagcagctcaaggagcgtttcgagttgacgggctgcaccacgctgctccttcgctggggattcccctgcgggtaagcgccgcttgcttggactcgtttccaaatcccatagccacgccagcagaggagtaacagagctcwhereisthegenetgattaaaaatatccttaagaaagcccatgggtataacttactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggctttggccacaagacccaccaagagcgaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttccatgggtataacttactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggctttggccacaagacccaccaagagcgaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggattagcattatcgatcggaaagtgataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtttttctcttatatacgttttataacccagcatattttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggattagcattatcgatcggaaagtgataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtttttctcttatatacgttttataacccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagtgttgtttatttttgcaatcgaaacgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagtgttgtttatttttgcaatcgaaacgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggacgcaaaaaggtatagccccacaaaccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtcgcgggctctattgtctcgatccgcggcgcagttcgcgtgcttagcggtcagccgaggcgagtctggttagaaatttacatggacgcaaaaaggtatagccccacaaaccacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtcgcgggctctattgtctcgatccgcggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattatgagcggactgaagccactttggattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagcaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattatgagcggactgaagccactttggattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagtcgctggcagcacttgcgaatgtactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaccagatggctggtgttttggcacagcggactcccctcgaacgctctcgaaatcaagtggctttcccattattaagtttttcctcagtcgctggcagcacttgcgaatgtactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaccagatggctggtgttttggcacagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagatcggcgctgataaggcccaatgtcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagatcggcgctgataaggcccaatgtcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgagttatcacaacaccatatccatacatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgagttatcacaacaccatatccatacatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagagcgcaatgccaaggacaattgccgtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatcgaatgtggaacagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagagcgcaatgccaaggacaattgccgtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatcgaatgtggaaggcgcccagcaggcaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcaggttcggttgtgtaaccttcgtgttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaggcgcccagcaggcaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcaggttcggttgtgtaaccttcgtgttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagctggccaattcgtattatcgctgtttacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcaaacggaaacggagcgagcactatagtactagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagctggccaattcgtattatcgctgtttacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcaaacggaaacggagcgagcactatagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcaggcgaaagcggcagtgagcgaaagcgagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaaatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcaggcgaaagcggcagtgagcgaaagcgagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcaagactatagattctcatacagattgagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttttctacgcttgcactgaaaatagaaccaaagtaaacacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcaagactatagattctcatacagattgagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttttctacgcttgcactgaaaatagaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccggcttgtaattgaagttgttgatgagttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccggcttgtaattgaagttgttgatgagttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgccatcatcctgtttgggaattcaattcgcgggcagtcgtttaattcaattaaaaggtagaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacactgggttgttgactcgccagattgattgcagtggagtggacctggtcaatcatcctgtttgggaattcaattcgcgggcagtcgtttaattcaattaaaaggtagaaaagggagcagaagaatgcgatcgctggaatttcctaacatcacggaccccataaatttgataagcccgagctcgctgcgttgagtcagccaccccacatccccaaatccccgccaaaagaagacactgggttgttgactcgccagattgattgcagtggagtggacctggtcaaagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctcttccatttggttttgtatttacgtacaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtagaagcaccgttaatgtgctgattccattcgattccatccgggaatgcgataaagaaaggctctgatccaagcaactgcaatccggatttcgattttctcttccatttggttttgtatttacgtacaagcattctaatgaagacttggagaagacttacgttatattcagaccatcgtgcgatagaggatgagtcatttccatatggccgaaatttattatgtttactatcgtttttagaggtgttttttggattaccaaaagaggcatttgttttcttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtttcaaatcgaatactccacatctcttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtgttttttggattaccaaaagaggcatttgttttcttcaactgaaaagatatttaaattttttcttggaccattttcaaggttccggatatatttgaaacacactagctagcagtgttggtaagttacatgtatttctataatgtcatattcctttgtccgtttcaaatcgaatactccacatctcttgtacttgaggaattggcgatcgtagcgatttcccccgccgtaaagttcctgatcctcgttgtttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagcgagagtctgcttgaggtgctggtcGacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtcgcgggctctattgtctcgatccttttgtacatcataaagtccggattctgctcgtcgccgaagatgggaacgaagctgccaaagcgagagtctgcttgaggtgctggtcGacatcgctgcgtttcggcagctaattgccttttagaaattattttcccatttcgagaaactcgtgtgggatgccggatgcggctttcaatcacttctggcccgggatcggattgggtcacattgtcgcgggctctattgtctcgatccgcggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattatgagcggactgaagccactttggattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaagcggcgcagttcgcgtgcttagcggtcagaaaggcagagattcggttcggattgatgcgctggcagcagggcacaaagatctaatgactggcaaatcgctacaaataaattaaagtccggcggctaattatgagcggactgaagccactttggattaaccaaaaaacagcagataaacaaaaacggcaaagaaaattgccacagagttgtcacgctttgttgcacaaacatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagtcgctggcagcacttgcgaatgtactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaccagatggctggtgttttggcacagcggactcccccatttgtgcagaaaagtgaaaagcttttagccattattaagtttttcctcagtcgctggcagcacttgcgaatgtactgatgttcctcataaatgaaaattaatgtttgctctacgctccaccgaactcgcttgtttgggggattggctggctaatcgcggctagatcccaggcggtataaccttttcgcttcatcagttgtgaaccagatggctggtgttttggcacagcggactcccctcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagatcggcgctgataaggcccaatgtcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggttcgaacgctctcgaaatcaagtggctttccagccggcccgctgggccgctcgcccactggaccggtattcccaggccaggccacactgtaccgcaccgcataatcctcgccagatcggcgctgataaggcccaatgtcactccgcaggcgtctatttatgccaaggaccgttcttcttcagctttcggctcgagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgagttatcacaacaccatatccatacatatccatatcaatatcaatatcgctattacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatattcattcctttggttttttgtcttcgacggactgaaaagtgcggagagaaacccaaaaacagaagcgcgcaaagcgccgttaatatgcgaactcagcgaactcattgagttatcacaacaccatatccatacatatccatatcaatatcaatatcgctattattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagagcgcaatgccaaggacaattgccgtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatgattaacgatcatgctctgctgatcaagtattcagcgctgcgctagattcgacagattgaatcgagctcaatagactcaacagactccactcgacagagcgcaatgccaaggacaattgccgtggagtaaacgaggcgtatgcgcaacctgcacctggcggacgcggcgtatgcgcaatgtgcaattcgcttaccttctcgttgcgggtcaggaactcccagatgggaatggccgatgacgagctgatcgaatgtggaaggcgcccagcaggcaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcaggttcggttgtgtaaccttcgtgttctttttttttagggcccaataaaagcgcttttgtggcggcttgagccgatgacgagctgatcgaatgtggaaggcgcccagcaggcaagattactttcgccgcagtcgtcatggtgtcgttgctgcttttatgttgcgtactccgcactacacggagagttcaggggattcgtgctccgtgatctgtgatccgtgttccgtgggtcaattgcaggttcggttgtgtaaccttcgtgttctttttttttagggcccaataaaagcgcttttgtggcggcttgatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagctggccaattcgtattatcgctgtttacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcatagattatcacttggtttcggtggctagccaagtggctttcttctgtccgacgcacttaattgaattaaccaaacaacgagctggccaattcgtattatcgctgtttacgtgtgtctcagcttgaaacgcaaaagcttgtttcacacatcggtttctcggcaagatgggggagtcagtcggtctagggagaggggcgcccaccagtcgatcacgaaaacggcgaattccaagcaaacggaaacggagcgagcactatagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcaggcgaaagcggcagtgagcgaaagcgagcggcagcgggtgggggatcgggagccccccgaaaaaaacagagaacggaaacggagcgagcactatagtactatgtcgaacaaccgatcgcggcgatgtcagtgagtcgtcttcggacagcgctggcgctccacacgtatttaagctctgagatcggctttgggagagcgcagagagcgccatcgcacggcaggcgaaagcggcagtgagcgaaagcgagcggcagcgggtgggggatcgggagccccccgaaaaaaacagaggcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcaagactatagattctcatacagattgagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttttctacgcgcacgtcgatgccatcggggaattggaacctcaatgtgtgggaatgtttaaatattctgtgttaggtagtgtagtttcaagactatagattctcatacagattgagtccttcgagccgattatacacgacagcaaaatatttcagtcgcgcttgggcaaaaggcttaagcacgactcccagtccccccttacatttgtcttcctaagcccctggagccactatcaaacttttctacgcttgcactgaaaatagaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccggcttgtaattgaagttgttgatgagttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataagcttgcactgaaaatagaaccaaagtaaacaatcaaaaagaccaaaaacaataacaaccagcaccgagtcgaacatcagtgaggcattgcaaaaatttcaaagtcaagtttgcgtcgtcatcgcgtctgagtccgatcaagccggcttgtaattgaagttgttgatgagttactggattgtggcgaattctggtcagcatacttaacagcagcccgctaattaagcaaaataaacatatcaaattccagaatgcgacggcgcatccttaagaaagcccatgggtataacttactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggctttggccacaagacccaccaagagcgaggactgttatgattctcatgctgatgacatatcaaattccagaatgcgacggcgcatccttaagaaagcccatgggtataacttactgcgtcctatgcgaggaatggtctttaggttctttatggcaaagttctcgcctcgcttgcccagccgcggtacgttcttggtgatctttaggaagaatcctggactactgtcgtctgcctggctttggccacaagacccaccaagagcgaggactgttatgattctcatgctgatgcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggattagcattatcgatcggaaagtgataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatcgactgaagcttcacctgactcctgctccacaattggtggcctttatatagcgagatccacccgcatcttgcgtggaatagaaatgcgggtgactccaggattagcattatcgatcggaaagtgataaaactgaactaacctgacctaaatgcctggccataattaagtgcatacatacacattacattacttacatttgtataagaactaaattttatagtacataccacttgcgtatgtaaatgcttgtttttctcttatatacgttttataacccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagtgttgtttatttttgcaatcgaaacgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacgtaaatgcttgtttttctcttatatacgttttataacccagcatattttacgtaaaaacaaaacggtaatgcgaacataacttatttattggggcccggaccgcaaaccggccaaacgcgtttgcacccataaaaacataagggcaacaaaaaaattgttaagtgttgtttatttttgcaatcgaaacgctcaaatagctgcgatcactcgggagcagggtaaagtcgcctcgaaacaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggacgcaaaaaggtatagccccacaaaccacatcgctgcgtttcggcagctaattgccttttagaagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcaaggaagctgaagcatcttctataaatacactcaaagcgatcattccgaggcgagtctggttagaaatttacatggacgcaaaaaggtatagccccacaaaccacatcgctgcgtttcggcagctaattgccttttagaagtatttgttgtgccatgttggttacgatgccaatcgcggtacagttatgcaaatgagcagcgaataccgctcctgacaatgaacggcgtcttgtcatattcatgctgacattcatatttattcatgctgacattcatatt

BioinformaticsBioinformatics

Machine Learning

By Dr. Khaled Wassif

Slide 1- 15

Where is the gene (DNA)?Where is the gene (DNA)?

Page 16: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

EvolutionEvolution

Machine Learning

By Dr. Khaled Wassif

Slide 1- 16

ancestorancestor

T yearsT years

??

Page 17: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Artificial intelligence: Artificial intelligence: – Learning symbolic representations of concepts. Learning symbolic representations of concepts.

– Using prior knowledge with training data to guide learning.Using prior knowledge with training data to guide learning.

Probability theory:Probability theory: – Computing probabilities of hypotheses and functions.Computing probabilities of hypotheses and functions.

– Algorithms for estimating values of unobserved variables.Algorithms for estimating values of unobserved variables.

Computational complexity theory:Computational complexity theory: – Bounds on inherent complexity of learning, e.g. time, data.Bounds on inherent complexity of learning, e.g. time, data.

Control theory: Control theory: – Learning to control processes to optimize performance Learning to control processes to optimize performance

measures.measures.

Relevant DisciplinesRelevant Disciplines

Machine Learning

By Dr. Khaled Wassif

Slide 1- 17

Page 18: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Information theory: Information theory:

– Measuring information entropy and Measuring information entropy and content.content. Philosophy: Philosophy:

– Analysis of the justification for generalizing beyond Analysis of the justification for generalizing beyond observed data.observed data.

Psychology and neurobiology: Psychology and neurobiology: – Practice improves performance.Practice improves performance.

– Motivating artificial neural network models of learning.Motivating artificial neural network models of learning.

Statistics: Statistics: – Characterization of errors (e.g., bias and variance) that occur Characterization of errors (e.g., bias and variance) that occur

when estimating the accuracy of a hypothesis.when estimating the accuracy of a hypothesis.

Relevant Disciplines (cont.)Relevant Disciplines (cont.)

Machine Learning

By Dr. Khaled Wassif

Slide 1- 18

Page 19: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

A computer program is said to A computer program is said to learnlearn from experience E from experience E with respect to some class of tasks T and performance with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured measure P, if its performance at tasks in T, as measured by P, improves with experience E.by P, improves with experience E.

--- Tom Mitchell--- Tom Mitchell

Machine Learning Machine Learning <T, P, E>:<T, P, E>:

– Computer program automatically improvesComputer program automatically improves

» at task at task TT

» according to performance measure according to performance measure PP

» through experience through experience EE

What is the Learning Problem?What is the Learning Problem?

Machine Learning

By Dr. Khaled Wassif

Slide 1- 19

Page 20: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

T: Playing checkersT: Playing checkers P: Percentage of games won against an arbitrary opponent P: Percentage of games won against an arbitrary opponent E: Playing practice games against itselfE: Playing practice games against itself

T: Recognizing hand-written wordsT: Recognizing hand-written words P: Percentage of words correctly classifiedP: Percentage of words correctly classified E: Database of human-labeled images of handwritten wordsE: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensorsT: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged errorP: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded E: A sequence of images and steering commands recorded

while observing a human driver.while observing a human driver.

T: Categorize email messages as spam or acceptable.T: Categorize email messages as spam or acceptable. P: Percentage of email messages correctly classified.P: Percentage of email messages correctly classified. E: Database of emails, some with human-given labelsE: Database of emails, some with human-given labels

Machine Learning ExamplesMachine Learning Examples

Machine Learning

By Dr. Khaled Wassif

Slide 1- 20

Page 21: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Part of problem specification:Part of problem specification:– T: Play checkersT: Play checkers

– P: Percent of games won in world competitionP: Percent of games won in world competition Within our control:Within our control:

– What experience?What experience?» Choose the Choose the training experiencetraining experience

– What exactly should be learned?What exactly should be learned?» Choose the Choose the target functiontarget function

– How shall it be represented?How shall it be represented?» Choose the target function representationChoose the target function representation

– What specific algorithm to learn it?What specific algorithm to learn it?

Designing a Learning SystemDesigning a Learning SystemE.g. Learning to Play CheckersE.g. Learning to Play Checkers

Machine Learning

By Dr. Khaled Wassif

Slide 1- 21

Page 22: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Direct or indirect feedback?Direct or indirect feedback?– Direct : Given board states + best move for that stateDirect : Given board states + best move for that state– Indirect: Given move sequences + outcome of gameIndirect: Given move sequences + outcome of game

» InferInfer goodnessgoodness ofof aa movemove byby whetherwhether gamegame won orwon or lostlost andand its contribution.its contribution.

Teacher or not?Teacher or not?– Teacher gives learner board states + best moveTeacher gives learner board states + best move– Learner asks teacher for best move for particular Learner asks teacher for best move for particular statestate– Learner plays against itself and has no teacher, only feedback Learner plays against itself and has no teacher, only feedback

from environmentfrom environment– If teacher exists, how nice is it? If teacher exists, how nice is it?

Is training experience representative of performance goal?Is training experience representative of performance goal?– Will the data (games) seen in training reflect those seen in Will the data (games) seen in training reflect those seen in

practice?practice?

Types of Training ExperienceTypes of Training Experience

Machine Learning

By Dr. Khaled Wassif

Slide 1- 22

Page 23: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

What target function (target concept) is to be learned and What target function (target concept) is to be learned and how it will be used by the performance system?how it will be used by the performance system?– For checkers, assume we are given a function for generating the For checkers, assume we are given a function for generating the

legal moves for a given board position and want to decide the best legal moves for a given board position and want to decide the best move.move.

» Could learn a function:Could learn a function:

ChooseMove(board, legal-moves) → best-moveChooseMove(board, legal-moves) → best-move (might work with direct info, difficult with indirect)(might work with direct info, difficult with indirect)

» Or could learn an evaluation function, Or could learn an evaluation function,

VV(board) → (board) → RR, ,

that gives each board position a score for how favorable that gives each board position a score for how favorable it is. it is.

VV can be used to pick a move by applying each legal move, scoring the can be used to pick a move by applying each legal move, scoring the resulting board position, and choosing the move that results in the highest resulting board position, and choosing the move that results in the highest scoring board position.scoring board position.

(better choice for indirect info: learn value of board states)(better choice for indirect info: learn value of board states)

Choosing the Target FunctionChoosing the Target Function

Machine Learning

By Dr. Khaled Wassif

Slide 1- 23

Page 24: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

– If If bb is a final winning board, then is a final winning board, then VV((bb) = 100) = 100

– If If bb is a final losing board, then is a final losing board, then VV((bb) = –100) = –100

– If If bb is a final draw board, then is a final draw board, then VV((bb) = 0) = 0

– Otherwise, then Otherwise, then VV((bb) = ) = VV((b' b' ) ) where where b' b' is the highest scoring is the highest scoring final board position that is achieved starting from final board position that is achieved starting from bb and and playing optimally until the end of the game (assuming the playing optimally until the end of the game (assuming the opponent plays optimally as well).opponent plays optimally as well).

» Can be computed using complete mini-max search of the game tree.Can be computed using complete mini-max search of the game tree.

Gives correct results, but is non operational, i.e. not Gives correct results, but is non operational, i.e. not efficiently computable. efficiently computable. – It involves searching the complete exponential game tree.It involves searching the complete exponential game tree.

Need to learn an operational Need to learn an operational approximationapproximation ( ) to the ( ) to the evaluation function.evaluation function.

Possible Definition for Possible Definition for VV((bb))

Machine Learning

By Dr. Khaled Wassif

Slide 1- 24

Page 25: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Target function can be represented in many ways:Target function can be represented in many ways:– lookup table lookup table

– collection of rulescollection of rules

– numerical functionnumerical function» Polynomial function of board features Polynomial function of board features

– neural network.neural network.

There is a trade-off between the expressiveness of a There is a trade-off between the expressiveness of a representation and the ease of learning. representation and the ease of learning.

The more expressive a representation, the better it The more expressive a representation, the better it will be at approximating an arbitrary function; will be at approximating an arbitrary function; however, the more examples will be needed to learn however, the more examples will be needed to learn an accurate function.an accurate function.

Representing the Target FunctionRepresenting the Target Function

Machine Learning

By Dr. Khaled Wassif

Slide 1- 25

Page 26: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

For learner to approximate For learner to approximate VV, need to give the learner , need to give the learner a set of training examples.a set of training examples.– each training example is an ordered pair of the form: each training example is an ordered pair of the form:

< b, < b, VVtraintrain(b) >.(b) >.

Uses training values for the target function to induce a Uses training values for the target function to induce a hypothesized definition that fits these examples and hypothesized definition that fits these examples and hopefully generalizes to unseen examples.hopefully generalizes to unseen examples.

In statistics, learning to approximate a continuous In statistics, learning to approximate a continuous function is called regression.function is called regression.

Attempts to minimize some measure of error (such as Attempts to minimize some measure of error (such as mean squared error).mean squared error).

Choosing a LearningChoosing a Learning AlgorithmAlgorithm

Machine Learning

By Dr. Khaled Wassif

Slide 1- 26

Page 27: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Summary of Design ChoicesSummary of Design Choices

Machine Learning

By Dr. Khaled Wassif

Slide 1- 27

Page 28: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Summary of Design Choices (cont.)Summary of Design Choices (cont.)

Machine Learning

By Dr. Khaled Wassif

Slide 1- 28

Page 29: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Evaluation of Learning SystemsEvaluation of Learning Systems ExperimentalExperimental

– Conduct controlled cross-validation experiments to compare Conduct controlled cross-validation experiments to compare various methods on a variety of benchmark datasets.various methods on a variety of benchmark datasets.

– Gather data on their performance, e.g. test accuracy, Gather data on their performance, e.g. test accuracy, training-time, testing-time.training-time, testing-time.

– Analyze differences for statistical significance.Analyze differences for statistical significance. TheoreticalTheoretical

– Analyze algorithms mathematically and prove theorems Analyze algorithms mathematically and prove theorems about their:about their:

» Computational complexityComputational complexity

» Ability to fit training dataAbility to fit training data

» Sample complexity (number of training examples needed to learn an Sample complexity (number of training examples needed to learn an accurate function)accurate function)

Machine Learning

By Dr. Khaled Wassif

Slide 1- 29

Page 30: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

What algorithms can approximate functions well (and when)? What algorithms can approximate functions well (and when)? How does number of training examples influence accuracy?How does number of training examples influence accuracy?

– [How well will it generalize given a training set of particular size?][How well will it generalize given a training set of particular size?]

How does complexity of hypothesis representation impact it? How does complexity of hypothesis representation impact it? How does noisy data [noise in examples and labels] influence How does noisy data [noise in examples and labels] influence

accuracy?accuracy? What are the theoretical limits of learnability?What are the theoretical limits of learnability? How can prior knowledge of learner help? How can prior knowledge of learner help?

– Choose set of candidate hypotheses and choose search procedure.Choose set of candidate hypotheses and choose search procedure.

What evidence can we get from biological learning systems?What evidence can we get from biological learning systems?– E.g. artificial neural networks, evolutionary algorithmsE.g. artificial neural networks, evolutionary algorithms

How can systems alter their own representations? How can systems alter their own representations? – E.g. choosing from among different types of classifiers, learning to learnE.g. choosing from among different types of classifiers, learning to learn

Some Issues in Machine LearningSome Issues in Machine Learning

Machine Learning

By Dr. Khaled Wassif

Slide 1- 30

Page 31: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Learning can be viewed as using direct or indirect Learning can be viewed as using direct or indirect experience to approximate a chosen target function.experience to approximate a chosen target function.

Function approximation can be viewed as a search Function approximation can be viewed as a search through a space of hypotheses (representations of through a space of hypotheses (representations of functions) for one that best fits a set of training data.functions) for one that best fits a set of training data.

Different learning methods assume different Different learning methods assume different hypothesis spaces (representation languages) and/or hypothesis spaces (representation languages) and/or employ different search techniques.employ different search techniques.

Some Issues in Machine LearningSome Issues in Machine Learning

Machine Learning

By Dr. Khaled Wassif

Slide 1- 31

Page 32: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Various Function RepresentationsVarious Function Representations Numerical functionsNumerical functions

– Linear regressionLinear regression– Neural networksNeural networks– Support vector machinesSupport vector machines

Symbolic functionsSymbolic functions– Decision treesDecision trees– Rules in propositional logicRules in propositional logic– Rules in first-order predicate logicRules in first-order predicate logic

Instance-based functionsInstance-based functions– Nearest-neighborNearest-neighbor– Case-basedCase-based

Probabilistic Graphical ModelsProbabilistic Graphical Models– Naïve BayesNaïve Bayes– Bayesian networksBayesian networks– Hidden-Markov Models (HMMs)Hidden-Markov Models (HMMs)– Probabilistic Context Free Grammars (PCFGs)Probabilistic Context Free Grammars (PCFGs)– Markov networksMarkov networks

Machine Learning

By Dr. Khaled Wassif

Slide 1- 32

Page 33: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Various Search AlgorithmsVarious Search Algorithms Gradient descentGradient descent

– PerceptronPerceptron

– BackpropagationBackpropagation

Dynamic ProgrammingDynamic Programming– HMM LearningHMM Learning

– PCFG LearningPCFG Learning

Divide and ConquerDivide and Conquer– Decision tree inductionDecision tree induction

– Rule learningRule learning

Evolutionary ComputationEvolutionary Computation– Genetic Algorithms (GAs)Genetic Algorithms (GAs)

– Genetic Programming (GP)Genetic Programming (GP)

– Neuro-evolutionNeuro-evolutionMachine Learning

By Dr. Khaled Wassif

Slide 1- 33

Page 34: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

History of Machine LearningHistory of Machine Learning 1950s1950s

– Samuel’s checker playerSamuel’s checker player– Selfridge’s PandemoniumSelfridge’s Pandemonium

1960s: 1960s: – Neural networks: PerceptronNeural networks: Perceptron– Pattern recognition Pattern recognition – Learning in the limit theoryLearning in the limit theory– Minsky and Papert prove limitations of PerceptronMinsky and Papert prove limitations of Perceptron

1970s: 1970s: – Symbolic concept inductionSymbolic concept induction– Winston’s arch learnerWinston’s arch learner– Expert systems and the knowledge acquisition bottleneckExpert systems and the knowledge acquisition bottleneck– Quinlan’s ID3Quinlan’s ID3– Michalski’s AQ and soybean diagnosisMichalski’s AQ and soybean diagnosis– Scientific discovery with BACONScientific discovery with BACON– Mathematical discovery with AMMathematical discovery with AM

Machine Learning

By Dr. Khaled Wassif

Slide 1- 34

Page 35: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

History of Machine Learning (cont.)History of Machine Learning (cont.) 1980s:1980s:

– Advanced decision tree and rule learningAdvanced decision tree and rule learning– Explanation-based Learning (EBL)Explanation-based Learning (EBL)– Learning and planning and problem solvingLearning and planning and problem solving– Utility problemUtility problem– AnalogyAnalogy– Cognitive architecturesCognitive architectures– Resurgence of neural networks (connectionism, backpropagation)Resurgence of neural networks (connectionism, backpropagation)– Valiant’s PAC Learning TheoryValiant’s PAC Learning Theory– Focus on experimental methodologyFocus on experimental methodology

1990s1990s– Data miningData mining– Adaptive software agents and web applicationsAdaptive software agents and web applications– Text learningText learning– Reinforcement learning (RL)Reinforcement learning (RL)– Inductive Logic Programming (ILP)Inductive Logic Programming (ILP)– Ensembles: Bagging, Boosting, and StackingEnsembles: Bagging, Boosting, and Stacking– Bayes Net learningBayes Net learning

Machine Learning

By Dr. Khaled Wassif

Slide 1- 35

Page 36: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

History of Machine Learning (cont.)History of Machine Learning (cont.) 2000s2000s

– Support vector machinesSupport vector machines– Kernel methodsKernel methods– Graphical modelsGraphical models– Statistical relational learningStatistical relational learning– Transfer learningTransfer learning– Sequence labelingSequence labeling– Collective classification and structured outputsCollective classification and structured outputs– Computer Systems ApplicationsComputer Systems Applications

» Compilers – Debugging – Graphics Compilers – Debugging – Graphics » Security (intrusion, virus, and worm detection)Security (intrusion, virus, and worm detection)

– Email managementEmail management– Personalized assistants that learnPersonalized assistants that learn– Learning in robotics and visionLearning in robotics and vision

Machine Learning

By Dr. Khaled Wassif

Slide 1- 36

Page 37: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Machine Learning seeks to develop Machine Learning seeks to develop theoriestheories and and computer computer systemssystems for for– representing;representing;– classifying, clustering and recognizing;classifying, clustering and recognizing;– reasoning under uncertainty;reasoning under uncertainty;– predicting;predicting;– and reacting toand reacting to– ……

complex, real world data, based on complex, real world data, based on the system's own experience with datathe system's own experience with data, , and (hopefully) under and (hopefully) under aa unified model or mathematical frameworkunified model or mathematical framework, that, that– can be formally characterized and analyzedcan be formally characterized and analyzed– can take into account human prior knowledgecan take into account human prior knowledge– can generalize and adapt across data and domainscan generalize and adapt across data and domains– can operate automatically and autonomouslycan operate automatically and autonomously– and can be interpreted and perceived by human.and can be interpreted and perceived by human.

Technical Definition of M.L.Technical Definition of M.L.

Machine Learning

By Dr. Khaled Wassif

Slide 1- 37

Page 38: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

HowHow cancan wewe build computer systems that automatically build computer systems that automatically improve with experience, and what laws govern improve with experience, and what laws govern learning in general?learning in general?

Statistics:Statistics:– What can be inferred from data plus a set of modeling What can be inferred from data plus a set of modeling

assumptions, with what reliability?assumptions, with what reliability? Computer Science:Computer Science:

– How can we build computers to solve problems, and which How can we build computers to solve problems, and which problems are inherently tractable/intractable?problems are inherently tractable/intractable?

Animal & Human Learning:Animal & Human Learning:– What mechanisms explain learning in animals, and what What mechanisms explain learning in animals, and what

teaching strategies are most effective?teaching strategies are most effective?

The ChallengeThe Challenge

Machine Learning

By Dr. Khaled Wassif

Slide 1- 38

Page 39: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Applications:Applications:– Intelligent agentsIntelligent agents

– Text analysisText analysis

– Cell biologyCell biology

– MarketingMarketing

– Brain imagingBrain imaging

– RoboticsRobotics

– Counter-TerrorismCounter-Terrorism

– Online tutoring systemsOnline tutoring systems

– Computer visionComputer vision

– … …

Machine Learning ResearchMachine Learning Research

Machine Learning

By Dr. Khaled Wassif

Slide 1- 39

Core Issues:Core Issues:– Transfer learningTransfer learning

– Learning from labeled and Learning from labeled and unlabeled dataunlabeled data

– Graphical prob. modelsGraphical prob. models

– Privacy-preserving data miningPrivacy-preserving data mining

– Mixed-initiative learningMixed-initiative learning

– Active learningActive learning

– Time series modelsTime series models

– Never-ending learningNever-ending learning

– ……

Page 40: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Modern techniquesModern techniques– Probabilistic graphical modelsProbabilistic graphical models

» Bayesian network, dynamic Bayesian network, Markov random fields…Bayesian network, dynamic Bayesian network, Markov random fields…– Kernel methodsKernel methods

» Support vector machines, kernel PCA, all kinds of kernel machines …Support vector machines, kernel PCA, all kinds of kernel machines …– Spectral graph analysisSpectral graph analysis

» Normalized cuts, spectral clustering …Normalized cuts, spectral clustering …– Markov decision processes (MDPs) and POMDPsMarkov decision processes (MDPs) and POMDPs

» read Sutton & Barto's positioning book, check out Ng, Dietterich, Parr, read Sutton & Barto's positioning book, check out Ng, Dietterich, Parr, Littman …Littman …

– Metric learning, manifold learning, embedding, source separationMetric learning, manifold learning, embedding, source separation» Too many to listToo many to list

– Hierarchical Bayesian models, nonparametric Bayesian analysisHierarchical Bayesian models, nonparametric Bayesian analysis» Gaussian processes, Dirichlet processesGaussian processes, Dirichlet processes

– Probabilistic relational modelsProbabilistic relational models» PRM, BLOGPRM, BLOG

– ……

Major Technical ParadigmsMajor Technical Paradigms

Machine Learning

By Dr. Khaled Wassif

Slide 1- 40

Page 41: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Resources: DatasetsResources: Datasets UCI Repository:UCI Repository:

http://www.ics.uci.edu/~mlearn/MLRepository.html

UCI KDD Archive:UCI KDD Archive:http://kdd.ics.uci.edu/summary.data.application.html

Statlib: Statlib: http://lib.stat.cmu.edu/

Delve:Delve:http://www.cs.utoronto.ca/~delve/

WEKAWEKA::Machine Learning

By Dr. Khaled Wassif

Slide 1- 41

Page 42: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Resources: JournalsResources: Journals Journal of Machine Learning Research Journal of Machine Learning Research www.jmlr.org Machine Learning Machine Learning Neural ComputationNeural Computation Neural NetworksNeural Networks IEEE Transactions on Neural NetworksIEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine IEEE Transactions on Pattern Analysis and Machine

IntelligenceIntelligence Annals of StatisticsAnnals of Statistics Journal of the American Statistical AssociationJournal of the American Statistical Association ......

Machine Learning

By Dr. Khaled Wassif

Slide 1- 42

Page 43: Introduction Introduction Dr. Khaled Wassif Spring 2008-2009 Machine Learning

Resources: ConferencesResources: Conferences International Conference on Machine Learning (ICML) International Conference on Machine Learning (ICML)

– ICML0ICML077: http://oregonstate.edu/conferences/icml2007/: http://oregonstate.edu/conferences/icml2007/ European Conference on Machine Learning (ECML)European Conference on Machine Learning (ECML)

– ECML0ECML088: http://www.ecmlpkdd2008.org/: http://www.ecmlpkdd2008.org/ Neural Information Processing Systems (NIPS)Neural Information Processing Systems (NIPS)

– NIPS05: http://nips.cc/NIPS05: http://nips.cc/ Uncertainty in Artificial Intelligence (UAI)Uncertainty in Artificial Intelligence (UAI)

– UAI05: http://www.cs.toronto.edu/uai2005/UAI05: http://www.cs.toronto.edu/uai2005/ Computational Learning Theory (COLT)Computational Learning Theory (COLT)

– COLT05: http://learningtheory.org/colt2005/COLT05: http://learningtheory.org/colt2005/ International Joint Conference on Artificial Intelligence (IJCAI)International Joint Conference on Artificial Intelligence (IJCAI)

– IJCAI0IJCAI077: http://ijcai: http://ijcai--0077..orgorg// International Conference on Neural Networks (Europe)International Conference on Neural Networks (Europe)

– ICANN0ICANN088: http://www.icann2008.org/: http://www.icann2008.org/ ......

Machine Learning

By Dr. Khaled Wassif

Slide 1- 43