13
General Information General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom: AH123 E-mail: [email protected] Homepage: http://www2.cs.uh.edu/~ceick/

General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123 E-mail: [email protected]@aol.com

Embed Size (px)

Citation preview

General InformationGeneral Information

Course Id: COSC6342 Machine Learning

Time: TU/TH 10a-11:30a

Instructor: Christoph F. Eick

Classroom: AH123

E-mail: [email protected]

Homepage: http://www2.cs.uh.edu/~ceick/

22

What is Machine Learning?What is Machine Learning? Machine Learning Machine Learning is theis the

• study of algorithms thatstudy of algorithms that• improve their performanceimprove their performance• at some taskat some task• with experiencewith experience

Role of Statistics: Inference from a sampleRole of Statistics: Inference from a sample Role of Computer science: Efficient algorithms toRole of Computer science: Efficient algorithms to

• Solve optimization problemSolve optimization problemss• Representing and evaluating the model for Representing and evaluating the model for

inferenceinference

33

ApplicationsApplications of Machine Learning of Machine Learning Supervised LearningSupervised Learning

• ClassificationClassification• Prediction Prediction

Unsupervised LearningUnsupervised Learning• Association Analysis Association Analysis • Clustering Clustering

Preprocessing and Summarization of DataPreprocessing and Summarization of Data Reinforcement LearningReinforcement Learning Activities Related to Models Activities Related to Models

• Learning parameters of modelsLearning parameters of models• Choosing/Comparing modelsChoosing/Comparing models• ……

Prerequisites Prerequisites BackgroundBackground ProbabilitiesProbabilities

• Distributions, densities, marginalization…Distributions, densities, marginalization… Basic statisticsBasic statistics

• Moments, typical distributions, regression Moments, typical distributions, regression Basic knowledge of optimization techniquesBasic knowledge of optimization techniques AlgorithmsAlgorithms

• basic data structures, complexity…basic data structures, complexity… Programming skillsProgramming skills We provide some background, but the class will be fast We provide some background, but the class will be fast

pacedpaced Ability to deal with “abstract mathematical concepts”Ability to deal with “abstract mathematical concepts”

TextbooksTextbooks

Textbook:Textbook: Ethem Alpaydin, Introduction to Machine Learning, MIT Press, 2004.

Recommended Textbooks:Recommended Textbooks:

1.Christopher M. Bishop, Pattern Recognition and Machine Learning, 2006.

2.Tom Mitchell, Machine Learning, McGraw-Hill, 1997.

GradingGrading

3 Exams 67-70%Project 18-24%Homeworks 10-15%Attendance 1-2%

 

NOTE: PLAGIARISM IS NOT TOLERATED.

Remark: Weights are subject to change

Topics Covered in 2009 (Based on Alpaydin)Topics Covered in 2009 (Based on Alpaydin)Topic 1: Introduction Topic 2: Supervised Learning Topic 3: Bayesian Decision Theory (excluding Belief Networks) Topic 4: Using Curve Fitting as an Example to Discuss Major Issues in ML Topic 5: Parametric Model Selection Topic 6: Dimensionality Reduction Centering on PCA Topic 7: Clustering1: Mixture Models, K-Means and EM Topic 8: Non-Parametric Methods Centering on kNN and Density Estimation Topic 9: Clustering2: Density-based ApproachesTopic 10: Decision Trees Topic 11: Comparing Classifiers Topic 12: Combining Multiple Learners Topic 13: Linear Discrimination Topic 14: More on Kernel Methods Topic 15: Naive Bayes' and Belief Networks Topic 16: Hidden Markov Models Topic 17: Sampling Topic 18: Reinforcement Learning Topic 19: Neural Networks Topic 20: Computational Learning Theory

Remark: Topics 14, 16, 17, 19, and 20 likely will be only briefly covered or skipped---due to the lack of time.

Course ProjectCourse Project

The project will center on the application of machine learning techniques The project will center on the application of machine learning techniques to a challenging problem. It will be conducted in the window Feb. 12-April 11.to a challenging problem. It will be conducted in the window Feb. 12-April 11.

You can either conduct some novel experiments by applying machine learningYou can either conduct some novel experiments by applying machine learning algorithm(s) to a challenging machine learning task or attempt a theoretical algorithm(s) to a challenging machine learning task or attempt a theoretical analysis. analysis.

Findings of the project will be summarized in a report and in a brief presentation.Findings of the project will be summarized in a report and in a brief presentation. The report must include a short survey of related work with the corresponding The report must include a short survey of related work with the corresponding list list of references. of references.

Tentative Tentative ML Spring 2009 ScheduleML Spring 2009 Schedule

Week Topic

Jan 20 Introduction

Jan 27 Supervised Learning/Bayesian Decision Theory

Feb. 3 Curve Fitting/Model Estimation---Parametric Approaches

Feb. 10 Model Estimation---Parametric Approaches

Feb. 17 Parametric Approaches/Clustering1

Feb. 24 Clustering1/Non-param Methods

March 3 Non-Param Methods/Exam1

March 10 Clustering2/Dim. Reduction,Decision Trees

March 24 Dim. Reduction; DecisionTrees /Exam2

March 31 SVMs/Kernel Methods; Ensemble Methods

April 7 Comparing Classifiers/Group1 Presentations

April 14 Group2 Presentations/TBDL

April 21 Reinforcement Learning/possibly Belief Networks

April 28 Review/Exam3

March 31, 2009

Course ElementsCourse Elements

Total: 25-26 classes Total: 25-26 classes • 18 lectures18 lectures• 2-3 classes for review and discussing homework problems 2-3 classes for review and discussing homework problems • 2 classes will be allocated for student presentations2 classes will be allocated for student presentations• 3 exams3 exams• homeworkshomeworks

• individual gradedindividual graded• group graded group graded • not-graded (solutions will be discussed in lecture 7-9 not-graded (solutions will be discussed in lecture 7-9 days later). days later).

Dates to RememberDates to Remember

Dates to remember Events

March 5, March 26, April 30 Exams

April 9 and 14 Student Project Presentations

March 17 /19 No class (Spring Break)

April 13(Group1)/April 15(Group2) 11p

Submit Project Report /Software/…

ExamsExams

Will be open notes/textbookWill be open notes/textbook Will get a review list before the examWill get a review list before the exam Exams will center (80% or more) on material that was covered in the lectureExams will center (80% or more) on material that was covered in the lecture There will be a review prior to the second and third exam; first exam will mostly There will be a review prior to the second and third exam; first exam will mostly center on basics. center on basics. Exam scores will be immediately converted into number gradesExam scores will be immediately converted into number grades No sample exams; sorry I haven’t taught this course for a long time… No sample exams; sorry I haven’t taught this course for a long time…

Other UH-CS Courses with Overlapping ContentsOther UH-CS Courses with Overlapping Contents

1.1. COSC 6368: COSC 6368: Artificial IntelligenceArtificial Intelligence Strong Overlap: Decision Trees, Bayesian Belief NetworksStrong Overlap: Decision Trees, Bayesian Belief Networks Medium Overlap: Reinforcement LearningMedium Overlap: Reinforcement Learning

COSC 6335: COSC 6335: Data MiningData Mining Strong Overlap: Decision trees, SVM, kNN, Density- Strong Overlap: Decision trees, SVM, kNN, Density- based Clusteringbased Clustering Medium Overlap: K-means, Decision Trees, Medium Overlap: K-means, Decision Trees, Preprocessing/Exploratory DA, AdaBoostPreprocessing/Exploratory DA, AdaBoost

COSC 6343: Pattern ClassificationMedium Overlap: all classification algorithms, feature Medium Overlap: all classification algorithms, feature selection—discusses those topics taking selection—discusses those topics taking a different perspective. a different perspective.