12
Computing & Information Sciences Kansas State University CIS 690 Data Mining in Mobile and Cloud Computing Environments William H. Hsu, Computing and Information Sciences Shih-Hsiung Chou, Industrial and Manufacturing Systems Engineering Kansas State University KSOL course page: http://bit.ly/a68KuL Course web site: http://www.kddresearch.org/Courses/CIS690 Instructor home page: http://www.cis.ksu.edu/~bhsu Reading for Next Class: Syllabus and Introductory Handouts Instructions for Labs 0 – 1 Han & Kamber 2 e , Sections 1.1 – 1.4.3 (pp. 1 – 25), 6.1 (pp. 285 – 289) Data Mining in Mobile and Cloud Computing Environments: Course Organization and Survey Lecture 0 of 27: Part A – Course Organization

Computing & Information Sciences Kansas State University CIS 690 Data Mining in Mobile and Cloud Computing Environments William H. Hsu, Computing and Information

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

William H. Hsu, Computing and Information Sciences

Shih-Hsiung Chou, Industrial and Manufacturing Systems Engineering

Kansas State University

KSOL course page: http://bit.ly/a68KuL

Course web site: http://www.kddresearch.org/Courses/CIS690

Instructor home page: http://www.cis.ksu.edu/~bhsu

Reading for Next Class:

Syllabus and Introductory Handouts

Instructions for Labs 0 – 1

Han & Kamber 2e, Sections 1.1 – 1.4.3 (pp. 1 – 25), 6.1 (pp. 285 – 289)

Data Mining inMobile and Cloud Computing Environments:

Course Organization and Survey

Lecture 0 of 27:Part A – Course Organization

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Course Administration

Course Page (KSOL): http://bit.ly/a68KuL Class Web Page: www.kddresearch.org/Courses/CIS690 Instructional E-Mail Addresses – Best Way to Reach Instructor

[email protected] (always use this to reach instructor and TA) [email protected]

Instructor: William Hsu, Nichols 324C Office phone: +1 785 532 7905; home phone: +1 785 539 7180 IM: AIM/MSN/YIM hsuwh/rizanabsith, ICQ 28651394/191317559, Google banazir Office hours: after class Mon/Wed/Fri; other times by appointment

Graduate Teaching Assistant: To Be Announced Office location: Nichols 124 (CIS Visualization Lab) & Nichols 218 Office hours: to be announced on class web board

Grading Policy: Overview Midterm exam: 15% Homework: 15% Term project: 50% Labs: 20% (1% each; see calendar)

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Course Policies

Letter Grades 15% graduations (85+%: A, 70+%: B, etc.) Cutoffs may be more lenient, but a) never higher and b) seldom much lower

Grading Policy Exams: midterm (in-class, open-book/notes) 15% Homework: 15% (2 written, 2 programming, 2 mixed; drop lowest 2, 3% each) Term project (including proposal, interim, final reports): 50% Labs (upload solutions to K-State On-Line file dropbox): 20%

Late Homework Policy Allowed only in case of medical excusal All other late homework: see drop policy

Attendance Policy Absence due to travel or personal reasons: e-mail CIS690TA-L in advance See instructor, Office of the Dean of Student Life as needed

Honor System Policy: http://www.ksu.edu/honor/ On plagiarism: cite sources, use quotes if verbatim, includes textbooks OK to discuss work, but turn in your own work only When in doubt, ask instructor

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Course Content Management System (CMS) http://www.kddresearch.org/Courses/CIS690 Lecture notes (MS PowerPoint 97-2010, PDF) Homeworks (MS Word 97-2010, PDF) Exam and homework solutions (MS PowerPoint 97-2010, PDF) Class announcements (students’ responsibility) and grade postings

Course Notes Online and at Copy Center (Required) Mailing List (Automatic): [email protected]

Homework/exams (before uploading to CMS, KSOL), sample data, solutions Class participation Project info, course calendar reminders Dated research announcements (seminars, conferences, calls for papers)

LISTSERV Web Archive http://listserv.ksu.edu/archives/cis690-l.html Stores e-mails to class mailing list as browsable/searchable posts

Class Resources

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Recommended Text

Witten, I. H. & Frank, E. (2006). Data Mining: Practical Machine Learning Tools and Techniques, second edition. San Francisco, CA, USA: Morgan Kauffman.

Other References[on Reserve in Main or CIS Library]

Han, J. & Kamber, M. (2006). Data Mining: Concepts and Techniques, second edition. San Francisco, CA, USA: Morgan Kauffman.

Mitchell, T. M. (1997) Machine Learning. New York, NY, USA: McGraw-Hill.

Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining. Reading, MA, USA: Addison-Wesley.

Textbookand Recommended References

Mitchell (1997)

Witten & Frank 2e

Tan et al. (2006)

1st edition (outdated)Han & Kamber 2nd edition

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Both Courses Proficiency in high-level programming language (C++/C#, Java, Python, etc.) Required: course in data structures Recommended: discrete mathematics, probability At least 80 hours for semester (up to 120 depending on term project) Textbook – Data Mining: Concepts and Techniques, 2e , Han & Kamber (2006) Reserve texts: Mitchell’s Machine Learning, several other outside references

CIS 690 Data Mining in Mobile and Cloud Computing Environments Fresh background in symbolic logic, discrete math (sets, relations, counting) Some background assumed in linear algebra, calculus New topics: classification/regression, association, optimization, clustering “Mathematical maturity”: ready to learn more

CIS 798 Topics in Computer Science Recommended: two programming courses Read up on heuristic search, games, constraints, knowledge representation AI programming experience helps (background lectures as needed) Watch advanced topics lectures; see list before choosing project topic

Background Expected

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Syllabus [1]:First Half of Course

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Syllabus [2]:Second Half of Course

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Basics: First Two Weeks (Hours 2 – 9 of Course) Review of mathematical foundations: set theory, discrete math, probability

Types of machine learning algorithms

Combinatorial analysis: mappings and counting

Bayesian classification

Bayesian Inference Hour 3: association rules, statistical evaluation

Hours 6 – 10: Naïve Bayes, classification in R

Hours 15 – 18: clustering, Expectation-Maximization (EM)

Other Math Topics to be Covered Information theory: decision tree induction, rule induction

Basic statistical hypothesis testing

Frequent itemsets: association rule mining

Convex optimization: constraints, linear and quadratic programming (QP)

Distance measures: clustering

Logic: propositional, first-order, resolution

Math BackgroundTo Be Covered

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Computing Platform:Mobile/Cloud Environments

Android

Operating system: modified Linux

For mobile devices (Motorola Droid, HTC Incredible, etc.)

Android, Inc. & Open Handset Alliance

Software development kit: download from http://developer.android.com/sdk/

Software Environment for the Advancement of Scholarly Research

Originally developed for compute clusters

Adapted for cloud computing environments

SEASR – overall environment: http://seasr.org

Meandre – data mining flows: http://seasr.org/meandre/

© 2005 – present, National Center for Supercomputing Applications (NCSA)

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

Computing Platform:Data Mining Software

Waikato Environment for Knowledge Analysis (WEKA)

Data mining package

Most popular machine learning and data mining software at present

Download from http://www.cs.waikato.ac.nz/ml/weka/

R Interpreter

R: popular programming language for computational statistics

Used for data mining implementations

Comprehensive R Archive Network (CRAN): http://cran.r-project.org

Apache Hadoop

Java software framework

Data-intensive distributed applications

Inspired by Google MapReduce and Google File System (GFS)

Computing & Information SciencesKansas State University

CIS 690Data Mining in Mobile and Cloud Computing Environments

About Project Proposals

Proposals

About 1-2 pages; due at end of second week of course, one revision allowed

Team projects: up to 2 people

Contents: at least one paragraph on each of

– 1. Problem statement: describe task, objectives, purpose

– 2. Background: survey related work and applicable approaches

– 3. Methodology: describe planned approach

– 4. Evaluation criteria: how will performance be assessed?

– 5. Milestones: what will be done, when?

Post Questions and Drafts to Class Mailing List