29
Algorithms for Data Science Anil Maheshwari [email protected] School of Computer Science Carleton University 1

Algorithms for Data Science - people.scs.carleton.ca

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms for Data Science - people.scs.carleton.ca

Algorithms for Data Science

Anil Maheshwari

[email protected] of Computer ScienceCarleton University

1

Page 2: Algorithms for Data Science - people.scs.carleton.ca

Outline

Topics

Required Background

Blended Teaching

Course Evaluation

Help Me

2

Page 3: Algorithms for Data Science - people.scs.carleton.ca

Topics

Page 4: Algorithms for Data Science - people.scs.carleton.ca

Course Title

Algorithms for Data Science

Input Algorithm Output01000011101000010111001001101010 ⇒ ?01100101011101000110111101101110

3

Page 5: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 6: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 7: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 8: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 9: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 10: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 11: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 12: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 13: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 14: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 15: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 16: Algorithms for Data Science - people.scs.carleton.ca

Topics

1. Probability Basics: Linearity of Expectation, Concentration Bounds, Ballsand Bins, Hashing.

2. Data Streaming - CMS, Estimating Frequency Moments, What isTrending?

3. Locality-Sensitive Hashing

4. Geometric Approximation (Nearest Neighbors, Core Sets)

5. Dimensionality Reduction

6. Online Algorithms: Bipartite Matching, Adwords

7. Regret Minimization (Multiplicative Weight Algorithm)

8. Graph Partitioning

9. Linear Algebra: Eigenvalues + SVD

10. Recommendation Systems

11. Randomized Linear Algebra

12. . . .

4

Page 17: Algorithms for Data Science - people.scs.carleton.ca

Our Focus

Warnings: First Offering - Online - Plans May Derail!Office may become inaccessible

We will try to understand

• Algorithmic Techniques

• Key Ideas

• Learn bit of Math

• High-level details

5

Page 18: Algorithms for Data Science - people.scs.carleton.ca

Required Background

Page 19: Algorithms for Data Science - people.scs.carleton.ca

Background

• Basic Data StructuresLists, Stacks, BST, Hashing, Searching and Sorting

See Pat Morin’s https://opendatastructures.org/

• Basic Probability TheoryExpectation, Variance, Concentration Bounds

Michiel Smid’s COMP 2804 Notes http://cglab.ca/~michiel/DiscreteStructures/

Joe Blitzstein https://projects.iq.harvard.edu/stat110/youtube

• Linear AlgebraEigenvalues and Eigenvectors, Null Space, Positive Definitive Matrices, QΛQT , SVD

Gilbert Strang

https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/

• Algorithmic TechniquesPseudocode, Correctness, Analysis (Recurrences + O,Ω-notation), Analysis of (randomized) Quicksort. Algorithms for

Connectivity, Strong Connectivity, MST, SSSP, Graph Traversal. Techniques: D& C, Greedy, Dynamic Programming.

6

Page 20: Algorithms for Data Science - people.scs.carleton.ca

Blended Teaching

Page 21: Algorithms for Data Science - people.scs.carleton.ca

Online Course

• Lectures (with recordings) on Tuesdays & Fridays from 06:30 AM EST.

• Timing may change when Day Light Saving Hours apply in March.

• E-mail: [email protected]

• Post queries during the lectures using the zoom chat.

7

Page 22: Algorithms for Data Science - people.scs.carleton.ca

References

Useful References

1. Foundations of Data Science by Blum, Hopcroft and Kannan.https://www.cs.cornell.edu/jeh/book.pdf

2. Mining of Massive Data Sets http://www.mmds.org/

3. Notes on Algorithm Design on my web-pagehttp://people.scs.carleton.ca/~maheshwa/

4. Numerous research articles.

5. Pointers to useful Videos.

8

Page 23: Algorithms for Data Science - people.scs.carleton.ca

Course Evaluation

Page 24: Algorithms for Data Science - people.scs.carleton.ca

Evaluation Parameters

Principles

• Open-ended course by design

• Contents evolve

• Enjoyable learning experience

• Use zoom chat effectively

Evaluation Components

• Assignments

• Programming Assignments

• Tests

• On the spot quizzes!

9

Page 25: Algorithms for Data Science - people.scs.carleton.ca

Help Me

Page 26: Algorithms for Data Science - people.scs.carleton.ca

PLEASE, PLEASE, PLEASE, PLEASE

1. Post your questions on Zoom

2. Post useful links/references/ideas

3. Take initiative to learn

4. E-mail now if you want to learn some specific algorithmic technique(s)

5. Seek Help, Have Fun & Smile

10

Page 27: Algorithms for Data Science - people.scs.carleton.ca

Season - SUMMER

11

Page 28: Algorithms for Data Science - people.scs.carleton.ca

Season - FALL

12

Page 29: Algorithms for Data Science - people.scs.carleton.ca

Season - WINTER

13