Large-Scale Machine Learning - New York University LeCun Large-Scale Machine Learning Large-Scale Machine…

  • Published on
    23-Jun-2018

  • View
    212

  • Download
    0

Embed Size (px)

Transcript

<ul><li><p>Yann LeCun</p><p> Large-Scale Machine Learning</p><p> Large-Scale Machine Learning</p><p> John Langford Yann LeCun Microsoft Research Courant Institute</p><p> John Langford Yann LeCun Microsoft Research Courant Institute</p></li><li><p>Yann LeCun</p><p>What is Data Science?What is Data Science?</p><p>Data Science: automatically extracting knowledge from dataMathematics &amp; StatisticsMachine LearningDomain Expertise</p><p>Applications in BusinessLots and lots</p><p>Applications in the SciencesAstronomy, CosmologyHigh-energy PhysicsBiology, GenomicsNeuroscienceThe Social Sciences</p><p>Medicine</p><p>Government</p><p>[afterDrewConway'sDataScienceVennDiagram]</p><p>Mathematics &amp;</p><p>StatisticsComputation</p><p>Domain Expertise</p><p>conventional</p><p>research</p><p>Danger</p><p>Zone!</p><p>Machine</p><p>Learning</p><p>Data</p><p>Science</p></li><li><p>Yann LeCun</p><p>Large Scale Machine LearningLarge Scale Machine Learning</p><p>Class website:http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:starthttp://cilvr.cs.nyu.edu courses big data </p><p>Forum, discussion, Q&amp;A on Piazzahttps://piazza.com/class#spring2013/csciga3033002</p><p>Evaluation:Programming assignmentsProjectFinal exam</p><p>Computing infrastructure100-node cluster, 8 CPUs/node, Hadoop (donated by Yahoo! Labs)</p><p>SoftwareTorch: http://www.torch.ch/Vowpal Wabbit: https://github.com/JohnLangford/vowpal_wabbit/wiki</p><p>http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:starthttp://cilvr.cs.nyu.edu/https://piazza.com/class#spring2013/csciga3033002</p></li><li><p>Yann LeCun</p><p>Big Data?Big Data?</p><p>Data often comes to in the form of a tableN: dimension of each vector (possibly very sparse)T: number of training samples (possibly infinite)</p><p>Big Data is large T, or large N, or bothLarge T, small N: great!Infinite T, small N: on-line / streamingSmall T, large N: hell!</p><p>Problems:(distributed) data storage and accesscan't use algo super-linear in TLarge N: overfittingParallelizingDealing with unbalanced setRepresenting high-dim data</p><p>N</p><p>T</p></li><li><p>Yann LeCun</p><p>SyllabusSyllabusIntro</p><p>Online Linear learning</p><p>2nd order optimization methods</p><p>LBFGS</p><p>Online Non-linear learning</p><p>Boosted Decision Trees</p><p>Hadoop, Allreduce</p><p>Parallel learning, OpenMP, CUDA</p><p>Inverted Indicies &amp; Predictive Indexing</p><p>Hashing, LSH, linear/non-linear dimensionality reduction</p><p>Feature Learning, deep learning</p><p>Many Classes</p><p>Active Learning</p><p>Exploration and Learning </p><p>Slide 1Slide 2Slide 3Slide 4Slide 5</p></li></ul>