65
Parallel Algorithms for Mining Parallel Algorithms for Mining Large LargeScale Data Scale Data Ed d Ch Edward Chang Director, Google Research, Beijing http://infolab stanford edu/~echang/ http://infolab.stanford.edu/ echang/ Ed Chang EMMDS 2009 1

Parallel Algorithms for Mining Large Scale Data

Embed Size (px)

DESCRIPTION

big data concept based ieee papers for students

Citation preview

  • ParallelAlgorithmsforMiningParallelAlgorithmsforMiningg gg g

    LargeLargeScaleDataScaleData

    Ed d ChEdwardChangDirector,GoogleResearch,Beijing

    http://infolab stanford edu/~echang/http://infolab.stanford.edu/ echang/

    Ed Chang EMMDS 2009 1

  • Ed Chang EMMDS 2009 2

  • Ed Chang EMMDS 2009 3

  • Story of the Photos

    The photos on the first pages are statues of two

    StoryofthePhotos

    bookkeepers displayed at the British Museum. One bookkeeper keeps a list of good people, and the other a list of bad. (Who is who, can you tell ? )

    When I first visited the museum in 1998, I did not take a photo of them to conserve films. During this trip (June 2009), capacity is no longer a concern or constraint. In fact, one can

    kid d ll t ki h t l t f h t Tsee kids, grandmas all taking photos, a lot of photos. Ten years apart, data volume explodes.

    Data complexity also grows. So, can these ancient bookkeepers still classify good from

    bad? Is their capacity scalable to the data dimension and data quantity ?

    Ed Chang EMMDS 2009 4

  • Ed Chang EMMDS 2009 5

  • Outline

    MotivatingApplications Q&ASystem

    SocialAds

    KeySubroutines FrequentItemsetMining[ACMRS08]

    L Di i hl All i LatentDirichletAllocation[WWW09,AAIM09]

    Clustering[ECML08]

    UserRank [Google TR 09] UserRank[GoogleTR09]

    SupportVectorMachines[NIPS07]

    Distributed Computing Perspectives

    Ed Chang EMMDS 2009 6

    DistributedComputingPerspectives

  • Query:WhataremustseeattractionsatYellowstone

    Ed Chang EMMDS 2009 7

  • Query:WhataremustseeattractionsatYellowstone

    Ed Chang EMMDS 2009 8

  • Query:MustseeattractionsatYosemite

    Ed Chang EMMDS 2009 9

  • Query:MustseeattractionsatCopenhagen

    Ed Chang EMMDS 2009 10

  • Query:MustseeattractionsatBeijing

    Hotel adsHotel ads

    Ed Chang EMMDS 2009 11

  • WhoisYaoMing

    Ed Chang EMMDS 2009 12

  • Q&A Yao MingQ&AYaoMing

    Ed Chang EMMDS 2009 13

  • Who is Yao Ming Y Mi R l t d Q&AWho is Yao Ming Yao Ming Related Q&As

    Ed Chang EMMDS 2009 14

  • Application:GoogleQ&A(Confucius)launched in China Thailand and RussialaunchedinChina,Thailand,andRussia

    DC

    DifferentiatedContent

    U

    Search

    SU

    Community

    C

    CCS/Q&A

    Trigger a discussion/question session during search

    CCS/Q&A

    Discussion/Question

    Provide labels to a Q (semi-automatically) Given a Q, find similar Qs and their As (automatically) Evaluate quality of an answer, relevance and originality E l t d ti l i t i iti Evaluate user credentials in a topic sensitive way Route questions to experts Provide most relevant, high-quality content for Search to index

    Ed Chang EMMDS 2009 15

  • Q&AUsesMachineLearning

    Label suggestion using ML algorithms.

    Real Time topic-to-topic (T2T) recommendation using ML algorithms.

    Gi t l t d hi h Gives out related high quality links to previous questions before human answer appear.

    Ed Chang EMMDS 2009 16

  • CollaborativeFiltering

    111Books/Photos

    1111111111111

    Based on membership so far, and memberships of others

    111

    1111

    U

    s

    e

    r

    s

    and memberships of others

    1111

    11Predict further membership

    1111111

    11

    Ed Chang EMMDS 2009 17

  • CollaborativeFiltering

    Based on partially 111Books/Photos

    ? ?observed matrix

    1111111111111 ?

    ??

    Predict unobserved entries

    U

    s

    e

    r

    s

    111

    1111 ??

    ?

    I. Will user i enjoy photo j?U

    1111

    11?

    ?

    ?II. Will user i be interesting to user j?

    III. Will photo i be related to photo j? 1111111

    11 ??

    ?

    Ed Chang EMMDS 2009 18

  • FIMbased RecommendationFIM basedRecommendation

    Ed Chang EMMDS 2009 19

  • FIMPreliminaries Observation1:IfanitemAisnotfrequent,anypatterncontains

    A wontbefrequent[R.Agrawal] useathresholdtoeliminateinfrequentitemsq{A} {A,B}

    Observation2:PatternscontainingA aresubsetsof(orfoundfrom) transactions containing A [J. Han]from)transactionscontainingA [J.Han] divideandconquer:selecttransactionscontainingAtoformaconditionaldatabase(CDB),andfindpatternscontainingAfromthatconditionaldatabase{A,B},{A,C},{A} CDBA{A,B},{B,C} CDBB

    Observation 3: To prevent the same pattern from being found in Observation3:TopreventthesamepatternfrombeingfoundinmultipleCDBs,allitemsetsaresortedbythesamemanner(e.g.,bydescendingsupport)

    Ed Chang EMMDS 2009 20

  • Preprocessing

    f: 4c: 4

    AccordingtoObservation1,we

    h f

    p: 3 f c a b m

    f a c d g i m p

    a b c f l m o

    c: 4a: 3b: 3m: 3

    f c a m pcountthesupportofeachitembyscanningthedatabase andp: 3 f c a b m

    f bo: 2d: 1

    a b c f l m o

    b f h j o

    database,andeliminatethoseinfrequentitemsfrom the

    c b p

    f c a m p

    d: 1e: 1g: 1h: 1i: 1

    b c k s p

    a f c e l p m n

    fromthetransactions.

    AccordingtoObservation 3, wei: 1

    k: 1l : 1n: 1

    Observation3,wesortitemsineachtransactionbytheorderofdescending

    Ed Chang EMMDS 2009 21

    n: 1 gsupportvalue.

  • ParallelProjection

    AccordingtoObservation2,weconstructCDBofitemA;thenfromthisCDB,wefindthosepatterns; , pcontainingA

    HowtoconstructtheCDBofA?If t ti t i A thi t ti h ld i IfatransactioncontainsA,thistransactionshouldappearintheCDBofA

    Givenatransaction{B,A,C},itshouldappearintheCDBofA the CDB of B and the CDB of CA,theCDBofB,andtheCDBofC

    Dedupsolution:usingtheorderofitems: sort{B,A,C}bytheorderofitems PutintotheCDBofA PutintotheCDBofB Put into the CDB of C

    Ed Chang EMMDS 2009 22

    PutintotheCDBofC

  • ExampleofProjection

    { f c a m / f c a m / c b }f c a m p p:

    b: { f c a / f / c }

    { f c a / f c a / f c a b }f c a b m

    f b

    m:

    b:

    { f c / f c / f c }

    { f c a / f / c }f b

    c b p a:

    f c a m p c: { f / f / f }

    Example of Projection of a database into CDBs.Left: sorted transactions in order of f, c, a, b, m, pRight: conditional databases of frequent items

    Ed Chang EMMDS 2009 23

  • ExampleofProjectionp j

    { f c a m / f c a m / c b }f c a m p p:

    b: { f c a / f / c }

    { f c a / f c a / f c a b }f c a b m

    f b

    m:

    b:

    { f c / f c / f c }

    { f c a / f / c }f b

    c b p a:

    f c a m p c: { f / f / f }

    Example of Projection of a database into CDBs.Left: sorted transactions; Right: conditional databases of frequent items

    Ed Chang EMMDS 2009 24

  • Example of ProjectionExampleofProjection{ f c a m / f c a m / c b }f c a m p p:

    b: { f c a / f / c }

    { f c a / f c a / f c a b }f c a b m

    f b

    m:

    b:

    { f c / f c / f c }

    { f c a / f / c }f b

    c b p a:

    f c a m p c: { f / f / f }

    Example of Projection of a database into CDBs.Left: sorted transactions; Right: conditional databases of frequent items

    Ed Chang EMMDS 2009 25

  • Recursive Projections [H Li et al ACM RS 08]RecursiveProjections[H.Li,etal.ACMRS08]

    Recursiveprojectionforma search treeMapReduceMapReduceMapReduce asearchtree

    EachnodeisaCDB Usingtheorderofitemsto

    t d li t d CDBc

    MapReduceIteration 3Iteration 1

    b

    MapReduceIteration 2

    MapReduce

    D|ab D|abc preventduplicatedCDBs. Eachlevelofbreathfirst

    searchofthetreecanbedone by a MapReducec

    a

    c

    D|ab D|abc

    D|acD|a

    D donebyaMapReduceiteration.

    OnceaCDBissmallenough to fit in memory

    c

    bD|bcD|b

    D

    enoughtofitinmemory,wecaninvokeFPgrowthtominethisCDB,andnomore growth of the sub

    c D|c

    Ed Chang EMMDS 2009 26

    moregrowthofthesubtree.

  • ProjectionusingMapReduce

    key: value(conditional transactions)key: value

    Map inputs(transactions)

    Sorted transactions(with infrequent items eliminated)

    Reduce outputs(patterns and supports)

    key="": value

    Reduce inputs(conditional databases)key: value

    Map outputs

    f a c d g i m p f c a m pp c : 3p : 3{ f c a m / f c a m / c b }p:

    m f : 3m c : 3

    p:m:

    a:

    c:

    f c a mf c afcf

    p:{fcam/fcam/cb}p:3,pc:3

    a b c f l m o f c a b mm c : 3m a : 3m f c : 3m f a : 3m c a : 3

    { f c a / f c a / f c a b }m:m:

    b:a:

    c:

    f c a bf c af cf

    a f c e l p m nb c k s pb f h j o f b

    c b pf c a m p

    m f c a : 3

    b: b : 3{ f c a / f / c }{ f c / f c / f c }a: a : 3

    a f : 3

    b: fp: c bb:p:

    c

    f c a m a f : 3a c : 3a f c : 3c : 3c f : 3{ f / f / f }c:

    p: m:

    a:

    c:

    f c a mf c af cf

    Ed Chang EMMDS 2009 27

    c f : 3{ f / f / f }c:

  • CollaborativeFiltering

    111Indicators/Diseases

    1111111111111

    a

    l

    s

    Based on membership so far, and memberships of others

    111

    1111

    I

    n

    d

    i

    v

    i

    d

    u

    aand memberships of others

    1111

    11Predict further membership

    1111111

    11

    Ed Chang EMMDS 2009 28

  • LatentSemanticAnalysisUsers/Music/Ads/Question

    e

    r

    s

    Search Constructalatentlayerforbetter

    forsemanticmatching

    s

    i

    c

    /

    A

    d

    s

    /

    A

    n

    s

    w

    e

    Example: iPhonecrack Applepie

    U

    s

    e

    r

    s

    /

    M

    u

    s

    1 recipe pastry for a 9 inch double crust 9 apples, 2/1 cup,

    brown sugar

    How to install apps on Apple mobile phones?

    Documents

    Other Collaborative Filtering Apps

    Topic Distribution

    OtherCollaborativeFilteringApps RecommendUsers Users RecommendMusic Users RecommendAds Users

    R d A Q

    Topic Distribution

    Ed Chang EMMDS 2009 29

    RecommendAnswers Q Predictthe?InthelightbluecellsUser quries iPhonecrack Applepie

  • Documents, Topics, WordsDocuments,Topics,Words

    A document consists of a number of topics Adocumentconsistsofanumberoftopics Adocumentisaprobabilisticmixtureoftopics

    Each topic generates a number of words Eachtopicgeneratesanumberofwords Atopicisadistributionoverwords

    th Theprobabilityoftheith wordinadocument

    Ed Chang EMMDS 2009 30

  • LatentDirichletAllocation[M.Jordan04]

    : uniformDirichlet priorfor per document d topic forperdocumentdtopicdistribution(corpuslevelparameter) f hl : uniformDirichlet priorforpertopiczworddistribution(corpuslevel

    )

    zparameter) d isthetopicdistributionof

    docd(documentlevel)

    z

    ( ) zdj thetopicifthejth wordin

    d,wdj thespecificword(wordlevel)

    w

    Nm

    K

    Ed Chang EMMDS 2009 31

    level)M

  • LDA Gibbs Sampling: Inputs And OutputsLDAGibbsSampling:InputsAndOutputs

    Inputs:

    1. trainingdata:documentsasbagsofwords

    h b f

    words topics

    2. parameter:thenumberoftopics

    Outputs:

    1 by product: a co occurrence

    docs docs

    1. byproduct:acooccurrencematrixoftopicsanddocuments.

    2. modelparameters:aco wordstopics

    occurrencematrixoftopicsandwords.

    Ed Chang EMMDS 2009 32

  • Parallel Gibbs Sampling [aaim 09]ParallelGibbsSampling[aaim09]

    Inputs:

    1. trainingdata:documentsasbagsofwords

    h b f

    words topics

    2. parameter:thenumberoftopics

    Outputs:

    1 by product: a co occurrence

    docs docs

    1. byproduct:acooccurrencematrixoftopicsanddocuments.

    2. modelparameters:aco wordstopics

    occurrencematrixoftopicsandwords.

    Ed Chang EMMDS 2009 33

  • Ed Chang EMMDS 2009 34

  • Ed Chang EMMDS 2009 35

  • Outline

    MotivatingApplications Q&ASystem

    SocialAds

    KeySubroutines FrequentItemsetMining[ACMRS08]

    L Di i hl All i LatentDirichletAllocation[WWW09,AAIM09]

    Clustering[ECML08]

    UserRank [Google TR 09] UserRank[GoogleTR09]

    SupportVectorMachines[NIPS07]

    Distributed Computing Perspectives

    Ed Chang EMMDS 2009 36

    DistributedComputingPerspectives

  • Ed Chang EMMDS 2009 37

  • Ed Chang EMMDS 2009 38

  • Open Social APIsOpenSocialAPIs1

    Profiles (who I am)

    Open Social

    2

    Friends (who I know)Stuff (what I have)

    4

    3

    Activities (what I do)

    Ed Chang EMMDS 2009 39

  • Activities

    Recommendations

    Applications

    Ed Chang EMMDS 2009 40

    Applications

  • Ed Chang EMMDS 2009 41

  • Ed Chang EMMDS 2009 42

  • Ed Chang EMMDS 2009 43

  • Ed Chang EMMDS 2009 44

  • Task: Targeting Ads at SNS UsersTask:TargetingAdsatSNSUsersUsers

    Ads

    Ed Chang EMMDS 2009 45

  • MiningProfiles,Friends&Activitiesfor RelevanceforRelevance

    Ed Chang EMMDS 2009 46

  • ConsideralsoUserInfluence

    Advertisersconsiderhuserswhoare

    Relevant InfluentialInfluential

    SNSInfluenceAnalysis CentralityCentrality Credential Activeness etc.

    Ed Chang EMMDS 2009 47

  • SpectralClustering[A.Ng,M.Jordan]p g [ g, ]

    Importantsubroutineintasksofmachinelearninganddatamining Exploitpairwisesimilarity ofdatainstancesMore effective than traditional methods e g k means Moreeffectivethantraditionalmethodse.g.,kmeans

    Keysteps Construct pairwise similarity matrixConstructpairwisesimilaritymatrix

    e.g.,usingGeodiscdistance

    ComputetheLaplacianmatrixA l i d iti Applyeigendecomposition

    Perform kmeans

    Ed Chang EMMDS 2009 48

  • Scalability ProblemScalabilityProblem Quadraticcomputationofnxnmatrix

    Approximationmethods

    Dense Matrix

    Sparsification Nystrom Others

    t NN neighborhood random greedy

    Ed Chang EMMDS 2009 49

    t-NN -neighborhood random greedy .

  • Sparsificationvs.Samplingp p g

    Constructthedensesimilarity matrix S

    Randomlysamplelpoints where l

  • Empirical Study [song et al ecml 08]EmpiricalStudy[song,etal.,ecml08] Dataset:RCV1(ReutersCorpusVolumeI)

    A filt d ll ti f 193 944 d t i 103 Afilteredcollectionof193,944 documentsin103categories

    Photoset:PicasaWeb 637,137 photos

    Experiments Clusteringqualityvs.computationaltime

    MeasurethesimilaritybetweenCATandCLS NormalizedMutualInformation(NMI)

    Scalability

    Ed Chang EMMDS 2009 51

  • NMI Comparison (on RCV1)NMIComparison(onRCV1)

    N t th d S t i i ti

    Ed Chang EMMDS 2009 52

    Nystrommethod Sparsematrixapproximation

  • Speedup Test on 637,137 PhotosSpeedupTeston637,137 Photos K=1000clusters

    A hi li d h i 32 hi ft th t Achieverlinearspeedup whenusing32machines,afterthat,sublinearspeedupbecauseofincreasingcommunicationandsynctime

    Ed Chang EMMDS 2009 53

  • Sparsification vs. Samplingp p gSparsification Nystrom,random

    samplingsampling

    Information Full n x n NoneInformation Fullnxnsimilarityscores

    None

    Pre processing O(n2) worst case; O(nl) l

  • Outline

    MotivatingApplications Q&ASystem

    SocialAds

    KeySubroutines FrequentItemsetMining[ACMRS08]

    L Di i hl All i LatentDirichletAllocation[WWW09,AAIM09]

    Clustering[ECML08]

    UserRank [Google TR 09] UserRank[GoogleTR09]

    SupportVectorMachines[NIPS07]

    Distributed Computing Perspectives

    Ed Chang EMMDS 2009 55

    DistributedComputingPerspectives

  • Matrix Factorization AlternativesMatrixFactorizationAlternatives

    exact

    approximate

    Ed Chang EMMDS 2009 56

    approximate

  • PSVM [E Chang et al NIPS 07]PSVM[E.Chang,etal,NIPS07]

    Columnbased ICF Column basedICF Slowerthanrowbasedonsinglemachine

    Parallelizable on multiple machines Parallelizableonmultiplemachines

    ChangingIPMcomputationordertoachievell li tiparallelization

    Ed Chang EMMDS 2009 57

  • SpeedupSpeedup

    Ed Chang EMMDS 2009 58

  • OverheadsOverheads

    Ed Chang EMMDS 2009 59

  • ComparisonbetweenParallelkComputingFrameworks

    MapReduce Project B MPI

    GFS/IO and task rescheduling overhead between iterations

    Yes No+1

    No+1

    Flexibility of computation model AllReduce only AllReduce only Flexibley p y+0.5

    y+0.5 +1

    Efficient AllReduce Yes+1

    Yes+1

    Yes+1

    Recover from faults between iterations

    Yes+1

    Yes+1

    Apps

    Recover from faults within each Yes Yes AppsRecover from faults within each iteration

    Yes+1

    Yes+1

    Apps

    Final Score for scalable machine learning

    3.5 4.5 5

    Ed Chang EMMDS 2009 60

    g

  • ConcludingRemarks Applicationsdemandscalablesolutions Haveparallelizedkeysubroutinesforminingmassivedatasets

    SpectralClustering[ECML08] FrequentItemsetMining[ACMRS08] PLSA[KDD08] LDA[WWW09] UserRank SupportVectorMachines[NIPS07]

    Relevantpapers http://infolab.stanford.edu/~echang/

    Open Source PSVM PLDA OpenSourcePSVM,PLDA http://code.google.com/p/psvm/ http://code.google.com/p/plda/

    Ed Chang EMMDS 2009 61

  • CollaboratorsCollaborators

    Prof.ChihJenLin(NTU)( ) HongjieBai(Google) WenYenChen(UCSB)

    ( ) JonChu(MIT) HaoyuanLi(PKU) Yangqiu Song (Tsinghua) YangqiuSong(Tsinghua) MattStanton(CMU) YiWang(Google)g ( g ) DongZhang(Google) KaihuaZhu(Google)

    Ed Chang EMMDS 2009 62

  • ReferencesReferences[1] Alexa internet. http://www.alexa.com/.[2] D. M. Blei and M. I. Jordan. Variational methods for the dirichlet process. In Proc. of the 21st international [ ] p

    conference on Machine learning, pages 373-380, 2004.[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-

    1022, 2003.[4] D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the Seventeenth

    International Conference on Machine Learning, pages 167-174, 2000.[5] D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In [ ] g p yp y

    Advances in Neural Information Processing Systems 13, pages 430-436, 2001.[6] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic

    analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990.[7] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm.

    Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977.[8] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE [8] S Ge a a d Ge a S oc as c e a a o , g bbs d s bu o s, a d e bayes a es o a o o ages

    Transactions on Pattern recognition and Machine Intelligence, 6:721-741, 1984.[9] T. Hofmann. Probabilistic latent semantic indexing. In Proc. of Uncertainty in Arti

    cial Intelligence, pages 289-296, 1999.[10] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information System, 22(1):89-

    115, 2004.[11] A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in[11] A. McCallum, A. Corrada Emmanuel, and X. Wang. The author recipient topic model for topic and role discovery in

    social networks: Experiments with enron and academic email. Technical report, Computer Science, University of Massachusetts Amherst, 2004.

    [12] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In Advances in Neural Information Processing Systems 20, 2007.

    [13] M. Ramoni, P. Sebastiani, and P. Cohen. Bayesian clustering by dynamics. Machine Learning, 47(1):91-121, 2002.

    Ed Chang EMMDS 2009 63

  • References (cont )References (cont.)[14] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative

    ltering. In Proc. Of the 24th international conference on Machine learning, pages 791-798, 2007.g g, p g ,[15] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social

    network. In Proc. of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 678-684, 2005.

    [16] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Gri ths. Probabilistic author-topic models for information discovery. In Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306-315, 2004.,

    [17] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR), 3:583-617, 2002.

    [18] T. Zhang and V. S. Iyengar. Recommender systems using linear classiers. Journal of Machine Learning Research, 2:313-334, 2002.

    [19] S. Zhong and J. Ghosh. Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS), 8:374-384, 2005.o a o Sys e s ( S), 8 3 38 , 005

    [20] L. Admic and E. Adar. How to search a social network. 2004[21] T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, pages

    5228-5235, 2004.[22] H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering.

    Communitcations of the ACM, 3:63-65, 1997.[23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD[23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD

    Rec., 22:207-116, 1993. [24] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In

    Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998.[25] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-

    177, 2004.

    Ed Chang EMMDS 2009 64

  • References (cont )References (cont.)[26] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms.

    In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.[27] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-

    177, 2004.177, 2004.[28] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms.

    In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.[29] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proceedings of the 3rd SIAM

    International Conference on Data Mining, 2003.[30] D. Goldbberg, D. Nichols, B. Oki and D. Terry. Using collaborative filtering to weave an information tapestry.

    Communication of ACM 35, 12:61-70, 1992.[31] P Resnik N Iacovou M Suchak P Bergstrom and J Riedl Grouplens: An open architecture for aollaborative[31] P. Resnik, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for aollaborative

    filtering of netnews. In Proceedings of the ACM, Conference on Computer Supported Cooperative Work. Pages 175-186, 1994.

    [32] J. Konstan, et al. Grouplens: Applying collaborative filtering to usenet news. Communication ofACM 40, 3:77-87, 1997.

    [33] U. Shardanand and P. Maes. Social information filtering: Algorithms for automating word of mouth. In Proceedings of ACM CHI, 1:210-217, 1995.

    [34] G Ki d B S ith d J Y k A d ti it t it ll b ti filt i IEEE I t t[34] G. Kinden, B. Smith and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7:76-80, 2003.

    [35] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning Journal 42, 1:177-196, 2001.

    [36] T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of International Joint Conference in Artificial Intelligence, 1999.

    [37] http://www.cs.carleton.edu/cs comps/0607/recommend/recommender/collaborativefiltering.html[ ] p _ p g[38] E. Y. Chang, et. al., Parallelizing Support Vector Machines on Distributed Machines, NIPS, 2007.[39] Wen-Yen Chen, Dong Zhang, and E. Y. Chang, Combinational Collaborative Filtering for personalized community

    recommendation, ACM KDD 2008.[40] Y. Sun, W.-Y. Chen, H. Bai, C.-j. Lin, and E. Y. Chang, Parallel Spectral Clustering, ECML 2008.

    Ed Chang EMMDS 2009 65