Upload
sfaizullahbasha
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
big data concept based ieee papers for students
Citation preview
ParallelAlgorithmsforMiningParallelAlgorithmsforMiningg gg g
LargeLargeScaleDataScaleData
Ed d ChEdwardChangDirector,GoogleResearch,Beijing
http://infolab stanford edu/~echang/http://infolab.stanford.edu/ echang/
Ed Chang EMMDS 2009 1
Ed Chang EMMDS 2009 2
Ed Chang EMMDS 2009 3
Story of the Photos
The photos on the first pages are statues of two
StoryofthePhotos
bookkeepers displayed at the British Museum. One bookkeeper keeps a list of good people, and the other a list of bad. (Who is who, can you tell ? )
When I first visited the museum in 1998, I did not take a photo of them to conserve films. During this trip (June 2009), capacity is no longer a concern or constraint. In fact, one can
kid d ll t ki h t l t f h t Tsee kids, grandmas all taking photos, a lot of photos. Ten years apart, data volume explodes.
Data complexity also grows. So, can these ancient bookkeepers still classify good from
bad? Is their capacity scalable to the data dimension and data quantity ?
Ed Chang EMMDS 2009 4
Ed Chang EMMDS 2009 5
Outline
MotivatingApplications Q&ASystem
SocialAds
KeySubroutines FrequentItemsetMining[ACMRS08]
L Di i hl All i LatentDirichletAllocation[WWW09,AAIM09]
Clustering[ECML08]
UserRank [Google TR 09] UserRank[GoogleTR09]
SupportVectorMachines[NIPS07]
Distributed Computing Perspectives
Ed Chang EMMDS 2009 6
DistributedComputingPerspectives
Query:WhataremustseeattractionsatYellowstone
Ed Chang EMMDS 2009 7
Query:WhataremustseeattractionsatYellowstone
Ed Chang EMMDS 2009 8
Query:MustseeattractionsatYosemite
Ed Chang EMMDS 2009 9
Query:MustseeattractionsatCopenhagen
Ed Chang EMMDS 2009 10
Query:MustseeattractionsatBeijing
Hotel adsHotel ads
Ed Chang EMMDS 2009 11
WhoisYaoMing
Ed Chang EMMDS 2009 12
Q&A Yao MingQ&AYaoMing
Ed Chang EMMDS 2009 13
Who is Yao Ming Y Mi R l t d Q&AWho is Yao Ming Yao Ming Related Q&As
Ed Chang EMMDS 2009 14
Application:GoogleQ&A(Confucius)launched in China Thailand and RussialaunchedinChina,Thailand,andRussia
DC
DifferentiatedContent
U
Search
SU
Community
C
CCS/Q&A
Trigger a discussion/question session during search
CCS/Q&A
Discussion/Question
Provide labels to a Q (semi-automatically) Given a Q, find similar Qs and their As (automatically) Evaluate quality of an answer, relevance and originality E l t d ti l i t i iti Evaluate user credentials in a topic sensitive way Route questions to experts Provide most relevant, high-quality content for Search to index
Ed Chang EMMDS 2009 15
Q&AUsesMachineLearning
Label suggestion using ML algorithms.
Real Time topic-to-topic (T2T) recommendation using ML algorithms.
Gi t l t d hi h Gives out related high quality links to previous questions before human answer appear.
Ed Chang EMMDS 2009 16
CollaborativeFiltering
111Books/Photos
1111111111111
Based on membership so far, and memberships of others
111
1111
U
s
e
r
s
and memberships of others
1111
11Predict further membership
1111111
11
Ed Chang EMMDS 2009 17
CollaborativeFiltering
Based on partially 111Books/Photos
? ?observed matrix
1111111111111 ?
??
Predict unobserved entries
U
s
e
r
s
111
1111 ??
?
I. Will user i enjoy photo j?U
1111
11?
?
?II. Will user i be interesting to user j?
III. Will photo i be related to photo j? 1111111
11 ??
?
Ed Chang EMMDS 2009 18
FIMbased RecommendationFIM basedRecommendation
Ed Chang EMMDS 2009 19
FIMPreliminaries Observation1:IfanitemAisnotfrequent,anypatterncontains
A wontbefrequent[R.Agrawal] useathresholdtoeliminateinfrequentitemsq{A} {A,B}
Observation2:PatternscontainingA aresubsetsof(orfoundfrom) transactions containing A [J. Han]from)transactionscontainingA [J.Han] divideandconquer:selecttransactionscontainingAtoformaconditionaldatabase(CDB),andfindpatternscontainingAfromthatconditionaldatabase{A,B},{A,C},{A} CDBA{A,B},{B,C} CDBB
Observation 3: To prevent the same pattern from being found in Observation3:TopreventthesamepatternfrombeingfoundinmultipleCDBs,allitemsetsaresortedbythesamemanner(e.g.,bydescendingsupport)
Ed Chang EMMDS 2009 20
Preprocessing
f: 4c: 4
AccordingtoObservation1,we
h f
p: 3 f c a b m
f a c d g i m p
a b c f l m o
c: 4a: 3b: 3m: 3
f c a m pcountthesupportofeachitembyscanningthedatabase andp: 3 f c a b m
f bo: 2d: 1
a b c f l m o
b f h j o
database,andeliminatethoseinfrequentitemsfrom the
c b p
f c a m p
d: 1e: 1g: 1h: 1i: 1
b c k s p
a f c e l p m n
fromthetransactions.
AccordingtoObservation 3, wei: 1
k: 1l : 1n: 1
Observation3,wesortitemsineachtransactionbytheorderofdescending
Ed Chang EMMDS 2009 21
n: 1 gsupportvalue.
ParallelProjection
AccordingtoObservation2,weconstructCDBofitemA;thenfromthisCDB,wefindthosepatterns; , pcontainingA
HowtoconstructtheCDBofA?If t ti t i A thi t ti h ld i IfatransactioncontainsA,thistransactionshouldappearintheCDBofA
Givenatransaction{B,A,C},itshouldappearintheCDBofA the CDB of B and the CDB of CA,theCDBofB,andtheCDBofC
Dedupsolution:usingtheorderofitems: sort{B,A,C}bytheorderofitems PutintotheCDBofA PutintotheCDBofB Put into the CDB of C
Ed Chang EMMDS 2009 22
PutintotheCDBofC
ExampleofProjection
{ f c a m / f c a m / c b }f c a m p p:
b: { f c a / f / c }
{ f c a / f c a / f c a b }f c a b m
f b
m:
b:
{ f c / f c / f c }
{ f c a / f / c }f b
c b p a:
f c a m p c: { f / f / f }
Example of Projection of a database into CDBs.Left: sorted transactions in order of f, c, a, b, m, pRight: conditional databases of frequent items
Ed Chang EMMDS 2009 23
ExampleofProjectionp j
{ f c a m / f c a m / c b }f c a m p p:
b: { f c a / f / c }
{ f c a / f c a / f c a b }f c a b m
f b
m:
b:
{ f c / f c / f c }
{ f c a / f / c }f b
c b p a:
f c a m p c: { f / f / f }
Example of Projection of a database into CDBs.Left: sorted transactions; Right: conditional databases of frequent items
Ed Chang EMMDS 2009 24
Example of ProjectionExampleofProjection{ f c a m / f c a m / c b }f c a m p p:
b: { f c a / f / c }
{ f c a / f c a / f c a b }f c a b m
f b
m:
b:
{ f c / f c / f c }
{ f c a / f / c }f b
c b p a:
f c a m p c: { f / f / f }
Example of Projection of a database into CDBs.Left: sorted transactions; Right: conditional databases of frequent items
Ed Chang EMMDS 2009 25
Recursive Projections [H Li et al ACM RS 08]RecursiveProjections[H.Li,etal.ACMRS08]
Recursiveprojectionforma search treeMapReduceMapReduceMapReduce asearchtree
EachnodeisaCDB Usingtheorderofitemsto
t d li t d CDBc
MapReduceIteration 3Iteration 1
b
MapReduceIteration 2
MapReduce
D|ab D|abc preventduplicatedCDBs. Eachlevelofbreathfirst
searchofthetreecanbedone by a MapReducec
a
c
D|ab D|abc
D|acD|a
D donebyaMapReduceiteration.
OnceaCDBissmallenough to fit in memory
c
bD|bcD|b
D
enoughtofitinmemory,wecaninvokeFPgrowthtominethisCDB,andnomore growth of the sub
c D|c
Ed Chang EMMDS 2009 26
moregrowthofthesubtree.
ProjectionusingMapReduce
key: value(conditional transactions)key: value
Map inputs(transactions)
Sorted transactions(with infrequent items eliminated)
Reduce outputs(patterns and supports)
key="": value
Reduce inputs(conditional databases)key: value
Map outputs
f a c d g i m p f c a m pp c : 3p : 3{ f c a m / f c a m / c b }p:
m f : 3m c : 3
p:m:
a:
c:
f c a mf c afcf
p:{fcam/fcam/cb}p:3,pc:3
a b c f l m o f c a b mm c : 3m a : 3m f c : 3m f a : 3m c a : 3
{ f c a / f c a / f c a b }m:m:
b:a:
c:
f c a bf c af cf
a f c e l p m nb c k s pb f h j o f b
c b pf c a m p
m f c a : 3
b: b : 3{ f c a / f / c }{ f c / f c / f c }a: a : 3
a f : 3
b: fp: c bb:p:
c
f c a m a f : 3a c : 3a f c : 3c : 3c f : 3{ f / f / f }c:
p: m:
a:
c:
f c a mf c af cf
Ed Chang EMMDS 2009 27
c f : 3{ f / f / f }c:
CollaborativeFiltering
111Indicators/Diseases
1111111111111
a
l
s
Based on membership so far, and memberships of others
111
1111
I
n
d
i
v
i
d
u
aand memberships of others
1111
11Predict further membership
1111111
11
Ed Chang EMMDS 2009 28
LatentSemanticAnalysisUsers/Music/Ads/Question
e
r
s
Search Constructalatentlayerforbetter
forsemanticmatching
s
i
c
/
A
d
s
/
A
n
s
w
e
Example: iPhonecrack Applepie
U
s
e
r
s
/
M
u
s
1 recipe pastry for a 9 inch double crust 9 apples, 2/1 cup,
brown sugar
How to install apps on Apple mobile phones?
Documents
Other Collaborative Filtering Apps
Topic Distribution
OtherCollaborativeFilteringApps RecommendUsers Users RecommendMusic Users RecommendAds Users
R d A Q
Topic Distribution
Ed Chang EMMDS 2009 29
RecommendAnswers Q Predictthe?InthelightbluecellsUser quries iPhonecrack Applepie
Documents, Topics, WordsDocuments,Topics,Words
A document consists of a number of topics Adocumentconsistsofanumberoftopics Adocumentisaprobabilisticmixtureoftopics
Each topic generates a number of words Eachtopicgeneratesanumberofwords Atopicisadistributionoverwords
th Theprobabilityoftheith wordinadocument
Ed Chang EMMDS 2009 30
LatentDirichletAllocation[M.Jordan04]
: uniformDirichlet priorfor per document d topic forperdocumentdtopicdistribution(corpuslevelparameter) f hl : uniformDirichlet priorforpertopiczworddistribution(corpuslevel
)
zparameter) d isthetopicdistributionof
docd(documentlevel)
z
( ) zdj thetopicifthejth wordin
d,wdj thespecificword(wordlevel)
w
Nm
K
Ed Chang EMMDS 2009 31
level)M
LDA Gibbs Sampling: Inputs And OutputsLDAGibbsSampling:InputsAndOutputs
Inputs:
1. trainingdata:documentsasbagsofwords
h b f
words topics
2. parameter:thenumberoftopics
Outputs:
1 by product: a co occurrence
docs docs
1. byproduct:acooccurrencematrixoftopicsanddocuments.
2. modelparameters:aco wordstopics
occurrencematrixoftopicsandwords.
Ed Chang EMMDS 2009 32
Parallel Gibbs Sampling [aaim 09]ParallelGibbsSampling[aaim09]
Inputs:
1. trainingdata:documentsasbagsofwords
h b f
words topics
2. parameter:thenumberoftopics
Outputs:
1 by product: a co occurrence
docs docs
1. byproduct:acooccurrencematrixoftopicsanddocuments.
2. modelparameters:aco wordstopics
occurrencematrixoftopicsandwords.
Ed Chang EMMDS 2009 33
Ed Chang EMMDS 2009 34
Ed Chang EMMDS 2009 35
Outline
MotivatingApplications Q&ASystem
SocialAds
KeySubroutines FrequentItemsetMining[ACMRS08]
L Di i hl All i LatentDirichletAllocation[WWW09,AAIM09]
Clustering[ECML08]
UserRank [Google TR 09] UserRank[GoogleTR09]
SupportVectorMachines[NIPS07]
Distributed Computing Perspectives
Ed Chang EMMDS 2009 36
DistributedComputingPerspectives
Ed Chang EMMDS 2009 37
Ed Chang EMMDS 2009 38
Open Social APIsOpenSocialAPIs1
Profiles (who I am)
Open Social
2
Friends (who I know)Stuff (what I have)
4
3
Activities (what I do)
Ed Chang EMMDS 2009 39
Activities
Recommendations
Applications
Ed Chang EMMDS 2009 40
Applications
Ed Chang EMMDS 2009 41
Ed Chang EMMDS 2009 42
Ed Chang EMMDS 2009 43
Ed Chang EMMDS 2009 44
Task: Targeting Ads at SNS UsersTask:TargetingAdsatSNSUsersUsers
Ads
Ed Chang EMMDS 2009 45
MiningProfiles,Friends&Activitiesfor RelevanceforRelevance
Ed Chang EMMDS 2009 46
ConsideralsoUserInfluence
Advertisersconsiderhuserswhoare
Relevant InfluentialInfluential
SNSInfluenceAnalysis CentralityCentrality Credential Activeness etc.
Ed Chang EMMDS 2009 47
SpectralClustering[A.Ng,M.Jordan]p g [ g, ]
Importantsubroutineintasksofmachinelearninganddatamining Exploitpairwisesimilarity ofdatainstancesMore effective than traditional methods e g k means Moreeffectivethantraditionalmethodse.g.,kmeans
Keysteps Construct pairwise similarity matrixConstructpairwisesimilaritymatrix
e.g.,usingGeodiscdistance
ComputetheLaplacianmatrixA l i d iti Applyeigendecomposition
Perform kmeans
Ed Chang EMMDS 2009 48
Scalability ProblemScalabilityProblem Quadraticcomputationofnxnmatrix
Approximationmethods
Dense Matrix
Sparsification Nystrom Others
t NN neighborhood random greedy
Ed Chang EMMDS 2009 49
t-NN -neighborhood random greedy .
Sparsificationvs.Samplingp p g
Constructthedensesimilarity matrix S
Randomlysamplelpoints where l
Empirical Study [song et al ecml 08]EmpiricalStudy[song,etal.,ecml08] Dataset:RCV1(ReutersCorpusVolumeI)
A filt d ll ti f 193 944 d t i 103 Afilteredcollectionof193,944 documentsin103categories
Photoset:PicasaWeb 637,137 photos
Experiments Clusteringqualityvs.computationaltime
MeasurethesimilaritybetweenCATandCLS NormalizedMutualInformation(NMI)
Scalability
Ed Chang EMMDS 2009 51
NMI Comparison (on RCV1)NMIComparison(onRCV1)
N t th d S t i i ti
Ed Chang EMMDS 2009 52
Nystrommethod Sparsematrixapproximation
Speedup Test on 637,137 PhotosSpeedupTeston637,137 Photos K=1000clusters
A hi li d h i 32 hi ft th t Achieverlinearspeedup whenusing32machines,afterthat,sublinearspeedupbecauseofincreasingcommunicationandsynctime
Ed Chang EMMDS 2009 53
Sparsification vs. Samplingp p gSparsification Nystrom,random
samplingsampling
Information Full n x n NoneInformation Fullnxnsimilarityscores
None
Pre processing O(n2) worst case; O(nl) l
Outline
MotivatingApplications Q&ASystem
SocialAds
KeySubroutines FrequentItemsetMining[ACMRS08]
L Di i hl All i LatentDirichletAllocation[WWW09,AAIM09]
Clustering[ECML08]
UserRank [Google TR 09] UserRank[GoogleTR09]
SupportVectorMachines[NIPS07]
Distributed Computing Perspectives
Ed Chang EMMDS 2009 55
DistributedComputingPerspectives
Matrix Factorization AlternativesMatrixFactorizationAlternatives
exact
approximate
Ed Chang EMMDS 2009 56
approximate
PSVM [E Chang et al NIPS 07]PSVM[E.Chang,etal,NIPS07]
Columnbased ICF Column basedICF Slowerthanrowbasedonsinglemachine
Parallelizable on multiple machines Parallelizableonmultiplemachines
ChangingIPMcomputationordertoachievell li tiparallelization
Ed Chang EMMDS 2009 57
SpeedupSpeedup
Ed Chang EMMDS 2009 58
OverheadsOverheads
Ed Chang EMMDS 2009 59
ComparisonbetweenParallelkComputingFrameworks
MapReduce Project B MPI
GFS/IO and task rescheduling overhead between iterations
Yes No+1
No+1
Flexibility of computation model AllReduce only AllReduce only Flexibley p y+0.5
y+0.5 +1
Efficient AllReduce Yes+1
Yes+1
Yes+1
Recover from faults between iterations
Yes+1
Yes+1
Apps
Recover from faults within each Yes Yes AppsRecover from faults within each iteration
Yes+1
Yes+1
Apps
Final Score for scalable machine learning
3.5 4.5 5
Ed Chang EMMDS 2009 60
g
ConcludingRemarks Applicationsdemandscalablesolutions Haveparallelizedkeysubroutinesforminingmassivedatasets
SpectralClustering[ECML08] FrequentItemsetMining[ACMRS08] PLSA[KDD08] LDA[WWW09] UserRank SupportVectorMachines[NIPS07]
Relevantpapers http://infolab.stanford.edu/~echang/
Open Source PSVM PLDA OpenSourcePSVM,PLDA http://code.google.com/p/psvm/ http://code.google.com/p/plda/
Ed Chang EMMDS 2009 61
CollaboratorsCollaborators
Prof.ChihJenLin(NTU)( ) HongjieBai(Google) WenYenChen(UCSB)
( ) JonChu(MIT) HaoyuanLi(PKU) Yangqiu Song (Tsinghua) YangqiuSong(Tsinghua) MattStanton(CMU) YiWang(Google)g ( g ) DongZhang(Google) KaihuaZhu(Google)
Ed Chang EMMDS 2009 62
ReferencesReferences[1] Alexa internet. http://www.alexa.com/.[2] D. M. Blei and M. I. Jordan. Variational methods for the dirichlet process. In Proc. of the 21st international [ ] p
conference on Machine learning, pages 373-380, 2004.[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-
1022, 2003.[4] D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the Seventeenth
International Conference on Machine Learning, pages 167-174, 2000.[5] D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In [ ] g p yp y
Advances in Neural Information Processing Systems 13, pages 430-436, 2001.[6] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic
analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990.[7] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm.
Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977.[8] S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE [8] S Ge a a d Ge a S oc as c e a a o , g bbs d s bu o s, a d e bayes a es o a o o ages
Transactions on Pattern recognition and Machine Intelligence, 6:721-741, 1984.[9] T. Hofmann. Probabilistic latent semantic indexing. In Proc. of Uncertainty in Arti
cial Intelligence, pages 289-296, 1999.[10] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information System, 22(1):89-
115, 2004.[11] A. McCallum, A. Corrada-Emmanuel, and X. Wang. The author-recipient-topic model for topic and role discovery in[11] A. McCallum, A. Corrada Emmanuel, and X. Wang. The author recipient topic model for topic and role discovery in
social networks: Experiments with enron and academic email. Technical report, Computer Science, University of Massachusetts Amherst, 2004.
[12] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In Advances in Neural Information Processing Systems 20, 2007.
[13] M. Ramoni, P. Sebastiani, and P. Cohen. Bayesian clustering by dynamics. Machine Learning, 47(1):91-121, 2002.
Ed Chang EMMDS 2009 63
References (cont )References (cont.)[14] R. Salakhutdinov, A. Mnih, and G. Hinton. Restricted boltzmann machines for collaborative
ltering. In Proc. Of the 24th international conference on Machine learning, pages 791-798, 2007.g g, p g ,[15] E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social
network. In Proc. of the 11th ACM SIGKDD international conference on Knowledge discovery in data mining, pages 678-684, 2005.
[16] M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Gri ths. Probabilistic author-topic models for information discovery. In Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 306-315, 2004.,
[17] A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research (JMLR), 3:583-617, 2002.
[18] T. Zhang and V. S. Iyengar. Recommender systems using linear classiers. Journal of Machine Learning Research, 2:313-334, 2002.
[19] S. Zhong and J. Ghosh. Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS), 8:374-384, 2005.o a o Sys e s ( S), 8 3 38 , 005
[20] L. Admic and E. Adar. How to search a social network. 2004[21] T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, pages
5228-5235, 2004.[22] H. Kautz, B. Selman, and M. Shah. Referral Web: Combining social networks and collaborative filtering.
Communitcations of the ACM, 3:63-65, 1997.[23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD[23] R. Agrawal, T. Imielnski, A. Swami. Mining association rules between sets of items in large databses. SIGMOD
Rec., 22:207-116, 1993. [24] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In
Proceedings of the Fourteenth Conference on Uncertainty in Artifical Intelligence, 1998.[25] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-
177, 2004.
Ed Chang EMMDS 2009 64
References (cont )References (cont.)[26] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms.
In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.[27] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-
177, 2004.177, 2004.[28] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms.
In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.[29] M. Brand. Fast online svd revisions for lightweight recommender systems. In Proceedings of the 3rd SIAM
International Conference on Data Mining, 2003.[30] D. Goldbberg, D. Nichols, B. Oki and D. Terry. Using collaborative filtering to weave an information tapestry.
Communication of ACM 35, 12:61-70, 1992.[31] P Resnik N Iacovou M Suchak P Bergstrom and J Riedl Grouplens: An open architecture for aollaborative[31] P. Resnik, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An open architecture for aollaborative
filtering of netnews. In Proceedings of the ACM, Conference on Computer Supported Cooperative Work. Pages 175-186, 1994.
[32] J. Konstan, et al. Grouplens: Applying collaborative filtering to usenet news. Communication ofACM 40, 3:77-87, 1997.
[33] U. Shardanand and P. Maes. Social information filtering: Algorithms for automating word of mouth. In Proceedings of ACM CHI, 1:210-217, 1995.
[34] G Ki d B S ith d J Y k A d ti it t it ll b ti filt i IEEE I t t[34] G. Kinden, B. Smith and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7:76-80, 2003.
[35] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning Journal 42, 1:177-196, 2001.
[36] T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of International Joint Conference in Artificial Intelligence, 1999.
[37] http://www.cs.carleton.edu/cs comps/0607/recommend/recommender/collaborativefiltering.html[ ] p _ p g[38] E. Y. Chang, et. al., Parallelizing Support Vector Machines on Distributed Machines, NIPS, 2007.[39] Wen-Yen Chen, Dong Zhang, and E. Y. Chang, Combinational Collaborative Filtering for personalized community
recommendation, ACM KDD 2008.[40] Y. Sun, W.-Y. Chen, H. Bai, C.-j. Lin, and E. Y. Chang, Parallel Spectral Clustering, ECML 2008.
Ed Chang EMMDS 2009 65