49
Data Fusion Techniques and Application Guangyu Zhou Reference paper: Zheng Yu: Methodologies for Cross-Domain Data Fusion: An Overview

Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

DataFusionTechniquesandApplication

GuangyuZhou

Referencepaper:ZhengYu:MethodologiesforCross-DomainDataFusion:AnOverview

Page 2: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Agenda§ Introduction§ Relatedwork§ Datafusiontechniques&applications

§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ Summary

Page 3: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Whatisdatafusion?§ Datafusion istheprocessofintegratingmultipledatasourcestoproducemoreconsistent,accurate,andusefulinformationthanthatprovidedbyanyindividualdatasource---- Wikipedia

Page 4: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Whydatafusion?§ Inthebigdataera,wefaceadiversityofdatasetsfromdifferentsourcesindifferentdomains,consistingofmultiplemodalities:§ Representation,distribution,scale,anddensity.

§ Howtounlockthepowerofknowledgefrommultipledisparate(butpotentiallyconnected)datasets?§ Treatingdifferentdatasetsequallyorsimplyconcatenatingthefeaturesfromdisparatedatasets?

Page 5: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Whydatafusion?§ Inthebigdataera,wefaceadiversityofdatasetsfromdifferentsourcesindifferentdomains,consistingofmultiplemodalities:§ Representation,distribution,scale,anddensity.

§ Howtounlockthepowerofknowledgefrommultipledisparate(butpotentiallyconnected)datasets?§ Treatingdifferentdatasetsequallyorsimplyconcatenatingthefeaturesfromdisparatedatasets

§ Useadvanceddatafusiontechniquesthatcanfuseknowledgefromvariousdatasetsorganicallyinamachinelearninganddataminingtask

Page 6: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

RelatedWork§ RelationtoTraditionalDataIntegration

Page 7: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

RelatedWork§ RelationtoHeterogeneousInformationNetwork

§ Itonlylinkstheobjectinasingledomain:§ Bibliographicnetwork,author,papers,andconferences.§ Flickrinformationnetwork:users,images,tags,andcomments.

§ Aimtofusedataacrossdifferentdomains:§ Trafficdata,socialmediaandairquality

§ Heterogeneousnetworkmaynotbeabletofindexplicitlinkswithsemanticmeaningsbetweenobjectsofdifferentdomains.

Page 8: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Datafusionmethodologies§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ multi-viewlearning-based§ similarity-based§ probabilisticdependency-based§ andtransferlearning-basedmethods.

Page 9: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Stage-baseddatafusionmethods§ Differentdatasetsatdifferentstagesofadataminingtask.§ Datasetsarelooselycoupled,withoutanyrequirementsontheconsistencyoftheirmodalities.

§ Canbeameta-approachusedtogetherwithotherdatafusionmethods

Page 10: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Mappartitionandgraphbuildingfortaxitrajectory

Page 11: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Friendrecommendation

§ Stages§ I.Detectstaypoints§ II.MaptoPOIvector§ III.Hierarchicalclustering§ IV.Partialtree§ V.Hierarchicalgraph

§ ->comparable(fromsametree)

Page 12: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Datafusionmethodologies§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ multi-viewlearning-based§ similarity-based§ probabilisticdependency-based§ andtransferlearning-basedmethods.

Page 13: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Feature-level-baseddatafusion§ DirectConcatenation

§ Treatfeaturesextractedfromdifferentdatasetsequally,concatenatingthemsequentiallyintoafeaturevector

§ Limitations:§ Over-fitting inthecaseofasmallsizetrainingsample,andthespecificstatisticalpropertyofeachviewisignored.

§ Difficulttodiscoverhighlynon-linearrelationshipsthatexistbetweenlow-levelfeaturesacrossdifferentmodalities.

§ Redundanciesanddependenciesbetweenfeaturesextractedfromdifferentdatasetswhichmaybecorrelated.

Page 14: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Feature-level-baseddatafusion§ DirectConcatenation+sparsityregularization:

§ handlethefeatureredundancyproblem

§ Dualregularization(i.e.,zero-meanGaussianplusinverse-gamma)§ RegularizemostfeatureweightstobezeroorclosetozeroviaaBayesiansparseprior

§ Allowforthepossibilityofamodellearninglargeweightsforsignificantfeatures

Page 15: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Feature-level-baseddatafusion§ DNN-BasedDataFusion§ Usingsupervised,unsupervisedandsemi-supervisedapproaches,DeepLearninglearnsmultiplelevelsofrepresentationandabstraction

§ Unifiedfeaturerepresentationfromdisparatedataset

Page 16: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

DNN-BasedDataFusion§ DeepAutoencoderModelsoffeaturerepresentationbetween2modalities(audio+video)

Page 17: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

MultimodalDeepBoltzmannMachine§ ThemultimodalDBMisagenerativeandundirectedgraphicmodel.

§ Enablesbi-directionalsearch.

§ Tolearn

Page 18: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

LimitationsofDNN-basedfusionmodel§ Performanceheavilydependonparameters

§ Findingoptimalparametersisalaborintensiveandtime-consumingprocessgivenalargenumberofparametersandanon-convexoptimizationsetting.

§ Hardtoexplainwhatthemiddle-levelfeaturerepresentationstandsfor.§ WedonotreallyunderstandthewayaDNNmakesrawfeaturesabetterrepresentationeither.

Page 19: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Semanticmeaning-baseddatafusion§ Unlikefeature-basedfusion,semanticmeaning-basedmethodsunderstandtheinsight ofeachdatasetandrelations betweenfeaturesacrossdifferentdatasets.

§ 4groupsofsemanticmeaningmethods:§ multi-view-based,similarity-based,probabilisticdependency-based,andtransfer-learning-basedmethods.

Page 20: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Datafusionmethodologies§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ multi-viewlearning-based§ co-training,multiplekernellearning(MKL),subspacelearning

§ similarity-based§ probabilisticdependency-based§ andtransferlearning-basedmethods.

Page 21: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion§ Differentdatasetsordifferentfeaturesubsetsaboutanobjectcanberegardedasdifferentviewsontheobject.

§ Person:face,fingerprint,orsignature§ Image:colorortexturefeatures

§ Latentconsensus&complementaryknowledge§ 3subcategories:

§ 1)co-training§ 2)multiplekernellearning(MKL)§ 3)subspacelearning

Page 22: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:Co-training§ Co-trainingconsidersasettinginwhicheachexamplecanbepartitionedintotwodistinctviews,makingthreemainassumptions:§ Sufficiency:eachviewissufficientforclassificationonitsown§ Compatibility:thetargetfunctionsinbothviewspredictthesamelabelsforco-occurringfeatureswithhighprobability

§ Conditionalindependence:theviewsareconditionallyindependentgiventheclasslabel.(Toostronginpractice)

Page 23: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:Co-training§ OriginalCo-training

Page 24: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Co-training-basedairqualityinferencemodel

Page 25: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:MKL§ 2.Multi-KernelLearning§ Akernelisahypothesisonthedata§ MKL referstoasetofmachinelearningmethodsthatusesapredefinedsetofkernelsandlearnsanoptimallinearornon-linearcombinationofkernelsaspartofthealgorithm.§ Eg:Ensembleandboostingmethods,suchasRandomForest,areinspiredbyMKL.

Page 26: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:MKL§ MKL-basedframeworkforforecastingairquality.

Page 27: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:MKL§ TheMKL-basedframeworkoutperformsasinglekernel-basedmodelintheairqualityforecastexample§ Featurespace:

§ Thefeaturesusedbythespatialandtemporalpredictorsdonothaveanyoverlaps,providingdifferentviewsonastation’sairquality.

§ Model:§ Thespatialandtemporalpredictorsmodelthelocalfactorsandglobalfactorsrespectively,whichhavesignificantlydifferentproperties.

§ Parameterlearning:§ Decomposingabigmodelinto3coupledsmallonesscalesdowntheparameterspacestremendously.

Page 28: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:subspacelearning§ Obtainalatentsubspacesharedbymultipleviewsbyassumingthatinputviewsaregeneratedfromthislatentsubspace,

§ Subsequenttasks,suchasclassificationandclustering§ Lowerdimensionality

Page 29: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion:subspacelearning§ Eg:PCA->

§ Linearcase:Canonicalcorrelationanalysis(CCA)§ maximizingthecorrelationbetween2viewsinthesubspace

§ Non-linear:KernelvariantofCCA(KCCA)§ mapeach(non-linear)datapointtoahigherspaceinwhichlinearCCAoperates.

Page 30: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Multi-ViewBasedDataFusion§ SummaryofMulti-ViewBasedmethods

§ 1)co-training:maximizethemutualagreementontwodistinctviewsofthedata.

§ 2)multiplekernellearning(MKL):exploitkernelsthatnaturallycorrespondtodifferentviewsandcombinekernelseitherlinearlyornon-linearlytoimprovelearning.

§ 3)subspacelearning:obtainalatentsubspacesharedbymultipleviews,assumingthattheinputviewsaregeneratedfromthislatentsubspace

Page 31: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Datafusionmethodologies§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ multi-viewlearning-based§ similarity-based

§ CoupledMatrixFactorization§ ManifoldAlignment

§ probabilisticdependency-based§ andtransferlearning-basedmethods.

Page 32: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

§ Recall:MatrixdecompositionbySVD

§ Problemsofsinglematrixdecompositionondifferentdatasets:§ Inaccuratecomplementationofmissingvaluesinthematrix.

Page 33: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Similarity-Based:CoupledMatrixFactorization§ Solutionbycoupled(context-aware)matrixfactorization:

§ Toaccommodatedifferentdatasetswithdifferentmatrices(distribution,meaning),whichshareacommondimensionbetweenoneanother.

§ Bydecomposingthesematricescollaboratively,wecantransferthesimilaritybetweendifferentobjectslearnedfromadatasettoanotherone,thereforecomplementingthemissingvaluesmoreaccurately.

Page 34: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

CoupledMatrixFactorizationApplication§ Estimatethetravelspeedoneachroadsegmentinanentirecity,basedontheGPStrajectoryofasampleofvehicles

Page 35: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

CoupledMatrixFactorizationApplication§ Coupledmatrixfactorization

§ Objectivefunction:

Page 36: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Similarity-Based:ManifoldAlignment§ Utilizestherelationshipsofinstanceswithineachdatasettostrengthentheknowledgeoftherelationships between thedatasets,therebyultimatelymapping initiallydisparatedatasetsto ajointlatentspace

§ Mapstwodatasets(X,Y)toanewjointlatentspace(f(X);g(Y)),

Page 37: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Similarity-Based:ManifoldAlignment§ Preserves2similarities:

§ Thelocalsimilaritywithinadataset,

§ Thecorrespondencesacrossdifferentdatasets.

§ C,costfunction;F,embeddingofdata;W,similaritymatrix;a,theathdataset

Page 38: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Similarity-Based:ManifoldAlignment§ Manifoldalignmentassumesthedisparatedatasetstobealignedhavethesameunderlyingmanifoldstructure

§ ThesecondlossfunctionissimplythelossfunctionforLaplacianEigen-mapsusingthejointadjacencymatrix:L=D- W

Page 39: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

CoupledMatrixFactorization+manifold§ Example:Inferthefine-grainednoisesituationbyusingcomplaintdatatogetherwithsocialmedia,roadnetworkdata,andPOIs

Page 40: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Datafusionmethodologies§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ multi-viewlearning-based§ similarity-based§ probabilisticdependency-based§ andtransferlearning-basedmethods.

Page 41: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

ProbabilisticDependency-BasedFusion§ Thiscategoryofapproachesbridgesthegapbetweendifferentdatasetsbytheprobabilisticdependency,whichemphasizemoreabouttheinteraction ratherthanthesimilarity betweentwoobjects.

§ Twobranchesofgraphicalrepresentationsofdistributionsarecommonlyused:§ BayesianNetworks§ MarkovNetworks(a.k.a.MarkovRandomField)

Page 42: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

ProbabilisticDependency-BasedFusionModel§ ThegraphicalstructureoftrafficvolumeinferencemodelbasedonPOIs,roadnetworks,travelspeedandweather.§ Agraynodedenotesahiddenvariableandwhitenodesareobservations.§ 𝜃:roadhiddenvariable§ 𝛼:POIhiddenvariable§ 𝑁$:Trafficvolumehiddenvariable

Page 43: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Datafusionmethodologies§ Stage-basedmethods§ Featurelevel-basedmethods§ Semanticmeaning-baseddatafusionmethods

§ multi-viewlearning-based§ similarity-based§ probabilisticdependency-based§ transferlearning-basedmethods.

Page 44: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Transferlearning-basedmethods§ Anassumptioninmanymachinelearningalgorithmsisthatthetrainingandtestdatamustbeinthesamefeaturespace andhavethesamedistribution.

§ Transferlearning,incontrast,allowsthedomains,tasks,anddistributionsusedintrainingandtestingtobedifferent.

§ Examples:§ Auser’stransactionrecordsinAmazon->applicationoftravelrecommendation.

§ Theknowledgelearnedfromonecity’strafficdata->anothercity.

Page 45: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

TaxonomyofTransferlearning

Page 46: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

TransferbetweentheSameTypeofDatasets§ Examplesofmulti-tasktransferlearning

Page 47: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

TransferLearningamongMultipleDatasets

Page 48: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

ComparisonofDifferentDataFusionMethods

FillingMissingValues(ofasparsedataset),PredictFuture,CausalityInference,ObjectProfiling,andAnomalyDetection.

Page 49: Data Fusion Techniques and Applicationyunshengb.com/wp-content/uploads/2017/12/... · Feature-level-based data fusion § Direct Concatenation § Treat features extracted from different

Thankyou!

Q&A