Upload
david-mcdaniel
View
213
Download
0
Embed Size (px)
Citation preview
Multimedia Data MiningArvind Balasubramanian
[email protected] Lab (ECSS 4.416)The University of Texas at Dallas
Me and My Research• Research Interests: – Machine Learning– Data Mining– Statistical Analysis– Applications of the above in Multimedia
• I am currently working on – developing a clustering algorithm guided by statistical
analysis– deriving a composite grading scale for speech and language
disorders, in collaboration with the UTD Callier Center
Data Mining and Multimedia
• Uncovering hidden information from data.• Exploiting data to obtain new
knowledge and interpret results.• Immense applications in Multimedia.
Data Mining Techniques
• Classification• Prediction• Cluster Analysis & Class Discovery• Extraction and Retrieval• Statistical Analysis
Ideas for ProjectsText Mining• Information Extraction from Domain-specific
documents – involves extracting data from free text pieces and
populating a database– Serves to organize required information available
in unorganized form– Not enough in itself; combine with class
discovery
Ideas for ProjectsText Mining• New Class Discovery using Clustering
techniques– identifying groups of keywords that do not fall into
known categories– creating new categories and validating them– Possibly employ clustering algorithms with proper
similarity measure or distance functions
Ideas for ProjectsText Mining (contd.)• Query-based document retrieval system– employ one of several base models such as a
probabilistic model or a vector space model– design an efficient indexing system– include relevance ranking feature– possibly make the system intelligent using
machine learning techniques
Ideas for ProjectsPattern Recognition in Multimedia Data• Scope– analyze and identify interrelationships within
Multimedia data sets– Derive a composite score from several different sub-
scores• Methods– classic techniques like Principal Component Analysis
(PCA) and Factor Analysis (FA)– Statistical methods such as Regression analysis
Ideas for ProjectsPattern Recognition in Multimedia Data
(contd.)• Methods– Principal Component Analysis (PCA)
(a) Dimensionality Reduction(b) Efficient Storage and Retrieval of Media data(c) Applications in any multi-dimensional media: Images
(noise reduction), Video (content analysis), Audio (Voice Signature recognition)
Ideas for ProjectsPattern Recognition in Multimedia Data
(contd.)• Methods– Factor Analysis (FA)
(a) Minimize data redundancy(b) Reveal hidden patterns(c) combining attributes to form a single attribute by
determining the importance and contribution of each attribute
(d) Medical analysis, IQ tests, Personality tests, Software measurement, Multimedia content analysis, Motion Capture Data analysis.
Ideas for ProjectsPattern Recognition in Multimedia Data
(contd.)• Methods– Statistical Analysis
(a) Correlation analysis to bring out interrelationships between data attributes
(b) Regression analysis to analyze the ability of a set of data attributes to predict other data attributes
Ideas for ProjectsPrediction and Suggestion Systems• An intelligent media hosting application that– learns from user queries and requests, and
accordingly suggests other media items– Suggested items would be retrieved by querying on
the features of the media features and metadata– Examples: Esnips music hosting– Many machine learning techniques could be
employed: Bayesian reasoning and classification algorithms
Ideas for Projects
• Ideas for alternative projects having to do with applications of machine learning, data mining and statistical analysis in the domain of multimedia are welcome.
• Tools – Weka, Matlab, Statistical software packages (even Excel helps a lot!!).
Thank You