Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Theses
Turin, January 2015
Elena Baralis, Tania Cerquitelli, Silvia Chiusano, Paolo Garza
Luca Cagliero, Luigi Grimaudo
Daniele Apiletti, Giulia Bruno, Alessandro Fiori
2DBMG
General information
Duration: 6 months full time
equivalent overall duration if part time
Internal thesis
cooperation on active research topic or research project
good programming and analytical skills required
supervised by a group member
can work at home or in our lab (LAB5)
External thesis (stage)
supervised by external tutor
More info on topicshttp://dbdmg.polito.it/wordpress/theses
3DBMG
Main Topics
Big data and cloud-based data mining services and algorithms
Database and data mining applications
Text and social network mining
Network traffic data analysis
Clinical and biological data management
Green/urban data mining
…
4DBMG
Big data and cloud-based data mining
Study of innovative, parallel, and distributed data mining approaches for
Pattern mining algorithms
Clustering techniques
Classification algorithms
Summarization algorithms
to efficiently gain interesting insights from huge data volume
Design and development of novel cloud-based data mining services based on
HADOOP and Spark frameworks
MapReduce paradigm
Exploitation of the cloud-based services for novel big data analytics applications (e.g., network traffic data, fraud detection, social networks)
Analysis modules based on HADOOP and Spark Ecosystems
European research project ONTIC
5DBMG
Itemset mining algorithms for time series data analysis design and development of novel itemset mining algorithms
targeted to time series data analysis analysis of historical financial data (e.g., stock prices, stock indexes,
credit card transactions) to plan trading and investment strategies to support fraud detection
Data Mining Algorithms
Integration of data mining algorithms into Rapid Miner Rapid Miner is an established Java-based machine learning tool integration in Rapid Miner of state-of-the-art algorithms for
weighted itemset mining document analysis and summarization
6DBMG
Design and implementation of anautomatic system (Mining Advisor) toselect for a dataset an optimal miningalgorithm for a given analysis taskbased on innovative data characterization
statistics definition/design of mining algorithms
(i.e., access methods and mining primitives), possibly disk-based
algorithm selection strategies exploiting a trade-off between accuracy and exploration time
Different instances of a Mining advisor can be tailored to different data mining techniques (e.g., clustering algorithms, pattern discovering)
Mining Advisor
Data characterizationthrough metrics
computation
Mining advisor
Miningprimitives
Algorithmselection
Access methods
Datasetunder
analysis
7DBMG
Network data analysis
Analysis of huge wireless network traffic captures
wireless traffic monitoring by exploiting itemset
mining algorithms
wireless traffic classification by exploiting association
rules and probabilistic models
Analysis of very large wired network traffic captures
wired traffic monitoring and characterization by exploiting data mining techniques (e.g., association rules, clustering, classification)
distributed VLDB/cloud technologies to support efficient storage,retrieval and indexing of huge amounts of network data
European research projects: mPlane and ONTIC
8DBMG
Text mining Text summarization
identification of salient knowledge from news, research articles, blogs
generation of sound and easy-to-read summaries of large document collections
development of multi-lingual and automatic summarization systems development of cloud-based summarization systems targeted to the
extraction of succinct summaries from big data collections
Social and educational text mining Content curation systems allow users to build personalized and
dynamically updated news reports Integration of a summarization system into a content curation
platform Evaluation of system appreciation and feedback
E-learning refers to the use of ICTs in education Development and integration of a summarization system in an e-
learning context
9DBMG
Social network mining
Social network analysis user behavior analysis by means of data mining techniques topic extraction and correlation analysis discovery of user communities, trends and deviations classification of web objects using user-generated content
Social watching analysis of social messages during specific TV programs analysis of evolution of hashtags during program broadcast characterization of user groups and social interactions
10DBMG
Clinical and biological data management
Physiological data analysis
analyze physiological data collected during incremental tests (e.g., cardiopulmonary exercise testing) commonly used in clinical domain and in sport science improve the effectiveness of the reliability/training sessions predict the final values of crucial parameters reduce test duration and the physical effort for patients/athletes
Clinical data analysis analyze data collected by the healthcare network of an Italian Health
Care Center
extract medical treatments (in terms of performed examinations, prescribed drugs) frequently done by patients
identify deviation from expected medical treatments according to medical guidelines
11DBMG
Genomic Computing
Next Generation Sequencing (NGS) is a new and high-throughput technology for DNA sequencing
There is a need for new and effective data mining approaches to discover knowledge from NGS data
Goals analysis of complex NGS data and development of innovative
semantics-aware algorithms
exploitation of bio-ontologies, gene/protein and genetic disorder libraries
smart indexing and mining of large-scale NGS datasets
compact data representation and efficient data access
mining algorithms based on disk-based structures
National research project [PRIN 2011]: GenData 2020
12DBMG
Green data mining
Joint analysis of
Energy consumption logs of residential and public building heating systems and indoor climate conditions
Data on the user thermal comfort perception of indoor climate conditions and user feedbacks
Goals
Suggest ready-to-implement energy efficient actions based on innovative and user-friendly indicators
Discovering of interesting correlations among the large and heterogeneous amount of available data
Regional research project: EDEN project
13DBMG
Green data mining
Joint analysis of energy and water consumption data to efficiently support an intelligent building management system localization of network losses and leaks detection of abnormal consumption characterization of user consumption forecast of energy and water consumption
Analysis of available bikes in the stations of a public bike-sharing system to forecast critical situations (e.g., empty or full stations) to
reschedule the bike redistribution process on the fly characterize the cyclic mobility patterns to support
human mobility in urban areas
14DBMG
Data analysis for Smart Cities
Mining urban data to increase the well-being of citizens by improving the efficiency, accessibility and functionality of provided services
Analysis of data collected through sensor networks embedded in smart street furniture
National research project
Analysis of air pollution data on urban area to detect possible critical conditions
National research project: MIE
Analysis on data un citizens urban safety and security
S[M2]ART
15DBMG
Data analysis for Smart Cities
IOC (Intelligent Operations Center) - IBM Platform for data analytics Study IOC architecture, data flow and programming model Deploy IOC in a real application to efficiently support
an intelligent transportation system an intelligent water management system
16DBMG
Laboratory Assistant Suite modular architecture to manage different kinds of raw experimental data, tracking
several laboratory activities, integrate different resources and aid in performing a variety of analyses to extract knowledge related to tumors
design of model-driven automated GUI generation development of infrastructural components (e.g., task scheduler, email notification
system, dashboard)
Biomedical Informatics @ IRCCS
Genome analysis innovative approaches to analyzing NGS data from human genomes analytical algorithms to identify genetic variants in tumors and in the blood of patients implementation and optimization of analytical algorithms for identification and
classification of genetic variants in paired comparative tests development of data analysis pipelines in parallel and distributed environment
Microarray data analysis study of class discovery algorithms (e.g., clustering, bi-clustering) identify robust gene markers by means of the integration of several classification
methods analysis of gene expression values over the time on data derived from xenopatients
17DBMG
Realization of virtual representation of life-science working environments based on 3D interactive models development of a prototypic application for user-friendly management of complex and hierarchical
storage systems, by means of 3D realistic representation of the physical containers and their interactions
Computer vision for sensor-based real-time tracking of laboratory activities development of a prototypic platform for automated monitoring of interactions between users,
instruments and experimental materials
virtual representation of objects and activities is also foreseen to provide intuitive and user-friendly GUIs
Virtual Laboratory @ IRCCS
Biological laboratories need Next Generation
LIMS
Virtual reality to improve graphical
user interfaces usability in laboratoryinformation management systems (LIMS)
Computer vision to improve the efficiency
of laboratory data-tracking procedures
18DBMG
Ooros exploiting geo-located social interactions among
people, places and businesses (e.g., checkins) business intelligence for social recommendations
Analysis of a real-world dataset from hundreds of businesses in Turin, see www.desidoo.com
External stages
Narus Lab Developing novel solutions to analyze network traffic data for security purposes (i.e., malware traffic detection, signature generation, anomaly detection), using machine learning and data mining techniques to find relevant patterns
The thesis will be conducted in collaboration with NARUS, Inc. – Sunnyvale, California in the context of the new laboratories that Narus is opening within the Politecnico di Torino Campus.