Upload
gfri
View
0
Download
0
Embed Size (px)
Citation preview
Fabrice Guillet, Gilbert Ritschard, and Djamel Abdelkader Zighed (Eds.)
Advances in Knowledge Discovery and Management
Studies in Computational Intelligence,Volume 398
Editor-in-Chief
Prof. Janusz Kacprzyk
Systems Research Institute
Polish Academy of Sciences
ul. Newelska 6
01-447Warsaw
Poland
E-mail: [email protected]
Further volumes of this series can be found on our
homepage: springer.com
Vol. 378. János Fodor, Ryszard Klempous, and
Carmen Paz Suárez Araujo (Eds.)
Recent Advances in Intelligent Engineering Systems, 2011ISBN 978-3-642-23228-2
Vol. 379. Ferrante Neri, Carlos Cotta, and
Pablo Moscato (Eds.)
Handbook of Memetic Algorithms, 2011ISBN 978-3-642-23246-6
Vol. 380.Anthony Brabazon,Michael O’Neill, and
Dietmar Maringer (Eds.)
Natural Computing in Computational Finance, 2011ISBN 978-3-642-23335-7
Vol. 381. Rados law Katarzyniak, Tzu-Fu Chiu,
Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.)
Semantic Methods for Knowledge Management and
Communication, 2011ISBN 978-3-642-23417-0
Vol. 382. F.M.T. Brazier, Kees Nieuwenhuis, Gregor Pavlin,
Martijn Warnier, and Costin Badica (Eds.)
Intelligent Distributed Computing V, 2011
ISBN 978-3-642-24012-6
Vol. 383. Takayuki Ito,Minjie Zhang,Valentin Robu,
Shaheen Fatima, and Tokuro Matsuo (Eds.)
New Trends in Agent-Based Complex AutomatedNegotiations, 2012ISBN 978-3-642-24695-1
Vol. 384. DaphnaWeinshall, Jorn Anemuller,
and Luc van Gool (Eds.)
Detection and Identification of Rare Audiovisual Cues, 2012ISBN 978-3-642-24033-1
Vol. 385.Alex Graves
Supervised Sequence Labelling with Recurrent NeuralNetworks, 2012
ISBN 978-3-642-24796-5
Vol. 386.Marek R. Ogiela and Lakhmi C. Jain (Eds.)
Computational Intelligence Paradigms in Advanced Pattern
Classification, 2012ISBN 978-3-642-24048-5
Vol. 387. David Alejandro Pelta, Natalio Krasnogor,
Dan Dumitrescu, Camelia Chira, and Rodica Lung (Eds.)
Nature Inspired Cooperative Strategies for Optimization
(NICSO 2011), 2011ISBN 978-3-642-24093-5
Vol. 388. Tiansi Dong
Recognizing Variable Environments, 2012
ISBN 978-3-642-24057-7
Vol. 389. Patricia Melin
Modular Neural Networks and Type-2 Fuzzy Systems forPattern Recognition, 2012ISBN 978-3-642-24138-3
Vol. 390. Robert Bembenik, Lukasz Skonieczny,
Henryk Rybinski, and Marek Niezgodka (Eds.)
Intelligent Tools for Building a Scientific InformationPlatform, 2012
ISBN 978-3-642-24808-5
Vol. 391. Herwig Unger, Kyandoghere Kyamaky,
and Janusz Kacprzyk (Eds.)
Autonomous Systems: Developments and Trends, 2012ISBN 978-3-642-24805-4
Vol. 392. Narendra Chauhan,Machavaram Kartikeyan,
and Ankush Mittal
Soft Computing Methods for Microwave and Millimeter-Wave
Design Problems, 2012ISBN 978-3-642-25562-5
Vol. 393. Hung T. Nguyen,Vladik Kreinovich, BerlinWu,
and Gang Xiang
Computing Statistics under Interval and FuzzyUncertainty, 2012
ISBN 978-3-642-24904-4
Vol. 394. David A. Elizondo,Agusti Solanas,
and Antoni Martınez-Balleste (Eds.)
Computational Intelligence for Privacy
and Security, 2012ISBN 978-3-642-25236-5
Vol. 395. Srikanta Patnaik and Yeon-MoYang (Eds.)
Soft Computing Techniques in Vision Science, 2012ISBN 978-3-642-25506-9
Vol. 396.Marielba Zacarias and
Jose Valente de Oliveira (Eds.)
Human-Computer Interaction: The Agency Perspective, 2012ISBN 978-3-642-25690-5
Vol. 397. Elena Nikolaevskaya,Alexandr Khimich, and
Tamara Chistyakova
Programming with Multiple Precision, 2012ISBN 978-3-642-25672-1
Vol. 398. Fabrice Guillet, Gilbert Ritschard,
and Djamel Abdelkader Zighed (Eds.)
Advances in Knowledge Discovery and Management, 2012ISBN 978-3-642-25837-4
Fabrice Guillet, Gilbert Ritschard,and Djamel Abdelkader Zighed (Eds.)
Advances in KnowledgeDiscovery and Management
Volume 2
123
EditorsProf. Fabrice GuilletNantes UniversityFrance
Prof. Gilbert RitschardUniversite de GeneveSwitzerland
Prof. Djamel Abdelkader ZighedUniversite Lumiere Lyon 2
Bron
France
ISSN 1860-949X e-ISSN 1860-9503ISBN 978-3-642-25837-4 e-ISBN 978-3-642-25838-1DOI 10.1007/978-3-642-25838-1Springer Heidelberg New York Dordrecht London
Library of Congress Control Number: 2010928588
c© Springer-Verlag Berlin Heidelberg 2012
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction onmicrofilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed. Exempted from this legal reservation are brief ex-cerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeof being entered and executed on a computer system, for exclusive use by the purchaser of the work.Duplication of this publication or parts thereof is permitted only under the provisions of the CopyrightLaw of the Publisher’s location, in its current version, and permission for use must always be obtainedfrom Springer. Permissions for use may be obtained through RightsLink at the Copyright ClearanceCenter.Violations are liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publi-cation does not imply, even in the absence of a specific statement, that such names are exempt from therelevant protective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
Preface
During the last decade, the French-speaking scientific community developed a very
strong research activity in the field of Knowledge Discovery and Management
(KDM or EGC for “Extraction et Gestion des Connaissances” in French), which
is concerned with, among others, Data Mining, Knowledge Discovery, Business In-
telligence, Knowledge Engineering and Semantic Web.
This emerging research area has also been strongly stimulated by the rapid
growth of information systems and the web semantic issues. The success of the
first two French-speaking EGC Conferences, held in Nantes in 2001 and in Mont-
pellier in 2002, resulted naturally in 2003 in the foundation of the International
French speaking EGC Association1. Since, the Association yearly organizes con-
ferences and workshops with the aim of promoting exchanges between researchers
and companies concernedwith KDM and its application in business, administration,
industry or public organizations.
The recent and novel research contributions collected in this book are extended
and reworked versions of a selection of the best papers that were originally presented
in French at the EGC 2010 Conference held in Hammamet Tunisia, on January
2010. The 12 best papers that have been selected for this book are issued from
the 32 papers accepted in long format with a 23% acceptance ratio among the 139
papers submitted to the EGC2010 conference2.
Structure of the Book
The volume is organized in three parts.
1 Association “Extraction et Gestion des Connaissances” (EGC), see
http://www.egc.asso.fr2 For further details about EGC2010 conference see “10th International French-Speaking
Conference on Knowledge Discovery and Management (EGC2010): conference report”
by Sadok Ben Yahia and Jean-Marc Petit, in ACM SIGKDD Explorations, volume 12,
issue 1.
VI Preface
Part I – Various Aspects of Data Cube and Ontology-Based
Representations
In the first chapter, entitled Constrained Closed and Quotient Cubes, Rosine Cic-
chetti, Lotfi Lakhal and Sébastien Nedjar investigate reduced representations for the
Constrained Cube. They propose two representations which discard redundancies,
are information lossless and avoid to compute the whole data cubes: the Constrained
Closed and Quotient Cube. Provided with the former, the decision maker can per-
form OLAP classification and querying. The latter adds navigation within the cube
to the previous capabilities. Hence according to her/his needs, the user can choose
the more suitable representation.
In the second chapter, A New Concise and Exact Representation of Data Cubes,
Hanen Brahmi, Tarek Hamrouni, Riadh Ben Messaoud and Sadok Ben Yahia intro-
duce a new concise and exact representation called closed non derivable data cubes
(CND-Cube). It is based on the concept of non derivable minimal generators. They
also propose a novel algorithm dedicated to the mining of CND-Cube from mul-
tidimensional databases and investigate in detail the theoretical foundations of this
concise representation.Moreover, they discuss Rollup/Drilldown semantics over the
CND-Cube and validate their approach on a set of real and benchmark datasets.
Their experiment results show the effectiveness of the approach in comparison with
those fitting in the same trend.
Michel Buffa and Catherine Faron-Zucker are interested in Ontology-Based Ac-
cess Rights Management. They propose an approach to manage access rights in a
content management systems which relies on semantic web models and technolo-
gies. They developed the AMO ontology which consists in 1) a set of classes and
properties dedicated to the annotation of resources whose access should be con-
trolled and in 2) a base of inference rules modelling the access management strategy
to carry out. When applied to the annotations of the resources whose access should
be controlled, these rules enable to manage access according to a given strategy.
This modelisation is flexible, extendable and ensures the adaptability of the AMO
ontology to any access management strategy.
In the last chapter of this first part, Souhir Gahbiche, Nathalie Pernelle and Fatiha
Saïs addresse the problem of explanation of data reconciliation decisions that are ob-
tained by automatic numerical informed by ontology knowledge data reconciliation
methods. These methods may give rise to decision errors and to approximated re-
sults. They have developed an explanation model based on Coloured Petri Nets for-
malism. This model allows to generate a graphical and readable explanation, which
takes into account all the semantic knowledge that are involved in the similarity
computation of a reference pair.
Part II – Efficient Pattern Mining Issues
The first of chapter of this second part by Mehdi Khiari and Patrice Boizumault and
Bruno Crémilleux proposes to combine Constraint Programming and Constraint-
based Mining. By investigating the relationship between constraint-based mining
Preface VII
and constraint satisfaction problems, they propose a generic approach to model and
mine queries involving several local patterns (n-ary patterns). The resulting frame-
work for pattern set mining is very flexible and allows the user to easily express in
a declarative way a wide range of problems for a wide range of data mining tasks.
The next chapter written by Marc Boullé is concerned with Simultaneous Par-
titioning of Input and Class Variables for Supervised Classification Problems with
Many Classes. It extends discretization and value grouping methods, based on the
partitioning of both the input and class variables. The best joint partitioning is
searched by maximizing a Bayesian model selection criterion. This preprocessing
is exploited as a preparation for the naive Bayes classifier, and demonstrates high
performance in problems with hundreds of classes.
Lionel Martin and his co-authors describe an Interactive and progressive con-
straint definition for dimensionality reduction and visualization. They propose a
tool for semi-supervised, constraint-based projection of data. Starting from an ini-
tial linear projection, a user can specify constraints, in order to move away or closer
of some pairs of objects. Based on the Uzawa algorithm, a new projection is com-
puted that takes these constraints into account. The user can iteratively add new
constraints. The implementation is based on Explorer3D, the 3D projection tool de-
veloped at the LIFO (Computer Science Laboratory of Orleans, France).
The last chapter of this part by Anne Laurent and his co-authors deals with Ef-
ficient parallel mining of gradual patterns on multicore processors. Mining grad-
ual patterns plays a crucial role in many real world applications where huge vol-
umes of complex numerical data must be handled, e.g., biological databases, survey
databases, data streams or sensor readings. Gradual patterns highlight complex or-
der correlations of the form “The more/less X, the more/less Y”. They present an
efficient parallel algorithm to mine gradual patterns on multicore processors. They
experimentally show that this algorithm significantly improves the state of the art
and scales very well with the number of cores available.
Part III – Data Preprocessing and Information Retrieval
In the first one, entitled Analyzing Old Documents Using a Complex Approach:
application to lettrines indexing, byMickael Coustaty and his co-authors proposes a
methodology based on a complex approach to analyze and to characterize graphical
images extracted from old documents. Based on a comparison between historians
and computer science approaches, the authors propose a novel method to describe
images using a complex approach to globally describing an image with respect to
its structure, sense, and elements.
In Identifying relevant features of images from their 2-D topology, Marc Joliveau
introduces a new method for the visual perception problem that focuses on the in-
trinsic two dimensional topology of images to extract their principal features. Vali-
dations on three different datasets show that the few features extracted by the method
provide enough intelligible information to efficiently and fastly identify similarities
between the images of a database.
VIII Preface
The next chapter by Radja Messai et al. entitled Analyzing Health Consumer Ter-
minology for Query Reformulation Tasks describes the analysis and characterisation
of the health consumer terminology in the breast cancer field. The results have been
used in a pilot study to enhance the reformulation of health consumers’ queries. The
results have showed significant differences between the health consumer terminol-
ogy and the professional one. Such studies are important to provide health services
more adapted to the language and the level of knowledge of health consumers.
The last chapter, proposed by Patrick Bosc et al., addresses the plethoric answer
problem that often arises when end-users have an approximate idea of how to for-
mulate a query to retrieve what they are looking for from large-scale databases. A
possible approach to reduce the set of retrieved items and to make it more man-
ageable is to constrain the initial query with additional predicates. The approach
presented in this paper relies on the identication of correlation links between pre-
defined predicates related to attributes of the relation of interest. Thus, the initial
query is strengthened by additional predicates that are semantically close to the
user-specified ones.
Acknowledgments
The editors would like to thank the chapter authors for their insights and contribu-
tions to this book.
The editors would also like to acknowledge the members of the review commit-
tee and the associated referees for their involvement in the review process of the
book. Their in depth reviewing, criticisms and constructive remarks significantly
contributed to the high quality of the retained papers.
A special thank goes to Bruno Pinaud who has efficiently composed and laid out
the manuscript.
Finally, we thank Springer and the publishing team, and especially T. Ditzinger
and J. Kacprzyk, for their confidence in our project.
Nantes, Geneva, Lyon Fabrice Guillet
September 2011 Gilbert Ritschard
Djamel A. Zighed
Preface IX
Review Committee
All published chapters have been reviewed by at least 2 referees.
• Tomas Aluja (UPC, Barcelona, Spain)
• Nadir Belkhiter (Univ. Laval, Québec,
Canada)
• Sadok Ben Yahia (Univ. Tunis, Tunisia)
• Younès Bennani (Univ. Paris 13, France)
• Omar Boussaid (Univ. Lyon 2, France)
• Paula Brito (Univ. of Porto, Portugal)
• Francisco de A. T. De Carvalho (Univ.
Federal de Pernambuco, Brazil)
• Gilles Falquet (Univ. of Geneva, Switzer-
land)
• Jean-Gabriel Ganascia (Univ. Paris 6,
France)
• Pierre Gancarski (Univ. of Strasbourg,
France)
• Howard Hamilton (Univ. of Regina,
Canada)
• Robert Hilderman (Univ. of Regina,
Canada)
• Petra Kraemer (Univ. La Rochelle,
France)
• Philippe Lenca (Telecom Bretagne, Brest,
France)
• Philippe Leray (Univ. of Nantes, France)
• Stan Matwin (Univ. of Ottawa, Canada)
• Monique Noirhomme (FUNDP, Namur,
Belgium)
• Matthieu Perreira Da Silva (Univ. La
Rochelle, France)
• Vincent Pisetta (Univ. Lyon 2, France)
• Pascal Poncelet (LIRMM, Univ. of Mont-
pellier, France)
• Zbigniew Ras (Univ. of North Carolina,
USA)
• Jan Rauch (University of Prague, Czech
Republic)
• Chiara Renso (KDDLAB - ISTI CNR,
Italy)
• Lorenza Saitta (Univ. di Torino, Italy)
• Ansaf Salleb-Aouissi (Columbia Univ.,
New York, USA)
• Florence Sédes (Univ. of Toulouse 3,
France)
• Arno Siebes (Univ. Utrecht, The Nether-
lands)
• Dan Simovici (Univ. of Massachusetts
Boston, USA)
• Stefan Trausan-Matu (Univ. of Bucharest,
Romania)
• Rosanna Verde (Second Univ. of Naples,
Italy)
• Christel Vrain (Univ. of Orléans, France)
• Jef Wijsen (Univ. of Mons-Hainaut, Bel-
gium)
• Chongsheng Zhang (Univ. of Nice,
France)
Associated Reviewers
Yassine Benabbas,
Patrick Bosc,
Marc Boullé,
Hanen Brahmi,
Guillaume Cleuziou,
Mickaël Coustaty,
Bruno Cremilleux,
Catherine Faron Zucker,
Manfredotti,
Anne Laurent,
Lionel Martin,
Radja Messai,
Jonathan Weber
Manuscript Coordinator
Bruno Pinaud (Univ. of Bordeaux I, France)
Contents
Part I Data Cube and Ontology-Based Representations
Constrained Closed and Quotient Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Rosine Cicchetti, Lotfi Lakhal, and Sébastien Nedjar
A New Concise and Exact Representation of Data Cubes . . . . . . . . . . . . . 27
Hanen Brahmi, Tarek Hamrouni, Riadh Ben Messaoud,
and Sadok Ben Yahia
Ontology-Based Access Rights Management . . . . . . . . . . . . . . . . . . . . . . . . 49
Michel Buffa and Catherine Faron-Zucker
Explaining Reference Reconciliation Decisions: A Coloured Petri Nets
Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Souhir Gahbiche, Nathalie Pernelle, and Fatiha Saïs
Part II Efficient Pattern Mining
Combining Constraint Programming and Constraint-Based Mining
for Pattern Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Mehdi Khiari, Patrice Boizumault, and Bruno Crémilleux
Simultaneous Partitioning of Input and Class Variables for Supervised
Classification Problems with Many Classes . . . . . . . . . . . . . . . . . . . . . . . . . 105
Marc Boullé
Interactive and Progressive Constraint Definition for Dimensionality
Reduction and Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Lionel Martin, Matthieu Exbrayat, Guillaume Cleuziou,
and Frédéric Moal
Efficient Parallel Mining of Gradual Patterns on Multicore Processors . . 137
Anne Laurent, Benjamin Négrevergne, Nicolas Sicard,
and Alexandre Termier
XII Contents
Part III Data Preprocessing and Information Retrieval
Analyzing Old Documents Using a Complex Approach: Application
to Lettrines Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Mickael Coustaty, Vincent Courboulay, and Jean-Marc Ogier
Identifying Relevant Features of Images from Their 2-D Topology . . . . . 173
Marc Joliveau
Analyzing Health Consumer Terminology for Query Reformulation
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Radja Messai, Michel Simonet, Nathalie Bricon-Souf,
and Mireille Mousseau
An Approach Based on Predicate Correlation to the Reduction of
Plethoric Answer Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Patrick Bosc, Allel Hadjali, Olivier Pivert, and Grégory Smits
List of Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243