Lecture Notes in Artificial Intelligence 4701

Lecture Notes in Artificial Intelligence 4701Edited by J. G. Carbonell and J. Siekmann

Subseries of Lecture Notes in Computer Science

Joost N. Kok Jacek KoronackiRamon Lopez de Mantaras Stan MatwinDunja Mladenic Andrzej Skowron (Eds.)

Machine Learning:ECML 2007

18th European Conference on Machine LearningWarsaw, Poland, September 17-21, 2007Proceedings

13

Series Editors

Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USAJörg Siekmann, University of Saarland, Saarbrücken, Germany

Volume Editors

Joost N. KokLeiden University, The NetherlandsE-mail: [email protected]

Jacek KoronackiPolish Academy of Sciences, Warsaw, PolandE-mail: [email protected]

Ramon Lopez de MantarasSpanish National Research Council (CSIC), Bellaterra, SpainE-mail: [email protected]

Stan MatwinUniversity of Ottawa, CanadaE-mail: [email protected]

Dunja MladenicJožef Stefan Institute, Ljubljana, SloveniaE-mail: [email protected]

Andrzej SkowronWarsaw University, PolandE-mail: [email protected]

Library of Congress Control Number: 2007934766

CR Subject Classification (1998): I.2, F.2.2, F.4.1, H.2.8

LNCS Sublibrary: SL 7 – Artificial Intelligence

ISSN 0302-9743ISBN-10 3-540-74957-8 Springer Berlin Heidelberg New YorkISBN-13 978-3-540-74957-8 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer. Violations are liableto prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media

springer.com

© Springer-Verlag Berlin Heidelberg 2007Printed in Germany

Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, IndiaPrinted on acid-free paper SPIN: 12124169 06/3180 5 4 3 2 1 0

Preface

The two premier annual European conferences in the areas of machine learningand data mining have been collocated ever since the first joint conference inFreiburg, 2001. The European Conference on Machine Learning (ECML) tracesits origins to 1986, when the first European Working Session on Learning washeld in Orsay, France. The European Conference on Principles and Practice ofKnowledge Discovery in Databases (PKDD) was first held in 1997 in Trondheim,Norway. Over the years, the ECML/PKDD series has evolved into one of thelargest and most selective international conferences in machine learning anddata mining. In 2007, the seventh collocated ECML/PKDD took place duringSeptember 17–21 on the central campus of Warsaw University and in the nearbyStaszic Palace of the Polish Academy of Sciences.

The conference for the third time used a hierarchical reviewing process. Wenominated 30 Area Chairs, each of them responsible for one sub-field or severalclosely related research topics. Suitable areas were selected on the basis of thesubmission statistics for ECML/PKDD 2006 and for last year’s InternationalConference on Machine Learning (ICML 2006) to ensure a proper load balanceamong the Area Chairs. A joint Program Committee (PC) was nominated for thetwo conferences, consisting of some 300 renowned researchers, mostly proposedby the Area Chairs. This joint PC, the largest of the series to date, allowed usto exploit synergies and deal competently with topic overlaps between ECMLand PKDD.

ECML/PKDD 2007 received 592 abstract submissions. As in previous years,to assist the reviewers and the Area Chairs in their final recommendation authorshad the opportunity to communicate their feedback after the reviewing phase.For a small number of conditionally accepted papers, authors were asked tocarry out minor revisions subject to the final acceptance by the Area Chairresponsible for their submission. With very few exceptions, every full submissionwas reviewed by three PC members. Based on these reviews, on feedback fromthe authors, and on discussions among the reviewers, the Area Chairs provideda recommendation for each paper. The four Program Chairs made the finalprogram decisions following a 2-day meeting in Warsaw in June 2007. Continuingthe tradition of previous events in the series, we accepted full papers with anoral presentation and short papers with a poster presentation. We selected 41 fullpapers and 37 short papers for ECML, and 28 full papers and 35 short papers forPKDD. The acceptance rate for full papers is 11.6% and the overall acceptancerate is 23.8%, in accordance with the high-quality standards of the conferenceseries. Besides the paper and poster sessions, ECML/PKDD 2007 also featured12 workshops, seven tutorials, the ECML/PKDD Discovery Challenge, and theIndustrial Track.

VI Preface

An excellent slate of Invited Speakers is another strong point of the conferenceprogram. We are grateful to Ricardo Bazea-Yates (Yahoo! Research Barcelona),Peter Flach (University of Bristol), Tom Mitchell (Carnegie Mellon Universi-ty), and Barry Smyth (University College Dublin) for their participation inECML/PKDD 2007. The abstracts of their presentations are included in thisvolume.

We distinguished four outstanding contributions; the awards were generouslysponsored by the Machine Learning Journal and the KD-Ubiq network.

ECML Best Paper: Angela Kimming, Luc De Raedt and Hannu Toivonen:“Probabilistic Explanation-Based Learning”

PKDD Best Paper: Toon Calders and Szymon Jaroszewicz: “Efficient AUC-Optimization for Classification”

ECML Best Student Paper: Daria Sorokina, Rich Caruana, and Mirek Rie-dewald: “Additive Groves of Regression Trees”

PKDD Best Student Paper: Dikan Xing, Wenyuan Dai, Gui-Rong Xue, andYong Yu: “Bridged Refinement for Transfer Learning”

This year we introduced the Industrial Track chaired by Florence d’Alche-Buc(Universite d’Evry-Val d’Essonne) and Marko Grobelnik (Jozef Stefan Institute,Slovenia) consisting of selected talks with a strong industrial component presen-ting research from the area covered by the ECML/PKDD conference.

For the first time in the history of ECML/PKDD, the conference procee-dings were available on-line to conference participants during the conference.We are grateful to Springer for accommodating this new access channel for theproceedings. Inspired by some related conferences (ICML, KDD, ISWC) we in-troduced videorecording, as we would like to save at least the invited talks andpresentations of award papers for the community and make them accessible athttp://videolectures.net/.

This year’s Discovery Challenge was devoted to three problems: user beha-vior prediction from Web traffic logs, HTTP traffic classification, and Sumerianliterature understanding. The Challenge was co-organized by Piotr Ejdys (Gemi-us SA), Hung Son Nguyen (Warsaw University), Pascal Poncelet (EMA-LGI2P)and Jerzy Tyszkiewicz (Warsaw University); 122 teams participated. For thefirst task, the three finalists were:

Malik Tahir Hassan, Khurum Nazir Junejo and Asim Karim from Lahore Uni-versity, Pakistan

Krzysztof Dembczynski and Wojciech Kot�lowski from Poznan University ofTechnology, Poland and Marcin Sydow from Polish-Japanese Institute ofInformation Technology, Poland

Tung-Ying Lee from National Tsing Hua University, Taiwan

Results for the other Discovery Challenge tasks were not available at the timethe proceedings were finalized, but were announced at the conference.

We are all indebted to the Area Chairs, Program Committee members andexternal reviewers for their commitment and hard work that resulted in a rich

Preface VII

but selective scientific program for ECML/PKDD. We are particularly gratefulto those reviewers who helped with additional reviews at very short notice toassist us in a small number of difficult decisions. We further thank our Workshopand Tutorial Chairs Marzena Kryszkiewicz (Warsaw Technical University) andJan Rauch (University of Economics, Prague) for selecting and coordinating the12 workshops and seven tutorial events that accompanied the conference; theworkshop organizers, tutorial presenters, and the organizers of the DiscoveryChallenge and the Industrial track; Richard van de Stadt and CyberChairPROfor competent and flexible support; Warsaw University and the Polish Academyof Sciences (Institute of Computer Science) for their local and organizationalsupport. Special thanks are due to the Local Chair, Marcin Szczuka, WarsawUniversity (assisted by Michal Ciesio�lka from the Polish Academy of Sciences)for the many hours spent making sure that all the details came together to ensurethe success of the conference. Finally, we are grateful to the Steering Committeeand the ECML/PKDD community that entrusted us with the organization ofthe ECML/PKDD 2007.

Most of all, however, we would like to thank all the authors who trusted uswith their submissions, thereby contributing to the one of the main yearly eventsin the life of our vibrant research community.

September 2007 Joost Kok (PKDD Program co-Chair)Jacek Koronacki (General Chair)

Ramon Lopez de Mantaras (General Chair)Stan Matwin (ECML Program co-Chair)

Dunja Mladenic (ECML Program co-Chair)Andrzej Skowron (PKDD Program co-Chair)

Organization

General Chairs

Ramon Lopez de Mantaras (Spanish Council for Scientific Research)Jacek Koronacki (Polish Academy of Sciences)

Program Chairs

Joost N. Kok (Leiden University)Stan Matwin (University of Ottawa and Polish Academy of Sciences)Dunja Mladenic (Jozef Stefan Institute)Andrzej Skowron (Warsaw University)

Local Chairs

Micha�l Ciesio�lka (Polish Academy of Sciences)Marcin Szczuka (Warsaw University)

Tutorial Chair

Jan Rauch (University of Economics, Prague)

Workshop Chair

Marzena Kryszkiewicz (Warsaw University of Technology)

Discovery Challenge Chair

Hung Son Nguyen (Warsaw University)

Industrial Track Chairs

Florence d’Alche-Buc (Universite d’Evry-Val d’Essonne)Marko Grobelnik (Jozef Stefan Institute)

X Organization

Steering Committee

Jean-Francois Boulicaut Pavel BrazdilRui Camacho Floriana EspositoJohannes Furnkranz Joao GamaFosca Gianotti Alıpio JorgeDino Pedreschi Tobias SchefferMyra Spiliopoulou Luıs Torgo

Area Chairs

Michael R. Berthold Hendrik BlockeelOlivier Chapelle James CussensKurt Driessens Peter FlachEibe Frank Johannes FurnkranzThomas Gartner Joao GamaRayid Ghani Jerzy Grzymala-BusseEamonn Keogh Kristian KerstingMieczys�law A. K�lopotek Stefan KramerPedro Larranaga Claire NedellecAndreas Nurnberger George PaliourasBernhard Pfahringer Enric PlazaLuc De Raedt Tobias SchefferGiovanni Semeraro W�ladys�law SkarbekMyra Spiliopoulou Hannu ToivonenLuıs Torgo Paul Utgoff

Program Committee

Charu C. AggarwalJesus Aguilar-RuizDavid W. AhaNahla Ben AmorSarabjot Singh AnandAnnalisa AppiceJosep-Lluis ArcosWalid G. ArefEva ArmengolAnthony J. BagnallAntonio BahamondeSugato BasuBettina BerendtFrancesco BergadanoRalph BergmannSteffen Bickel

Concha BielzaMikhail BilenkoFrancesco BonchiGianluca BontempiChristian BorgeltKarsten M. BorgwardtDaniel BorrajoAntal van den BoschHenrik BostromMarco BottaJean-Francois BoulicautJanez BrankThorsten BrantsUlf BrefeldCarla E. BrodleyPaul Buitelaar

Organization XI

Toon CaldersLuis M. de CamposNicola CanceddaClaudio CarpinetoJesus CerquidesKaushik ChakrabartiChien-Chung ChanAmanda ClareIra CohenFabrizio CostaSusan CrawBruno CremilleuxTom CroonenborghsJuan Carlos CuberoPadraig CunninghamAndrzej CzyzewskiWalter DaelemansIan DavidsonMarco DegemmisOlivier DelalleauJitender S. DeogunMarcin DetynieckiBelen Diaz-AgudoChris H.Q. DingCarlotta DomeniconiMarek J. DruzdzelSaso DzeroskiTina Eliassi-RadTapio ElomaaAbolfazl Fazel FamiliWei FanAd FeeldersAlan FernGeorge FormanLinda C. van der GaagPatrick GallinariJose A. GamezAlex GammermanMinos N. GarofalakisGemma C. GarrigaEric GaussierPierre GeurtsFosca GianottiAttilio GiordanaRobert L. Givan

Bart GoethalsElisabet GolobardesPedro A. Gonzalez-CaleroMarko GrobelnikDimitrios GunopulosMaria HalkidiMark HallMatthias HeinJose Hernandez-OralloColin de la HigueraMelanie HilarioShoji HiranoTu-Bao HoJaakko HollmenGeoffrey HolmesFrank HoppnerTamas HorvathAndreas HothoJiayuan HuangEyke HullemeierMasahiro InuiguchiInaki InzaManfred JaegerSzymon JaroszewiczRosie JonesEdwin D. de JongAlıpio Mario JorgeTamer KahveciAlexandros KalousisHillol KarguptaAndreas KarwathGeorge KarypisSamuel KaskiDimitar KazakovRoss D. KingFrank KlawonnRalf KlinkenbergGeorge KolliosIgor KononenkoBozena KostekWalter A. KostersMiroslav KubatHalina KwasnickaJames T. KwokNicolas Lachiche

XII Organization

Michail G. LagoudakisNiels LandwehrPedro LarranagaPavel LaskovMark LastDominique LaurentNada LavracQuoc V. LeGuy LebanonUlf LeserJure LeskovecJessica LinFrancesca A. LisiPasquale LopsJose A. LozanoPeter LucasRichard MaclinDonato MalerbaNikos MamoulisSuresh ManandharStephane Marchand-MailletElena MarchioriLluis MarquezYuji MatsumotoMichael MayMike MayoThorsten MeinlPrem MelvilleRosa MeoTaneli MielikainenBamshad MobasherSerafın MoralKatharina MorikHiroshi MotodaToshinori MunakataIon MusleaOlfa NasraouiJennifer NevilleSiegfried NijssenJoakim NivreAnn NoweArlindo L. OliveiraSanti OntanonMiles OsborneMartijn van Otterlo

David PageSpiros PapadimitriouSrinivasan ParthasarathyAndrea PasseriniJose M. PenaLourdes Pena CastilloJose M. Pena SanchezJames F. PetersJohann PetrakLech PolkowskiHan La PoutrePhilippe PreuxKatharina ProbstTapani RaikoAshwin RamSheela RamannaJan RamonZbigniew W. RasChotirat Ann RatanamahatanaFrancesco RicciJohn RiedlChristophe RigottiCeline RobardetVictor RoblesMarko Robnik-SikonjaJuho RousuCeline RouveirolUlrich Ruckert (TU Munchen)Ulrich Ruckert (Univ. Paderborn)Stefan RupingHenryk RybinskiLorenza SaittaHiroshi SakaiRoberto SantanaMartin ScholzMatthias SchubertMichele SebagSandip SenJouni K. SeppanenGalit ShmueliArno SiebesAlejandro SierraVikas SindhwaniArul SiromoneyDominik Slezak

Organization XIII

Carlos SoaresMaarten van SomerenAlvaro SotoAlessandro SperdutiJaideep SrivastavaJerzy StefanowskiDavid J. StracuzziJan StruyfGerd StummeZbigniew SurajEinoshin SuzukiRoman SwiniarskiMarcin SydowPiotr SynakMarcin SzczukaLuis TalaveraMatthew E. TaylorYannis TheodoridisKai Ming TingLjupco TodorovskiVolker TrespShusaku TsumotoKarl TuylsMichalis VazirgiannisKatja VerbeeckJean-Philippe Vert

Michail VlachosHaixun WangJason Tsong-Li WangTakashi WashioGary M. WeissSholom M. WeissShimon WhitesonMarco WieringSlawomir T. WierzchonGraham J. WilliamsStefan WrobelYing YangJingTao YaoYiyu YaoFrancois YvonBianca ZadroznyMohammed J. ZakiGerson ZaveruchaFilip ZeleznyChengXiang ZhaiYi ZhangZhi-Hua ZhouJerry ZhuWojciech ZiarkoAlbrecht Zimmermann

Additional Reviewers

Rezwan AhmedFabio AiolliDima AlbergVassilis AthitsosMaurizio AtzoriAnne AugerPaulo AzevedoPierpaolo BasileMargherita BerardiAndre BergholzMichele BerlingerioKanishka BhaduriKonstantin BiatovJerzy B�laszczynskiGianluca BontempiYann-ael Le Borgne

Zoran BosnicRemco BouckaertAgnes BraudBjoern BringmannEmma ByrneOlivier CaelenRossella CancelliereGiovanna CastellanoMichelangelo CeciHyuk ChoKamalika DasSouptik DattaUwe DickLaura DietzMarcos DominguesHaimonti Dutta

XIV Organization

Marc DymetmanStefan EickelerTimm EulerTanja FalkowskiFernando FernandezFrancisco J. Ferrer-TroyanoCesar FerriDaan FierensBlaz FortunaAlexandre FranciscoMingyan GaoFabian GuizaAnna Lisa GentileAmol N. GhotingArnaud GiacomettiValentin GjorgjioskiRobby GoetschalckxDerek GreenePerry GrootPhilip GrothDaniele GunettiBernd GutmannSattar HashemiYann-Michael De HauwereVera HollinkYi HuangLeo IaquintaAlexander IlinTasadduq ImamTao-Yuan JenFelix JungermannAndrzej KaczmarekBenjamin Haibe KainsJuha KarkkainenRohit KateChris KauffmanArto KlamiJiri KlemaDragi KocevChristine KoernerKevin KontosPetra KraljAnita KrishnakumarMatjaz KukarBrian Kulis

Arnd Christian KonigChristine KornerFei Tony LiuAntonio LaTorreAnne LaurentBaoli LiZi LinBin LiuYan LiuCorrado LoglisciRachel LomaskyCarina LopesChuan LuPierre MaheMarkus MaierGiuseppe MancoIrina MatveevaNicola Di MauroDimitrios MavroeidisStijn MeganckIngo MierswaMirjam MinorAbhilash Alexander MirandaJoao MoreiraSourav MukherjeeCanh Hao NguyenDuc Dung NguyenTuan Trung NguyenJanne NikkilaXia NingBlaz NovakIrene NtoutsiRiccardo OrtaleStanis�law OsinskiKivanc OzonatAline PaesPance PanovThomas Brochmann PedersenMaarten PeetersRuggero PensaXuan-Hieu PhanBenjarath PhoophakdeeAloisio Carlos de PinaChristian PlagemannJose M. Puerta

Organization XV

Aritz PerezChedy RaissiM. Jose Ramirez-QuintanaUmaa RebbapragadaStefan ReckowChiara RensoMatthias RenzFrancois RioultDomingo Rodriguez-BaenaSten SagaertLuka SajnEsin SakaSaeed SalemAntonio SalmeronEerika SaviaAnton SchaeferLeander SchietgatGaetano SciosciaHoward ScordioSven Van SegbroeckIvica SlavkovLarisa SoldatovaArnaud SouletEduardo SpynosaVolkmar SterzingChristof StoermannJiang SuPiotr Szczuko

Alexander TartakovskiOlivier TeytaudMarisa ThomaEufemia TinelliIvan TitovRoberto TrasartiGeorge TsatsaronisKatharina TschumitschewDuygu UcarAntonio VarlaroShankar VembuCeline VensMarcos VieiraPeter VrancxNikil WaleChao WangDongrong WenArkadiusz WojnaYuk Wah WongAdam WoznicaMichael WurstWei XuXintian YangMonika ZakovaLuke ZettlemoyerXueyuan ZhouAlbrecht Zimmermann

Sponsors

We wish to express our gratitude to the sponsors of ECML/PKDD 2007 fortheir essential contribution to the conference. We wish to thank Warsaw Uni-versity, Faculty of Mathematics, Informatics and Mechanics, and Institute ofComputer Science, Polish Academy of Sciences for providing financial and orga-nizational means for the conference; the European Office of Aerospace Researchand Developement, Air Force Office of Scientific Research, United States AirForce Research Laboratory, for their generous financial support.1 KDUbiq Eu-ropean Coordination Action for supporting Poster Reception, Student TravelAwards, and the Best Paper Awards; Pascal European Network of Excellencefor sponsoring the Invited Speaker Program, the Industrial Track and the video-recording of the invited talks and presentations of the four Award Papers; JozefStefan Institute, Slovenia, SEKT European Integrated project and Unilever R& D for their financial support; the Machine Learning Journal for supportingthe Student Best Paper Awards; Gemius S.A. for sponsoring and supportingthe Discovery Challenge. We also wish to express our gratitude to the followingcompanies and institutions that provided us with data and expertise which wereessential components of the Discovery Challenge: Bee Ware, l’Ecole des Minesd’Ales, LIRMM - The Montpellier Laboratory of Computer Science, Robotics,and Microelectronics, and Warsaw University, Faculty of Mathematics, Informa-tics and Mechanics. We also acknowledge the support of LOT Polish Airlines.

1 AFOSR/EOARD support is not intended to express or imply endorsement by theU.S. Federal Government.

Table of Contents

Invited Talks

Learning, Information Extraction and the Web . . . . . . . . . . . . . . . . . . . . . . 1Tom M. Mitchell

Putting Things in Order: On the Fundamental Role of Ranking inClassification and Probability Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Peter A. Flach

Mining Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Ricardo Baeza-Yates

Adventures in Personalized Information Access . . . . . . . . . . . . . . . . . . . . . . 5Barry Smyth

Long Papers

Statistical Debugging Using Latent Topic Models . . . . . . . . . . . . . . . . . . . . 6David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin Zhu

Learning Balls of Strings with Correction Queries . . . . . . . . . . . . . . . . . . . . 18Leonor Becerra Bonache, Colin de la Higuera,Jean-Christophe Janodet, and Frederic Tantini

Neighborhood-Based Local Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Paul N. Bennett

Approximating Gaussian Processes with H2-Matrices . . . . . . . . . . . . . . . . . 42Steffen Borm and Jochen Garcke

Learning Metrics Between Tree Structured Data: Application to ImageRecognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Laurent Boyer, Amaury Habrard, and Marc Sebban

Shrinkage Estimator for Bayesian Network Parameters . . . . . . . . . . . . . . . . 67John Burge, Terran Lane

Level Learning Set: A Novel Classifier Based on Active ContourModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Xiongcai Cai and Arcot Sowmya

Learning Partially Observable Markov Models from First PassageTimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Jerome Callut and Pierre Dupont

XX Table of Contents

Context Sensitive Paraphrasing with a Global UnsupervisedClassifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Michael Connor and Dan Roth

Dual Strategy Active Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Pinar Donmez, Jaime G. Carbonell, and Paul N. Bennett

Decision Tree Instability and Active Learning . . . . . . . . . . . . . . . . . . . . . . . . 128Kenneth Dwyer and Robert Holte

Constraint Selection by Committee: An Ensemble Approach toIdentifying Informative Constraints for Semi-supervised Clustering . . . . . 140

Derek Greene and Padraig Cunningham

The Cost of Learning Directed Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Thomas Gartner and Gemma C. Garriga

Spectral Clustering and Embedding with Hidden Markov Models . . . . . . 164Tony Jebara, Yingbo Song, and Kapil Thadani

Probabilistic Explanation Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 176Angelika Kimmig, Luc De Raedt, and Hannu Toivonen

Graph-Based Domain Mapping for Transfer Learning in GeneralGames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Gregory Kuhlmann and Peter Stone

Learning to Classify Documents with Only a Small Positive TrainingSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Xiao-Li Li, Bing Liu, and See-Kiong Ng

Structure Learning of Probabilistic Relational Models from IncompleteRelational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

Xiao-Lin Li and Zhi-Hua Zhou

Stability Based Sparse LSI/PCA: Incorporating Feature Selection inLSI and PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

Dimitrios Mavroeidis and Michalis Vazirgiannis

Bayesian Substructure Learning - Approximate Learning of Very LargeNetwork Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

Andreas Nagele, Mathaus Dejori, and Martin Stetter

Efficient Continuous-Time Reinforcement Learning with AdaptiveState Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

Gerhard Neumann, Michael Pfeiffer, and Wolfgang Maass

Source Separation with Gaussian Process Models . . . . . . . . . . . . . . . . . . . . 262Sunho Park and Seungjin Choi

Table of Contents XXI

Discriminative Sequence Labeling by Z-Score Optimization . . . . . . . . . . . . 274Elisa Ricci, Tijl de Bie, and Nello Cristianini

Fast Optimization Methods for L1 Regularization: A ComparativeStudy and Two New Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Mark Schmidt, Glenn Fung, and Romer Rosales

Bayesian Inference for Sparse Generalized Linear Models . . . . . . . . . . . . . . 298Matthias Seeger, Sebastian Gerwinn, and Matthias Bethge

Classifier Loss Under Metric Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310David B. Skalak, Alexandru Niculescu-Mizil, and Rich Caruana

Additive Groves of Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323Daria Sorokina, Rich Caruana, and Mirek Riedewald

Efficient Computation of Recursive Principal Component Analysis forStructured Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

Alessandro Sperduti

Hinge Rank Loss and the Area Under the ROC Curve . . . . . . . . . . . . . . . . 347Harald Steck

Clustering Trees with Instance Level Constraints . . . . . . . . . . . . . . . . . . . . . 359Jan Struyf and Saso Dzeroski

On Pairwise Naive Bayes Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371Jan-Nikolas Sulzmann, Johannes Furnkranz, and Eyke Hullermeier

Separating Precision and Mean in Dirichlet-Enhanced High-OrderMarkov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

Rikiya Takahashi

Safe Q-Learning on Complete History Spaces . . . . . . . . . . . . . . . . . . . . . . . . 394Stephan Timmer and Martin Riedmiller

Random k-Labelsets: An Ensemble Method for MultilabelClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406

Grigorios Tsoumakas and Ioannis Vlahavas

Seeing the Forest Through the Trees: Learning a ComprehensibleModel from an Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

Anneleen Van Assche and Hendrik Blockeel

Avoiding Boosting Overfitting by Removing Confusing Samples . . . . . . . . 430Alexander Vezhnevets and Olga Barinova

Planning and Learning in Environments with Delayed Feedback . . . . . . . . 442Thomas J. Walsh, Ali Nouri, Lihong Li, and Michael L. Littman

XXII Table of Contents

Analyzing Co-training Style Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454Wei Wang and Zhi-Hua Zhou

Policy Gradient Critics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466Daan Wierstra and Jurgen Schmidhuber

An Improved Model Selection Heuristic for AUC . . . . . . . . . . . . . . . . . . . . . 478Shaomin Wu, Peter Flach, and Cesar Ferri

Finding the Right Family: Parent and Child Selection for AveragedOne-Dependence Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490

Fei Zheng and Geoffrey I. Webb

Short Papers

Stepwise Induction of Multi-target Model Trees . . . . . . . . . . . . . . . . . . . . . . 502Annalisa Appice and Saso Dzeroski

Comparing Rule Measures for Predictive Association Rules . . . . . . . . . . . 510Paulo J. Azevedo and Alıpio M. Jorge

User Oriented Hierarchical Information Organization and Retrieval . . . . . 518Korinna Bade, Marcel Hermkes, and Andreas Nurnberger

Learning a Classifier with Very Few Examples: Analogy Basedand Knowledge Based Generation of New Examples for CharacterRecognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

S. Bayoudh, H. Mouchere, L. Miclet, and E. Anquetil

Weighted Kernel Regression for Predicting Changing Dependencies . . . . . 535Steven Busuttil and Yuri Kalnishkan

Counter-Example Generation-Based One-Class Classification . . . . . . . . . . 543Andras Banhalmi, Andras Kocsor, and Robert Busa-Fekete

Test-Cost Sensitive Classification Based on Conditioned LossFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551

Mumin Cebe and Cigdem Gunduz-Demir

Probabilistic Models for Action-Based Chinese Dependency Parsing . . . . 559Xiangyu Duan, Jun Zhao, and Bo Xu

Learning Directed Probabilistic Logical Models: Ordering-SearchVersus Structure-Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

Daan Fierens, Jan Ramon, Maurice Bruynooghe, andHendrik Blockeel

A Simple Lexicographic Ranker and Probability Estimator . . . . . . . . . . . . 575Peter Flach and Edson Takashi Matsubara

Table of Contents XXIII

On Minimizing the Position Error in Label Ranking . . . . . . . . . . . . . . . . . . 583Eyke Hullermeier and Johannes Furnkranz

On Phase Transitions in Learning Sparse Networks . . . . . . . . . . . . . . . . . . . 591Goele Hollanders, Geert Jan Bex, Marc Gyssens,Ronald L. Westra, and Karl Tuyls

Semi-supervised Collaborative Text Classification . . . . . . . . . . . . . . . . . . . . 600Rong Jin, Ming Wu, and Rahul Sukthankar

Learning from Relevant Tasks Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608Samuel Kaski and Jaakko Peltonen

An Unsupervised Learning Algorithm for Rank Aggregation . . . . . . . . . . . 616Alexandre Klementiev, Dan Roth, and Kevin Small

Ensembles of Multi-Objective Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . 624Dragi Kocev, Celine Vens, Jan Struyf, and Saso Dzeroski

Kernel-Based Grouping of Histogram Data . . . . . . . . . . . . . . . . . . . . . . . . . . 632Tilman Lange and Joachim M. Buhmann

Active Class Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640R. Lomasky, C.E. Brodley, M. Aernecke, D. Walt, and M. Friedl

Sequence Labeling with Reinforcement Learning and RankingAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648

Francis Maes, Ludovic Denoyer, and Patrick Gallinari

Efficient Pairwise Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658Sang-Hyeun Park and Johannes Furnkranz

Scale-Space Based Weak Regressors for Boosting . . . . . . . . . . . . . . . . . . . . . 666Jin-Hyeong Park and Chandan K. Reddy

K-Means with Large and Noisy Constraint Sets . . . . . . . . . . . . . . . . . . . . . . 674Dan Pelleg and Dorit Baras

Towards ‘Interactive’ Active Learning in Multi-view Feature Sets forInformation Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683

Katharina Probst and Rayid Ghani

Principal Component Analysis for Large Scale Problems with Lots ofMissing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691

Tapani Raiko, Alexander Ilin, and Juha Karhunen

Transfer Learning in Reinforcement Learning Problems ThroughPartial Policy Recycling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699

Jan Ramon, Kurt Driessens, and Tom Croonenborghs

XXIV Table of Contents

Class Noise Mitigation Through Instance Weighting . . . . . . . . . . . . . . . . . . 708Umaa Rebbapragada and Carla E. Brodley

Optimizing Feature Sets for Structured Data . . . . . . . . . . . . . . . . . . . . . . . . 716Ulrich Ruckert and Stefan Kramer

Roulette Sampling for Cost-Sensitive Learning . . . . . . . . . . . . . . . . . . . . . . . 724Victor S. Sheng and Charles X. Ling

Modeling Highway Traffic Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732Tomas Singliar and Milos Hauskrecht

Undercomplete Blind Subspace Deconvolution Via Linear Prediction . . . 740Zoltan Szabo, Barnabas Poczos, and Andras Lorincz

Learning an Outlier-Robust Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 748Jo-Anne Ting, Evangelos Theodorou, and Stefan Schaal

Imitation Learning Using Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . 757Deepak Verma and Rajesh P.N. Rao

Nondeterministic Discretization of Weights Improves Accuracy ofNeural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765

Marcin Wojnarski

Semi-definite Manifold Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773Liang Xiong, Fei Wang, and Changshui Zhang

General Solution for Supervised Graph Embedding . . . . . . . . . . . . . . . . . . . 782Qubo You, Nanning Zheng, Shaoyi Du, and Yang Wu

Multi-objective Genetic Programming for Multiple Instance Learning . . . 790Amelia Zafra and Sebastian Ventura

Exploiting Term, Predicate, and Feature Taxonomies inPropositionalization and Propositional Rule Learning . . . . . . . . . . . . . . . . . 798

Monika Zakova and Filip Zelezny

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807