Upload
patience-fitzgerald
View
218
Download
6
Tags:
Embed Size (px)
Citation preview
Challenges in Physically Inspired Machine Learning (PIALM Task Force)Dymitr Ruta, PhD
BT Group
Bogdan Gabrys, PhD
Bournemouth University
© British Telecommunications plc
Agenda
• Motivation• The links between physics and information theory
• Data fields methodology for classification, clustering, data condensation and visualisation
• Information theoretic learning (ITL)
• Hybrid large scale optimisation (simulations)• Concluding remarks• Discussion, Q&A
© British Telecommunications plc
Motivation (Business): Ability to Predict is the Key to Survival & Success
• DISCOVER the data driven problem that can be improved using intelligent learning techniques
• DESCRIBE the problem and its characteristics, prior knowledge, input and output data
• MODEL the relationship between inputs and outputs adopting existing algorithms or devise the new ones
• LEARN to reproduce outputs based on previously unseen inputs
• PREDICT the future outputs and save/earn money
Extract Clean Pre-process …… Tune Implement Productise Deploy Support …
© British Telecommunications plc
Motivation (Personal): Physical phenomena guide artificial learning mechanisms
• Deep analogies between information and matter, energy and uncertainty, complexity and entropy
• Any knowledge can only be conveyed using certain amount of matter/energy
• Physics limits the ability to access, learn and know.• Convergence of matter and information at the
quantum level: “It From Bit”• Computational intelligence sciences are in chaos:
– Lack of unified theory of information and its processing– Vast amounts of data, yet mostly numericals are used– Many models, too many assumptions, poor performance
• Guidance of well established physical models
© British Telecommunications plc
Key Analogies Between Physics and Information Theory
• Energy, Work → Uncertainty, Information• Law of the Total Energy/Uncertainty Preservation• Thermodynamic Entropy → Shannon Entropy• Matter, Space → Information, Knowledge Space• Heisenberg Principle of Uncertainty → Breiman
Principle of Uncertainty• Information exist only in the physical context• Physics and information theory converge at the
quantum level
© British Telecommunications plc
Boundaries of information processing
• Physical constraints on information processing– Mass/energy, Speed of light, Location, Time
• Spatial bounds on information capacity– Quantum mechanics at the elementary particle level– Gravitational collapse into a black hole in the macro scale
• Communication is a dynamic process and requires certain energy transmitted with power P
GcR sb23
]/[ /AP
[S.Lloyd et al. Phys. Rev. 93(10) 2004]
© British Telecommunications plc
Quantum effects about to emerge
Turing / von Neumann: Universal MachineWe can make computers
Landauer: Information is PhysicalComputers need cooling fans
Deutch: Information is QuantumComputers get weird
© British Telecommunications plc
Where lies the problem: stop the atom
© British Telecommunications plc
Quantum computingState Amplitude Probability
* (α+i β) (|α|2+|β|2)000 a = 0.37 + i 0.04 0.14001 b = 0.35 + i 0.43 0.31010 c = 0.09 + i 0.31 0.10011 d = 0.30 + i 0.30 0.18100 e = 0.11 + i 0.18 0.04101 f = 0.40 + i 0.01 0.16110 g = 0.09 + i 0.12 0.02111 h = 0.15 + i 0.16 0.05
11|0||
22
• Multitude of states, inherent parallelism• Applications:
– Search in the unsorted database– Factoring large numbers (cryptography)– Simulating quantum effects in complex
systems
1,0bit
500 qubit system:
2500 states at a single pulse
© British Telecommunications plc
Information Thermodynamics & Complexity
XAx xp
xpxH)(
1log)()(
• Landauer: Any logical data processing must be accompanied by the corresponding entropy increase of the environment heat waste of at least kTln2/bit
• Equivalence between thermodynamic and information (Shannon) entropy
• Information complexity: size, dimensionality, structure• Complexity measures the cost of obtaining
information• Kolmogorov Algorithmic Complexity: the shortest
program code that can obtain the requested• Information distance: )|(),|(max),( xyKyxKyxDI
© British Telecommunications plc
Data Particles – The Prime Inspiration
• Across different scales in Physics the two particle interaction paradigms are dominant:– Dynamic particle models where particles act upon each
other and/or environment and move accordingly– Static or statistic particle models where scale and
complexity forces statistical description of particles
• Both areas are now the field of our exploration towards the possibility of a synergic merger:– ITLDynamic data fields provide the whole methodology
for dealing with mobile data particles,,, while…– Kernel Machines, Information Theoretic Learning typically
treats data statically– Can a unified methodology be proposed?
© British Telecommunications plc
Data Field Models
TS
TS
TSS XXNXXNXXD )1,(2),1( 11
• Distance matrix calculation
• Charged data points• Central, potential field• Attracting/repelling force
© British Telecommunications plc
Electrostatic Field for Classification
© British Telecommunications plc
Field generated clustering
• All the data points let free to merge in a single point• Data hierarchies arranged as time passes,• Data trajectories form dynamic clustering dendrograms
Gravity Field Lennard-Jones Potential
© British Telecommunications plc
Dynamic Data Condensation
• Terabytes of complex corporate data – unexploited• State-of-the-art machine learning techniques – O(n2)• Real-time and adaptive models require frequent
retraining• Data are being condensed using:
– Random sub-sampling– Parzen density based methods– Multi-resolution spatial analysis– Hierarchical clustering models– …..
• …but dynamic data condensation is not approached• …but labelled data are not being condensed?
© British Telecommunications plc
Soft Fixed Field Electrostatic Condensation Process
• Builds a soft Parzen density estimate for each class of data• Normalises and freezes the original class distribution• Electrostatic field with Gaussian relation on the distance is built:
• The data are let free to move and merge towards lower energy states yet the original field continuously guards the distribution
• Fast matrix implementation in Matlab
© British Telecommunications plc
99% data reduction, 99% performance retention
© British Telecommunications plc
Discriminant Function Visualisation
Quadratic
Decision Tree
)(maxarg1
xP j
C
jd
)(max1
xPD j
C
j
© British Telecommunications plc
Visualisation of Classifier Fusion
• Mean
• Max
• Min
• Product
• Majority Vote
ijNi
C
jP1
1fus Fmaxarg
ij
N
i
C
jP
11max maxmaxarg
ij
N
i
C
jP
11min minmaxarg
N
i ij
C
jP11
prod maxarg
ijNi
C
jPvote 1
1vote maxarg
N
i ij
C
jP11
mean maxarg
© British Telecommunications plc
Information Theoretic Learningfor data transformation
XAx
R xpxH
)(log
1
1)(
)|()(),( YCHCHYCI
BTWV
dyypcPycp
ALLV
dyypcP
INV
dyycpYCIc
Yc
Yc
Y )()(),(2)()(),(),( 222
• Mutual Information• Renyi’s Entropy• Information potentials:
ci
BTW
ci
ALL
ci
IN
i y
V
y
V
y
V
y
I
2
Information Forces
Linear Feature Transformation
[Torkkola NIPS 2001]
© British Telecommunications plc
Information Theoretic Learningfor classification and clustering
• ITL Framework [Principe et al 1999]• Class label transmission [Archambeau et al 2004]: a
new generic method for classification based on ITL and Parzen density model
• Generalised information distances used for feature generation [Kaplan & Hafner 2006]
• Classification with unlabelled data using ITL-linked density divregence minimisation [Jeong et al 2005]
• Clustering by separating cell identities using MIM [Schneideman et at 2003]
• Unsupervised Clustering by MIM between data and parameters [Herschkowitz & Nadal 1999]
© British Telecommunications plc
The Challenges to Tackle
• Data obesity and data quality issues• Model and data complexity control,• Multidimensional information uncertainty and fusion• Natural language processing
Zadeh’s Generalised Theory of Information Uncertainty:
Information is a generalised constraint
Most Swedes are tall: ))()(()( duuuhhGC tallbalikely
© British Telecommunications plc
Particle Dynamics based Exploration Models
• Simulated Annealing – random particle exploration of the input space in the cooling environment gradually slowing particles velocity
• Stochastic Diffusion Search – random agent exploration with one-to-one communication
• Ant Colony Optimisation – spatial path optimisation inspired by ant laid pheromone trails
• Particle Swarm Optimisation – swarm dynamically led by the local best (one-to-all communication)
• Particle filters – a flexible sequential predictor based on sampling from a sequence of probability distributions using large number of particles
© British Telecommunications plc
Business Vision for the Future
• Distributed data mining• Multimedia mining (voice, text, video)• Open and flexible data structures• Unified data processing framework• Online secured predictive services• Networked, evolving and adaptable software• Automated knowledge discovery• Artificial awareness: self-aware software (>2020)
© British Telecommunications plc
Composite Optimisation Problem
• Each component treated separately• Lack of coordination• Modelling inconsistencies
• The challenge: Full Optimisation
© British Telecommunications plc
Task Force Achievements
• Seminars given:– IDSIA, Switzerland, Prof. J. Schmidhuber– Birmingham University, Dr Peter Tino– Aston University, Prof. David Lowe
• Conferences:– ICCMSE’2007 with a paper published in American Institute
of Physics (AIP) Conference Proceeding Series
• Publications:– D. Ruta, B. Gabrys.
Reducing Spatial Data Complexity fort Classification Models. Accepted to the International Conference of Computational Methods in Sciences and Engineering ICCMSE 2007, American Institute of Physics Proceeding Series
– D. Ruta, B. Gabrys. A Framework for Machine Learning based on Dynamic Physical Fields. Accepted to the Special Issue of Natural Computing Journal on Nature-inspired Learning and Adaptive Systems
• Establishing an active group of about 20 researchers networking around the PIALM and related issues
© British Telecommunications plc
• Proposal submitted to the EU 7th to merge different PIALM directions and follow up research cantred around Information Theoretical Learning and Dynamic Particle Models.
• Transforming the PIALM contacts into prospective project support group with regular meetings agenda, newsletter and closer collaborative ventures.
• Organisation of Special Sessions during related Conferences
• Further applications for networking/travel grants• Widening the scope of PIALM into several focus themes
to strengthen the link with other NISIS projects and better address changing needs of the society
PIALM Follow-up and Future Activities
© British Telecommunications plc
Conclusions
• Business analytics quite disparate from state-of-the-art research in machine learning, pattern recognition etc.
• Over-complex black-box type models unusable in business applications
• Customer analytics gains on importance and the modelling tools for customer-centric service providers
• Online predictive and adaptable services soon to emerge
• Nature continues to provide inspirations for data-driven modelling and learning
© British Telecommunications plc