Upload
others
View
23
Download
0
Embed Size (px)
Citation preview
Machine Learning – An Engineer’s Perspective
Robert C. Williamson
Data is the latest buzz
It’s all About the D Central to all science ... and business and society
Scientists have been collecting and analysing data for a long time
Data = Fact ?
Facts Are constructed.. Theory-ladenness of observation... ‘Raw data is an oxymoron’
You say ‘the data’; I say ‘the capta’
• Data is usually conceived of as a given – “Data” comes from the Latin dare meaning “to give”
• But there is always an element of choice – one takes the data, actively selecting and gathering it in order to do something with it.
• Thus “capta” from the Latin capere meaning “to take, seize, obtain, get, enjoy or reap”.
Categories are in your head Don’t take them (or data) for granted
Pervasive Uncertainty (even when “N=All”)
But as for certain truth, no man has known it, Now will he know it; neither of the gods Nor yet of all the things of which I speak. And even if by chance he were to utter The perfect truth, he would himself not know it For all is but a woven web of guesses
Machine Learning New techniques for data problems Easy to bedazzled by the latest hammer (e.g. deep learning) The same (complex) problems remain
Risk sensitive ML
• Risk measures used in finance and insurance • Related to loss functions • Bizarrely simplified
in practice • Understand the
problem first
The Importance of Architecture • Modularity
– Enables distributed creation; enables faster adaptation; decouples
– Composability: only way to make large systems • Interoperability
– Makes a healthy ecosystem • Platforms
– Positive value: generativity – Negative: lock-in and rent-seeking
• Stacks – More layers: richer ecosystem
• Gateways – Break monopolies; enhance competition
• Standards – Many small standards rather than one massive one – Open standards and labor mobility?
http://psikit.net/
Subproblem How to take computation to the data in a way that guarantees information does not leak
Technology solutions for data analytics with privacy and data control | Stephen Hardy
Organisation 1
Cloud / data center
Sensitive data
Health; sensor; finance; government; location; etc.
Organisation 2
Cloud / data center
Sensitive data
How can you jointly learn across multiple data without sharing the data?
Insights
Joint Analysis
Health; sensor; finance; government; location; etc.
N1Analytics
Privacy Technologies Partial homomorphic
encryption Secure multi-party
computation Irreversible aggregation
Graph Computation Engine
Analytics Statistics Regression Clustering
Data Auth
Machine Learning Learn Evaluate Deploy
Config
Technology solutions for data analytics with privacy and data control | Stephen Hardy
Memory Μνημοσυνη (Mnemosyne) the goddess of memory and remembrance, who invented language and words, discovered the uses of the power of reason, and gave a designation to every object.
Μνημοσυνη (Mnemosyne) The goddess of memory and remembrance, who invented language and words, discovered the uses of the power of reason, and gave a designation to every object
Μνημοσυνη • Provenance, trust and reliability –
managing provenance of capta and the transformations that the capta undergo
• Management of legal rights (including confidentiality) –using formal methods, so that compositions of capta can have the appropriate legal rights enforced
• Management of uncertainty – “big-data” does not remove uncertainty; it just makes it worse.
Μνημοσυνη • Management of complex workflows,
including the ability to rewind what was already done
• Late binding ontologies – data is typically captured and organised for one purpose, but later used for another. Allow user to rewind all the way back to the initial data categories (the capta“ontology”) for re-use
• Being able to cross organisational, jurisdictional, and technical boundaries
Μνημοσυνη • Decoupling technique from problem • Generative Infrastructure • Avoidance of proprietary control • Learn from lessons of the past • Deliver as a service • Developed piecemeal
Conclusion • ML is new technology for old problems • It is indeed a technology!
– Can see similar patterns of evolution to other technologies
– Can learn from actuarial inference / finance (and conversely)
• Key open problems – Dealing with provenance, utility, risk, uncertainty and
jurisdictional boundaries – Sustainable infrastructures for data