Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
From Perception to Cognition: Towards Human-Understanding and Human-Centricity in AI
A*STAR AI Initiative
Kenneth Kwok, PhDPrincipal Scientist, Institute of High Performance ComputingProgramme Manager, A*STAR AI Initiative
2
Let there be AI…
• Simulate every aspect of human learning and intelligence • Programming Computers to Use a Language• Self Improvement (Learn)• Randomness and Creativity
… We think that significant advance can be made in one or more of these problems if a carefully selected a group of scientists work on it together for a summer.
3
A Short History of AI
1945 1970DartmouthConference (1956)
1980 1990 2000 2010 2020
• Minsky and Papertpaper (1969)
1st AI Winter
• EDVAC (1949) • ENIAC (1945)
Reasoning as SearchLogicNeural Networks
• LISP (1958)
• McCulloch and Pitts (1943)
• Semantic Nets (1966)• ELIZA (1965)
• ADALINE (1959)
• GPS (1957)
• Logic based Q&A (1964)
• Perceptron (1957)
Expert SystemsKnowledge and Reasoning
• SOAR (1983)
2nd AI Winter
• PROLOG(1972)• NLP at Stanford (1970)
• MYCIN(1974)
• Frames(1975)
• Commercial Expert Systems• LISP machines (1980)
• PROSPECTOR (1979)
• NetTalk (1985) • SVM(1983)• Q‐learning (1989)
• IBM Watson(2011)• Apple SIRI (2011)
Big DataDeep Learning / Reinforcement LearningComputational Power
• Backpropagation (1974)
• Return of NNs and Backpropagation
• DeepMind Atari Games (2015)
• DeepMind AlphaGo(2016)
• CMU Libratus(2017)
• Convolutional NN (1979)• LSTM (1997)
• Hopfield Nets (1982)
• IBM Deep Blue (1997) • Robo‐Cup (1997)
• DARPA Grand Challenge (2005)• DARPA Urban Challenge (2007)
• ImageNet(2009)
Grand Challenges
4
Recent Successes (State of the Art)
• Object recognition (ImageNet Challenge)
• Voice assistants (Siri, Alexa/Echo, Google Home)
• Machine translation (Google, Nuance)
• Go, Chess and Poker (IBM Deep Blue, DeepMind AlphaGo, Libratus)
• Autonomous vehicles (Google, Uber, nuTonomy)
• Trivia/Q&A (IBM Watson for Jeopardy)
• Medical/legal assistance (DeepMind, Watson)
A*STAR AI Capabilities
6
English, Chinese and Southeast Asian Languages
Speaker Recognition (Voice Biometrics)
Speech Recognition
Language Understanding
DialogueManagement
ResponseGeneration
SpeechSynthesis
Spoken Dialogue System
Speech Recognition Machine Translation Speech Synthesis
A*STAR Speech and Language
7
I²R’s English speech recognition solution won the 2015 ASpIRE (Automatic Speech Recognition in Reverberant Environments) Challenge organized by IARPA of US, participated in by 169 teams from 32 countries
Benchmarking of Capabilities
8
I2R’s engine performed >10% better than acoustic feature extraction engine of Nuance and Google for Mandarin speech recognition. Benchmarking was conducted by a Japanese firm in several application scenarios, under both clean and noisy conditions
Benchmarking of Capabilities
9
• Image Segmentation• Image / Object Classification• Action, Activity Recognition
A*STAR Computer Vision
10
Virtual Radiologist for CT ImagesTask : Nodule Segmentation
Approach: Deep Convolutional Neural Networks (DCNN) Using Human Organ Medical Images
Lung nodule detection from CT Images: Classification Accuracy 80%
11
•Target: automatically classify a given tissue image into tumor or non‐tumor groups.•SPIE paper: extract 43 features: colour + texture (Gabor & Co‐occurrence matrix), then apply ELM/SVM to do classification. •Classification result: ~91% accuracy by ELM and ~89% accuracy by SVM, in contrast, deep convolutional neural networks achieves ~96% accuracy.
Description
•Deep convolutional neural networks: 1 input layer + 2*convolutional layer s+ 2*max‐pooling layers + 1 full connection layer + 1 output layer
Method / Approach
•Significantly higher classification accuracy •General approach applicable to other biomedical applications.
Expected Impact
tumor
Non-tumorInput Images
In Conv Pool Conv Pool PC
Convolutional Neural Networks
32x32 32x32 16x16 16x16 8x8
128
Out
232321616
3x3 2x2 3x3 2x2
3x3x3x16 3x3x16x32 8x8x32x128
depth:weight:
Tumor Tissue Image Classification
12
Automation for Medical Imaging:Human Anatomy Recognition Using Multi-Modal Data and Deep Learning
•Human operator has to configure imaging parameters depending on diagnostic procedure (e.g. lung, heart, etc.), which is slow and inaccurate if patient moves or is covered by a blanket, evenif he has medical knowledge to estimate organ position based on surface features.
•Non‐intrusive technology is needed (does not interfere with medical devices)
Motivation
•Develop vision‐based sensing technology and algorithms to estimate patient position•Low‐cost depth & thermal sensors make multi‐modal data more accessible for higher recognition accuracy
•Human skeleton dataset of 40K frames collected•Poisson surface reconstruction, human detection using depth image, pose estimation & recognition, skeleton recognition
•Deep learning (CNN) to automate alignment during medical imaging
Approach
•Ability to accurately predict feature points such as joints and internal organs, generalisable to other object detection applications
•Automate medical imaging and increase throughput•2 cm mean error of joint locations achieved, comparable with current gold standard of human operators
•Runs in real‐time on a laptop equipped with mid‐range GPU
Achievement
13
Coating Surface Defect Inspection
‐ Need regular checking
‐ Time consuming and labor intensive
‐ Exposure to dangerous environment
14
• Traditional methods: detect defect + feature extraction + pattern classification
• An integrated approach using deep learning
• High potential to reduce the time taken over traditional methods
• High potential to reduce the complexity of developing accurate models
Description
• Develop an automatic recognition system based on deep neural networks architecture.
Method / Approach
• 95% for defect classification• 80% for detection of coating surface defect
Performance
DeepLearning
Coating SurfaceImage
In
OutDefect Patterns & Localization
Virtual Defect Inspection Engineer
15
• Machine Health Analytics• Biomedical Informatics• Consumer Analytics
A*STAR Data Analytics
16
Machine Health Monitoring
Time to change bearings?
‐ Requires regular checking and maintenance
‐ Time consuming and labour intensive
17
Virtual Machine Health Doctor : AI-based Predictive Maintenance
Data Module gModelling Module Evaluation ModuleFeature Module
Raw Signals Pre-processingExtraction
Time domainFrequency domainTime-frequency domain
SelectionFisher’s ratio
TechniquesNeural Networks (NN)Support Vector Regression
MetricsRoot-Mean Squared Error (RMSE)Mean Absolute Percentage Error Precision (PR)Prognostics Horizon (PH)Confidence Interval (CI)
• Accurately predict the remaining useful life of a machine based on sensor data.
• Effectively reduce machine downtime.
• Effectively reduce labor cost on regular maintenance.
18
Data Clean
Feature Extraction and Engineering
Algorithm Selection
Hyper‐Parameter Tuning
Validation Deployment
Data
Key Features: ‐ Automatic Feature Extraction and Engineering‐ Automatic Algorithm and Hyper‐Parameter Tuning‐ Self‐adapting and learning for new dataset
Virtual Scientist : Automated Model Building System
Manufacturing Healthcare Urban SystemsServices & Digital Economy
Working with Embraer team on this automated approach for predictive maintenance under A*STAR Aerospace Programme.
Virtual Data Scientist: Use AI to Automate Model Building
19
Preliminary Results on Bearing Fault Detection
Manual Model Building
Auto Model Building
Prediction Accuracy 0.65
PredictionAccuracy 0.85
j
Lot of parametersto be manually
adjusted
TaskFaulty or Normal?
20
Rakuten-Viki Global TV Recommender Challenge
• To build a personalized TV Recommender system for world‐wide Rakuten‐Viki fans
• Recommend videos that a user is likely to watch (precision) and watch for long time (engagement)
• “Cold‐Start” problem : 20+% users do not apprear in training data)
• Data sparsity problem : most users viewed <= 5 videos in training data
Motivation / Objectives
• Typical recommendation algorithms do not well here due to sparsity and cold‐start problems
• Formulate as classification problem instead of a typical recommendation problem to predict the probability of a video that a user is likely to watch
Approach
• 1st Prize Winner• Overcome “cold‐star” and data sparsity problems• Robust and scalable approach for online recommendations
• Flexible to incorporate other general features
Achievement / Impact / Value Capture
Distribution of No. of Videos viewed in the training datafor Users tested in Feb 2015(Left) and Mar 2015 (Right)
Leader Board Journey
IHPC
Performan
ce Score
Lead
er Boa
rd
Rank
ing
1st Prize
21
Gap between AI and True Intelligence
Essentially pattern matching ~ Mostly PERCEPTION No UNDERSTANDING, largely BLACK BOX approaches
Where is
Understanding?
22
Cognition: Human-Level AI
• A computer program capable of acting intelligently in the world must have a general representation of the world in terms of which its inputs are interpreted.
• Designing such a program requires commitments about what knowledge is and how it is obtained...
• More specifically, we want a computer program that decides what to do by inferring... a certain strategy will achieve its assigned goal. This requires formalising concepts of causality, ability,and knowledge.
JOHN McCARTHY1927–2011
23
A*STAR Social Cognitive Computation
Psychometrics & decision science
Social intelligence
&Cognitive systems
Individual Communities
Integrative psychological
modelling
Groups
Applications
Understand ground sentimentsBehaviour motivationsEnhance learning productivity
Person-product matching
Brand surveillanceCognitive ability assessment
Consumer preferences
Targeted marketing Optimized business transactionsStrategic crowdsourcing
People and behaviour profiling
Improve consumer satisfaction
Collaborativethinking &
technologies
24
Fine-Grained Sentiment Analysis
Our Real‐World Sentiment Analysis Case Studies
Understanding brand perceptions across cities
Discovering consumer preferences across products
Ground sensing of day‐to‐day commuter sentiments
Quantifying positivity generated from public campaigns
http://imageanalysis.socialanalyticsplus.net
Design Features and Novelty
http://172.20.98.207:8080/sentimo‐webportal/sentimo_api.html
• Fine‐grained multi‐dimensional outputs (positive, negative, neutral, mixed, sadness, anger, happiness, excitement…)
• Comprehensive lexicons, fully in‐house developed (English, Internet slangs, local language and domain words collections)
• Linguistic processing units (decomposer, negation handler, amplifier handler…)
Performance
0.000.200.400.600.801.00
Average of F1 ‐ Score for Positivity, Negativity, Neutrality Recognition
SentiMoAPI v1.0
ToolA
ToolB
“A method and system for sentiment classification and emotion classification”, Patent Cooperation Treaty (PCT) Application PCT/SG2015/050469
SentiMoAPI & SDK released
to licensees
25
People Analytics
System and Platform
Application
26
Learning from Experience: Rapid Causal Learning
Build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems+
Causality from Temporal Correlation
Inspired by Contingency Model of Causal Learning from Psychology
Knowledge Level Causal Learning
Ground Level Causal Learning
(Temporal Correlation)
+ See also: Building Machines that Learn and Think Like People (Lake, Ullman, Tenenbaum, and Gershman, 2016) in arXiv:1604.00289v3
Strength(Cause(Event1, Event2)) =Prob(Cause(Event1, Event2)) – Wt * Uncert(Cause(Event1, Event2))
Lightning flash (L)
Headlight flashes of vehicles
Blinking beacon (B)
H G
Experiment: Learning Causality from ExperienceRelationship between Lightning and Thunder
Lightning reliably predicts Thunder
27
Learning to play Atari Games in a human-like way• Google DeepMind published a Nature paper+ on learning to play 50 Atari games using deep reinforcement learning
• DeepMind achieved human level score in 50% of the games. However, learning took a long time –hours! – with many iterations, and the system had to learn from scratch for each game.
...50+ games
That is NOT human performance
+ Mnih, V. et al.(2015) Human‐level control through deep reinforcement learning, Nature, 518, p. 529
Learning Object Interactions in Game
Environment
Learning Object Interactions in Game
Environment
Learning Behavioural Scripts
Learning Behavioural Scripts Learning Game PlayLearning Game Play
• We are currently building a system using our causal learning method to learn to play the same games in a human‐like way
•Properties of barriers, gaps, missiles, shields etc.
•Predicting trajectories•Effects of a successful shot and of being shot etc. •Behaviours such as shooting
dodging shots, hiding, chasing•Rules of specific games•Adapt behaviours to game
Human‐level scoresFast on‐line learningTransfer between games
28
Learning to play Space Invaders
29
Commonsense Knowledge Representation and Reasoning Achievements:• Codified a commonsense knowledge base (KB) using semantic graph representation
• 3.4 million concepts in about 10 million relational assertions
• Sourced from KBs such as ConceptNet, YAGO, DBpedia, augmented by an 8‐billion‐word text corpus represented as word embedding in vector space using Word2Vec
• Applied KB in tasks such as topic categorisation, sentiment analysis and commonsense reasoning
Current Work: • Develop representations for narrative knowledge (modelling temporally extended events)
• Reasoning about events for event monitoring and prediction
Semantic Graph using Neo4J
Representation for Events and Scripts
Event Models for Event Monitoring
A*STAR AI Initiative
Human-Centric AI Programme
Human-Centric AI Programme
31
Speech & Language Video & Image Data Analytics Social Cognitive
Computing
Deep Learning / Machine Learning
Good Old Fashion AI (GOFAI)
Human‐Centric AISingaporean and Asian Culture
• Knowledge of human needs/motivations, social/cultural norms, and commonsense
• Personalised and Explainable• Instructable through real‐time instruction and demonstration, or learn from experience with a small number of examples
Human‐Centric AI ResearchAI that understands humans, reasons for humans and learns like humans.
Specifically, human‐centric AI that understands Singaporean and Asian cultures.
Human AI
Explicit instructions
Explanations
Implicit signals
(Socio‐cultural behaviors, commonsense, mental state)
Learns like humans
Understands humans
Cognitive Human‐like EmpatheticExplainable
Cognitive Human‐likeEmpathetic
Machine‐learningReasons for humans
Human‐Centric AI Research
Human-Centric AI Programme
34
Speech & Language Video & Image Data Analytics Social Cognitive
Computing
Deep Learning / Machine Learning
Good Old Fashion AI (GOFAI)
Understanding Language and Expressions
Human‐Centric AISingaporean and Asian Culture
Social‐Cultural VisualIntelligence
Towards Understanding Humans in Multi‐modal Content
“Understanding Humans” means• Being able to extract and create representations of humans from multi‐modal data sources*
• In order to reason about• Human roles• Relationships with other Humans and Objects
• Behaviours• Goals and intentions• Mental Models
* primarily images and videos, such as theseWhat is the woman thinking?(Mental Models)
How are these people related?(Roles and Relationships)
What are their intentions?(Goals and Intentions)
What could happen next?(Behaviours)
36
From Perception to Cognition: Still a long journey...
• Knowledge – the next frontier• Understanding Humans
– Goals, Intentions– Motivations– Mental models– etc...
• “Explainability”
Thank you
38
Contact us
A*STAR Artificial Intelligence Initiative
Programme Manager
Kenneth Kwok
A*STAR Artificial Intelligence Initiative
Programme Manager
Cheston Tan