A*STAR AI Initiative From Perception to Cognition: Towards ... to... · Automation for Medical Imaging: Human Anatomy Recognition Using Multi-Modal Data and Deep Learning •Human

From Perception to Cognition: Towards Human-Understanding and Human-Centricity in AI

A*STAR AI Initiative

Kenneth Kwok, PhDPrincipal Scientist, Institute of High Performance ComputingProgramme Manager, A*STAR AI Initiative

2

Let there be AI…

• Simulate every aspect of human learning and intelligence • Programming Computers to Use a Language• Self Improvement (Learn)• Randomness and Creativity

… We think that significant advance can be made in one or more of these problems if a carefully selected a group of scientists work on it together for a summer.

3

A Short History of AI

1945 1970DartmouthConference (1956)

1980 1990 2000 2010 2020

• Minsky and Papertpaper (1969)

1st AI Winter

• EDVAC (1949) • ENIAC (1945)

Reasoning as SearchLogicNeural Networks

• LISP (1958)

• McCulloch and Pitts (1943)

• Semantic Nets (1966)• ELIZA (1965)

• ADALINE (1959)

• GPS (1957)

• Logic based Q&A (1964)

• Perceptron (1957)

Expert SystemsKnowledge and Reasoning

• SOAR (1983)

2nd AI Winter

• PROLOG(1972)• NLP at Stanford (1970)

• MYCIN(1974)

• Frames(1975)

• Commercial Expert Systems• LISP machines (1980)

• PROSPECTOR (1979)

• NetTalk (1985) • SVM(1983)• Q‐learning (1989)

• IBM Watson(2011)• Apple SIRI (2011)

Big DataDeep Learning / Reinforcement LearningComputational Power

• Backpropagation (1974)

• Return of NNs and Backpropagation

• DeepMind Atari Games (2015)

• DeepMind AlphaGo(2016)

• CMU Libratus(2017)

• Convolutional NN (1979)• LSTM (1997)

• Hopfield Nets (1982)

• IBM Deep Blue (1997) • Robo‐Cup (1997)

• DARPA Grand Challenge (2005)• DARPA Urban Challenge (2007)

• ImageNet(2009)

Grand Challenges

4

Recent Successes (State of the Art)

• Object recognition (ImageNet Challenge)

• Voice assistants (Siri, Alexa/Echo, Google Home)

• Machine translation (Google, Nuance)

• Go, Chess and Poker (IBM Deep Blue, DeepMind AlphaGo, Libratus)

• Autonomous vehicles (Google, Uber, nuTonomy)

• Trivia/Q&A (IBM Watson for Jeopardy)

• Medical/legal assistance (DeepMind, Watson)

A*STAR AI Capabilities

6

English, Chinese and Southeast Asian Languages

Speaker Recognition (Voice Biometrics)

Speech Recognition

Language Understanding

DialogueManagement

ResponseGeneration

SpeechSynthesis

Spoken Dialogue System

Speech Recognition Machine Translation Speech Synthesis

A*STAR Speech and Language

7

I²R’s English speech recognition solution won the 2015 ASpIRE (Automatic Speech Recognition in Reverberant Environments) Challenge organized by IARPA of US, participated in by 169 teams from 32 countries

Benchmarking of Capabilities

8

I2R’s engine performed >10% better than acoustic feature extraction engine of Nuance and Google for Mandarin speech recognition. Benchmarking was conducted by a Japanese firm in several application scenarios, under both clean and noisy conditions

Benchmarking of Capabilities

9

• Image Segmentation• Image / Object Classification• Action, Activity Recognition

A*STAR Computer Vision

10

Virtual Radiologist for CT ImagesTask : Nodule Segmentation

Approach: Deep Convolutional Neural Networks (DCNN) Using Human Organ Medical Images

Lung nodule detection from CT Images: Classification Accuracy 80%

11

•Target: automatically classify a given tissue image into tumor or non‐tumor groups.•SPIE paper: extract 43 features: colour + texture (Gabor & Co‐occurrence matrix), then apply ELM/SVM to do classification. •Classification result: ~91% accuracy by ELM and ~89% accuracy by SVM, in contrast, deep convolutional neural networks achieves ~96% accuracy.

Description

•Deep convolutional neural networks: 1 input layer + 2*convolutional layer s+ 2*max‐pooling layers + 1 full connection layer + 1 output layer

Method / Approach

•Significantly higher classification accuracy •General approach applicable to other biomedical applications.

Expected Impact

tumor

Non-tumorInput Images

In Conv Pool Conv Pool PC

Convolutional Neural Networks

32x32 32x32 16x16 16x16 8x8

128

Out

232321616

3x3 2x2 3x3 2x2

3x3x3x16 3x3x16x32 8x8x32x128

depth:weight:

Tumor Tissue Image Classification

12

Automation for Medical Imaging:Human Anatomy Recognition Using Multi-Modal Data and Deep Learning

•Human operator has to configure imaging parameters depending on diagnostic procedure (e.g. lung, heart, etc.), which is slow and inaccurate if patient moves or is covered by a blanket, evenif he has medical knowledge to estimate organ position based on surface features.

•Non‐intrusive technology is needed (does not interfere with medical devices)

Motivation

•Develop vision‐based sensing technology and algorithms to estimate patient position•Low‐cost depth & thermal sensors make multi‐modal data more accessible for higher recognition accuracy

•Human skeleton dataset of 40K frames collected•Poisson surface reconstruction, human detection using depth image, pose estimation & recognition, skeleton recognition

•Deep learning (CNN) to automate alignment during medical imaging

Approach

•Ability to accurately predict feature points such as joints and internal organs, generalisable to other object detection applications

•Automate medical imaging and increase throughput•2 cm mean error of joint locations achieved, comparable with current gold standard of human operators

•Runs in real‐time on a laptop equipped with mid‐range GPU

Achievement

13

Coating Surface Defect Inspection

‐ Need regular checking

‐ Time consuming and labor intensive

‐ Exposure to dangerous environment

14

• Traditional methods: detect defect + feature extraction + pattern classification

• An integrated approach using deep learning

• High potential to reduce the time taken over traditional methods

• High potential to reduce the complexity of developing accurate models

Description

• Develop an automatic recognition system based on deep neural networks architecture.

Method / Approach

• 95% for defect classification• 80% for detection of coating surface defect

Performance

DeepLearning

Coating SurfaceImage

In

OutDefect Patterns & Localization

Virtual Defect Inspection Engineer

15

• Machine Health Analytics• Biomedical Informatics• Consumer Analytics

A*STAR Data Analytics

16

Machine Health Monitoring

Time to change bearings?

‐ Requires regular checking and maintenance

‐ Time consuming and labour intensive

17

Virtual Machine Health Doctor : AI-based Predictive Maintenance

Data Module gModelling Module Evaluation ModuleFeature Module

Raw Signals Pre-processingExtraction

Time domainFrequency domainTime-frequency domain

SelectionFisher’s ratio

TechniquesNeural Networks (NN)Support Vector Regression

MetricsRoot-Mean Squared Error (RMSE)Mean Absolute Percentage Error Precision (PR)Prognostics Horizon (PH)Confidence Interval (CI)

• Accurately predict the remaining useful life of a machine based on sensor data.

• Effectively reduce machine downtime.

• Effectively reduce labor cost on regular maintenance.

18

Data Clean

Feature Extraction and Engineering

Algorithm Selection

Hyper‐Parameter Tuning

Validation Deployment

Data

Key Features: ‐ Automatic Feature Extraction and Engineering‐ Automatic Algorithm and Hyper‐Parameter Tuning‐ Self‐adapting and learning for new dataset

Virtual Scientist : Automated Model Building System

Manufacturing Healthcare Urban SystemsServices & Digital Economy

Working with Embraer team on this automated approach for predictive maintenance under A*STAR Aerospace Programme.

Virtual Data Scientist: Use AI to Automate Model Building

19

Preliminary Results on Bearing Fault Detection

Manual Model Building

Auto Model Building

Prediction Accuracy 0.65

PredictionAccuracy 0.85

j

Lot of parametersto be manually

adjusted

TaskFaulty or Normal?

20

Rakuten-Viki Global TV Recommender Challenge

• To build a personalized TV Recommender system for world‐wide Rakuten‐Viki fans

• Recommend videos that a user is likely to watch (precision) and watch for long time (engagement)

• “Cold‐Start” problem : 20+% users do not apprear in training data)

• Data sparsity problem : most users viewed <= 5 videos in training data

Motivation / Objectives

• Typical recommendation algorithms do not well here due to sparsity and cold‐start problems

• Formulate as classification problem instead of a typical recommendation problem to predict the probability of a video that a user is likely to watch

Approach

• 1st Prize Winner• Overcome “cold‐star” and data sparsity problems• Robust and scalable approach for online recommendations

• Flexible to incorporate other general features

Achievement / Impact / Value Capture

Distribution of No. of Videos viewed in the training datafor Users tested in Feb 2015(Left) and Mar 2015 (Right)

Leader Board Journey

IHPC

Performan

ce Score

Lead

er Boa

rd

Rank

ing

1st Prize

21

Gap between AI and True Intelligence

Essentially pattern matching ~ Mostly PERCEPTION No UNDERSTANDING, largely BLACK BOX approaches

Where is

Understanding?

22

Cognition: Human-Level AI

• A computer program capable of acting intelligently in the world must have a general representation of the world in terms of which its inputs are interpreted.

• Designing such a program requires commitments about what knowledge is and how it is obtained...

• More specifically, we want a computer program that decides what to do by inferring... a certain strategy will achieve its assigned goal. This requires formalising concepts of causality, ability,and knowledge.

JOHN McCARTHY1927–2011

23

A*STAR Social Cognitive Computation

Psychometrics & decision science

Social intelligence

&Cognitive systems

Individual Communities

Integrative psychological

modelling

Groups

Applications

Understand ground sentimentsBehaviour motivationsEnhance learning productivity

Person-product matching

Brand surveillanceCognitive ability assessment

Consumer preferences

Targeted marketing Optimized business transactionsStrategic crowdsourcing

People and behaviour profiling

Improve consumer satisfaction

Collaborativethinking &

technologies

24

Fine-Grained Sentiment Analysis

Our Real‐World Sentiment Analysis Case Studies

Understanding brand perceptions across cities

Discovering consumer preferences across products

Ground sensing of day‐to‐day commuter sentiments

Quantifying positivity generated from public campaigns

http://imageanalysis.socialanalyticsplus.net

Design Features and Novelty

http://172.20.98.207:8080/sentimo‐webportal/sentimo_api.html

• Fine‐grained multi‐dimensional outputs (positive, negative, neutral, mixed, sadness, anger, happiness, excitement…)

• Comprehensive lexicons, fully in‐house developed (English, Internet slangs, local language and domain words collections)

• Linguistic processing units (decomposer, negation handler, amplifier handler…)

Performance

0.000.200.400.600.801.00

Average of F1 ‐ Score for Positivity, Negativity, Neutrality Recognition

SentiMoAPI v1.0

ToolA

ToolB

“A method and system for sentiment classification and emotion classification”, Patent Cooperation Treaty (PCT) Application PCT/SG2015/050469

SentiMoAPI & SDK released

to licensees

25

People Analytics

System and Platform

Application

26

Learning from Experience: Rapid Causal Learning

Build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems+

Causality from Temporal Correlation

Inspired by Contingency Model of Causal Learning from Psychology

Knowledge Level Causal Learning

Ground Level Causal Learning

(Temporal Correlation)

+ See also: Building Machines that Learn and Think Like People (Lake, Ullman, Tenenbaum, and Gershman, 2016) in arXiv:1604.00289v3

Strength(Cause(Event1, Event2)) =Prob(Cause(Event1, Event2)) – Wt * Uncert(Cause(Event1, Event2))

Lightning flash (L)

Headlight flashes of vehicles

Blinking beacon (B)

H G

Experiment: Learning Causality from ExperienceRelationship between Lightning and Thunder

Lightning reliably predicts Thunder

27

Learning to play Atari Games in a human-like way• Google DeepMind published a Nature paper+ on learning to play 50 Atari games using deep reinforcement learning

• DeepMind achieved human level score in 50% of the games. However, learning took a long time –hours! – with many iterations, and the system had to learn from scratch for each game.

...50+ games

That is NOT human performance

+ Mnih, V. et al.(2015) Human‐level control through deep reinforcement learning, Nature, 518, p. 529

Learning Object Interactions in Game

Environment

Learning Object Interactions in Game

Environment

Learning Behavioural Scripts

Learning Behavioural Scripts Learning Game PlayLearning Game Play

• We are currently building a system using our causal learning method to learn to play the same games in a human‐like way

•Properties of barriers, gaps, missiles, shields etc.

•Predicting trajectories•Effects of a successful shot and of being shot etc. •Behaviours such as shooting

dodging shots, hiding, chasing•Rules of specific games•Adapt behaviours to game

Human‐level scoresFast on‐line learningTransfer between games

28

Learning to play Space Invaders

29

Commonsense Knowledge Representation and Reasoning Achievements:• Codified a commonsense knowledge base (KB) using semantic graph representation

• 3.4 million concepts in about 10 million relational assertions

• Sourced from KBs such as ConceptNet, YAGO, DBpedia, augmented by an 8‐billion‐word text corpus represented as word embedding in vector space using Word2Vec

• Applied KB in tasks such as topic categorisation, sentiment analysis and commonsense reasoning

Current Work: • Develop representations for narrative knowledge (modelling temporally extended events)

• Reasoning about events for event monitoring and prediction

Semantic Graph using Neo4J

Representation for Events and Scripts

Event Models for Event Monitoring

A*STAR AI Initiative

Human-Centric AI Programme


31

Speech & Language Video & Image Data Analytics Social Cognitive

Computing

Deep Learning / Machine Learning

Good Old Fashion AI (GOFAI)

Human‐Centric AISingaporean and Asian Culture

• Knowledge of human needs/motivations, social/cultural norms, and commonsense

• Personalised and Explainable• Instructable through real‐time instruction and demonstration, or learn from experience with a small number of examples

Human‐Centric AI ResearchAI that understands humans, reasons for humans and learns like humans.

Specifically, human‐centric AI that understands Singaporean and Asian cultures.

Human AI

Explicit instructions

Explanations

Implicit signals

(Socio‐cultural behaviors, commonsense, mental state)

Learns like humans

Understands humans

Cognitive Human‐like EmpatheticExplainable

Cognitive Human‐likeEmpathetic

Machine‐learningReasons for humans

Human‐Centric AI Research


34

Speech & Language Video & Image Data Analytics Social Cognitive

Computing

Deep Learning / Machine Learning

Good Old Fashion AI (GOFAI)

Understanding Language and Expressions

Human‐Centric AISingaporean and Asian Culture

Social‐Cultural VisualIntelligence

Towards Understanding Humans in Multi‐modal Content

“Understanding Humans” means• Being able to extract and create representations of humans from multi‐modal data sources*

• In order to reason about• Human roles• Relationships with other Humans and Objects

• Behaviours• Goals and intentions• Mental Models

* primarily images and videos, such as theseWhat is the woman thinking?(Mental Models)

How are these people related?(Roles and Relationships)

What are their intentions?(Goals and Intentions)

What could happen next?(Behaviours)

36

From Perception to Cognition: Still a long journey...

• Knowledge – the next frontier• Understanding Humans

– Goals, Intentions– Motivations– Mental models– etc...

• “Explainability”

Thank you

38

Contact us

A*STAR Artificial Intelligence Initiative

Programme Manager

[email protected]

Kenneth Kwok

A*STAR Artificial Intelligence Initiative

Programme Manager

[email protected]

Cheston Tan

Documents

A*STAR AI Initiative From Perception to Cognition: Towards ... to... · Automation for Medical Imaging: Human Anatomy Recognition Using Multi-Modal Data and Deep Learning •Human