Upload
esmond-bradley
View
226
Download
4
Tags:
Embed Size (px)
Citation preview
Gesture Recognition & Machine Learning for Real-Time Musical Interaction
Rebecca FiebrinkAssistant Professor of Computer Science (also Music)Princeton University
Nicholas GillianPostdoc in Responsive EnvironmentsMIT Media Lab
3
Outline
• ~40 min: Machine learning fundamentals
• ~1 hour: Wekinator: Intro & hands-on
• ~1 hour: Eyesweb: Intro & hands-on
• Wrap-up
4
Models in gesture recognition & mapping
sensed action
interpretation
response (music, visuals, etc.)
computer
human + sensorssound, visuals,
etc.
model
•What is the current state (e.g., pose)?•Was a control motion performed?
• If so, which• How?
•What sound should result from this state, motion, motion quality, etc.?
6
algorithm
trainingdata
Training
Supervised learning
model
inputs
outputsRunning
“Gesture 1” “Gesture 2” “Gesture 3”
“Gesture 1”
7
Why use supervised learning?
• Models capture complex relationships from the data. (feasible)
• Models can generalize to new inputs. (accurate)
• Supervised learning circumvents the need to explicitly define mapping functions or models. (efficient)
9
Features
• Each data point is represented as a feature vector
Example # Red(pixel1) Green(pixel1) Blue(pixel1) … Label
1 84 120 34 … Gesture1
2 43 25 85 … Gesture1
3 12 128 4 … Gesture2
10
Features
• Good features can make a problem easier to learn!
Example # X(r_hand) Y(r_hand) Depth(r_hand Label
1 0.1 0.5 0.6 Gesture1
2 0.2 0.4 0.1 Gesture1
3 0.9 0.9 0.1 Gesture2
11
Classification
This model: a separating line or hyperplane (decision boundary)
feature1
feat
ure2
13
Unsupervised learning
• Training set includes examples, but no labels
• Example: Infer clusters from data:
feature1
feat
ure2
14
Temporal modeling
• Examples and inputs are sequential data points in time
• Model used for following, identification, recognition
Image: Bevilacqua et al., NIME 2007
17
The learning problem
• Goal: Build the best** model given the training data– Definition of “best” depends on context, assumptions…
18
Which classifier is best?
Image from Andrew Ng
Competing goals:Accurately model training data**Accurately classify unseen data points**
“Overfit”“Underfit”
20
Another simple classifier: Decision tree
Images: http://ai.cs.umbc.edu/~oates/classes/2009/ML/homework1.html, http://nghiaho.com/?p=1300
21
AdaBoost: Iteratively train a “weak” learnerImage from http://www.cc.gatech.edu/~kihwan23/imageCV/Final2005/FinalProject_KH.htm
22
Support vector machine
• Re-map input space into a higher number of dimensions and find a separating hyperplane
23
Choosing a classifier: Practical considerations
• k-Nearest Neighbor+ Can tune k to adjust smoothness of decision boundaries
- Sensitive to noisy, redundant, irrelevant features; prone to overfitting; weird in high dimensions
• Decision tree:+ Can prune to reduce overfitting, produces human-understandable model
- Can still overfit
• AdaBoost+ Theoretical benefits, less prone to overfitting
+ Can tune by changing base learner, number of training rounds
• Support Vector Machine+ Theoretical benefits similar to AdaBoost– Many parameters to tune, training can take a long time
24
How to evaluate which classifier is better?
• Compute a quality metric– Metrics on training set (e.g, accuracy, RMS error)– Metrics on test set– Cross-validation
• Use it
Image from http://blog.weisu.org/2011/05/cross-validation.html
26
Which learning method should you use?
• Classification (e.g., kNN, AdaBoost, SVM, decision tree):– Apply 1 of N labels to a static pose or state– Label a dynamic gesture, when segmentation & normalization are trivial
• E.g., feature vector is a fixed-length window in time
• Regression (e.g., with neural networks):– Produce a real-valued output (or vector of real-valued outputs) for each
feature vector
• Dynamic time warping, HMMs, other temporal models– Identify when a gesture has occurred, identify probable location within a
gesture, possibly also apply a label– Necessary when segmentation is non-trivial or online following is needed
27
Suggested ML reading
• Bishop, 2006: Pattern Recognition & Machine Learning. Science and Business Media, Springer
• Duda, 2001: Pattern Classification, Wiley-Interscience
• Witten, 2005: Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann
28
Suggested NIME-y reading
• Lee, Freed, & Wessel, 1992. Neural networks for simultaneous classification and parameter estimation in musical instrument control. Adaptive and Learning Systems, 1706:244–55. (early example of ML in music)
• Hunt, A. and Wanderley, M. M. 2002. Mapping performer parameters to synthesis engines. Organised Sound 7, 2, 97–108. (learning as a tool for generative mapping creation)
• Chapter 2 of Rebecca’s dissertation: http://www.cs.princeton.edu/~fiebrink/thesis/ (historical/topic overview)
• Recent publications by F. Bevilacqua & team @ IRCAM (HMMs, gesture follower)
• TODO: Nick, anything else?
30
model(s)
.01, .59, .03, ....01, .59, .03, ....01, .59, .03, ....01, .59, .03, ...
5, .01, 22.7, …5, .01, 22.7, …5, .01, 22.7, …5, .01, 22.7, …
time
time
Feature extractor(s)
Parameterizable process
Inputs: from built-in feature extractors or OSC. Outputs: control ChucK patch or go elsewhere using OSC.
The Wekinator: Running in real time
OSC
OSC
31
Brief intro to OSC
• Messages sent to host (e.g., localhost) and port (e.g., 6448)– Listener must listen on the same port
• Message contains message string (e.g., “/myOscMessage”) and optionally some data – Data can be int, float, string types– Listener code may listen for specific message strings &
data formats
32
3.3098 Class24
Wekinator: Under the hood
Model1 Model2 ModelM
Feature1 Feature2 Feature3 FeatureN…
Parameter1 Parameter2 ParameterM
…
…
joystick_x joystick_y
pitchvolume
webcam_1
33
3.3098 Class24
Under the hood
Model1 Model2 ModelM
Feature1 Feature2 Feature3 FeatureN…
Parameter1 Parameter2 ParameterM
…
…
Learning algorithms:Classification:
AdaBoost.M1J48 Decision TreeSupport vector machineK-nearest neighbor
Regression:Multilayer perceptron NNs
34
Interactive ML with Wekinator
algorithm
trainingdata
Training
model
inputs
outputsRunning
“Gesture 1” “Gesture 2” “Gesture 3”
“Gesture 1”
35
Interactive ML with Wekinator
algorithm
trainingdata
Training
model
inputs
outputsRunning “Gesture 1”
“Gesture 1” “Gesture 2”
creating training data
36
Interactive ML with Wekinator
algorithm
trainingdata
Training
inputs
outputsRunning
“Gesture 1” “Gesture 2”
model
“Gesture 1”
creating training data…evaluating the trained model
37
Interactive ML with Wekinator
algorithm
trainingdata
Training
model
inputs
outputsRunning “Gesture 1”
“Gesture 1” “Gesture 2” “Gesture 3”
creating training dataevaluating the trained model…
modifying training data (and repeating)
interactive machine learning