Learning Musicianship for Automatic Accompaniment

Carnegie Mellon

Learning Musicianship for Automatic Accompaniment

Gus (Guangyu) Xia Roger Dannenberg School of Computer Science Carnegie Mellon University

Carnegie Mellon

Introduction: Musical background

2

Interaction

Expression Rehearsal

Carnegie Mellon

Introduction: Technical background

3

Interaction Musicianship Score following and automatic

accompaniment

Expressive Performance

Expressive Interactive Performance

Carnegie Mellon

Introduction: Problem definition

4

¢  How to interpret the music based on the expression of human musicians?

¢  How to distill models from rehearsals? ¢  What are the limits of validity of the learned models? ¢  How many rehearsals are needed?

For interactive music performance, how can we build artificial performers that automatically improve their ability to sense and coordinate with human musicians’ expression with rehearsal experience?

We start from piano duets, focusing on expressive timing and expressive dynamics.

Carnegie Mellon

5

Outline

¢  Introduction ¢  Data Collection ¢  Methods ¢  Demos ¢  Conclusion & Future Work

Carnegie Mellon

Current Data Collection

¢  Musicians: §  10 music master students play duet pieces in 5 pairs.

¢  Music pieces: §  3 pieces of music are selected, Danny boy, Serenade

(by Schubert), and Ashokan Farewell. §  Each pair performs every piece of music 7 times.

¢  Recording settings: §  Recorded by electronic pianos with MIDI output.

6

Carnegie Mellon

7

Outline


Carnegie Mellon

8

Method Overview ¢  From “local” to “general”

§  Local: low-dimensional feature space, only apply to certain notes

§  General: high-dimensional feature space, apply to the whole piece of music

¢  Base line:

Reference time

Rea

l tim

e

Score Following Automatic Accompaniment

Reference time

Rea

l tim

e

next note’s reference time

predicted next note’s performance time

Carnegie Mellon

Method (1): Note-specific approach

¢  Idea: §  Expressive timings of the notes are linearly correlated. §  Predict the expressive timing of 2nd piano by the

expressive timing of 1st piano.

9

¢  Model:

! = [!! , !! ,… , !!]!

! = !! ,!! ,… ,!! !

!! = !!!! + !!!!!!"#$(!!)!!!

!!!!

Carnegie Mellon

Result: Note-specific approach

10

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

Score time (sec)

Tim

e re

sidu

al (s

ec)

BLNote−8Note−34

Mean Absolute Error: BL: 0.098 Note-8: 0.087 Note-34: 0.060

Carnegie Mellon

Method (2): Rhythm-specific approach

¢  Idea: §  Notes with same score rhythm context share parameters. §  Introduces an extra dummy variable to encode the score

rhythm context of each note .

11

¢  Model:

! = [!! , !! ,… , !!]!

! = !! ,!! ,… ,!! !

!! = !!!!!"!!(!!,!) + !!!!!"!!(!!,!)!!"#$(!!)!!!

!!!!

!

Carnegie Mellon

Result: Rhythm-specific approach

12

Mean Absolute Error: BL: 0.098 Rhythm-4: 0.084 Rhythm-8: 0.067

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

Score time (sec)

Tim

e re

sidu

al (s

ec)

BLRhythm−4Rhythm−8

Carnegie Mellon

Method (3): General feature approach

¢  Idea: §  Make the model more general. §  Predict the expressive timing by considering more

than score rhythm context .

13

¢  Model: ! = !!!

! = !! ,!! ,… ,!! !

! = !! ,!! ,… ,!! !

Carnegie Mellon

Regularization: Group Lasso

¢  Idea: §  Reduces the burden for training. §  Discover the dominant features that could predict the

expressive timings.

14

¢  Solve:

min!∈!!

! − !" !! + ! !! !! !

!

!!!!

Carnegie Mellon

Result: General feature approach

15

Mean Absolute Error: (ONLY 4 training pieces!) BL: 0.098 LR: 0.072 Glasso: 0.059

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

Score time (sec)

Tim

e re

sidu

al (s

ec)

BLLRGlasso

Carnegie Mellon

Method (4): LDS approach

¢  Idea: §  Add another regularization by adjacent notes. §  Lower dimensional hidden mental states that

control the expressive timings.

16

¢  Model:

zt zt+1 zt-‐1

ut-‐1 ut ut+1

yt-‐1 yt yt+1

… …

!! = !!!!! + !!! + !! !!!!!!!~!(0,!)!!! = !!! + !!! + !! !!!!!!!!~!(0,!)!

Carnegie Mellon

Result: LDS (horizontal regularization)

17

Mean Absolute Error: (ONLY 4 training pieces!) BL: 0.085 LR: 0.072 LDS: 0.067

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

Score time (sec)

Tim

e re

sidu

al (s

ec)

BLLRLDS

Carnegie Mellon

A Global View

18

BL Note−4 Rhythm−4 Glasso−4 Note−8 Rhythm−8 Glasso−8 Note−34 Rhythm−34Glasso−340

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Methods and training size

Tim

e re

sidu

al(s

ec)

SerenadeDanny boyAshokan Farewell

Carnegie Mellon

19

Outline


Carnegie Mellon

Some Initial Audio Demo ¢  Base Line:

¢  Note-specific approach: 34 training examples

¢  General feature approach, group lasso: 4 training

examples

20

Carnegie Mellon

21

Future Work

¢  Cross-piece models ¢  Performer-specific models ¢  Online learning and decoding ¢  Plugin with music robots

Carnegie Mellon

22

Conclusion

¢  An artificial performer for interactive performance ¢  Learn musicianship from rehearsal experience ¢  A combination of expressive performance and

automatic accompaniment ¢  Much better prediction just based on 4 rehearsals

Carnegie Mellon

23

Q&A

Carnegie Mellon

Spectral Learning(1): Oblique projections

24

zt zt+1 zt-‐1

ut-‐1 ut ut+1

yt-‐1 yt yt+1

… …

!(!!) = [!!! !!!! !!!!] !!!!!!!

!

¢  We don’t know the future. ¢  Partially explain future observations based on the

history !! ≝ [!!! !!!! !0] !

!!!!0!

Carnegie Mellon

Spectral Learning(2): state estimation

25

zt zt+1 zt-‐1

ut-‐1 ut ut+1

yt-‐1 yt yt+1

… … !! = !!!! ≝!!"!!!!!⋮! !!! !

!! = !Σ!! = (!Σ!!)(Σ

!!!!!)!

¢  States estimation by SVD

¢  Moreover, enforce a bottleneck by throwing out near-zero singular values and corresponding columns in U and V.

Carnegie Mellon

Spectral Learning(3): Estimate parameter

26

¢  Based on estimated hidden states, the parameters could be estimated from the following equation:

zt zt+1 zt-‐1

ut-‐1 ut ut+1

yt-‐1 yt yt+1

… …

!!!! = !!! + !!! + !! !!!!!!~!(0,!)!

!! = !!! + !!! + !! !!!!!~!(0,!)!

!!!!!! = ! !

! !!!!! + !!

!! !

Documents

Learning Musicianship for Automatic Accompaniment