Upload
dulcie-sharp
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Non-linear speech processing: overview of COST-277 current research
1
Nonlinear speech processing (NOLISP)
Overview of COST-277 current research
Marcos Faúndez-Zanuy
COST-277 Chairman
Non-linear speech processing: overview of COST-277 current research
2
OUTLINE
1. Overview: what means “nonlinear”?
2. Organization of COST-277
3. Report activity june’01 – june’03
Non-linear speech processing: overview of COST-277 current research
3
OUTLINE
1. Overview: what means “nonlinear”?
2. Organization of COST-277
3. Report activity june’01 – june’03
Non-linear speech processing: overview of COST-277 current research
4
What means “Non-linear”? (Strict sense)
Superposition principle does not hold:
Given: f(x1)=y1, f(x2) =y2 =>
f(ax1)=ay1, f (x1 +x2) =y1+y2
Non-linear speech processing: overview of COST-277 current research
5
What means “Non-linear”? Strict sense: Really almost “everything” is nonlinear
Acquisition Parameterization Models
Quantizer (linear, A-law, etc.)
Cepstrum HMM, VQ
-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8
outp
ut
input
Uniform 3 bits quantizer
-4 -3
-2 -1
0 1
2 3
)(log)( 1 nxFFnxcepstrum
Non-linear speech processing: overview of COST-277 current research
6
Non-linearities are always present
Nonlinearities of the systems that generate the signal and/ or noise
Nonlinearities of the signal acquisition system
Nonlinearities of the transmission channel Nonlinearities of the human perception
mechanism.
Non-linear speech processing: overview of COST-277 current research
7
Classical approachWide sense: linear speech processing
Speech signal model consists of a pulse/ noise source and a linear filter where both change their characteristics on a frame-by-frame basis.
This approach neglects structure known to be present in the speech signal.
Non-linear speech processing: overview of COST-277 current research
8
Evidences of nonlinearities
Residue comparison Correlation dimension Higher order statistics Probability density functions
Non-linear speech processing: overview of COST-277 current research
9
Example: Linear vs NL
Non-linear speech processing: overview of COST-277 current research
10
Drawbacks with NOLISP approaches
A lack of a unifying theory of the different nonlinear processing tools (nnets, homomorphic, polynomial, morphological, ordered statistics filters, and so on)
High computational burden Well known analysis tools are not applicable Usually, a closed-form formulation does not exist,
and iterative methods (with local minima problems) must be used.
Non-linear speech processing: overview of COST-277 current research
11
What are we mainly looking for?
The replacement of the linear filter (or parts thereof) with nonlinear operators (models) should enable us to obtain an accurate description of the speech signal with a lower number of parameters. This in turn should lead to better performance of practical speech processing applications.
Non-linear speech processing: overview of COST-277 current research
12
OUTLINE
1. Overview: what means “nonlinear”?
2. Organization of COST-277
3. Report activity june’01 – june’03
Non-linear speech processing: overview of COST-277 current research
13
What is COST ?
Intergovernmental Cooperation– Created in 1971– 17 Scientific and Technical Domains
Participation– 33 COST Countries– European Commission– International Organisations – Organizations from Non-COST Countries on Mutual
Benefit Basis COST Actions
– Concerted Actions of Nationally Funded R&D
Non-linear speech processing: overview of COST-277 current research
14
COST TISTCOST TISTTelecommunications,Telecommunications,Information ScienceInformation Scienceand Technologiesand Technologies
Non-linear speech processing: overview of COST-277 current research
15
COST CountriesThe fifteen EU Member States
The EFTA Member States
Iceland
Norway
Switzerland
Central and Eastern countries
Estonia
Latvia
Lithuania
Poland
the Czech republic
Slovakia
Slovenia
Croatia
Romania
Bulgaria
Other countries
Cyprus
Malta
Turkey
Hungary
Non-linear speech processing: overview of COST-277 current research
16
Evolution of COST Actions
0
50
100
150
200
250
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00
Total Actions
Starting Actions
Non-linear speech processing: overview of COST-277 current research
17
WHAT IS A COST ACTION?
Concerted Action Pan-European “NON-COMPETITIVE” Research R&D Financed Nationally Flexibility Bottom-up A la carte participation Commission funds only coordination activities
Non-linear speech processing: overview of COST-277 current research
18
COST Senior Officials (CSO)
Responsible for the overall strategy of COST
Decides on the launching of each individual COST Action
Approves participation from non-COST countries institutes
Approves prolongation of COST Actions
Non-linear speech processing: overview of COST-277 current research
19
COST Technical Committee (TC)
Selection of new COST Actions
Monitoring of ongoing COST Actions
Evaluation of completed COST Actions
Dissemination and Valorisation of COST activities
Provide Advice to EC on Budget Planning
Non-linear speech processing: overview of COST-277 current research
20
Management Committee (MC)
Supervises and coordinates the implementation of the Action
Composed of :– Maximum two representatives of each signatory
country they ensure the scientific coordination at national level
– One representative of any non-COST institution admitted to participate
– The Scientific Secretary– Representatives of the Commission services
Each signatory has one vote
Non-linear speech processing: overview of COST-277 current research
21
Working Group (WG)
Small number of researchers per working group
Working group members may be:
– Management Committee members
– Other scientists from the signatory countries
Non-linear speech processing: overview of COST-277 current research
22
COST TIST
~ 28 Actions, ~ 2000 Organisations Covering Basic Research on
– Antennas and Radio Propagation– Satellite Technologies and Services– Mobile Technologies and Services– Optical Networking Components and Services– Internet & Multimedia Network Services– Speech Technologies– Information and Computer Science
Strong Relationship with IST Program
Non-linear speech processing: overview of COST-277 current research
23
Evolution of COST Evolution of COST TIST ActionsTIST Actions
0
5
10
15
20
25
30
1996 1998 2000
Total Actions
StartingActions
Non-linear speech processing: overview of COST-277 current research
24
Special Needs & User Requirements
COST 219bis,
269
COST TISTResearch Domains & Actions
Antennas/Radio PropagationCOST 244bis, 255,
260, 261, 271
Mobile & Personal Comm.
COST 259, 273Satellite
Tech. & Services
COST 272
Optical Networking
COST 265, 266, 267, 268, 270
New Internet & Multimedia Services COST 211 Quad, 256,
257, 263, 264, 269, 275, 279
Speech Technologies
COST 258, 277, 278
Information & Computer Science
COST 274, 276
Non-linear speech processing: overview of COST-277 current research
25
Other COST Actions in Speech Technologies
COST 275: Biometrics-Based Recognition of People over the Internet – Involves the use of both voice and face recognition
for user authentification over the Internet COST 278: Spoken Language Interaction in
Telecommunications– Improve knowledge regarding issues and problems
related to spoken language interaction, including robustness and multi-lingual aspects
– Human-computer interaction using spoken language in multi-modal context, including dialoque theories and application evaluation
Non-linear speech processing: overview of COST-277 current research
26
Relationship between COST Actions 275, 277 and 278
275: Biometrics based Recognition of People
over the Internet
277: Non-linear Speech Processing
278: Spoken LanguageInteraction in
Telecommunication
Speaker
Recognition
Speech
Recognition
Natural
Language
Processing
Multi
Modality &
Data Fusion
Speech
Analysis & Coding
Image
Analysis &
Graphics
Speech
SynthesisDialogue
Application Fields
Interface Components
Generic Functions
Non-linear speech processing: overview of COST-277 current research
27
GRANT CONTRACTS COST TIST support is provided through annual
Grant Contracts with coordinating organisation Contract covers costs for:
– Secretariat (manpower to cover administration)– Meetings (WG and MC)– Seminars and workshops– Short Term Scientific Missions– Publications
Non-linear speech processing: overview of COST-277 current research
28
SECRETARIAT Contract Management, Payments Reimbursement of Meetings Rebuilding of WWW site
– Repository of Official Documents– TC and Action Activities and Events
Enhancing Dissemination– News Letter– Central Index and Storage of Reports for Retrieval
Links with EC (IST) and National Programmes
Non-linear speech processing: overview of COST-277 current research
29
Overview:COST-277
DISCRETE MODELS
SY
NT
HE
TIC
SP
EE
CHH
UM
AN
SP
EE
CH
CODED SPEECH
WRITTEN SPEECH
TtS
StT
StC
CtS
Analysis SynthesisR
ecogn.
Cod
ing
© u
kl 2
002
Non-linear speech processing: overview of COST-277 current research
30
Organization
Chair: Marcos Faúndez Vice-Chair: Gernot Kubin Secretary: Stephen McLaughlin
– WG1: Bastiaan Kleijn– WG2: Bojan Petek– WG3: Stephen McLaughlin– WG4: Gerard Chollet
Non-linear speech processing: overview of COST-277 current research
31
Countries
Austria Belgium Czech Republic France Germany Greece Ireland Italy Lithuania Portugal Slovakia Slovenia Spain Sweden Switzerland UK
Canada
Non-linear speech processing: overview of COST-277 current research
32
Dissemination of info
e-mail distribution list:
Subscribe/unsubscribe [email protected]
Website:
http://www.ee.ed.ac.uk/cost277/
Non-linear speech processing: overview of COST-277 current research
33
Future Meetings of the management committee
Non-linear speech processing: overview of COST-277 current research
34
Publications and reports
International Journal of control and intelligent systems, special issue on Non-linear Speech processing techniques and applications ACTAPRESS. Invited editor: A. Hussain (COST-277 MC member)
Special sessions in EUSIPCO’02, IWANN’01, IWANN’03, EUSIPCO’04 (TBC)
Non-linear speech processing: overview of COST-277 current research
35
COST Actions in Speech Technologies
COST 275: Biometrics-Based Recognition of People over the Internet – Involves the use of both voice and face recognition for user
authentification over the Internet COST 277: Nonlinear speech processing COST 278: Spoken Language Interaction in
Telecommunications– Improve knowledge regarding issues and problems related
to spoken language interaction, including robustness and multi-lingual aspects
– Human-computer interaction using spoken language in multi-modal context, including dialoque theories and application evaluation
Non-linear speech processing: overview of COST-277 current research
36
Relationship between COST Actions 275, 277 and 278
275: Biometrics based Recognition of People
over the Internet
277: Non-linear Speech Processing
278: Spoken LanguageInteraction in
Telecommunication
Speaker
Recognition
Speech
Recognition
Natural
Language
Processing
Multi
Modality &
Data Fusion
Speech
Analysis & Coding
Image
Analysis &
Graphics
Speech
SynthesisDialogue
Application Fields
Interface Components
Generic Functions
Non-linear speech processing: overview of COST-277 current research
37
COST-277: A different approach
“The four classical areas of speech processing:
Speech Recognition (Speech-to-Text, StT)
Speech Synthesis (Text-to-Speech, TtS and Code-to-Speech, CtS)
Speech Coding (Speech-to-Code, StC with CtS) and
Speaker Verification and Identification (SV)
have all developed their own methodology almost independently from the neighboring areas. This has led to a plurality of tools and methods that are hard to integrate to any small multifunctional speech processing system (a mobile phone performing speaker verification and continuous speech recognition in addition to speech coding should have many separate processes running in parallel).
Non-linear speech processing: overview of COST-277 current research
38
Relations between different fields
DISCRETE MODELS
SY
NT
HE
TIC
SP
EE
CHH
UM
AN
SP
EE
CH
CODED SPEECH
WRITTEN SPEECH
TtS
StT
StC
CtS
Analysis SynthesisR
ecogn.C
odin
g
© u
kl 2
002
Non-linear speech processing: overview of COST-277 current research
39
COST277Non-linear speech processing
PROGRESS REPORT
Period: from (June-2001) to (June-2003)
Speech coding 40
LINEAR PREDICTION
Scalar linear prediction AR modeling of order P : where ai are the scalar prediction coefficients.
obtained with the levinson-durbin recursion.
Vectorial linear prediction AR-vector modeling of order P: where are matrices
P
ii neinxanx
1
neinxAnxP
ii
1
PiiA ,1mm
Speech coding 41
NL SCALAR PREDICTION WITH NNET
input layer
hidden layer
output layer
x[n-1]x[n-p] x[n-p+1]inputs: x[n]
output
Speech coding 42
NLVECTORIAL PREDICTION WITH NNET
input layer
hidden layer
output layer
inputs:
outputs
x[n-p] x[n-p+1] x[n-1]
x[n] x[n+1]
Speech coding 43
ADPCM NNET PREDICTION
Q
Q -1
MLP1
x[n]
+ -
d[n]
xN[n] ~
d[n] ~
c[n]
x[n] ^ MLP2
MLPN
x1[n] ~ C
OM
.
x[n] ~
Speech coding 44
VECTORIAL NL-ADPCM RESULTS
1 1.5 2 2.5 3 3.5 46
8
10
12
14
16
18
20
22
24
26
bits per sample
SE
GS
NR
1D2D3D4D5D
Non-linear speech processing: overview of COST-277 current research
45
Very low bit rate speech coder
Demonstration !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Non-linear speech processing: overview of COST-277 current research
46
Broadcast news audio segmentation,
classification, clustering and speech recognition
Demonstration
demo
Available at http://193.126.86.80
Non-linear speech processing: overview of COST-277 current research
47
SPEAKER RECOGNITION
Current systems rely on low-level information in speech.– Short time extent analysis windows (20-30 ms)– Spectral energy based (MFCC)
Another possibility: High level information– Speaking rate– Pitch patterns– Word/ Phrase usage– Idiosyncratic pronunciation
Non-linear speech processing: overview of COST-277 current research
48
SPEAKER RECOGNITION:Possibilities of NOLISP
Low level information:– Non-linear predictive models instead of LPCC– Parameters: Fractal, Lyapunov exponents,
correlation dimension, etc. High level information:
– To take advantage of the other working groups. For instance intonation is fundamental in speech synthesis and useful for speaker recognition.
Non-linear speech processing: overview of COST-277 current research
49
Why to use NL-models?
Listening to the residual signal of an LPC analysis it is possible to identify who is speaking.– Usually the residual signal is discarded.– NL models offer a better fit and whiter
residual signal. NL models can offer an improvement in
coding and synthesis, so there is room for speaker recognition improvement.
Non-linear speech processing: overview of COST-277 current research
50
BANDWIDTH EXTENSION:An example of NL processing
A speech signal that has passed through the public switched telephony network (PSTN) has generally a limited frequency range between 0.3 and 3.4 kHz.
The Bandwidth extension algorithms aim at recovering the lost low- (0 - 0.3 kHz) and/or high- (3.4 –8 kHz) frequency band given the narrow-band speech signal
Non-linear speech processing: overview of COST-277 current research
51
SPECTRAL BAND REPLICATION
0 fs/4 fs/2
0 fs/4 fs/2fs/8
0 fs/4 fs/2
0 fs/4 fs/2
initial
final
f [kHz]5 10
LPF
Non-linear speech processing: overview of COST-277 current research
52
BANDWIDTH EXTENSION
Databases:– Original fullband: [0.3, 7] kHz
– Narrow band: [0.3, 3.4] kHz
– Bandwidth extended: [0.3, 7] kHz
LPF
Bandwidth extension
Non-linear speech processing: overview of COST-277 current research
53
MIC database:DCF for several MELCEPS-l
8 10 12 14 16 18 20 22 24 260.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
l
DC
FMELCEPS
[0, 8] kHz[0.3, 3.4] kHz
[0.3, 8] kHz BWext
Non-linear speech processing: overview of COST-277 current research
54
Bandwidth extension
For human beings it’s more easy to recognize using full band signals.
No new information is added Experimental results reveal that:
– The bandwidth extension algorithm does not introduce any damaging artifacts
– With MELCEPS parameterization, the results are better than using the narrow band signal.