BIOEN 303 Final Project Report

7/28/2019 BIOEN 303 Final Project Report

1/21

Voice Recognition:A Discourse Between Man and Machine

BIOEN 303 Final Project

8 March 2007

Andy Chang

Charlie HuangKwang Kim

Jae Hyung Lee

Ali Ziadloo

Contents:Abstract..p. 1

Introduction...p. 2Background Information.......p. 2

Methods....pp. 3-5

Results..pp. 6-9Discussion...pp. 9-11

Conclusion...p. 11

Referencesp. 11

Appendix (MATLAB code)..pp. 12-20


2/21

Abstract

The objective of this project was to develop a voice recognition program using MATLAB thataccurately identifies speakers by their voice. The program is divided into three stages: (1)

recording of the voice signal, (2) filtering of the signal, (3) analysis and comparison of the signal

with stored values in a pre-made database. In the analysis portion, formants were used to

accurately represent the password phrases power spectral density for the speaker. A techniqueusing cepstrums was also implemented to determine the pitch of the speakers voice and

accurately differentiate between male and female speakers. The comparison section of the

program differentiated between each member of this project group and also between group andnon-group members. In comparing with the database of group member voices, points were given

based on how many matches were found. A threshold score value was set, depending on the

security level wanted, and speakers were identified based on the difference between their totalscore and the threshold score. Through running several test trials, it was found that group

members were successfully identified 80% of the time, and non-group members (unknown

speakers) were identified 100% of the time. Based on these results, it was concluded that the

methods employed in this project could be improved if given more time, but are nonetheless a

good stepping stone for the development of a more advanced speaker identification system basedon voice printing.

1


3/21

IntroductionThe development of voice recognition systems began as early as the 1960s. Voice printing is abiometric method that compares the spectral content of the voice, which is uniquely defined for

each individual and therefore difficult for others to imitate. In the future, this technique may

replace normal access control means such as keys, locks, access cards, or password combinations

that unlock doors to grant access to the bearerregardless of whether or not the bearer issupposed to have access to the restricted area.

Speaker identification research continues today in the field of digital signal processing wheremany advances have been made in recent years. The concept of a human-computer interface is

gradually entering the mainstream as it has proven its usefulness for a variety of applications.

Speech recognition plays an important role in an increasing number of our daily activities, suchas speech-to-text programs and voice-activated household appliances. Through further discovery

of this emerging technology, everyone may have the opportunity to participate in the discourse

between man and machine.1

For our current design project, a digital-based speaker identification program was developed todifferentiate each member of the project team by voice. In addition, the program detects when

the speakers voice does not belong to a member of the project team, and also distinguishesbetween male and female speakers. Although much improvements can still be made, the methods

employed in our program may be used as a stepping stone for the design of more flexible and

accurate voice recognition software. Such software may then be integrated into larger structuressuch as voice-activated security systems or appliances.

Background Information

Our voice recognition program was designed to use the acoustic features of speech and further

amplify the characteristics that each individual possesses. Comparative methods using energyspectral density, speech pattern analysis, and voiceprints of pitch were employed to further

distinguish each individuals voice traits.

The energy spectral density, also known as a formant,2

is a peak in the acoustic frequencyspectrum that results from the resonant frequencies of any acoustical system. Formants are

determined by the phonetic resonant frequencies of every individuals vocal tract, so they may be

used to distinguish the energy spectrum of one voice signal from another. The information thathumans require to distinguish between vowels can be represented purely by the energy content of

the vowel sounds. The formant with the lowest frequency is calledf1, the secondf2, and the third

f3. The first two formant frequencies are enough to disambiguate individual vowels. Vowelsusually have four formant peaks, but sometimes may have up to six.

3

Pitch analysis in our program was performed by utilizing a method involving cepstrums. Acepstrum is the result of taking the Fourier Transform (FT) of the decibel spectrum:

Cepstrum = FT of the logarithm of the FT of the original signal.

The cepstrum technique operates in the domain of quefrency. Quefrency is a measure of time,

but not in the sense of a signal in the time domain. The peak in the quefrency domain indicates a

presence of harmonic pitch. This peak occurs due to the periodic harmonics in the spectrum.4

2


4/21

Methods Theory of Operation

Filtering:

After studying different passwords and analyzing their vowels, we decided to make our password

a three-word phrase: Let Me In. Using MATLABs wavrecordfunction, the speakers input

voice signal was recorded and stored as a one-dimensional sequence of data. The first step in the

filtering process was to separate the vowels from each word of the password and to determine theindices of the signal where the vowels start and end. We designed a high-pass filter by

convolving a triangular window with the voice signal to remove any background noise and to

smooth out the signal. The function that implements this filter, filterZ2, has two inputs: the

voice signal and a variable which indicates the number of vowels that the password has. Our

chosen password consisted of three vowels, although filterZ2 is flexible and the number of

vowels can be changed if the password changes.

To distinguish between the vowels and consonants in each word of the password phrase, we

normalized the signal by subtracting the mean of the signal from the signal values and taking theabsolute value of the results after dividing the signal by the maximum value. Then, we defined a

threshold cutoff value (0.2 on a scale from 0 to 1) so that the vowels would pass the thresholdand the other noisy parts would be removed from the signal by assigning them to zero. Afterfiltering the signal, the start and stop points of the vowels were found by identifying the indices

of the first and last nonzero value in each vowel segment. These indices were returned as a

matrix to be used by other functions in the next stages of our program.

Formants:

Our programs main function, recordmain2, checks if the password has three syllables, and if

so, the signal data is passed to the comparison function comparor2. In comparor2,we used our

formant-calculation function, formantgen, to get the spectrums of different voices and to

distinguish between speakers. In formantgen, the MATLAB functionpyulear is called, which

calculates the power spectral density (PSD) of the voice signal. The result of this step producedthe formant of the voice, which is the frequency spectrum caused by the vocal tract and is used to

differentiate between human voices. The order of the autoregressive model for the signal was set

to 20 by trial and error to obtain the best results possible. We converted the results into decibelsby taking the log of the power spectrum and multiplying by 10, and passed the formants back to

comparor2 for more evaluation.

Cepstrums:

Besides the formant approach, a method involving cepstrums was also used to expose any

unusual pitches in the voice. In thepitchfinder function, we determined the cepstrum of the

voice signal and filtered the cepstrum using a lowpass Butterworth filter. Next, the pitch of the

voice was found by dividing the sampling frequency by the index of the first maximum (theindices of the cepstrum are represented as quefrency). This method was used specifically todetermine the sex of the subject since most women have a higher pitch than men. If the pitch was

calculated to be higher than 185 Hz, then the speaker was determined to be female.

Data Bank and Comparison:

At this point in our program, all necessary calculations have been made to compare the input

voice signal with a pre-made data bank and to determine whether or not the speaker is a member

of the group. The data bank stores several copies of the password audio of all group members.

3


5/21

Each word of the password phrase was recorded several times separately and the formant of each

recorded vowel and its peaks were gathered. The mean of the indices and the magnitude of thepeaks of the formants of each vowel were also collected in this data bank for each group

member.

After finding the formants of the subjects voice, thepeakfinder4 function was used to obtainthe peaks of each vowels formant and to compare them with the average of the peaks of the

formant for each group member in the data bank. For each matching index, the subject gained

one point. The magnitude of the first formant peak (the ae vowel in Let) was then comparedto the corresponding magnitude value in the database. If the magnitude was within a range of one

standard deviation of the corresponding magnitude data in the database, the speaker gained two

additional points. This process was repeated for the other two vowels (ee for Me and i for

In) of the subjects voice signal. After the input voice was compared to all the entries in thedatabase, the individual scores were summed up to give the total score for each person in the

database. After testing the program several times and analyzing the scores, we set the passingthreshold value to a score of 18 so that we could get the right match without making it too hard

to pass the test.

To add more security to our program, we extended our analysis to cases where the final point is

slightly below the threshold. If the input voice signal got a total score just below the threshold,

the program moved on to the next method where the first two peaks of each vowel werecompared in terms of indices and magnitudes, and the percentage difference was calculated. If

the second method agreed with the result of the first method, the person was admitted. Any voice

input that did not pass the two tests was rejected.

Figure 1 shows a flowchart that summarizes our methods. All of our programs MATLAB

functions are presented in the Appendix.

Methods Test Protocol

To create the database for our group, each member spoke the password at least ten times and the

averages and ranges of each vowels formant index and peak were recorded and stored. Next,

each group member went through ten test trials where recordMain2 determined the identity of

the speaker each time. These tests demonstrated the voice recognition abilities of our program.

Our program was also tested on seven classmates whose voices were not stored in our database.

These tests demonstrated the security capabilities of our program; if the speaker was not a

member of our group, our program indicated so. Our pitch detection method was concurrently

tested; if the speaker was female, our program identified her as an unknown female subject.

4


6/21

Figure 1: Summary of our voice recognition program.

5


7/21

ResultsFigure 2 shows an example of the original input voice signal before filtering. Note thatbackground noise was present in the signal.

Figure 2: Original input voice signal before any filtering.

The filtered signal is shown in Figure 3. Only the words of the password were passed; all other

background noise was zeroed out. Also, as shown in the figure, the signal was normalized to

show the difference in magnitude between the words of the voice signal.

Figure 3: Filtered and normalized input voice signal.

In Figure 4, the formants for two different subjects are shown. As one can see, the location and

magnitude of the peaks were different and distinguishable. The formants of each word were

determined separately and compared with the data bank. The best matches were used to identifythe speaker.

6


8/21

Figure 4: Formant comparison between two speakers for each vowelin the password phrase.

A sample result from our pitch detection method is shown in Figure 5. The first peak in thisexample was at the index 50, and the fundamental frequency was calculated by dividing the

sampling frequency by the index of the peak (pitch = 11025/50 = 220.5 Hz).

Figure 5: Example of the cepstrum of a speakers voice forthe whole password phrase.

7


9/21

Table 1 shows the results of running 10 test trials of our program for each group member. A

successfully identified speaker is denoted by a one (1) and non-identified or wrongly-identifiedspeaker is denoted by a zero (0).

Table 1: Results from running the voice recognition program

for each group members.

Trial Ali Andy Kwang Charlie

1 0 1 1 0

2 1 0 1 1

3 1 1 1 1

4 1 0 0 1

5 1 1 0 1

6 1 1 1 1

7 1 1 1 1

8 1 1 1 0

9 1 0 1 1

10 1 0 1 1 Total

Average 90% 60% 80% 80% 80%

Std. Dev. 10% 16.3% 13.3% 13.3% 13.2%

The average percentage of successful trials among group members is shown in Figure 6. The

lowest average percentage was 60%.

0

10

20

30

40

50

60

70

80

90

100

Ali Andy Kwang Charlie

Avg.

PercentageofSuccessfulTrials

Figure 6: Average percentage of successful trials among group members.The error bars show a range of one standard deviation from the average.

Table 2 shows the results of testing our program on students who were not in our group. Formale speakers, a one (1) means that the program successfully identified the speaker as a non-

8


10/21

group member. For female speakers, a one (1) means that the program identified the speaker as

female in addition to being a non-group member.

Table 2: Results from running the voice recognition on non-group members.

Speaker Chun Alber t Adrienne Jason Joshua K imber ly Chri st ina

Successful?1 1 1 1 1 1 1

Our program successfully identified each non-group member on the first try, so we did not

conduct more than one test trial for each speaker.

Discussion

The first challenge we faced in writing our program was isolating the vowels in the spoken

password in order to accurately process each vowel to create their formants. The filter function

(filterZ2) that we designed to perform the isolation task was especially efficient because its

filtering method was simple yet powerful, as can be seen from its output in Figure 3. However,

the designed filter was not capable of overcoming an excessive amount of noise. This drawback

could have been improved by processing the signal with much more delicate and extensivefiltering methods. However, this small problem was avoided in our program by prompting the

speaker to say the password again whenever too much background noise was present.

With the individual vowels separated, the formant of each vowel showed distinctive features that

clearly characterized their sound. In addition, even the formants for the same vowel vary among

individuals. This variability and specificity of the formants allowed for high-resolution

comparison between different individuals. As is shown in Figure 4, x-axis positions and y-axismagnitudes were different for the two speakers when they both spoke the same vowels. Although

it was obvious that the formants vary between individuals, it was not easy to find the peaks thatuniquely describe the speaker because to some degree, peaks overlapped. This required an

extensive analysis of the formant pattern of our members. By conducting a thorough statisticalanalysis on a large number of voice samples obtained within our group, we managed to constructa comprehensive database that contained the characteristic peaks associated with each individual

and their variability compared with others.

The biggest challenge was designing the comparison method (comparor2) that determines the

speakers identity. This stage was especially difficult because the formants varied even for the

same person, depending on the environment the speaker was in and their physical condition.

However, we were confident that with 18 different peak positions (six from each vowel), wecould accurately distinguish between various individuals.

Finding the right threshold values was the key to the success of the comparison method. Therewere two main factors that contradicted each other, making the fine-tuning process problematic:

a higher threshold made it too hard for the speaker to pass the test even though he was a group

member; on the other hand, a threshold that was too low made the program vulnerable to falseidentification of a non-member. To overcome this dilemma, our program was designed to give

bonus points when the number of matches exceeded a certain value for each vowel. Additionally,

when there was no match for a vowel, penalties were given to the final score. Furthermore, by

separating the comparison process into two sequential levels (comparing x-axis then y-axisvalues), the program was able to produce a wider spectrum for degree of similarity. With the

9


11/21

implementation of the bonus-penalty points system, the overall comparison method successfully

determined the identity of the speaker.

Even with the method designed as described above, we still observed cases where the right

person got a final score below the threshold value. We fixed this problem in the program by

giving the speaker a second chance so as to not sacrifice the rigorous nature of the comparisonmethod. An extra precautionary step was taken by adding this additional comparison method in

series with the main comparison method. Provided that the final score was just few points below

the threshold and that the second comparison method produced the same result as the first, thedetermined identity of the speaker was confirmed.

The test trials with the members of our group resulted in an average success rate of 80%. Testtrials with non-member speakers were also conducted, and the result was 100% accurate.

However, the test was done with only seven volunteers. The test trials and the scores obtained

from each trial revealed that the accuracy was heavily influenced by the environment and the

physical condition of the speaker, and as a result, there was a big difference between the success

rates of our members. For example, as seen in Table 1, Andy was not recognized or was falselyrecognized four out of ten times, whereas the other three subjects passed the test with 80% or

90% accuracy. This disparity in accuracy between group members shows that there is still muchroom for improvement. In general, the performance was hampered when there was an excessive

amount of noise or when the subject was tired, thus having a lower than normal voice.

Although the formant comparison method was useful for distinguishing between vowels of

different individuals, it did not suggest anything about the pitch of the sound. Therefore, we

added one final method using cepstrums to our program because the most conspicuousinformation that can be extracted from the cepstrum is the fundamental frequency. We used the

cepstrum to find the overall pitch of the speakers voice and to decide the gender of the speaker.However, since our group consisted of all male members, the cepstrum method was not used as a

factor in determining the identity of the speaker.

There are many parts of our program where much improvement could have been made if moretime was allotted. Firstly, in the filtering stage, the function could have been made to be capable

of cleaning out noise components outside the normal frequency range of the human voice. We

suspect that, to some extent, the unfiltered noise interfered with the calculation of the formants.This interference might have contributed largely to the undesirable variability in formant values

that significantly reduced the accuracy of our program.

In addition, it was observed that the formants varied greatly between people for certain vowels

whereas the variability was minimal for other vowels. Choosing an appropriate password phrase

that contains the vowels that are easier to analyze seems to be the key for higher accuracy inidentifying the speaker. In the comparison stage, a more in-depth and comprehensive statistical

analysis of the formants is needed in order to make the method more reliable. Particularly,

identifying the peaks that are unique to each speaker seems to be the most essential part of the

process. In any future voice recognition design projects, we would consider each of theaforementioned problems more closely.

10


12/21

Overall, we strived to make our program code flexible so that it could be modified with ease later

on. To make editing as convenient as possible, we used variables for the important recurringvalues and made sub-methods that handled smaller tasks whenever applicable. It was especially

hard to manage the size of each function because there were many variables that had to be passed

on to other functions, so to step back and divide the whole voice recognition program into pieces

was the most difficult task to be completed. Once each sub-methods role was ironed out, thework was divided evenly among our group members, and the rest of the project came together

smoothly as a result of careful planning.

Conclusion

With the variability and the uncertainty associated with our voices, it is most likely impossible to

design an identification method based solely on a voice signal that is as accurate asfingerprinting. However, in this project, we demonstrated that to a certain degree, our voices do

convey unique features that may enable us to accurately identify the speaker. Despite the time

constraint and insufficient knowledge of our vocal system, we managed to design a voice

recognition program that can identify the speaker with 80% confidence. This indicates that if

enough effort and time was spent, the program could have beenimproved to be a viableapplication. The improvements needed mostly lie in the statistical analysis of the formants of

each vowel. Understanding the characteristics of the formants is the most crucial part of thedevelopment of a voice recognition system. In the future, with more knowledge of our vocal

systems and voices, a much improved speaker recognition systems could be designed and used

for many applications.

References1

Propper, Ryan. Speech recognition: Enabling tomorrows breakthroughs in human-computerinteraction. . Retrieved

February 16, 2007.

2 Pasich, Chris. Introduction to Speaker Identification. .

Retrieved February 16, 2007.

3Neel, Amy T. Formant detail needed for vowel identification.Acoustics Research Letters

Online. Vol. 5, Issue 4 (2004): 125-131.

4Childers, D.G., D.P. Skinner, R.C. Kemerait. The cepstrum: a guide to processing.

Proceedings of the IEEE. Vol. 65, Issue 10 (1977): 1428-1443.

11


13/21

Appendix (MATLAB code)

%r ecor dMai n2 i s t he mai n f unct i on t hat r uns our voi ce r ecogni t i on pr ogr am. %Thi s f unct i on al so cal l ed on f i l t er Z2, compar or 2, and pi t chf i nder

%Out put : i dent i f y who the speaker i s and t he scor e f or by compar i ng wi t h

%t he data bankl oad everyt hi ng. mat

%passTheSi gnal Test i s t he var i abl e t hat al l ows us t o br eak out of t he whi l e%l ooppassTheSi gnal Test = - 1;

%t hi s whi l e l oops keeps on r unni ng unt i l t he speaker has noi se- f r ee i nput %si gnal whi l e passTheSi gnal Test == - 1;

% Prompt f or passwordwavpl ay(promptPW, f sprompt) ; % Say t he passworddi spl ay( ' BEGI N I N' ) ; pause( 1) ; di spl ay( ' 3' ) ; pause( 1) ; di spl ay( ' 2' ) ; pause( 1) ; di spl ay( ' 1' ) ; pause( 1) ; di spl ay( ' GO! ' ) ; Fs = 11025; ori g_si g = wavr ecord( 5*Fs, Fs, ' doubl e' ) ; di spl ay( ' STOP! ' ) ;

pause( 1) ; di spl ay( ' ' ) ; % Cur r ent l y i nspecti ngdi spl ay( ' Cur rent l y i nspect i ng. . . ' ) ; wavpl ay( i nspect i ngPW, f si nspect i ng) ; pause(0. 5) ; wavpl ay( or i g_si g, Fs) ;

[ f i l t er ed_si g, t hr e2, passTheSi gnal ] =f i l t er Z2( or i g_si g, passTheSi gnal Test ) ;

i f passTheSi gnal ==1break

end

%speaker di d not speak wi t h a cl ear voi ce or had si gni f i cant %background noi sei f passTheSi gnal Test == - 1

di spl ay( ' Pl ease say i t agai n. . . l oud, cl ear and SLOWLY' ) end

end

12


14/21

% compar e t est Bank wi t h st ored bank.[ i dent i t y scor es i ndi ce peak] = compar or 2( or i g_si g, t hr e2, kwang_dat a,dataBank2) ;

di spl ay( ' Fi ni shed i nspecti ng. ' ) ; % pl ays t he wav f i l e "speaker i s"

wavpl ay( speaker I D, f sspeakerI D) ; pause(0. 5) ;

% cal l s t he pi t chf i nder t o det er mi ne t he pi t chpi t ch = pi t chf i nder ( or i g_si g, Fs);

i f ( pi t ch > 180) i dent i t y = 6; %i dent i t y = 6 means when pi t ch i s hi gher t han 180 Hz, we can say the%voi ce i s f r om a gi r l

end

%Tel l s who t he speaker i s

% Al i i f i dent i t y == 2

wavpl ay( al i I D, f sal i I D) ; di spl ay( ' Wel come home! ' ) ;

% Andyel sei f i dent i t y == 3

wavpl ay( andyI D, f schar l i eI D) ; di spl ay( ' Wel come home! ' ) ;

% Char l i eel sei f i dent i t y == 4

wavpl ay(char l i eI D, f skwangI D) ; di spl ay( ' Wel come home! ' ) ;

% Kwangel sei f i dent i t y == 1

wavpl ay(kwangI D, f sal i I D) ;di spl ay( ' Wel come home! ' ) ;

% J a eel sei f i dent i t y == 5

wavpl ay(j aeI D, f sj aeI D) ; di spl ay( ' Wel come home! ' ) ;

% Unknown speaker el sei f i dent i t y == 0wavpl ay( unknownI D, f sunknownI D) ; di spl ay( ' St ep away f r omt he door . The pol i ce have been not i f i ed. ' ) ;

% Unknown f emal e speaker el sei f i dent i t y == 6

wavpl ay( gi r l , gi r l I D) ; di spl ay( ' Sor ry you are a gi r l . . . ' )

end

13


15/21

% f i l t er Z2 f i l t er s and nor mal i zes the i nput si gnal .

f uncti on [ f i l t er ed_si g, t hr e2, passTheSi gnal Test ] = f i l t er Z2( passwor d,passTheSi gnal )

%% I nput s: passwor d = t he i nput si gnal whi ch i s t he voi ce si gnal% passTheSi gnal = t he var i abl e t hat can be 1 or - 1 and checks i f% t he spoken phr ase has t he same number of words as t he passwor d%% Out put s: f i l t er ed_si g = t he f i l t er ed si gnal % t hr e2 = t he matr i x t hat cont ai ns t he st ar t and end i ndeces of% t he words% passTheSi gnal Test = t hi s var i abl e al so i s 1 when t he voi ce% si gnal has same number of words and - 1 when i t doesn' t % Oper at i on: t hi s f uncti on f i l t er s t he voi ce si gnal and r et ur ns t he st ar t % and end i ndi ces of t he words%

%set t i ng t he of f set of t he si gnal t o zer omean_or i g_si g = mean( passwor d) ; password = abs( password - mean_or i g_si g) ;

%convol vi ng t he si gnal wi t h a t r i angl e wi ndow t o f i l t er t he hi gh%f r equenci est r i = t r i ang( 512) ; f i l t er ed_si g = conv( passwor d, t r i ) ;

%nor mal i zed t he si gnal by di vi di ng t he si gnal by the mazi mum val uef i l t er ed_si g = f i l t er ed_si g/ max( f i l t er ed_si g) ;

%mor e f i l t er i ng by set t i ng a t hr eshol d = 0. 2t hr e1 = f i l t er ed_si g > 0. 2;

st ar t = 0; f i ni sh = 0; f i l t ered_si g = f i l t ered_si g. *thre1;

%t he mat r i x hol d t he st ar t and end poi nt s of t he vowel st hr e2 =[ ] ;

%t hi s vari abl e makes sure t hat t he l engt h of t he vowel s ar e reasonabl el _l i mi t = 1500;

%number of wor dsnumber = 3;

%r et ur ns t he or i gi nal si gnalpassTheSi gnal Test = passTheSi gnal ;

%t he f ol l owi ng codes f i nd t he st art and end poi nt s of t he vowel s and st ore%t he i ndi ces i n thr e2f or i = 1: ( l engt h( f i l t ered_s i g) - 1) ;

i f( f i l t er ed_si g( i ) == 0 && f i l t er ed_si g( i +1) >= 0) st ar t = ( i +1) ;

el sei f ( f i l t er ed_si g( i ) >= 0 && f i l t er ed_si g( i +1) == 0)

14


16/21

f i ni sh = ( i ) ; endi f( ( f i ni sh - star t ) > l _ l i mi t )

t hr e2 = [ t hr e2; [ start f i ni sh] ] ; el se

f i l t ered_s i g( start : f i ni sh) = 0; end

end

%compar i ng t he number of number spoken wor ds wi t h t he number of wor ds i n the%passwor di f ( l ength( t hre2) == number )

passTheSi gnal Test = 1; end

% f ormant gen f i nds t he f ormant s of t he password phr ase.

f uncti on [ f ormant ] = f ormant gen( si gnal , order )

%% I nput s: si gnal = t he i nput si gnal % order = oreder of aut oregr essi ve model ( AR) used t o est i mate PSD% Out put s: f ormant = t he power spect r al densi t y (PSD) of t he si gnal i n dB% scal e

%f i ndi ng PSD of t he si gnal usi ng Yul e- Wal ker met hod wi t h speci f i ed orderf ormant = pyul ear ( si gnal , or der ) ;

%conver t i ng t he PSD t o dB scal ef ormant = 10*l og10( f ormant ) ; %pl ot PSDpl ot ( f or mant ) ;

% comparor2 compares t he f ormant s f r omt he i nput voi ce si gnal t o t he% f ormant s i n the database and det er mi nes t he i dent i t y of t he speaker . % % si gnal = i nput voi ce si gnal % i ndi ces = begi nni ng and endi ng of each vowel segment % % databank = database of i ndex and magni t ude of al l t he peaks of t he% f ormant s of each vowel obt ai ned f r om t he members of our gr oup% % databank2 = dat abase of i ndex and magni t ude of t he f i r st t wo peaks of t he% f ormant s

f uncti on [ i dent i t y scores i ndi ce peak] = comparor2( si gnal , i ndi ces, databank,dat abank2)

% Al l owabl e r ange i n peak posi t i on compar i son. Det ermi nes t he secur i t y l evel . n2 = 1;

% Number of vowel s i n the voi ce si gnal n = l engt h( i ndi ces( : , 1) ) ;% I dent i t y of t he speaker . Set t o be undeter mi ned ( denoted as - 1)

15


17/21

i dent i t y = - 1;i f n < 3 | | n > 3 I f more t han 3 syl l abl es, i dent i t y r emai ns undet ermi ned. %

i dent i t y = - 1; el se

[ r ow col umn] = si ze( dat abank) ; % Si ze of dat abank i s obt ai ned.scores = [ ] ; % scores f or al l t hr ee vowel s and al l member si ndi ce = [ ] ; % i ndi ces of t he peaks i n t he f or mant s of t he i nput si gnal . peak = [ ] ; % magni t udes of t he peaks i n t he f ormant s of t he i nput si gnal .

% Repeat t he process as many t i mes as t he number of vowel s i n t he i nput% si gnal . f or i = 1: n

% Gets t he segment t hat cont ai ns a si ngl e vowel . si gnal _seg = si gnal ( i ndi ces(i , 1) : i ndi ces(i , 2) ) ;

% Gets t he f ormant f r om t he segment . f ormant = f ormant gen( si gnal _seg, 20) ;

% Gets i ndi ces and magni t udes of al l peaks of t he f ormant . [ i ndex peaks] = peakmast er3( f ormant , 2) ;% Saves t he i ndi ces f or vi ewi ng pur pose. i ndi ce = [ i ndi ce i ndex' ] ;

% Saves t he magni t udes f or vi ewi ng pur pose. peak = [ peak peaks' ] ;

i f l engt h( i ndex) < 3 % I f t her e ar e l ess t han 3 peaks i n one f ormant i dent i t y = - 1; % set t he i dent i t y as undet er mi ned andbreak; % br eak

end

% scor es obt ai ned by compar i ng t he f ormant s of t he i nput si gnal

% wi t h t he f or mant s of di f f er ent peopl e f or t he sel ect ed vowel . score = [ ] ;

% r epeat as many t i mes as t he number of r ows i n dat abankf or j = 1: r ow

s = 0; % saves poi nt s f r om x- axi s compar i sonp = 0; % saves poi nt s f r om y- axi s compar i son

% r epeat as many t i mes as t he number of peaks i n the f ormant f or k = 1: l engt h( i ndex)

% r epeat as many t i mes as t he number of peaks f or t he vowel

% of t he person i n databankf or l = 1: l engt h( dat abank{j , i }( : , 1) )

% i f t he x- axi s posi t i ons mat chi f( i ndex( k) = ( dat abank{j , i }( l , 1) -cei l ( dat abank{j , i }( l , 2) ) ) )

% Add 1 t o the t ot al scor e f or t he x- axi s compari sons = s + 1;

16


18/21

% I f t he magni t ude of t he peak t hat passed t he x- axi s% compar i son f al l s i n t he r ange i n y- axi si f peaks( k) = ( dat abank{j , i }( l , 3) - n2 - ( dat abank{j , i }( l , 4) ) )

% Add 2 t o the t ot al scor e f or t he y- axi s% comapr i sonp = p + 2;

endend

endend

i f ( l - s ) < 3 % I f t her e ar e l ess t han 3 mi smatches i n x- axi ss = s + 1; % bonus 1 poi nt

el sei f ( l - s ) 5 % I f t her e ar e more t han 3 matches i n y- axi s

p = p + 3; % bonus 3 poi nt sel sei f p > 7 % i f t her e ar e more t han 4 matches i n y- axi s

p = p + 4; % bonus 4 poi nt sends = s + p; % Fi nal scor escor e =[ scor e; s] ; % Save t he f i nal scores f or each vowel

endscores = [ scores score] ; % Save t he f i nal scor es f or al l t he vowel s

end

end

sums = [ ] ; % sum f or each per son i n t he database

% By summi ng each row i n scor es, t he t otal score f or each person i n% database i s obt ai ned. f or z = 1: l engt h( scor es( : , 1) )

i f l engt h( f i nd( scor es( z, : ) ==0) ) > 0sums = [ sums; ( sum( scores( z, : ) ) - 3) ] ;

el sesums = [ sums; sum( scor es( z, : ) ) ] ;

endend

% Fi nd ent r i es i n dat abase t hat exceeds t he thr eshol d. absol ut e = f i nd( sums >= 18) ;

% Fi nd ent i r es t hat mi ght need a second chancenotsur e = f i nd( sums >= 10) ;

17


19/21

% Run t he second compar i son met hod[ i dent i t y2 poi nt s aver ages] = compar or ( si gnal , i ndi ces, dat abank2) ; % I f any one of t he ent r i es i n dat abase exceeds t he t hr eshol di f l engt h( absol ut e) > 0

i f l engt h( absol ut e) > 1 % I f t her e i s more than one match% The one who got t he maxi mum scor es i s det ermi ned as t he mat ch. i dent i t y = f i nd( max( sums) == sums) ;

el se% I f t her e i s onl y one, i dent i t y i s det er mi ned. i dent i t y = absol ut e( 1) ;

end% I f no one passed t he thr eshol d, but some were cl oseel sei f l engt h( notsur e) > 0

% I f t he second compar i son method agr ees wi t h any one of t he candi dat esi f l engt h( f i nd( not sur e == i dent i t y2) ) > 0

% I dent i t y i s det er mi ned. i dent i t y = not sur e( f i nd( not sur e == i dent i t y2) ) ;

el se% I f t he above cr i t er i a ar e not met , t he speaker i s an unknown

% person. i dent i t y = 0;

endel se

i dent i t y = 0; % I f t here' s no one above 10, t he speaker i s unknown. end

%pi t chf i nder i s cal l ed by r ecor dMai n2 t o f i nd t he pi t ch of t he si gnal

%Thi s f unct i on f i nds t he aver age pi t ch of t he speaker %I nput = si gnal ( or i gi nal wave si gnal ) , f s ( sampl i ng f r equency) %Out put = Pi t ch ( i n Hz) and cepst r um spect r umf uncti on [ pi t ch, cepst r um] = pi t chf i nder ( si gnal , f s)

%r ceps i s a mat l ab f unct i on. I t does t he f our i er t r ansf or m of t he%l og of t he f our i er t r ansf or m of t he or i gi nal si gnal si gnal _ceps = r ceps( si gnal ) ;

%Set t i ng t he l i mi t t hr eshol d f or t he human voi ce pi t chupper l i mi t = 300; l owerl i mi t = 70; t hr eshol d = r ound( f s/ upper l i mi t ) ; l i mi t = r ound( f s/ l ower l i mi t ) ;

%t aki ng t he si gnal f r omt he speci f i c set of t he cepst r um- ed si gnal cepst r um_or i g = si gnal _ceps( t hr eshol d: l i mi t ) ;

%appl yi ng a but t erwort h f i l t er on t he cepst r um- ed si gnal [ b a] = but t er ( 10, 0. 1) ; cepstr um = f i l t f i l t ( b, a, cepstr um_or i g) ;

%pi t ch i s t he sampl i ng f r equency di vi ded by t he maxi mum peaks i n t he%cepst r umdomai n. Max peak i n t he cepst r um domai n i s t he f undenment al %f r equencypi t ch = f s/ ( f i nd( max( cepst r um_or i g) == cepst r um_or i g) +t hr eshol d) ;

18


20/21

%peakmast er3 f i nds t he peaks of t he i nput si gnal .

%i nput : si gnal of t he wave t hat we want t o f i nd peaks f or, Thol d i s t he%t hr eshol d val ue

%out put : gi ves t he i ndex and t he hei ght of t he peaks

f uncti on [ Ta Tb] = peakmast er3( si gnal , Thol d)

%st ar t i ng out wi t h t hr eshol d val uev = 0; x = 1; T=[ ] ; pr ev = si gnal ( 1) ; mi nV = mi n(si gnal ) ; max = [ 0, mi nV] ; l = l engt h( si gnal ) ;

%l oop t hat compar es each val ue. . . i f t he next val ue i s hi gher , t hen i t %i s count ed as a peakwhi l e x < l

i f si gnal ( x) > max( 2) && si gnal ( x) > pr evmax = [ x, si gnal ( x)] ; v = v + si gnal ( x) - pr ev; el se i f ( max(2) ~= mi nV) && ( si gnal ( x) < ( max(2) - Thol d) )

i f v > Thol d- 3. 9T = [ T; max] ;

endmax( 2) = mi nV; v = 0;

endendpr ev = si gnal ( x); x = x+1;

end

% comparor t akes an i nput voi ce si gnal and ext r act s segment s t hat cont ai n% vowel s. Then gets t he f ormant s f or each vowel and compares t he f i r st t wo% peaks wi t h t he data i n dat abank.

% si gnal = i nput voi ce si gnal % i ndi ces = begi nni ngs and endi ngs of each vowel % databank = database of t he f i r st t wo peaks of t he f ormant of each vowel % f or al l t he member s of our gr oup.

f uncti on [ i dent i t y scor es aver ages] = compar or ( si gnal , i ndi ces, dat abank)

scores = [ ] ; % Fi nal scor es f or al l t hr ee vowel s f or al l t he member s

n = l engt h( i ndi ces( : , 1) ) ; % number of vowel s i n t he i nput si gnal i dent i t y = - 1; % i dent i t y i s i ni t i al l y set t o be undet er mi ned

19


21/21

% r epeat s t he compar i son pr ocess as many t i mes as t he number of vowel s i n% t he i nput si gnal f or i = 1: n

si gnal _seg = si gnal ( i ndi ces(i , 1) : i ndi ces(i , 2) ) ; % get s t he nth segment f ormant = f ormant gen( si gnal _seg, 20) ; % get s t he f ormant of t he segment [ i ndex peaks] = peakmast er3( f ormant , 2) ; % f i nds t he peaks i n t he f ormant i ndex = i ndex' ; peaks = peaks' ;

% I f t her e ar e l ess t han t hr ee peaks, i dent i t y i s undet er mi ned and% br eak out . i f l engt h( i ndex) < 3

i dent i t y = - 1; break;

end

vowel scor e =[ ] ; % scor es f or al l ent r i es f or t he vowel bei ng compar ed.f or k = 1: l engt h( dat abank( : , 1) )

% per cent age di f f er ence of t he x- axi s posi t i ons and t he y- axi s% posi t i ons of t he two peakss = abs( ( i ndex( 1, 1) - dat abank(k, ( 4*i - 3) ) ) / dat abank( k, ( 4*i - 3) ) ) ; s = s + abs( ( peaks( 1, 1) - dat abank(k, ( 4*i - 2) ) ) / dat abank(k, ( 4*i - 2) ) ) ; s = s + abs( ( i ndex( 1, 2) - dat abank( k, ( 4*i - 1) ) ) / dat abank( k, ( 4*i - 1) ) ) ; s = s + abs( ( peaks( 1, 2) - dat abank(k, ( 4*i ) ) ) / dat abank(k, ( 4*i ) ) ) ;

vowel score = [ vowel score; s] ; % sums t he percent age di f f erencesend

scores =[ scores vowel score] ; % saves al l t he sums

end

% comput es aver age of t he per cent age di f f er ences i n t he t hr ee vowel s f or % each personaver ages = [ ] ; f or h = 1: l engt h( scor es( : , 1) )

aver age1 = abs( ( scor es( h, 1) +scor es( h, 2) +2*scores( h, 3) ) / 4) ; aver ages = [ aver ages; aver age1] ;

end

% I f t he mi ni mum aver age per cent age di f f er ence i s bel ow 1 ( t hr eshol d) , t he% speaker i s i dent i f i ed.i f mi n( aver ages) < 1

i dent i t y = f i nd( averages == mi n( averages) ) ; el se

i dent i t y = 0; end

Documents

BIOEN 303 Final Project Report