BIOEN 303 Final Project Report

Embed Size (px)

Citation preview

  • 7/28/2019 BIOEN 303 Final Project Report

    1/21

    Voice Recognition:A Discourse Between Man and Machine

    BIOEN 303 Final Project

    8 March 2007

    Andy Chang

    Charlie HuangKwang Kim

    Jae Hyung Lee

    Ali Ziadloo

    Contents:Abstract..p. 1

    Introduction...p. 2Background Information.......p. 2

    Methods....pp. 3-5

    Results..pp. 6-9Discussion...pp. 9-11

    Conclusion...p. 11

    Referencesp. 11

    Appendix (MATLAB code)..pp. 12-20

  • 7/28/2019 BIOEN 303 Final Project Report

    2/21

    Abstract

    The objective of this project was to develop a voice recognition program using MATLAB thataccurately identifies speakers by their voice. The program is divided into three stages: (1)

    recording of the voice signal, (2) filtering of the signal, (3) analysis and comparison of the signal

    with stored values in a pre-made database. In the analysis portion, formants were used to

    accurately represent the password phrases power spectral density for the speaker. A techniqueusing cepstrums was also implemented to determine the pitch of the speakers voice and

    accurately differentiate between male and female speakers. The comparison section of the

    program differentiated between each member of this project group and also between group andnon-group members. In comparing with the database of group member voices, points were given

    based on how many matches were found. A threshold score value was set, depending on the

    security level wanted, and speakers were identified based on the difference between their totalscore and the threshold score. Through running several test trials, it was found that group

    members were successfully identified 80% of the time, and non-group members (unknown

    speakers) were identified 100% of the time. Based on these results, it was concluded that the

    methods employed in this project could be improved if given more time, but are nonetheless a

    good stepping stone for the development of a more advanced speaker identification system basedon voice printing.

    1

  • 7/28/2019 BIOEN 303 Final Project Report

    3/21

    IntroductionThe development of voice recognition systems began as early as the 1960s. Voice printing is abiometric method that compares the spectral content of the voice, which is uniquely defined for

    each individual and therefore difficult for others to imitate. In the future, this technique may

    replace normal access control means such as keys, locks, access cards, or password combinations

    that unlock doors to grant access to the bearerregardless of whether or not the bearer issupposed to have access to the restricted area.

    Speaker identification research continues today in the field of digital signal processing wheremany advances have been made in recent years. The concept of a human-computer interface is

    gradually entering the mainstream as it has proven its usefulness for a variety of applications.

    Speech recognition plays an important role in an increasing number of our daily activities, suchas speech-to-text programs and voice-activated household appliances. Through further discovery

    of this emerging technology, everyone may have the opportunity to participate in the discourse

    between man and machine.1

    For our current design project, a digital-based speaker identification program was developed todifferentiate each member of the project team by voice. In addition, the program detects when

    the speakers voice does not belong to a member of the project team, and also distinguishesbetween male and female speakers. Although much improvements can still be made, the methods

    employed in our program may be used as a stepping stone for the design of more flexible and

    accurate voice recognition software. Such software may then be integrated into larger structuressuch as voice-activated security systems or appliances.

    Background Information

    Our voice recognition program was designed to use the acoustic features of speech and further

    amplify the characteristics that each individual possesses. Comparative methods using energyspectral density, speech pattern analysis, and voiceprints of pitch were employed to further

    distinguish each individuals voice traits.

    The energy spectral density, also known as a formant,2

    is a peak in the acoustic frequencyspectrum that results from the resonant frequencies of any acoustical system. Formants are

    determined by the phonetic resonant frequencies of every individuals vocal tract, so they may be

    used to distinguish the energy spectrum of one voice signal from another. The information thathumans require to distinguish between vowels can be represented purely by the energy content of

    the vowel sounds. The formant with the lowest frequency is calledf1, the secondf2, and the third

    f3. The first two formant frequencies are enough to disambiguate individual vowels. Vowelsusually have four formant peaks, but sometimes may have up to six.

    3

    Pitch analysis in our program was performed by utilizing a method involving cepstrums. Acepstrum is the result of taking the Fourier Transform (FT) of the decibel spectrum:

    Cepstrum = FT of the logarithm of the FT of the original signal.

    The cepstrum technique operates in the domain of quefrency. Quefrency is a measure of time,

    but not in the sense of a signal in the time domain. The peak in the quefrency domain indicates a

    presence of harmonic pitch. This peak occurs due to the periodic harmonics in the spectrum.4

    2

  • 7/28/2019 BIOEN 303 Final Project Report

    4/21

    Methods Theory of Operation

    Filtering:

    After studying different passwords and analyzing their vowels, we decided to make our password

    a three-word phrase: Let Me In. Using MATLABs wavrecordfunction, the speakers input

    voice signal was recorded and stored as a one-dimensional sequence of data. The first step in the

    filtering process was to separate the vowels from each word of the password and to determine theindices of the signal where the vowels start and end. We designed a high-pass filter by

    convolving a triangular window with the voice signal to remove any background noise and to

    smooth out the signal. The function that implements this filter, filterZ2, has two inputs: the

    voice signal and a variable which indicates the number of vowels that the password has. Our

    chosen password consisted of three vowels, although filterZ2 is flexible and the number of

    vowels can be changed if the password changes.

    To distinguish between the vowels and consonants in each word of the password phrase, we

    normalized the signal by subtracting the mean of the signal from the signal values and taking theabsolute value of the results after dividing the signal by the maximum value. Then, we defined a

    threshold cutoff value (0.2 on a scale from 0 to 1) so that the vowels would pass the thresholdand the other noisy parts would be removed from the signal by assigning them to zero. Afterfiltering the signal, the start and stop points of the vowels were found by identifying the indices

    of the first and last nonzero value in each vowel segment. These indices were returned as a

    matrix to be used by other functions in the next stages of our program.

    Formants:

    Our programs main function, recordmain2, checks if the password has three syllables, and if

    so, the signal data is passed to the comparison function comparor2. In comparor2,we used our

    formant-calculation function, formantgen, to get the spectrums of different voices and to

    distinguish between speakers. In formantgen, the MATLAB functionpyulear is called, which

    calculates the power spectral density (PSD) of the voice signal. The result of this step producedthe formant of the voice, which is the frequency spectrum caused by the vocal tract and is used to

    differentiate between human voices. The order of the autoregressive model for the signal was set

    to 20 by trial and error to obtain the best results possible. We converted the results into decibelsby taking the log of the power spectrum and multiplying by 10, and passed the formants back to

    comparor2 for more evaluation.

    Cepstrums:

    Besides the formant approach, a method involving cepstrums was also used to expose any

    unusual pitches in the voice. In thepitchfinder function, we determined the cepstrum of the

    voice signal and filtered the cepstrum using a lowpass Butterworth filter. Next, the pitch of the

    voice was found by dividing the sampling frequency by the index of the first maximum (theindices of the cepstrum are represented as quefrency). This method was used specifically todetermine the sex of the subject since most women have a higher pitch than men. If the pitch was

    calculated to be higher than 185 Hz, then the speaker was determined to be female.

    Data Bank and Comparison:

    At this point in our program, all necessary calculations have been made to compare the input

    voice signal with a pre-made data bank and to determine whether or not the speaker is a member

    of the group. The data bank stores several copies of the password audio of all group members.

    3

  • 7/28/2019 BIOEN 303 Final Project Report

    5/21

    Each word of the password phrase was recorded several times separately and the formant of each

    recorded vowel and its peaks were gathered. The mean of the indices and the magnitude of thepeaks of the formants of each vowel were also collected in this data bank for each group

    member.

    After finding the formants of the subjects voice, thepeakfinder4 function was used to obtainthe peaks of each vowels formant and to compare them with the average of the peaks of the

    formant for each group member in the data bank. For each matching index, the subject gained

    one point. The magnitude of the first formant peak (the ae vowel in Let) was then comparedto the corresponding magnitude value in the database. If the magnitude was within a range of one

    standard deviation of the corresponding magnitude data in the database, the speaker gained two

    additional points. This process was repeated for the other two vowels (ee for Me and i for

    In) of the subjects voice signal. After the input voice was compared to all the entries in thedatabase, the individual scores were summed up to give the total score for each person in the

    database. After testing the program several times and analyzing the scores, we set the passingthreshold value to a score of 18 so that we could get the right match without making it too hard

    to pass the test.

    To add more security to our program, we extended our analysis to cases where the final point is

    slightly below the threshold. If the input voice signal got a total score just below the threshold,

    the program moved on to the next method where the first two peaks of each vowel werecompared in terms of indices and magnitudes, and the percentage difference was calculated. If

    the second method agreed with the result of the first method, the person was admitted. Any voice

    input that did not pass the two tests was rejected.

    Figure 1 shows a flowchart that summarizes our methods. All of our programs MATLAB

    functions are presented in the Appendix.

    Methods Test Protocol

    To create the database for our group, each member spoke the password at least ten times and the

    averages and ranges of each vowels formant index and peak were recorded and stored. Next,

    each group member went through ten test trials where recordMain2 determined the identity of

    the speaker each time. These tests demonstrated the voice recognition abilities of our program.

    Our program was also tested on seven classmates whose voices were not stored in our database.

    These tests demonstrated the security capabilities of our program; if the speaker was not a

    member of our group, our program indicated so. Our pitch detection method was concurrently

    tested; if the speaker was female, our program identified her as an unknown female subject.

    4

  • 7/28/2019 BIOEN 303 Final Project Report

    6/21

    Figure 1: Summary of our voice recognition program.

    5

  • 7/28/2019 BIOEN 303 Final Project Report

    7/21

    ResultsFigure 2 shows an example of the original input voice signal before filtering. Note thatbackground noise was present in the signal.

    Figure 2: Original input voice signal before any filtering.

    The filtered signal is shown in Figure 3. Only the words of the password were passed; all other

    background noise was zeroed out. Also, as shown in the figure, the signal was normalized to

    show the difference in magnitude between the words of the voice signal.

    Figure 3: Filtered and normalized input voice signal.

    In Figure 4, the formants for two different subjects are shown. As one can see, the location and

    magnitude of the peaks were different and distinguishable. The formants of each word were

    determined separately and compared with the data bank. The best matches were used to identifythe speaker.

    6

  • 7/28/2019 BIOEN 303 Final Project Report

    8/21

    Figure 4: Formant comparison between two speakers for each vowelin the password phrase.

    A sample result from our pitch detection method is shown in Figure 5. The first peak in thisexample was at the index 50, and the fundamental frequency was calculated by dividing the

    sampling frequency by the index of the peak (pitch = 11025/50 = 220.5 Hz).

    Figure 5: Example of the cepstrum of a speakers voice forthe whole password phrase.

    7

  • 7/28/2019 BIOEN 303 Final Project Report

    9/21

    Table 1 shows the results of running 10 test trials of our program for each group member. A

    successfully identified speaker is denoted by a one (1) and non-identified or wrongly-identifiedspeaker is denoted by a zero (0).

    Table 1: Results from running the voice recognition program

    for each group members.

    Trial Ali Andy Kwang Charlie

    1 0 1 1 0

    2 1 0 1 1

    3 1 1 1 1

    4 1 0 0 1

    5 1 1 0 1

    6 1 1 1 1

    7 1 1 1 1

    8 1 1 1 0

    9 1 0 1 1

    10 1 0 1 1 Total

    Average 90% 60% 80% 80% 80%

    Std. Dev. 10% 16.3% 13.3% 13.3% 13.2%

    The average percentage of successful trials among group members is shown in Figure 6. The

    lowest average percentage was 60%.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    Ali Andy Kwang Charlie

    Avg.

    PercentageofSuccessfulTrials

    Figure 6: Average percentage of successful trials among group members.The error bars show a range of one standard deviation from the average.

    Table 2 shows the results of testing our program on students who were not in our group. Formale speakers, a one (1) means that the program successfully identified the speaker as a non-

    8

  • 7/28/2019 BIOEN 303 Final Project Report

    10/21

    group member. For female speakers, a one (1) means that the program identified the speaker as

    female in addition to being a non-group member.

    Table 2: Results from running the voice recognition on non-group members.

    Speaker Chun Alber t Adrienne Jason Joshua K imber ly Chri st ina

    Successful?1 1 1 1 1 1 1

    Our program successfully identified each non-group member on the first try, so we did not

    conduct more than one test trial for each speaker.

    Discussion

    The first challenge we faced in writing our program was isolating the vowels in the spoken

    password in order to accurately process each vowel to create their formants. The filter function

    (filterZ2) that we designed to perform the isolation task was especially efficient because its

    filtering method was simple yet powerful, as can be seen from its output in Figure 3. However,

    the designed filter was not capable of overcoming an excessive amount of noise. This drawback

    could have been improved by processing the signal with much more delicate and extensivefiltering methods. However, this small problem was avoided in our program by prompting the

    speaker to say the password again whenever too much background noise was present.

    With the individual vowels separated, the formant of each vowel showed distinctive features that

    clearly characterized their sound. In addition, even the formants for the same vowel vary among

    individuals. This variability and specificity of the formants allowed for high-resolution

    comparison between different individuals. As is shown in Figure 4, x-axis positions and y-axismagnitudes were different for the two speakers when they both spoke the same vowels. Although

    it was obvious that the formants vary between individuals, it was not easy to find the peaks thatuniquely describe the speaker because to some degree, peaks overlapped. This required an

    extensive analysis of the formant pattern of our members. By conducting a thorough statisticalanalysis on a large number of voice samples obtained within our group, we managed to constructa comprehensive database that contained the characteristic peaks associated with each individual

    and their variability compared with others.

    The biggest challenge was designing the comparison method (comparor2) that determines the

    speakers identity. This stage was especially difficult because the formants varied even for the

    same person, depending on the environment the speaker was in and their physical condition.

    However, we were confident that with 18 different peak positions (six from each vowel), wecould accurately distinguish between various individuals.

    Finding the right threshold values was the key to the success of the comparison method. Therewere two main factors that contradicted each other, making the fine-tuning process problematic:

    a higher threshold made it too hard for the speaker to pass the test even though he was a group

    member; on the other hand, a threshold that was too low made the program vulnerable to falseidentification of a non-member. To overcome this dilemma, our program was designed to give

    bonus points when the number of matches exceeded a certain value for each vowel. Additionally,

    when there was no match for a vowel, penalties were given to the final score. Furthermore, by

    separating the comparison process into two sequential levels (comparing x-axis then y-axisvalues), the program was able to produce a wider spectrum for degree of similarity. With the

    9

  • 7/28/2019 BIOEN 303 Final Project Report

    11/21

    implementation of the bonus-penalty points system, the overall comparison method successfully

    determined the identity of the speaker.

    Even with the method designed as described above, we still observed cases where the right

    person got a final score below the threshold value. We fixed this problem in the program by

    giving the speaker a second chance so as to not sacrifice the rigorous nature of the comparisonmethod. An extra precautionary step was taken by adding this additional comparison method in

    series with the main comparison method. Provided that the final score was just few points below

    the threshold and that the second comparison method produced the same result as the first, thedetermined identity of the speaker was confirmed.

    The test trials with the members of our group resulted in an average success rate of 80%. Testtrials with non-member speakers were also conducted, and the result was 100% accurate.

    However, the test was done with only seven volunteers. The test trials and the scores obtained

    from each trial revealed that the accuracy was heavily influenced by the environment and the

    physical condition of the speaker, and as a result, there was a big difference between the success

    rates of our members. For example, as seen in Table 1, Andy was not recognized or was falselyrecognized four out of ten times, whereas the other three subjects passed the test with 80% or

    90% accuracy. This disparity in accuracy between group members shows that there is still muchroom for improvement. In general, the performance was hampered when there was an excessive

    amount of noise or when the subject was tired, thus having a lower than normal voice.

    Although the formant comparison method was useful for distinguishing between vowels of

    different individuals, it did not suggest anything about the pitch of the sound. Therefore, we

    added one final method using cepstrums to our program because the most conspicuousinformation that can be extracted from the cepstrum is the fundamental frequency. We used the

    cepstrum to find the overall pitch of the speakers voice and to decide the gender of the speaker.However, since our group consisted of all male members, the cepstrum method was not used as a

    factor in determining the identity of the speaker.

    There are many parts of our program where much improvement could have been made if moretime was allotted. Firstly, in the filtering stage, the function could have been made to be capable

    of cleaning out noise components outside the normal frequency range of the human voice. We

    suspect that, to some extent, the unfiltered noise interfered with the calculation of the formants.This interference might have contributed largely to the undesirable variability in formant values

    that significantly reduced the accuracy of our program.

    In addition, it was observed that the formants varied greatly between people for certain vowels

    whereas the variability was minimal for other vowels. Choosing an appropriate password phrase

    that contains the vowels that are easier to analyze seems to be the key for higher accuracy inidentifying the speaker. In the comparison stage, a more in-depth and comprehensive statistical

    analysis of the formants is needed in order to make the method more reliable. Particularly,

    identifying the peaks that are unique to each speaker seems to be the most essential part of the

    process. In any future voice recognition design projects, we would consider each of theaforementioned problems more closely.

    10

  • 7/28/2019 BIOEN 303 Final Project Report

    12/21

    Overall, we strived to make our program code flexible so that it could be modified with ease later

    on. To make editing as convenient as possible, we used variables for the important recurringvalues and made sub-methods that handled smaller tasks whenever applicable. It was especially

    hard to manage the size of each function because there were many variables that had to be passed

    on to other functions, so to step back and divide the whole voice recognition program into pieces

    was the most difficult task to be completed. Once each sub-methods role was ironed out, thework was divided evenly among our group members, and the rest of the project came together

    smoothly as a result of careful planning.

    Conclusion

    With the variability and the uncertainty associated with our voices, it is most likely impossible to

    design an identification method based solely on a voice signal that is as accurate asfingerprinting. However, in this project, we demonstrated that to a certain degree, our voices do

    convey unique features that may enable us to accurately identify the speaker. Despite the time

    constraint and insufficient knowledge of our vocal system, we managed to design a voice

    recognition program that can identify the speaker with 80% confidence. This indicates that if

    enough effort and time was spent, the program could have beenimproved to be a viableapplication. The improvements needed mostly lie in the statistical analysis of the formants of

    each vowel. Understanding the characteristics of the formants is the most crucial part of thedevelopment of a voice recognition system. In the future, with more knowledge of our vocal

    systems and voices, a much improved speaker recognition systems could be designed and used

    for many applications.

    References1

    Propper, Ryan. Speech recognition: Enabling tomorrows breakthroughs in human-computerinteraction. . Retrieved

    February 16, 2007.

    2 Pasich, Chris. Introduction to Speaker Identification. .

    Retrieved February 16, 2007.

    3Neel, Amy T. Formant detail needed for vowel identification.Acoustics Research Letters

    Online. Vol. 5, Issue 4 (2004): 125-131.

    4Childers, D.G., D.P. Skinner, R.C. Kemerait. The cepstrum: a guide to processing.

    Proceedings of the IEEE. Vol. 65, Issue 10 (1977): 1428-1443.

    11

  • 7/28/2019 BIOEN 303 Final Project Report

    13/21

    Appendix (MATLAB code)

    %r ecor dMai n2 i s t he mai n f unct i on t hat r uns our voi ce r ecogni t i on pr ogr am. %Thi s f unct i on al so cal l ed on f i l t er Z2, compar or 2, and pi t chf i nder

    %Out put : i dent i f y who the speaker i s and t he scor e f or by compar i ng wi t h

    %t he data bankl oad everyt hi ng. mat

    %passTheSi gnal Test i s t he var i abl e t hat al l ows us t o br eak out of t he whi l e%l ooppassTheSi gnal Test = - 1;

    %t hi s whi l e l oops keeps on r unni ng unt i l t he speaker has noi se- f r ee i nput %si gnal whi l e passTheSi gnal Test == - 1;

    % Prompt f or passwordwavpl ay(promptPW, f sprompt) ; % Say t he passworddi spl ay( ' BEGI N I N' ) ; pause( 1) ; di spl ay( ' 3' ) ; pause( 1) ; di spl ay( ' 2' ) ; pause( 1) ; di spl ay( ' 1' ) ; pause( 1) ; di spl ay( ' GO! ' ) ; Fs = 11025; ori g_si g = wavr ecord( 5*Fs, Fs, ' doubl e' ) ; di spl ay( ' STOP! ' ) ;

    pause( 1) ; di spl ay( ' ' ) ; % Cur r ent l y i nspecti ngdi spl ay( ' Cur rent l y i nspect i ng. . . ' ) ; wavpl ay( i nspect i ngPW, f si nspect i ng) ; pause(0. 5) ; wavpl ay( or i g_si g, Fs) ;

    [ f i l t er ed_si g, t hr e2, passTheSi gnal ] =f i l t er Z2( or i g_si g, passTheSi gnal Test ) ;

    i f passTheSi gnal ==1break

    end

    %speaker di d not speak wi t h a cl ear voi ce or had si gni f i cant %background noi sei f passTheSi gnal Test == - 1

    di spl ay( ' Pl ease say i t agai n. . . l oud, cl ear and SLOWLY' ) end

    end

    12

  • 7/28/2019 BIOEN 303 Final Project Report

    14/21

    % compar e t est Bank wi t h st ored bank.[ i dent i t y scor es i ndi ce peak] = compar or 2( or i g_si g, t hr e2, kwang_dat a,dataBank2) ;

    di spl ay( ' Fi ni shed i nspecti ng. ' ) ; % pl ays t he wav f i l e "speaker i s"

    wavpl ay( speaker I D, f sspeakerI D) ; pause(0. 5) ;

    % cal l s t he pi t chf i nder t o det er mi ne t he pi t chpi t ch = pi t chf i nder ( or i g_si g, Fs);

    i f ( pi t ch > 180) i dent i t y = 6; %i dent i t y = 6 means when pi t ch i s hi gher t han 180 Hz, we can say the%voi ce i s f r om a gi r l

    end

    %Tel l s who t he speaker i s

    % Al i i f i dent i t y == 2

    wavpl ay( al i I D, f sal i I D) ; di spl ay( ' Wel come home! ' ) ;

    % Andyel sei f i dent i t y == 3

    wavpl ay( andyI D, f schar l i eI D) ; di spl ay( ' Wel come home! ' ) ;

    % Char l i eel sei f i dent i t y == 4

    wavpl ay(char l i eI D, f skwangI D) ; di spl ay( ' Wel come home! ' ) ;

    % Kwangel sei f i dent i t y == 1

    wavpl ay(kwangI D, f sal i I D) ;di spl ay( ' Wel come home! ' ) ;

    % J a eel sei f i dent i t y == 5

    wavpl ay(j aeI D, f sj aeI D) ; di spl ay( ' Wel come home! ' ) ;

    % Unknown speaker el sei f i dent i t y == 0wavpl ay( unknownI D, f sunknownI D) ; di spl ay( ' St ep away f r omt he door . The pol i ce have been not i f i ed. ' ) ;

    % Unknown f emal e speaker el sei f i dent i t y == 6

    wavpl ay( gi r l , gi r l I D) ; di spl ay( ' Sor ry you are a gi r l . . . ' )

    end

    13

  • 7/28/2019 BIOEN 303 Final Project Report

    15/21

    % f i l t er Z2 f i l t er s and nor mal i zes the i nput si gnal .

    f uncti on [ f i l t er ed_si g, t hr e2, passTheSi gnal Test ] = f i l t er Z2( passwor d,passTheSi gnal )

    %% I nput s: passwor d = t he i nput si gnal whi ch i s t he voi ce si gnal% passTheSi gnal = t he var i abl e t hat can be 1 or - 1 and checks i f% t he spoken phr ase has t he same number of words as t he passwor d%% Out put s: f i l t er ed_si g = t he f i l t er ed si gnal % t hr e2 = t he matr i x t hat cont ai ns t he st ar t and end i ndeces of% t he words% passTheSi gnal Test = t hi s var i abl e al so i s 1 when t he voi ce% si gnal has same number of words and - 1 when i t doesn' t % Oper at i on: t hi s f uncti on f i l t er s t he voi ce si gnal and r et ur ns t he st ar t % and end i ndi ces of t he words%

    %set t i ng t he of f set of t he si gnal t o zer omean_or i g_si g = mean( passwor d) ; password = abs( password - mean_or i g_si g) ;

    %convol vi ng t he si gnal wi t h a t r i angl e wi ndow t o f i l t er t he hi gh%f r equenci est r i = t r i ang( 512) ; f i l t er ed_si g = conv( passwor d, t r i ) ;

    %nor mal i zed t he si gnal by di vi di ng t he si gnal by the mazi mum val uef i l t er ed_si g = f i l t er ed_si g/ max( f i l t er ed_si g) ;

    %mor e f i l t er i ng by set t i ng a t hr eshol d = 0. 2t hr e1 = f i l t er ed_si g > 0. 2;

    st ar t = 0; f i ni sh = 0; f i l t ered_si g = f i l t ered_si g. *thre1;

    %t he mat r i x hol d t he st ar t and end poi nt s of t he vowel st hr e2 =[ ] ;

    %t hi s vari abl e makes sure t hat t he l engt h of t he vowel s ar e reasonabl el _l i mi t = 1500;

    %number of wor dsnumber = 3;

    %r et ur ns t he or i gi nal si gnalpassTheSi gnal Test = passTheSi gnal ;

    %t he f ol l owi ng codes f i nd t he st art and end poi nt s of t he vowel s and st ore%t he i ndi ces i n thr e2f or i = 1: ( l engt h( f i l t ered_s i g) - 1) ;

    i f( f i l t er ed_si g( i ) == 0 && f i l t er ed_si g( i +1) >= 0) st ar t = ( i +1) ;

    el sei f ( f i l t er ed_si g( i ) >= 0 && f i l t er ed_si g( i +1) == 0)

    14

  • 7/28/2019 BIOEN 303 Final Project Report

    16/21

    f i ni sh = ( i ) ; endi f( ( f i ni sh - star t ) > l _ l i mi t )

    t hr e2 = [ t hr e2; [ start f i ni sh] ] ; el se

    f i l t ered_s i g( start : f i ni sh) = 0; end

    end

    %compar i ng t he number of number spoken wor ds wi t h t he number of wor ds i n the%passwor di f ( l ength( t hre2) == number )

    passTheSi gnal Test = 1; end

    % f ormant gen f i nds t he f ormant s of t he password phr ase.

    f uncti on [ f ormant ] = f ormant gen( si gnal , order )

    %% I nput s: si gnal = t he i nput si gnal % order = oreder of aut oregr essi ve model ( AR) used t o est i mate PSD% Out put s: f ormant = t he power spect r al densi t y (PSD) of t he si gnal i n dB% scal e

    %f i ndi ng PSD of t he si gnal usi ng Yul e- Wal ker met hod wi t h speci f i ed orderf ormant = pyul ear ( si gnal , or der ) ;

    %conver t i ng t he PSD t o dB scal ef ormant = 10*l og10( f ormant ) ; %pl ot PSDpl ot ( f or mant ) ;

    % comparor2 compares t he f ormant s f r omt he i nput voi ce si gnal t o t he% f ormant s i n the database and det er mi nes t he i dent i t y of t he speaker . % % si gnal = i nput voi ce si gnal % i ndi ces = begi nni ng and endi ng of each vowel segment % % databank = database of i ndex and magni t ude of al l t he peaks of t he% f ormant s of each vowel obt ai ned f r om t he members of our gr oup% % databank2 = dat abase of i ndex and magni t ude of t he f i r st t wo peaks of t he% f ormant s

    f uncti on [ i dent i t y scores i ndi ce peak] = comparor2( si gnal , i ndi ces, databank,dat abank2)

    % Al l owabl e r ange i n peak posi t i on compar i son. Det ermi nes t he secur i t y l evel . n2 = 1;

    % Number of vowel s i n the voi ce si gnal n = l engt h( i ndi ces( : , 1) ) ;% I dent i t y of t he speaker . Set t o be undeter mi ned ( denoted as - 1)

    15

  • 7/28/2019 BIOEN 303 Final Project Report

    17/21

    i dent i t y = - 1;i f n < 3 | | n > 3 I f more t han 3 syl l abl es, i dent i t y r emai ns undet ermi ned. %

    i dent i t y = - 1; el se

    [ r ow col umn] = si ze( dat abank) ; % Si ze of dat abank i s obt ai ned.scores = [ ] ; % scores f or al l t hr ee vowel s and al l member si ndi ce = [ ] ; % i ndi ces of t he peaks i n t he f or mant s of t he i nput si gnal . peak = [ ] ; % magni t udes of t he peaks i n t he f ormant s of t he i nput si gnal .

    % Repeat t he process as many t i mes as t he number of vowel s i n t he i nput% si gnal . f or i = 1: n

    % Gets t he segment t hat cont ai ns a si ngl e vowel . si gnal _seg = si gnal ( i ndi ces(i , 1) : i ndi ces(i , 2) ) ;

    % Gets t he f ormant f r om t he segment . f ormant = f ormant gen( si gnal _seg, 20) ;

    % Gets i ndi ces and magni t udes of al l peaks of t he f ormant . [ i ndex peaks] = peakmast er3( f ormant , 2) ;% Saves t he i ndi ces f or vi ewi ng pur pose. i ndi ce = [ i ndi ce i ndex' ] ;

    % Saves t he magni t udes f or vi ewi ng pur pose. peak = [ peak peaks' ] ;

    i f l engt h( i ndex) < 3 % I f t her e ar e l ess t han 3 peaks i n one f ormant i dent i t y = - 1; % set t he i dent i t y as undet er mi ned andbreak; % br eak

    end

    % scor es obt ai ned by compar i ng t he f ormant s of t he i nput si gnal

    % wi t h t he f or mant s of di f f er ent peopl e f or t he sel ect ed vowel . score = [ ] ;

    % r epeat as many t i mes as t he number of r ows i n dat abankf or j = 1: r ow

    s = 0; % saves poi nt s f r om x- axi s compar i sonp = 0; % saves poi nt s f r om y- axi s compar i son

    % r epeat as many t i mes as t he number of peaks i n the f ormant f or k = 1: l engt h( i ndex)

    % r epeat as many t i mes as t he number of peaks f or t he vowel

    % of t he person i n databankf or l = 1: l engt h( dat abank{j , i }( : , 1) )

    % i f t he x- axi s posi t i ons mat chi f( i ndex( k) = ( dat abank{j , i }( l , 1) -cei l ( dat abank{j , i }( l , 2) ) ) )

    % Add 1 t o the t ot al scor e f or t he x- axi s compari sons = s + 1;

    16

  • 7/28/2019 BIOEN 303 Final Project Report

    18/21

    % I f t he magni t ude of t he peak t hat passed t he x- axi s% compar i son f al l s i n t he r ange i n y- axi si f peaks( k) = ( dat abank{j , i }( l , 3) - n2 - ( dat abank{j , i }( l , 4) ) )

    % Add 2 t o the t ot al scor e f or t he y- axi s% comapr i sonp = p + 2;

    endend

    endend

    i f ( l - s ) < 3 % I f t her e ar e l ess t han 3 mi smatches i n x- axi ss = s + 1; % bonus 1 poi nt

    el sei f ( l - s ) 5 % I f t her e ar e more t han 3 matches i n y- axi s

    p = p + 3; % bonus 3 poi nt sel sei f p > 7 % i f t her e ar e more t han 4 matches i n y- axi s

    p = p + 4; % bonus 4 poi nt sends = s + p; % Fi nal scor escor e =[ scor e; s] ; % Save t he f i nal scores f or each vowel

    endscores = [ scores score] ; % Save t he f i nal scor es f or al l t he vowel s

    end

    end

    sums = [ ] ; % sum f or each per son i n t he database

    % By summi ng each row i n scor es, t he t otal score f or each person i n% database i s obt ai ned. f or z = 1: l engt h( scor es( : , 1) )

    i f l engt h( f i nd( scor es( z, : ) ==0) ) > 0sums = [ sums; ( sum( scores( z, : ) ) - 3) ] ;

    el sesums = [ sums; sum( scor es( z, : ) ) ] ;

    endend

    % Fi nd ent r i es i n dat abase t hat exceeds t he thr eshol d. absol ut e = f i nd( sums >= 18) ;

    % Fi nd ent i r es t hat mi ght need a second chancenotsur e = f i nd( sums >= 10) ;

    17

  • 7/28/2019 BIOEN 303 Final Project Report

    19/21

    % Run t he second compar i son met hod[ i dent i t y2 poi nt s aver ages] = compar or ( si gnal , i ndi ces, dat abank2) ; % I f any one of t he ent r i es i n dat abase exceeds t he t hr eshol di f l engt h( absol ut e) > 0

    i f l engt h( absol ut e) > 1 % I f t her e i s more than one match% The one who got t he maxi mum scor es i s det ermi ned as t he mat ch. i dent i t y = f i nd( max( sums) == sums) ;

    el se% I f t her e i s onl y one, i dent i t y i s det er mi ned. i dent i t y = absol ut e( 1) ;

    end% I f no one passed t he thr eshol d, but some were cl oseel sei f l engt h( notsur e) > 0

    % I f t he second compar i son method agr ees wi t h any one of t he candi dat esi f l engt h( f i nd( not sur e == i dent i t y2) ) > 0

    % I dent i t y i s det er mi ned. i dent i t y = not sur e( f i nd( not sur e == i dent i t y2) ) ;

    el se% I f t he above cr i t er i a ar e not met , t he speaker i s an unknown

    % person. i dent i t y = 0;

    endel se

    i dent i t y = 0; % I f t here' s no one above 10, t he speaker i s unknown. end

    %pi t chf i nder i s cal l ed by r ecor dMai n2 t o f i nd t he pi t ch of t he si gnal

    %Thi s f unct i on f i nds t he aver age pi t ch of t he speaker %I nput = si gnal ( or i gi nal wave si gnal ) , f s ( sampl i ng f r equency) %Out put = Pi t ch ( i n Hz) and cepst r um spect r umf uncti on [ pi t ch, cepst r um] = pi t chf i nder ( si gnal , f s)

    %r ceps i s a mat l ab f unct i on. I t does t he f our i er t r ansf or m of t he%l og of t he f our i er t r ansf or m of t he or i gi nal si gnal si gnal _ceps = r ceps( si gnal ) ;

    %Set t i ng t he l i mi t t hr eshol d f or t he human voi ce pi t chupper l i mi t = 300; l owerl i mi t = 70; t hr eshol d = r ound( f s/ upper l i mi t ) ; l i mi t = r ound( f s/ l ower l i mi t ) ;

    %t aki ng t he si gnal f r omt he speci f i c set of t he cepst r um- ed si gnal cepst r um_or i g = si gnal _ceps( t hr eshol d: l i mi t ) ;

    %appl yi ng a but t erwort h f i l t er on t he cepst r um- ed si gnal [ b a] = but t er ( 10, 0. 1) ; cepstr um = f i l t f i l t ( b, a, cepstr um_or i g) ;

    %pi t ch i s t he sampl i ng f r equency di vi ded by t he maxi mum peaks i n t he%cepst r umdomai n. Max peak i n t he cepst r um domai n i s t he f undenment al %f r equencypi t ch = f s/ ( f i nd( max( cepst r um_or i g) == cepst r um_or i g) +t hr eshol d) ;

    18

  • 7/28/2019 BIOEN 303 Final Project Report

    20/21

    %peakmast er3 f i nds t he peaks of t he i nput si gnal .

    %i nput : si gnal of t he wave t hat we want t o f i nd peaks f or, Thol d i s t he%t hr eshol d val ue

    %out put : gi ves t he i ndex and t he hei ght of t he peaks

    f uncti on [ Ta Tb] = peakmast er3( si gnal , Thol d)

    %st ar t i ng out wi t h t hr eshol d val uev = 0; x = 1; T=[ ] ; pr ev = si gnal ( 1) ; mi nV = mi n(si gnal ) ; max = [ 0, mi nV] ; l = l engt h( si gnal ) ;

    %l oop t hat compar es each val ue. . . i f t he next val ue i s hi gher , t hen i t %i s count ed as a peakwhi l e x < l

    i f si gnal ( x) > max( 2) && si gnal ( x) > pr evmax = [ x, si gnal ( x)] ; v = v + si gnal ( x) - pr ev; el se i f ( max(2) ~= mi nV) && ( si gnal ( x) < ( max(2) - Thol d) )

    i f v > Thol d- 3. 9T = [ T; max] ;

    endmax( 2) = mi nV; v = 0;

    endendpr ev = si gnal ( x); x = x+1;

    end

    % comparor t akes an i nput voi ce si gnal and ext r act s segment s t hat cont ai n% vowel s. Then gets t he f ormant s f or each vowel and compares t he f i r st t wo% peaks wi t h t he data i n dat abank.

    % si gnal = i nput voi ce si gnal % i ndi ces = begi nni ngs and endi ngs of each vowel % databank = database of t he f i r st t wo peaks of t he f ormant of each vowel % f or al l t he member s of our gr oup.

    f uncti on [ i dent i t y scor es aver ages] = compar or ( si gnal , i ndi ces, dat abank)

    scores = [ ] ; % Fi nal scor es f or al l t hr ee vowel s f or al l t he member s

    n = l engt h( i ndi ces( : , 1) ) ; % number of vowel s i n t he i nput si gnal i dent i t y = - 1; % i dent i t y i s i ni t i al l y set t o be undet er mi ned

    19

  • 7/28/2019 BIOEN 303 Final Project Report

    21/21

    % r epeat s t he compar i son pr ocess as many t i mes as t he number of vowel s i n% t he i nput si gnal f or i = 1: n

    si gnal _seg = si gnal ( i ndi ces(i , 1) : i ndi ces(i , 2) ) ; % get s t he nth segment f ormant = f ormant gen( si gnal _seg, 20) ; % get s t he f ormant of t he segment [ i ndex peaks] = peakmast er3( f ormant , 2) ; % f i nds t he peaks i n t he f ormant i ndex = i ndex' ; peaks = peaks' ;

    % I f t her e ar e l ess t han t hr ee peaks, i dent i t y i s undet er mi ned and% br eak out . i f l engt h( i ndex) < 3

    i dent i t y = - 1; break;

    end

    vowel scor e =[ ] ; % scor es f or al l ent r i es f or t he vowel bei ng compar ed.f or k = 1: l engt h( dat abank( : , 1) )

    % per cent age di f f er ence of t he x- axi s posi t i ons and t he y- axi s% posi t i ons of t he two peakss = abs( ( i ndex( 1, 1) - dat abank(k, ( 4*i - 3) ) ) / dat abank( k, ( 4*i - 3) ) ) ; s = s + abs( ( peaks( 1, 1) - dat abank(k, ( 4*i - 2) ) ) / dat abank(k, ( 4*i - 2) ) ) ; s = s + abs( ( i ndex( 1, 2) - dat abank( k, ( 4*i - 1) ) ) / dat abank( k, ( 4*i - 1) ) ) ; s = s + abs( ( peaks( 1, 2) - dat abank(k, ( 4*i ) ) ) / dat abank(k, ( 4*i ) ) ) ;

    vowel score = [ vowel score; s] ; % sums t he percent age di f f erencesend

    scores =[ scores vowel score] ; % saves al l t he sums

    end

    % comput es aver age of t he per cent age di f f er ences i n t he t hr ee vowel s f or % each personaver ages = [ ] ; f or h = 1: l engt h( scor es( : , 1) )

    aver age1 = abs( ( scor es( h, 1) +scor es( h, 2) +2*scores( h, 3) ) / 4) ; aver ages = [ aver ages; aver age1] ;

    end

    % I f t he mi ni mum aver age per cent age di f f er ence i s bel ow 1 ( t hr eshol d) , t he% speaker i s i dent i f i ed.i f mi n( aver ages) < 1

    i dent i t y = f i nd( averages == mi n( averages) ) ; el se

    i dent i t y = 0; end