45
Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

  • Upload
    emmly

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography. DNA. DNA analysis is no longer confined to genetic and medical research. Criminal Forensics: - PowerPoint PPT Presentation

Citation preview

Page 1: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Deoxyribonucleic acid (DNA)Biometrics

CPSC 4600 Biometrics and Cryptography

Page 2: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

DNA analysis is no longer confined to genetic and medical research.

Criminal Forensics: – Forensic science relies heavily on the ability of DNA

to identify the source of biological substances and determine who is most likely to have committed a crime.

– This ability to identify an individual is enhanced by the variety of substances that contain DNA, including blood, hair, urine, bone, teeth, and tissues.

DNA

Page 3: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Criminal Forensics: – Using saliva, the FBI were able to match DNA

samples from letters mailed to relatives by Theodore Kaczynski with DNA obtained from stamps on letters mailed by the Unabomber (University and Airline Bomber).

– Identification of specimens using DNA has had other benefits, in one third of the cases where this technique has been used, DNA analysis has been able to exonerate people wrongly accused of crimes.

DNA

Page 4: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Establishing paternity– DNA analysis is now a common tool for establishing

paternity, and it has been called on to identify remains after tragedies such as airline accidents.

Investigating migration of human beings and genetic disease– Anthropologists are using DNA analysis to study the

migration of human beings across the oceans. – Historians employ these techniques to identify genetic

disease in famous individuals. Tracking endangered species

– Wildlife biologists use the variation of DNA sequences between species to track endangered species.

DNA

Page 5: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Features of DNA DNA is composed of FOUR different chemical

building blocks called "bases". These four bases are:– adenine (A)– guanine (G)– thymine (T)– cytosine (C)

They are joined together in one strand by strong covalent bonds. These two strands are held together in a double helix because bases with complementary shapes can pair with each other.

Page 7: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Features of DNA (cont’d)

Adenine is able to pair with Thymine and Guanine pairs with Cytosine.

Complementary base pairs are found along the entire length of the DNA duplex.

The complementary nature of the two strands provides a basis for copying genetic information and for passing this information on to offspring.

Page 8: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Information is stored in DNA in the sequence of bases just as information can be stored in a book in the sequence of letters.

Each human cell contains approximately 3 billion base pairs of DNA organized in 23 pairs of chromosomes.

Every person inherits one set of 23 chromosomes from the mother and one set of 23 chromosomes from the father.

Features of DNA (cont’d)

Page 9: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Techniques used for DNA fingerprinting1. Isolating the DNA in question from the rest of the

cellular material in the nucleus. 2. Cutting the DNA into several pieces of different sizes. 3. Sorting the DNA pieces by size. 4. Denaturing the DNA, so that all of the DNA is rendered

single-stranded. This can be done either by heating or chemically treating the DNA in the gel.

5. Blotting the DNA. 6. DNA sequence is detected: AGGCCTC• More:

http://protist.biology.washington.edu/fingerprint/dnaintro.html

Page 10: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Polymerase Chain Reaction (PCR) for DNA Fingerprinting

Often DNA samples obtained from crime scenes are too small in quantity or too degraded by sunlight or high temperature to be analyzed by the restriction fragment length polymorphism (RFLP) method.

These samples are subjected to a different fingerprinting technique known as PCR.

PCR is a valuable technique because it provides a method for producing millions of copies of small regions of DNA.

Page 11: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

DNA Matching -- Sequence Alignment

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

DefinitionGiven two strings x = x1x2...xM, y

= y1y2…yN,

an alignment is an assignment of gaps to positions

0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap

in the other sequence

AGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGAC

Page 12: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

What is a good alignment?

Alignment: The “best” way to match the letters of one sequence with those of the other

How do we define “best”?

Alignment:A hypothesis that the two sequences come from a common ancestor through sequence edits

Parsimonious explanation:Find the minimum number of edits that transform one sequence into the other

Page 13: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Scoring Function Sequence edits:

AGGCCTC

– Mutations AGGACTC– Insertions AGGGCCTC– Deletions AGG . CTC

Scoring Function:Match: +mMismatch: -sGap: -d

Score F = (# matches) m - (# mismatches) s – (#gaps) d

Page 14: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

How do we compute the best alignment?

AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA

AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC

Too many possible alignments:

O( 2M+N)

M

N

Page 15: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

DNA Matching -- Dot matrix method

The dot matrix method (dot plot method) is a graphical way of comparing two sequences.

In a dot matrix, two sequences to be compared are represented as horizontal and vertical axes of a two-dimensional diagram.

The comparison is done by scanning each residue of one sequence for similarity with all residues in the other sequence.

Page 16: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Dot matrix method• If a residue match is found, a

dot is placed within the graph. Otherwise, the matrix positions will be left blank.

• When the two sequences have substantial regions of similarity, many dots line up to form contiguous diagonal lines, which reveal the sequence alignment.

• If there are interruptions in the middle of a diagonal line, they will indicate insertions and deletions. Parallel diagonal lines represent repetition.

Basically Diagonal lines = alignment Non-diagonal lines = gaps

Page 17: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Dynamic Programming

Dynamic programming is a method that determines optimal alignment between two sequences.

Suppose we wish to alignx1……xMy1……yN

Let F(i,j) = optimal score of aligningx1……xiy1……yj

Page 18: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Three steps: 1. creates a two-dimensional alignment grid as in the

dot matrix method. . 2. accumulates scores in the matrix for matches and

mismatches b/w sequences. 3. traces back through matrix in reverse order to

identify the highest scoring path.

Dynamic Programming (cont’d)

Page 19: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Dynamic Programming (cont’d)

Notice three possible cases:

1. xi aligns to yj

x1……xi-1 xi

y1……yj-1 yj

2. xi aligns to a gapx1……xi-1 xi

y1……yj -

3. yj aligns to a gapx1……xi -y1……yj-1 yj

m, if xi = yj

F(i,j) = F(i-1, j-1) + -s, if

not

F(i,j) = F(i-1, j) - d

F(i,j) = F(i, j-1) -

d

Match: +mMismatch: -sGap: -d

F(i-1, j-1) F(i-1, j)

F( i, j-1) F(i, j)

+m/-s

-d

-d

Page 20: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Dynamic Programming (cont’d)

How do we know which case is correct?

Inductive assumption:F(i, j-1), F(i-1, j), F(i-1, j-1) are optimal

Then,F(i-1, j-1) + s(xi, yj)

F(i, j) = max F(i-1, j) – dF( i, j-1) – d

Where s(xi, yj) = m, if xi = yj; -s, if not

Match: +mMismatch: -sGap: -d

Page 21: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Intuitive understanding of the algorithm

F(i, j) is the maximum score from one of the three directions.

Match: +mMismatch: -sGap: -d

F(i-1, j-1) F(i-1, j)

F( i, j-1) F(i, j)

+m/-s

-d

-d

Page 22: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

F(i,j) i = 0 1 2 3 4

Examplex = AGTA m = 1y = ATA s = 1

d = 1

A G T A0 -1 -2 -3 -4

A -1T -2A -3

j = 012

3

Optimal Alignment:

F(4,3) = 2

AGTAA - TA

1 0 -1 -2

0 0 1 0

-1 -1 0 2

Page 23: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Example

x = AGTA m = 1y = ATA s = -1

d = -1

A G T A0 -1 -2 -3 -4

A -1 1 0 -1 -2T -2 0 0 1 0A -3 -1 -1 0 2

F(i,j) i = 0 1 2 3 4

j = 012

3

Optimal Alignment:F(4,3) = 2

AGTAA - TA

Score= 3 match + 0 mismatch + 1 gap = 3x1 + 0x(-1) + 1x(-1) = 2

Page 24: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

The Needleman-Wunsch Matrixx1 ……………………………… xM

y 1 …

……

……

……

……

……

… y

N

Every nondecreasing path

from (0,0) to (M, N)

corresponds to an alignment of the two sequences

Can think of it as adivide-and-conquer algorithm

Page 25: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

The Needleman-Wunsch Algorithm

1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j dc. F(i, 0) = - i d

2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M

For each j = 1……N F(i-1,j) – d [case 1]F(i, j) = max F(i, j-1) – d [case 2] F(i-1, j-1) + s(xi, yj) [case 3]

UP, if [case 1]Ptr(i,j) = LEFT if [case 2]

DIAG if [case 3]

3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignment

Page 26: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Performance

Time:O(NM)

Space:O(NM)

Page 27: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

The local alignment problem

Given two strings x = x1……xM, y = y1……yN

Find substrings x’, y’ whose similarity (optimal global alignment value)is maximum

e.g. x = aaaacccccggggy = cccgggaaccaacc

Page 28: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

The classic application of dynamic programming in local alignment is the

Smith-Waterman algorithm. In this algorithm, a similar tracing-back

procedure is used. However, the alignment path may begin and end internally

along the main diagonal. Gaps can be inserted if necessary. One or several

aligned segments with best scores can be obtained. This appraoch may be

suitable for aligning divergent sequences or sequences that have multiple

domains that may be of different origins.

DP for local alignment

Page 29: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

The Smith-Waterman algorithm

Idea: Ignore badly aligning regions

Modifications to Needleman-Wunsch:

Initialization: F(0, j) = F(i, 0) = 0

0Iteration: F(i, j) = max F(i – 1, j) – d

F(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)

Page 30: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

The Smith-Waterman algorithm

Termination:

1. If we want the best local alignment…

FOPT = maxi,j F(i, j)

2. If we want all local alignments scoring > t

For all i, j find F(i, j) > t, and trace back

Page 31: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Smith-Waterman Algorithm (Example) Align S1=ATCTCGTATGATGATCTCGTATGATG S2=GTCTATCACGTCTATCAC

GTCTATCAC

A T C T C G T A T G A T G

0 0 0 0 0 2 1 0 0 2 1 00000000000

0 0 0 0 0 0 0 0 0 0 0 0 02

0 2 1 2 1 1 4 3 2 1 1 3 20021021

1224321

4323654

3654554

4554657

3444556

3546545

3475576

2569876

1458876

03677

109

2258799

2147788

108

97

534

2

0

d=1

A T C T C G T A T G A T GA T C T C G T A T G A T G

G T C G T C T A T C A CT A T C A C

),()1,1(1)1,(1),1(

0

max),(

ji YXSjiFjiFjiF

jiF

m, if xi = yj

S(i,j) = -s, if not

Page 32: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

An example of Smith Waterman

A T T G C

Align with DP: A G G C

Match: m = 1

Gap: d = -1

Mismatch: s = 0

Page 33: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

0 0 1

Match: 1

Gap: -1

Mismatch: 0

An example of Smith Waterman

Page 34: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Score= 3 match + 1 mismatch + 1 gap = 3x1 + 1x0 + 1x(-1) = 2

0 1 0

0 1 1 00 0 1 2

0 1 0

0 1 1 00 0 1 2

Match: 1

Gap: -1

Mismatch: 0

An example of Smith Waterman

Page 35: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Advantages:

1. DP is an exhaustive search method to find a optimum global

alignment.

Disadvantages:

1. sometimes results in many different alignments having the same

score.

2. the exhaustive search natures makes it difficult to be applied to

searching a large database.

Advantages and disadvantages of DP

Page 36: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Biometrics Issues and concerns

Page 37: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Issues and concerns

Excessive concern with the biometric may have an eclipsing effect on the performance of the technology. One could:– plant DNA at the scene of the crime – associate another's identity with his

biometrics, thereby impersonating without arousing suspicion

– interfere with the interface between a biometric device and the host system, so that a "fail" message gets converted to a "pass".

Page 38: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Identity theft and privacy issues

Two types of privacy concerns:– Informational privacy. Relates to the unauthorized

collection, storage, and usage of biometric information. For example, if someone’s iris scan is stolen it allows someone else to access personal information or financial accounts, the damage could be irreversible.

– Personal privacy. Relates to an inherent discomfort individuals may feel when encountering biometric technology.

– The former one is more critical.

Page 39: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Defining Application-Specific Privacy Risk: The BioPrivacy Impact Framework

Certain types of biometric deployments are more prone than others to lead to privacy-invasive uses, while other types of deployments have little or no bearing on privacy.

Biometrics, in and of themselves, are neither a protector nor an enemy of privacy.

The type of deployment determines the relation between biometrics and privacy.

Page 40: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Biometric DeploymentsOvert versus Covert

– User awareness and consent, – Notices and signs– A covert system can not permanently store biometric

info collected from individuals who do not match watch lists.

Opt-in versus Mandatory– Mandatory system runs greater privacy risks than a

voluntary or opt-in system. – Choice over whether one wants to provide one’s

personal info is a central privacy principle.

Page 41: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Biometric DeploymentsVerification versus Identification

– Identification (1:N) is more susceptible to privacy-related abuse than a system only capable of 1:1 matching.

Fixed Duration versus Indefinite Duration– When deployed for an indefinite duration, the risk

increases. Public Sector versus Private Sector

– Data in public sector are more likely to be misused.

Page 42: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Biometric Deployments

Citizen, Employee, Traveler, Student, Customer, Individual

User ownership versus Institutional Ownership of Biometric Data

Personal Storage versus Storage in Template Database

Page 43: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Sociological concernsPhysical concerns:

– Biometric technology can cause physical harm to an individual using the methods, or instruments are unsanitary.

Personal information concerns: – whether our personal information taken through

biometric methods can be misused, tampered with, or sold, e.g. by criminals stealing, rearranging or copying the biometric data.

– The data obtained using biometrics can be used in unauthorized ways without the individual's consent.

Page 44: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Sociological concerns

Society fears in using biometrics will continue over time. As the public becomes more educated on the practices, and the methods are being more widely used, these concerns will become more and more evident.

Biometric technology is being used at border crossings that have electronic readers that are able to read the chip in the cards and verify the information present in the card and on the passport.

Biometric method allows for the increase in efficiency and accuracy of identifying people at the border crossing. CANPASS, by Canada Customs is currently being used by some major airports that have kiosks set up to take digital pictures of a person’s eye as a means of identification.

Page 45: Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography

Conclusions

Despite these misgivings, biometric systems have the potential to identify individuals with a very high degree of certainty.

Forensic DNA evidence enjoys a particularly high degree of public trust at present

Also substantial claims are being made in respect of iris recognition technology, which has the capacity to discriminate between individuals with identical DNA, such as monozygotic twins.