76
Reconstructing Sibling Relationships from Genotyping Data Saad Sheikh Department of Computer Science University of Illinois at Chicago Brothers! ? ?

Reconstructing Sibling Relationships from Genotyping Data

  • Upload
    chance

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Saad Sheikh Department of Computer Science University of Illinois at Chicago. ?. Brothers!. ?. Reconstructing Sibling Relationships from Genotyping Data. Biological Motivation. Used in: conservation biology, animal management, molecular ecology, genetic epidemiology - PowerPoint PPT Presentation

Citation preview

Page 1: Reconstructing Sibling Relationships from Genotyping Data

Reconstructing Sibling Relationships from Genotyping Data

Saad SheikhDepartment of Computer ScienceUniversity of Illinois at Chicago

Brothers!

?

?

Page 2: Reconstructing Sibling Relationships from Genotyping Data

• Used in: conservation biology, animal management, molecular ecology, genetic epidemiology

• Necessary for: estimating heritability of quantitative characters, characterizing mating systems and fitness.

• But: hard to sample parent/offspring pairs. Sampling cohorts of juveniles is easier

Lemon sharks, Negaprion brevirostris

2 Brown-headed cowbird (Molothrus ater) eggs in a Blue-winged Warbler's nest

Biological Motivation

Page 3: Reconstructing Sibling Relationships from Genotyping Data

GeneUnit of inheritance

AlleleActual genetic sequence

LocusLocation of allele in entire genetic sequence

Diploid2 alleles at each locus

Basic Genetics

Page 4: Reconstructing Sibling Relationships from Genotyping Data

Diploid Siblings

Siblings: two children with the same parentsQuestion: given a set of children, find sibling

groups

locusallele

father (.../...),(a /b ),(.../...),(.../...) (.../...),(c /d ),(.../...),(.../...) mother

(.../...),(e /f ),(.../...),(.../...) child

one from fatherone from mother

recombination

Page 5: Reconstructing Sibling Relationships from Genotyping Data

Microsatellites (STR)Advantages:

Codominant (easy inference of genotypes and allele frequencies)

Many heterozygous alleles per locus

Possible to estimate other population parameters

Cheaper than SNPsBut:

Few lociAnd:

Large familiesSelf-mating…

CACACACA5’

AllelesCACACACA

CACACACACACACACACACACACACA

#1

#2

#3

Genotypes1/1 2/2 3/3 1/2 1/3 2/3

Page 6: Reconstructing Sibling Relationships from Genotyping Data

Sibling Reconstruction Problem

Sibling Groups:

2, 4, 5, 6

1, 3

7, 8

22/221/68

88/221/57

1/36

33/441/35

77/661/34

33/551/43

33/441/32

11/221/21

allele1/allele2

Locus2Locus1Animal

S={P1={2,4,5,6},P2={1,3},P3={7,8}}

33/77

Page 7: Reconstructing Sibling Relationships from Genotyping Data

Existing MethodsMethod Approach Error-

Detection

Assumptions

Almudevar & Field (1999,2003)

Minimal Sibling groups under likelihood

No Minimal sibgroups, representative allele frequencies

KinGroup (2004)

Markov Chain Monte Carlo/ML

No Allele Frequencies etc. are representative

Family Finder(2003)

Partition population using likelihood graphs

No Allele Frequencies etc. are representative

Pedigree (2001)

Markov Chain Monte Carlo/ML

No Allele Frequencies etc are representative

COLONY (2004)

Simulated Annealing

Yes Monogamy for one sex

Fernandez & Toro (2006)

Simulated Annealing

No Co-ancestry matrix is a good measure, parents can be reconstructed or are available

Page 8: Reconstructing Sibling Relationships from Genotyping Data

KINSHIP

David C. Queller and Keith F. Goodnight.

Computer software for performing likelihood tests of pedigree relationship using genetic markers.

Molecular Ecology, 8:1231–1234, 1999.

Page 9: Reconstructing Sibling Relationships from Genotyping Data

KINSHIP

First software and likelihood measure for sibling/kinship reconstruction

Estimates a ratio of two likelihoods: Primary vs. Null Hypothesis

Assumes Population Frequencies are known

Page 10: Reconstructing Sibling Relationships from Genotyping Data

Probability of sharing allele

R – Probability of alleles being identical by descent Rp = Probability (Xp = Yp)Rm = Probability (Xm = Ym)

Relationship Rp RmMother–offspring 0.0 1.0Father–offspring 1.0 0.0Full siblings 0.5 0.5Full sisters (haplodiploid) 1.0 0.5Half siblings (maternal) 0.0 0.5Cousins (maternal) 0.0 0.3Unrelated 0.0 0.0

Page 11: Reconstructing Sibling Relationships from Genotyping Data

Haploid Likelihood

Two individuals X =<X> and Y=<Y>If X=Y

Likelihood = Pr(Drawing X) x Pr(X = Y)=R+(1-R)Px

OtherwiseLikelihood = Pr(Drawing X) x Pr(X Y)=Px(1-R)Py

Page 12: Reconstructing Sibling Relationships from Genotyping Data

Diploid IndividualsDiploid Individuals X=<Xp/Xm> , Y =<Yp/Ym>Assumptions

We know which alleles are mother's and father'sNo Inbreeding

Likelihood = Likelihoodp x Likelihoodm

Loci are independentTotal Likelihood is a product of likelihoods

across loci

Page 13: Reconstructing Sibling Relationships from Genotyping Data

Calculating Likelihood

Population Frequencies: Pxm,Pxp,Pym,PypLikelihoods:

Xp = Yp

Xm = Ym

Xp Yp

Pxm(Rm + (1 - Rm)Pxm) ́Pxp(Rp + (1 – Rp)Pxp) Pxm(Rm + (1 - Rm)Pxm) ́ Pxp(1 – Rp)Pyp

Xm Ym Pxm(1 - Rm)Pym ́ Pxp(Rp + (1 - Rp)Pxp) Pxm(1 - Rm)Pym ́ Pxp(1 - Rp)Pyp

Page 14: Reconstructing Sibling Relationships from Genotyping Data

Likelihood Ratios

Independent Likelihood is not very reliable or meaningful

Different Ratios => Different LociRatio != Statistical SignificanceSimulations used to determine P-values

Page 15: Reconstructing Sibling Relationships from Genotyping Data

Statistical Significance

Randomly generate an individual X using allele frequencies

Draw Y using Rm and RpFirst Allele: Copy X's allele with Probability

Rm or vice versaSecond Allele: Copy X's allele with

Probability Rp or vice versaDraw a large number of such <X,Y> pairsThe value of the ratio that excludes 95%

of such pairs is at P=0.05 significance

Page 16: Reconstructing Sibling Relationships from Genotyping Data

Family Finder

Jen Beyer and B. May.

A graph-theoretic approach to the partition of individuals into full-sib families.

Molecular Ecology, 12:2243–2250, 2003.

Page 17: Reconstructing Sibling Relationships from Genotyping Data

Graph-Theory?

Build a graph of all individualsConnect individuals with edges

representing relationshipsAssign Likelihood Ratio Full

Sib/Unrelated as distance measureFilter using likelihood ratio at 0.05

significance levelFind a cut

Page 18: Reconstructing Sibling Relationships from Genotyping Data

AlgorithmCalculate LFS/LUR likelihood ratios for all pairsBuild a graph representing the full-sib relationships Find the connected components in the graph and store them in a queue.While the queue is not empty do

Remove a component from the queue and calculate its score. Build a GH cut tree for the component. For each cut with less than 1/3 the total number of edges in

the component do Score the components that would result if the cut's edges

were removed. If the scores are the best found so far, then store them.

If the best scores found are higher than the score for the original component then separate the families and put them in the queue for

further analysis.Otherwise save the original component as a result family.

Page 19: Reconstructing Sibling Relationships from Genotyping Data

Example

Score the components and Keep the best cuts

Page 20: Reconstructing Sibling Relationships from Genotyping Data

Conclusion – Family Finder

Some theoretical basisEfficiently computableProduces reasonably good results for

many lociA lot of assumptions because of

Goodknight & Queller measureRequires a significant number of loci - 8+Works well only when families are almost

equal size

Page 21: Reconstructing Sibling Relationships from Genotyping Data

Parsimony=Occam’s Razor"entities must not be multiplied beyond necessity”"plurality should not be posited without necessity”

“Parsimony is a 'less is better' concept of frugality, economy or caution in arriving at a hypothesis or course of action. The word derives from Middle English parcimony, from Latin parsimonia, from parsus, past participle of parcere: to spare. It is a general principle that has applications from science to philosophy and all related fields. Parsimony is essentially the implementation of Occam's razor.”

• Wikipedia

Min Sib groups = Most Parsimonious explanation

Parsimony

Page 22: Reconstructing Sibling Relationships from Genotyping Data

4-allele rule:siblings have at most 4 different alleles in a locus

Yes: 3/3, 1/3, 1/5, 1/6No: 3/3, 1/3, 1/5, 1/6, 3/2

2-allele rule: In a locus in a sibling group:

a + R ≤ 4

Yes: 3/3, 1/3, 1/5No: 3/3, 1/3, 1/5, 1/6

Mendelian Constraints

Num distinct alleles

Num alleles that appear with 3 others or are homozygote

Page 23: Reconstructing Sibling Relationships from Genotyping Data

Find the minimum number of Sibling Groups necessary to explain the given cohort

Minimum Set Cover:Cohort as universe UIndividuals as elements of UCovering Groups C include all genetically

feasible sibling groupsNP-complete even when we know sibsets at

most 3Hard to approximate (Ashley et al. 09)ILP formulation (Chaovalitwongse et al. 08)

Min Sibgroups Reconstruction

Page 24: Reconstructing Sibling Relationships from Genotyping Data

[ ]min | | iI m i I

I such that S U

Minimum Set CoverGiven: universe U = {1, 2, …, n}

collection of sets S = {S1, S2,…,Sm} where Si subset of U

Find: the smallest number of sets in Swhose union is the universe U

Minimum Set Cover is NP-hard(1+ln n)-approximable (sharp)

Page 25: Reconstructing Sibling Relationships from Genotyping Data

1. Generate all maximal feasible sibling groups (sets) that satisfy 2-allele property using “2-Allele Algorithm” [ISMB 2007; Bioinformatics 23(13)]

2. Use Min Set Cover to find the minimum sibling groups

Optimally using ILP (CPLEX)

2-Allele Min Set Cover

Page 26: Reconstructing Sibling Relationships from Genotyping Data

Generate candidate sets by all pairs of individualsCompare every set to every individual x

if x can be added to the set without any affecting “accomodability” or violating 2-allele: add it

If the “accomodability” is affected , but the 2-allele property is still satisfied: create a new copy of the set, and add to it

Otherwise ignore the individual, compare the next

2-Allele Algorithm Overview

Page 27: Reconstructing Sibling Relationships from Genotyping Data

ID alleles1 1/2

2 2/3

3 2/1

4 1/3

5 3/2

6 1/4

Canonical families

1/1 1/2 1/3 1/4 2/2 2/3 2/4 3/4 3/3 4/4

1/1 1/1

1/2

2/1

2/2 1/3

1/4

2/3

2/4

3/1

4/1

3/2

4/2

1/1

1/2

2/1

1/1

1/3

2/1

2/3

3/1

2/1

3/2

1/2

1/3

2/1

3/1

ID alleles1 55/43

2 43/114

3 43/55

4 55/114

5 114/43

6 55/78

1/3

2/1

2/3

2/1

3/2

Page 28: Reconstructing Sibling Relationships from Genotyping Data

Add

New Group Add (won’t accommodate (2/2))

Can’t add (a+R =4)

Examples

1/41/ 2 3/ 4

3/ 2

1/41/ 2 3/ 2

3/ 2

1/41/ 2 1/ 1

1/ 5

Page 29: Reconstructing Sibling Relationships from Genotyping Data

1. Get a dataset with known sibgroups(real or simulated)

2. Find sibgroups using our alg3. Compare the solutions

Partition distance, Gusfield ’03

4. Compare results to other sibship methods

Testing and Validation: Protocol

Page 31: Reconstructing Sibling Relationships from Genotyping Data

Random Data GenerationGenerate F females and M males (F=M=5, 10, 15)Each with l loci (l=2, 4, 6)Each locus with a alleles

a[uniform]=5, 10, 15 a[nonuniform]=4 12-4-1-1

Generate f familiesf[uniform]=2, 5, 10 f[nonuniform]=5

For each family select female+male uniformly at random

For each parent pair generate o offspringo[uniform]=2, 5, 10 o[nonuniform]=25-10-10-4-1

For each offspring for each locus choose allele outcome uniformly at random

Page 32: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 33: Reconstructing Sibling Relationships from Genotyping Data

2-Allele Min Set CoverFirst combinatorialMakes no assumptions other parsimonyWorks consistently and comparatively

Sibling ReconstructionGrowing number of methodsBiologists need (one) reliable reconstructionGenotyping errors

Answer: Consensus

Summary (Min Sib Groups)

Page 34: Reconstructing Sibling Relationships from Genotyping Data

Combine multiple solutions to a problem to generate one unified solutionC: S*→ SBased on Social Choice TheoryCommonly used where the real solution is not

known e.g. Phylogenetic Trees

Consensus Methods

Consensus...

S1 S2 Sk

S

Page 35: Reconstructing Sibling Relationships from Genotyping Data

Only Pareto Optimality and Anti-Pareto Optimality are enforcedAll solutions must agree on equivalence

All disputed individuals go to singletons

Strict Consensus

Strict Consensus

5 Sibling Groups? When 3 can do?

S1 = {{1,2,3},{4,5},{6,7}S2={{1,2,3,4},{5,6,7}}S3={{1,2},{3,4,5},{6,7}}

S={{1,2},{3},{4},{5},{6,7}}

Si x≡Siy≡ x≡Sy

Page 36: Reconstructing Sibling Relationships from Genotyping Data

Majority of solutions determine the final solutionTwo individuals are together if a majority of

solutions vote in their favourViolates Transitivity: A ≡ B ∧ B ≡ C ⇒ A ≡ C

Majority Consensus

S1 = {{1,2,3},{4,5},{6,7}S2={{1,2,3,4},{5,6,7}}S3={{1,2},{3,4,5},{6,7}}

1 ≡ 3 AND 3 ≡ 4 BUT 1 ≡ 4

Page 37: Reconstructing Sibling Relationships from Genotyping Data

Voting ConsensusMajority under closureResults in large monolithic groups

Majority Consensus

Voting Consensus 1 ≡ 5 ?

S1 = {{1,2,3},{4,5},{6,7}S2={{1,2,3,4},{5,6,7}}S3={{1,2},{3,4,5},{6,7}}

S={{1,2,3,4,5},{6,7}}

Page 38: Reconstructing Sibling Relationships from Genotyping Data

Commonly used consensus methods don’t work [AAAI-MPREF08]Strict Consensus produces too many singletonsMajority violates transitivity AND doesn’t work

for error-tolerance

Consensus Methods

Page 39: Reconstructing Sibling Relationships from Genotyping Data

Algorithm Compute a consensus solution S={g1,...,gk }Search for a good solution near S

Distance-based Consensus

Consensus...S1 S2 Sk Ss

S

Search

fd

f q

fq fd

Page 40: Reconstructing Sibling Relationships from Genotyping Data

NeedsA Distance Function fd: S x S →R A Quality Function fq: S → R

What is the Catch? [Sheikh et al. CSB 2008]Optimization of fd, fq or an arbitrary linear

combination is NP-Complete Reduction from the 2-Allele Min Set Cover

Problem

Distance-based Consensus

Page 41: Reconstructing Sibling Relationships from Genotyping Data

Algorithm Compute a strict consensusWhile distance is not too large

Merge two nearest sibgroupsQuality: fq=n-|C|Distance Function

fd(C,C’)=cost of merging groups in C to obtain C’

A Greedy Approach

Page 42: Reconstructing Sibling Relationships from Genotyping Data

A Greedy Approach

{1,2} {3} {4} {5} {6,7}{1,2} 3.5 1.1 2.5 5.1

{3} 0.5 0.3 0.5 0.1{4} 1.0 3.0 0.6 1.1

{5} 2.0 1.2 3.5 4.9

{6,7} 0.6 0.9 1.2 4.1

S1 ={ {1,2,3}, {4,5}, {6,7} }S2={ {1,2,3}, {4},

{5,6,7} }S3={ {1,2}, {3,4,5}, {6,7} }

Strict Consensus S={ {1,2}, {3},{4},{5},{6,7} }

{1,2} {3,6,7} {4} {5}{1,2} 3.5 1.1 2.5

{3,6,7} 1.7 3.1 2.2

{4} 1.0 3.0 0.6{5} 2.0 1.2 3.5

S={ {1,2}, {3,6,7},{4},{5} }

Page 43: Reconstructing Sibling Relationships from Genotyping Data

Distance Function(sibgroup, sibgroup)Cost of assigning all individuals

fd(C,C’)=min(SXPi fassign(Pj,X), SXPj fassign(Pi,X) )Distance Function (sibgroup, individual)

Benefit: Alleles and allele pairs sharedCost: Minimum Edit Distance

fassign(PiX)=

Greedy Consensus

benefit X can be a member of Pi

cost X cannot be a member of Pi`

Page 44: Reconstructing Sibling Relationships from Genotyping Data

AlgorithmCompute a strict consensusWhile distance is not too large

Merge two sibgroups which will minimize the TOTAL merging cost

Store the new merging cost in the merged set

Greedy Consensus

Page 45: Reconstructing Sibling Relationships from Genotyping Data

Error-Tolerant Approach

Locu

s 1

Locu

s 2

Locu

s 3

Locu

s k

Sibling Reconstruction

Algorithm

...

Consensus...

S1 S2 Sk S

Page 46: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 47: Reconstructing Sibling Relationships from Genotyping Data

>90% accuracy for all real data

Results

Page 48: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 49: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 50: Reconstructing Sibling Relationships from Genotyping Data

A consensus method CANNOT be all of these [Arrow 1963,Mirkin 1975]FairIndependentPareto Optimal

Biologically [AAAI-MPREF 2008]The subset of individuals chosen will impact the

consensus considerably

Impossibility Result

Page 51: Reconstructing Sibling Relationships from Genotyping Data

ParametricDoes NOT outperform other algorithms on:

Biological dataSmaller familiesHigh Allele Frequencies

Problems

Page 52: Reconstructing Sibling Relationships from Genotyping Data

Change costs to average per locus costsCompare max group error on per locus basisTreat cost and benefit independentlyIn order to qualify a merge

Cost <= maxcostBenefit >= minbenefitBenefit = max benefit among possible merges

Auto Greedy Consensus

Page 53: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 54: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 55: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 56: Reconstructing Sibling Relationships from Genotyping Data

First consensus method for Sibship ReconstructionMajority won’t work

First combinatorial approach for Error-Tolerant Sibship ReconstructionFewer AssumptionsMore Efficient

Distance-based Consensus is NP-HardNew non-parametric consensus

Summary (Consensus)

Page 57: Reconstructing Sibling Relationships from Genotyping Data

Min number of sibgroups is just ONE way to interpret parsimony

Alternate ObjectivesSibship that minimizes number of parents

Very Hard! Connection to Raz’s Parallel Repetition Theorem

Sibship that minimizes number of matingsSibship that maximizes family sizeSibship that tries to satisfy uniform allele

distributions

Parsimony: Alternate Objectives

Page 58: Reconstructing Sibling Relationships from Genotyping Data

Problem Statement:Given a population U of individuals, partition

the individuals into groups G such that the parents (mothers+fathers) necessary for G are minimized

Observations and Challenges:MinParents: intractable, inapproximable

Reduction from Min-Rep Problem (Raz’s Parallel Repetition Theorem)

There may be O(2|loci|) potential parents for a sibgroup

Self-mating (plants) may or may not be allowed

Parsimony: Minimize Parents

Page 59: Reconstructing Sibling Relationships from Genotyping Data

Not Necessarily…

Is MinParents = MinSibgroups?

Parents

Genotype atLocus 1

P1 1 10P2 2 20P3 3 30P4 4 40P5 3 50

Child Parents Genotype

A P1-P2 1 20B P1-P2 2 10C P1-P2 10 20D P1-P3 1 30E P1-P3 10 3F P4-P2 4 20G P4-P2 40 2H P4-P3 40 3I P4-P3 4 30J P4-P5 4 50

Page 60: Reconstructing Sibling Relationships from Genotyping Data

1. Generate M a set of covering groups2. Cover a subset S of covering groups3. For each group x in S

1. Generate Parent Pairs for x2. Insert parent vertices into graph G (if needed)3. Connect the parents in each parent pair

4. Cover the minimum vertices necessary to (doubly) cover all the individuals

Min Parents Meta ApproachM={{1,2},{3,6,7},{3,5}, {2,4},{1,6},{2,5},{6,7}}

S={{1,2,4},{3,5},{6,7}}

X={3,5}

{F=5/10, M=2/20},{F=5/20.M=2/10}

5/10

2/20

5/20

2/10

X={3,5}

X={3,5}

Page 61: Reconstructing Sibling Relationships from Genotyping Data

Different approaches to selecting a subset of maximal feasible groupsGreedy Min Set CoverK –Greedy Min Set CoversAll Sets! (Nearing optimality)

Forget maximal feasible sibling groupsGenerate K random minimal feasible sibling

reconstructions

Covering Groups

Page 62: Reconstructing Sibling Relationships from Genotyping Data

The number of generated parents is just too many!

Mine Association Rules across loci {A,B}locus1 => {C,D}locus2

Use Association Rules to filter parents {A,B}locus1 => {C,D}locus2 OR {C’,D’}locus2

Polygamy=>High Confidence Association Rules

No Polygamy=>Min Parents=Min GroupsIf self-mating is not allowed, odd-cycles must

be disallowed

Generating Parents

Page 63: Reconstructing Sibling Relationships from Genotyping Data

HeuristicWhile all vertices are not covered

Select the vertex that will cover the most uncovered individuals

MIP Formulation

Covering Vertices

Page 64: Reconstructing Sibling Relationships from Genotyping Data

ResultsLegend:M1: k-greedy cover with optimal graph cover

M2: greedy set cover with optimal graph cover

M3: Randomized cover with optimal graph cover

M4: k-greedy with graph heuristics

M5: greedy set cover with graph heuristic

Page 65: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 66: Reconstructing Sibling Relationships from Genotyping Data

Results

Page 67: Reconstructing Sibling Relationships from Genotyping Data

Reduction is from a version of Parallel Repetition theorem even if we know all the parents and just need to find the minimum parents to choose!

But, what is the parallel repetition theorem?

Complexity Results

Page 68: Reconstructing Sibling Relationships from Genotyping Data

2-prover 1-roundproof system

label cover problemfor bipartite graphs

small inapproximability

boosting(Raz’s parallel repetition theorem)

parallel repetition of2-prover 1-roundproof system

label cover problemfor some kind of“graph product” forbipartite graphslarger inapproximability

Unique gamesconjecture

restriction restriction

Page 69: Reconstructing Sibling Relationships from Genotyping Data

We need some version of Raz’s parallel repetition theorem that is suitable for us

Fortunately, the following two papers helped:

U. Feige, A threshold of ln n for approximating set-cover, Journal of the ACM, 1998

G. Kortsarz, R. Krauthgamer and J. R. Lee, Hardness of Approximating Vertex-Connectivity Network Design Problems, SIAM J. of Computing, 2004

Page 70: Reconstructing Sibling Relationships from Genotyping Data

Inapproximability for MINREP(Raz’s parallel repetition theorem)

Let LNP and x be an input instance of L

L MINREP

O(npolylog(n)) time

xL

xL

OPT ≤ α+β

0 < ε < 1 is any constant

OPT (α+β) 2log |A| +|B|

Page 71: Reconstructing Sibling Relationships from Genotyping Data

MINREP (minimum representative) problem

α partitionsall of equal size

β partitionsall of equal size

…A

B

A1 A2 Aα

B1 B2 BβB3

…A1 A2 Aα

B1 B2 B3 Bβ

B “super”-nodes

A “super”-nodes

associated “super”-graph Hinput graph G

(A1,B2)H if uA1 and vB2 such that (u,v)GIn this case, edge (u,v)G a witness of the super-edge (A1,B2)H

α partitionsall of equal size

…A

B

A1 A2 Aα

B1 B2 BβB3

Page 72: Reconstructing Sibling Relationships from Genotyping Data

MINREP goal

Valid solution: A’ A and B’ B such that

A’B’ contains a witness for every super-edge

Objective: minimize the size of the solution |A’B’|

Page 73: Reconstructing Sibling Relationships from Genotyping Data

Informally, given a set of childrengiven a candidate set of parentsassuming we believe in Mendelian inheritance

lawassuming that the parents tried to be as much

monogamous as possible

can we partition the children into a set of full siblings

(full sibling group has the same pair of parents)

Can reduce MINREP to show that this problem is hard

Page 74: Reconstructing Sibling Relationships from Genotyping Data

Parsimony-based combinatorial optimization works bet with least amount of information

Parsimony-based combinatorial optimization is NP-hard and inapproximable

First combinatorial approach for Error-Tolerant Sibship ReconstructionFewer AssumptionsMore Efficient

Other parsimony-based optimization objectives are possibleMin Parents is interesting and hard!

Conclusions

Page 75: Reconstructing Sibling Relationships from Genotyping Data

Better heuristics for Min Parents?Other parsimony objectivesFurther analysis of when objectives give

same results

Future Work

Page 76: Reconstructing Sibling Relationships from Genotyping Data

Mary AshleyUIC

W. Art Chaovalitwong

seRutgers

Isabel CaballeroUIC

Sibship Reconstruction Project

Ashfaq Khokhar

UIC

Tanya Berger-WolfUIC

Priya Govindan

UIC

Bhaskar DasGupta

UIC

Thank You!!Questions?

Chun-An (Joe) Chou

Rutgers