33
Identification and evaluation of causative genetic variants corresponding to a certain phenotype Xidan Li

Identification and evaluation of causative genetic variants corresponding to a certain phenotype Xidan Li

Embed Size (px)

Citation preview

Identification and evaluation of causative genetic variants corresponding to a certain phenotype

Xidan Li

Outline

• SIT - identify and evaluate the causative genetic variants within a QTL/GWAS defined region.

• PASE - evaluate the effect of amino acid substitution to the hosting protein function

• DIPT - to identify causative genes underlying an expression phenotype

• Parallelizing computing

Genetic variances identification

Possible solutions?

Working process of SITVCF file

SNPs analysis in non-coding regions SNPs analysis in coding regions

Splicing sites

CpG island

UTR region

Non-synonymous SNPs

PASE

Candidate genes with candidate SNPs

List of ranking Non-synonymous SNPs

Ensembl

Sample results

Non-synonymous SNPs are ranked

The life is easy!

Amino acid substitutions effects prediction

Effect of amino acid substitutions

Selected seven physico-chemical properties of Amino acids

Seven Physiochemical properties of Amino acid

Transfer free energy from octanol to water

Normalized van der Waals volume

Isoelectric point

Polarity

Normalized frequency of alpha-helix

Free energy of solution in water

Normalized frequency of turn

Formula for conservation calculation

1-.95N

Probability of 20 different AAs in a position for N random equal frequent sequences.

nobserved /Ntotal

(1-.95N)*(nobserved /Ntotal)Blast search clustalw

Protein kinase AMP-activated gamma 3 (PRKAG3) gene

• (R200Q) in AMPK3 in purebred Hampshire pigs – RN• (V199I) in AMPK3 Co-participate in the effective

process with R200Q • RN that causes excess glycogen content in pig skeletal

muscle

• Milan D, et. al. (2000). A mutation in PRKAG3 associated with excess glycogen content in pig skeletal muscle. Science 288 (5469): 1248–51.

• Ciobanu,D, et. al. (2001). Evidence for New Alleles in the Protein Kinase Adenosine Monophosphate-Activated 3-Subunit Gene Associated With Low Glycogen Content in Pig Skeletal Muscle and Improved Meat Quality . Genetics, 159, 1151-1162.

Genes ID Coordinate REF ALT Conservations score (MSAC)

PASE score

PASEC (combined)

score

PRKAG_3 200 R Q 0.93 0.54 0.50

PRKAG_3 199 V I 0.85 0.14 0.12

(R200Q) Cause major increase in the muscle glycogen content(V199I) Contribute with smaller effect

Ciobanu,D, et. al. (2001). Evidence for New Alleles in the Protein Kinase Adenosine Monophosphate-Activated 3-Subunit Gene Associated With Low Glycogen Content in Pig Skeletal Muscle and Improved Meat Quality. Genetics, 159, 1151-1162.

Testing with SIFT and POLYPHEN

Conservation scores

(MSAC)

PASE scores(Physico-chemical

properties changings)

PASEC score(combined)

SIFTTolerated (1987) 0.47 0.39 0.18

Deleterious (1351) 0.60 0.51 0.30

PolyPhen

Benign (1637) 0.44 0.37 0.16

Possibly damaging (539)

0.56 0.43 0.24

Probably damaging (1162)

0.63 0.53 0.33

Features• Other tool

SIFT, PolyPhen

MAINLY rely on calculating sequence conservation scores (finding homologous sequences).

• PASE

not only uses the physico-chemical property changing score, but also combine with sequence conservation score

Potentially being able to analyze the evolutionary-distant protein sequence

From expression phenotype to association genotype

Sample result of DIPT

www.computationalgenetics.se/DIPT/

Parallelizing computing

Principle of parallelizing computing

Multiple threads – efficient work

Single thread - tough job!

• Usually in the loop

• Data must be independent

GPU vs. CPU

Cuda Vs. C#include <cuda.h>#include <stdio.h>

// Prototypes__global__ void helloWorld(char*);

// Host functionint main(int argc, char** argv){ int i;

// desired output char str[] = "Hello World!";

// mangle contents of output ; the null character is left intact for simplicity for(i = 0; i < 12; i++) str[i] -= i;

// allocate memory on the device char *d_str; size_t size = sizeof(str); cudaMalloc((void**)&d_str, size);

// copy the string to the device cudaMemcpy(d_str, str, size, cudaMemcpyHostToDevice);

// set the grid and block sizes dim3 dimGrid(2); // one block per word dim3 dimBlock(6); // one thread per character // invoke the kernel helloWorld<<< dimGrid, dimBlock >>>(d_str);

// retrieve the results from the device cudaMemcpy(str, d_str, size, cudaMemcpyDeviceToHost);

// free up the allocated memory on the device cudaFree(d_str); // everyone's favorite part printf("%s\n", str); return 0;}

// Device kernel__global__ void helloWorld(char* str){ // determine where in the thread grid we are int idx = blockIdx.x * blockDim.x + threadIdx.x;

// unmangle output str[idx] += idx;}

#include <stdio.h>

int main(void){ printf("Hello World\n"); return 0;}

Thank You!