De novo RNA-seq for the study of ODAP synthesis pathway in Lathyrus sativus

  • View
    293

  • Download
    0

  • Category

    Science

Preview:

Citation preview

DE NOVO RNA-SEQ FOR THE STUDY OF ODAP SYNTHESIS PATHWAY IN

LATHYRUS SATIVUS

Calabuig Serna Tono, Martínez Rodero Iris & Segarra Martín Eva

Escola Tècnica Superior d’Enginyeria Agronòmica i del Medi Natural Universitat Politècncia de ValènciaMay, 2015

INDEX 1. INTRODUCTION

2. FIRST APPROCH TO THE PROJECT

3. EXPERIMENTAL DESIGN

a. SAMPLE RECOVERY

b. RNA-SEQ ASSAY

c. RAW DATA PROCESSING

i. DATA FILTERING

ii. TRANSCRIPTOME ASSEMBLY

d. DATA ANALYSIS

4. BUDGET ESTIMATE

5. CONCLUSIONS

1. INTRODUCTION

• the ‘insurance crop’

• But ODAP synthesis

LATHYRUS SATIVUS

Commonly known as grass pea

Agronomical and biological advantatges

Main source when other crops fail

Areas prone to famine and drough: Asia and East Africa

× Neuro-excitatory aminoacid

× Associated with neurolathyrism: neurodegenerative disease

1.INTRODUCTION

• Converting L. sativus into a safe food

• Discovering ODAP sythesis control

• Results future grass pea with low ODAP

OUR AIM

need of genetic improvement

Gene global expression assay

HOW L. SATIRUS & ODAP ARE RELATED

2. FiRST APPROACH TO THE PROJECT

Genotype effects:‘Jamalpur’ variety >> ‘LS-5602’ variety

Environmental effects: Drought conditions >> normal conditions

Developement stage:

seed >> vegetative tissues

2. FIRST APPROACH TO THE PROJECT

1st) Microarray assay

• We contacted Agilent

• But genome sequence was nedeed!

GLOBAL EXPRESSION STUDY: WHICH TECHNOLOGY ?

probes design

2nd) Including sequencing & annotation step

2. FIRTS APPROACH TO THE PROJECT

2. FIRST APPROACH TO THE PROJECT

3rd) RNA-seq ‘de novo’

GLOBAL EXPRESSION STUDY: WHICH TECHNOLOGY ?

3. EXPERIMENTAL DESIGN

• Taking advantatge of previous knowledge:

Sample Variety Tissue Environmental

conditions

1

‘Jamalpur’

Seed Drought

2 Control

3 Stem Drought

4 Control

5

‘LS-8603’

Seed Drought

6 Control

7 Stem Drought

8 Control

SAMPLES RECOVERY

50 g of seeds 100 mg of stem no need of replicates

3. EXPERIMENTAL DESIGN

RNA- SEQ ASSAY

2) cDNA library construction random hexamer primers dNTPs RNase H and DNA polymerase I

1) RNA extraction

3) Libraries qualification and quantification Agilent 2100 Bioanaylzer ABI StepOnePlus Real-Time PCR System 4) cDNA sequencing

HiSeq 2000

3. EXPERIMENTAL DESIGN

RAW DATA PROCESSING

1) Data filtering avoid sequecing errors

Reads removed:

• Sequences with adapter

• Low quality at both ends

• Average quality score < 15 in Phred

• Too short (< 36 bp)

3. EXPERIMENTAL DESIGN

RAW DATA PROCESSING

2) Transcriptome assembly

‘de novo’ transcriptome assemblier

• Recovers more full-lenght transcripts

• Sensitivity across a range of expression levels

• 3 modules:

• Inchworm

• Chrysalis sequentially applied

• Butterfly to

large volumes of RNA-seq reads

TRINITY

A. INCHWORM: reconstructs linear transcript contigs

1. k-mer dictionary from all sequence reads

2. removing of error-containing k-mers

3. the most frequent k-mer

4. Extends the seed contig in each direction:

5. Extends the sequence in either direction

6. Repeats steps 3–5 until k-mer dictionary exhausted

CONTIG

highest occuring k-merterminal base

(k-1) overlapping

growingcontig

sequenceuntil it

cannot be further

extended

B. CHRYSALIS: constructs complete de Bruijn graphs

1. groups Inchworm contigs into connected components

If they perfectly overlap k- 1 bases

2. likely to be:

3. builds a de Bruijn graph for each component

4. assigns each read to

components

• alternative splice forms

• closely related paralogs

with which the readshares the largest number of k-mers

component

C. BUTTERFLY: reconstructs full-length linear transcripts

• by reconciling:

1. Graph simplification

2. Deletes edges that represent minor deviations

individual de Bruijn graph

iterates between consecutive nodes

obtains linear paths in de Bruijn graphs

nodes representing longer sequences

likely sequencing errors

generated by Chrysalis:• original reads• paired ends

3. EXPERIMENTAL DESIGN DATA ANALYSIS

A) Transcriptome annotation

• NCBI non-redundant protein database• Swiss-Prot• Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database

the best hits

BLASTx against:

transcription direction coding region

of transcripts

ESTScan:

• detect coding regions in DNA sequences even low quality

1. CDS assignment

Transcripts were translated with CDS > 100 bp

3. EXPERIMENTAL DESIGN DATA ANALYSIS

A) Transcriptome annotation

• NCBI non-redundant protein database

Transcripts against:

3. Blast2GO

Transcripts annotated with GO• molecular function• biological processes• cellular component

2. BLASTn

termsGO annotations

3. EXPERIMENTAL DESIGN DATA ANALYSIS

B) Differential expression

• Mapping

• Transcripts normalization:

FPKM =total fragments

mapped reads millions ∗ exon length (Kb)

BWAaligner

readsassembled

transciptome

different FPKM different expression

3. EXPERIMENTAL DESIGN DATA ANALYSIS

B) Differential expression

• Different selected variables different ODAP content

• Different ODAP content different expressed genes

Sample Variety Tissue Environmental

conditions

1

‘Jamalpur’

Seed Drought

2 Control

3 Stem Drought

4 Control

5

‘LS-8603’

Seed Drought

6 Control

7 Stem Drought

8 Control

Look for genes more expressedprobably

synthesizing ODAP

3. EXPERIMENTAL DESIGN DATA ANALYSIS

C) Statistical analysis

ANOVA detects differentially expressed genes

all the level combinations considers the possible effect of

over the response variable

expression level for each gene

• Factor: variable which takes values in the experiment• Level: possible values for each factor

3. EXPERIMENTAL DESIGN

DATA ANALYSIS

C) Statistical analysis

Factors Levels

Variety Jamalpur (+) LS-8603 (-)

Tissue seed (+) stem (-)

E. condition drought (+) control (-)

ANOVA23 factorial design • 2 levels

• 3 factors

3. EXPERIMENTAL DESIGN DATA ANALYSIS

C) Statistical analysis

Variety Tissue E. condition

1 + + +

2 + + -

3 + - +

4 + - -

5 - + +

6 - + -

7 - - +

8 - - -

genes with higher expression in sample 1

than in the other samples

candidate genes for ODAP synthesis

detects just the genes implied in ODAP synthesisANOVA

SAMPLE 1: more ODAP content

4. BUDGET ESTIMATE

• RNA extraction 150$ / sample

• RNAseq 3600$ / sample

• Trinity free

• ESTScan free

• BWAligner free

TOTAL = 150·8 + 3,600·8 = 30,000 $

5. CONCLUSIONS

• Contribution to the knowledge of Lathyrus sativus

and ODAP biosynthesis

• Possible future modification

• Developement of the project

Support always our ideas with references

Need of collaboration

Lot of work!

lower ODAP content variant

THANK YOU FOR YOUR ATTENTION

Any question?

Recommended