56
Sequence variation Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for human genomics workshop, UNSW Day 2 – Friday 29 th January 2016

Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Sequence variation

Prince of Wales Clinical School

Dr Jason Wong

Introductory bioinformatics for human genomics workshop, UNSW Day 2 – Friday 29th January 2016

Page 2: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Aims of the session

• Introduce major human genome variation databases. – 1000 Genomes Project – Catalog of Somatic Mutations in Cancer (COSMIC)

• By the end of the session you will know how to access these datasets.

• We will look more into annotation of variants this afternoon.

Page 3: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Types of variation

• Cytological level: – Chromosome numbers – Segmental duplications, rearrangements,

and deletions

• Sub-chromosomal level: – Transposable Elements – Short Deletions/Insertions, Tandem Repeats

• Sequence level: – Single Nucleotide Polymorphisms (SNPs) – Small Nucleotide Insertions and Deletions

(Indels)

Page 4: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Why study sequence variation?

• Determine disease risk

• Response to therapy

• Cancer biology

• Forensics

• Evolution

Page 5: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Single nucleotide polymorpisms (SNP)

• Typically refers to single bases substitution.

• There are ~40 M common SNPs in human population.

• A given individual would expect to differ from reference genome by 1% (i.e. 3 million SNPs)

Page 6: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Types of SNPs

• Genic, coding SNPs – Frameshift – Splice site – Non-synonymous (missense, nonsense) – Synonymous (splice enhancer/suppressor?)

• Genic, non-coding SNPs – Untranslated region – Regulatory SNPs – Intronic SNPs

• Intergenic

Page 7: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Examples:

Cytochrome P450 2D6 Drug metabolism

CAC TCC TGA CGC

G

Factor V Deep-vein thrombosis

GAC AGG CGA GGA

A

Coding SNPs

Page 8: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Predicting effect of coding SNPs

• Functional importance of SNPs usually based on:

– Sequence conservation.

– Frequency in population.

– Alter protein 2D/3D structure.

– Within protein motifs.

• Many tools are now available for coding SNP function prediction – BUT is still far from perfect.

Page 9: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Non-coding SNPs

• Traditionally more difficult to annotate as >98% of the genome is non-coding.

• Want to find SNPs that is

associated with gene expression (eQTLs).

• With the ENCODE/Epigenome project, it is easier (but still very difficult) to find potential functional non-coding SNPs.

Page 10: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Insertion/deletion (Indels)

• Typically defined as gain or loss of 1-50 bps

• Less frequent than SNPs (~10% of all sequence variation).

• But if in coding sequence can additionally cause frameshift mutations.

Page 11: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

dbSNP

• Online database from NCBI for cataloguing all SNPs submitted by the scientific community.

www.ncbi.nlm.nih.gov/SNP/

Page 12: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Reliability of dbSNP

• Problem with dbSNP is that anyone can upload variants and therefore it is claimed that there is a false positive rate of perhaps > 10%.

• Furthermore, some somatic mutations have also found their way into dbSNP.

• Therefore, use dbSNP – BUT ideally only SNPs from 1000genomes project.

Page 13: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

1000 Genomes project

• Goal of the project is to find virtually all genetic variants with frequency of at least 1% in the human population.

• Ultimate aims was to sequence ~2500 humans at 4x whole-genome coverage

www.1000genomes.org

Page 14: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Major populations Total samples

East Asian (ASN) 523

South Asian (SAN) 494

African (AFR) 691

European (EUR) 514

Americas (AMR) 355

Total 2,577

1000 Genomes samples

Population Code Population Description Super Population CHB Han Chinese in Bejing, China ASNJPT Japanese in Tokyo, Japan ASNCHS Southern Han Chinese ASNCDX Chinese Dai in Xishuangbanna, China ASNKHV Kinh in Ho Chi Minh City, Vietnam ASNCEU Utah Residents (CEPH) with Northern and Western European ancestry EURTSI Toscani in Italia EURFIN Finnish in Finland EURGBR British in England and Scotland EURIBS Iberian population in Spain EURYRI Yoruba in Ibadan, Nigera AFRLWK Luhya in Webuye, Kenya AFRGWD Gambian in Western Divisons in The Gambia AFRMSL Mende in Sierra Leone AFRESN Esan in Nigera AFRASW Americans of African Ancestry in SW USA AFRACB African Carribbeans in Barbados AFRMXL Mexican Ancestry from Los Angeles USA AMRPUR Puerto Ricans from Puerto Rico AMRCLM Colombians from Medellin, Colombia AMRPEL Peruvians from Lima, Peru AMRGIH Gujarati Indian from Houston, Texas SANPJL Punjabi from Lahore, Pakistan SANBEB Bengali from Bangladesh SANSTU Sri Lankan Tamil from the UK SANITU Indian Telugu from the UK SAN

26 sub-populations

Page 15: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Usage of variant data

• GWAS studies have typically only been able to find associations of variants of frequency of > 5%.

• 1000 Genomes project enables screening of variants discovered in sequencing of patients with genetic disorders and in cancer genomes.

Page 16: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Per individual variant load (Nature 491, 56-65)

Page 17: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

State of 1000 genomes project

• Phase 1

– Completed in 2012

– 1,092 humans.

– 14 populations

– 36.7 M SNPs

– 1.38 M Indels

• Phase 3

– Completed in 2014

– 2,535 humans.

– 26 populations.

– 78.1 M SNPs

– 3.1 M Indels

1.9 M SNPS are not shared – Some samples not shared.

– Different sequencing platforms.

– Change in variant calling pipeline.

Page 18: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Accessing 1000 genomes data

• www.1000genomes.org contains variant calls (VCF format), aligned BAM files and RAW files.

• Almost all (if not all) SNPs from 1000 genomes also catalogued in dbSNP.

• UCSC has a track containing all 1000 genome Phase 1 and 3 data.

Page 19: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Accessing 1000genomes from UCSC Aim: Visualise all coding 1000 genome SNPs over the APOE gene and highlighting non-synonymous SNPs

To load session, user: jasewong session name: bioinf_workshop_SNP_2016

Page 20: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Accessibility tracks

• Shows regions of the genome where variant calls can be reliably made using whole genome NGS data.

• Note that some regions still have variant calls because 1000 genomes didn’t just use whole gnome NGS.

• Useful for raising caution if your variants come from these regions. https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=tgpPhase3Accessibility

Page 21: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Good for downloading SNPs, but not good for visualisation – for this bring up dbSNP tracks

Page 22: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Bring up dbSNP 144 All SNPs, Common SNPs, Flagged SNPs and Multi. SNP from the “Variation” section of tracks. Select “dense” for each one.

Page 23: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

SNP definitions

• Common SNPs - SNPs with >= 1% minor allele frequency (MAF), mapping only once to reference assembly.

• Flagged SNPs - SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSNP as "clinically associated" -- not necessarily a risk allele! (These are rare SNPs that with known clinical function).

• Mult. SNPs - SNPs mapping in more than one place on reference assembly.

• All SNPs - all SNPs from dbSNP mapping to reference assembly.

https://genome.ucsc.edu/cgi-bin/hgTrackUi?g=snp144

Page 24: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Right-click anywhere on the “All SNPs” track and select “Configure All SNPs(144)

Need to configure track to only show 1000 genome coding SNPs

Page 25: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

1. Set Single Nucleotide Polymorphism only

2. Set 1000 Genomes Project only

3. Remove all except missense variant

Page 26: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 27: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Select only clinical associated variants

Page 28: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Why are there less clinically associated, missensed, 1000 genome SNPs than Flagged SNPs?

Page 29: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

APOE contains two well known Alzhimer’s disease risk associated SNPs rs429358 and rs7412 Bring up GWAS Catalog track (under Phenotype and Literature) to find where these are.

Page 30: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Note: 1. The two SNPs are NOT in the Flagged SNPs track because their MAF >= 1% 2. The two SNPs lie in regions inaccessible to short read NGS.

Page 31: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Change All SNPs track to packed and click on rs7412 for more information

Page 32: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Click to link out to dbSNP

Page 33: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 34: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Exporting specific SNPs using table browser Aim is to download all 1000genome clinically associated missense SNPs over APOE

Type APOE and click “go”.

Page 35: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Select “Table browser” from “Tools” menu”

Select track

Select filter

Check “position”. The coordinates should be where the browser last was.

Page 36: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Filter

1. Check 1000genomes

2. Check missense

3. Clincally-assoc

Page 37: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Select BED as the output format Optionally type in a name for the output file to download the file.

Page 38: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

BED file from UCSC – should be 8 SNPs in total

Page 39: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Now we’ll have 5 minutes for everyone to catch up and questions.

Page 40: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Somatic mutations

• Somatic mutations arise in a subset of cells in the body.

• Commonly identified in cancer.

• Only a few are likely to be oncogenic. Most are just passenger mutations.

Page 41: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 42: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Somatic mutations

• Like SNPs, somatic mutations can occur anywhere in the genome.

• Due to the preference for exome-sequencing, currently most somatic mutations are defined in coding regions.

Page 43: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Catalogue of somatic mutations in cancer (COSMIC)

• Somatic mutations curated from literature and large scale sequencing projects.

• Maintained by the Wellcome Trust Sanger Institute, UK

• Started in 2004 with just 4 genes (HRAS, KRAS2, NRAS and BRAF)

Page 44: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Aim is to explore mutations in the APOE gene in COSMIC

Type in ‘APOE’

http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/

Page 45: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 46: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Mutation list (Data) Gene view

Page 47: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Tissue distribution Types of mutation

Page 48: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

References

Page 49: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Note that UCSC COSMIC track is not currently the latest version

Page 50: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 51: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 52: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes
Page 53: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Click get results to download data

Attributes can be selected to populate the output table

Note coordinates are GRCh38 (i.e. hg38). Can use LiftOver from UCSC to convert to hg19. BUT note that ‘chr’ is required in front of chromosome number in UCSC. i.e. 19:44908652-44908652 needs to be chr19:44908652-44908652

Page 54: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Other COSMIC resources

Page 55: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Exercises

1. Retrieve nonsense SNPs from the 1000 Genomes project for MLH1 using the Table Browser.

2. Also visualise this in the Genome Browser and find out what substitution has caused the nonsense SNPs.

3. How many of these SNPs are also COSMIC mutations? (Hint requires intersection function in Table browser – or visualise directly in genome browser)

Page 56: Dr Jason Wong - University of New South Wales · –Catalog of Somatic Mutations in Cancer (COSMIC) ... African (AFR) 691 European (EUR) 514 Americas (AMR) 355 Total 2,577 1000 Genomes

Further reading/resources • 1000 genomes project (www.1000genomes.org/)

– Phase 1 paper (www.ncbi.nlm.nih.gov/pubmed/23128226) – Phase 3 paper (www.ncbi.nlm.nih.gov/pubmed/2643224)

• Somatic mutations – COSMIC (cancer.sanger.ac.uk/) – Cancer mutation papers

• www.ncbi.nlm.nih.gov/pubmed/2353959 • www.ncbi.nlm.nih.gov/pubmed/26404825

• Structural variants in the human genome – dbVar (www.ncbi.nlm.nih.gov/dbvar)

• SNP annotation databases – OMIM (www.omim.org) – ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) – GWAS Catalog (https://www.ebi.ac.uk/gwas/) – SNPedid (www.snpedia.com/)