22
Jiwen (Kevin) Xin, Cyrus Afrasiabi, Sean D. Mooney, Andrew I. Su, Chunlei Wu [email protected] The Scripps Research Institute La Jolla, CA, USA NGS 2016 04/05/2016 MyVariant.info Community-aggregated Variant Annotations As a Service

MyVariant.info--Community Aggregated Variant Annotation as a Service (NGS2016, Barcelona)

Embed Size (px)

Citation preview

Jiwen (Kevin) Xin, Cyrus Afrasiabi, Sean D. Mooney, Andrew I. Su, Chunlei Wu

[email protected]

The Scripps Research InstituteLa Jolla, CA, USA

NGS 2016

04/05/2016

MyVariant.infoCommunity-aggregated Variant Annotations As a

Service

So many variant annotation resources

dbNSFP

Schematic view of MyVariant.info architecture

Each data source is updated individually. Colors indicate their different updating

schedules.

HGVS name examples

Table . Examples of HGVS (Human Genome Variation Society) nomenclature.

MyVariant.info for the end users:

http://MyVariant.info(currently v1 API, two endpoints)

http://MyVariant.info/v1/query?q=<query>

any query term(s)

matching variant hits

http://MyVariant.info/v1/variant/<variantid>

hgvs id(s)

matching variant object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

http://myvariant.info/v1/variant/chr1:g.31349647C>T

Retrieving a single variant

Integrated annotations across resources in well-formatted data structure

Always up-to-date

http://myvariant.info/v1/variant/chr1:g.31349647C>T

http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp

http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp.clinvar

http://myvariant.info/v1/variant/chr1:g.31349647C>T?fields=dbnsfp.clinvar,dbsnp.gmaf,clinvar.hgvs.coding

Filtering returned fields

Making flexible queries

• All variants with dbNSFP annotation: http://myvariant.info/v1/query?q=_exists_:dbnsfp

• All non-synonymous variants on gene "BTK": http://myvariant.info/v1/query?q=dbnsfp.genename:BTK

• All variants within a genomic range: http://myvariant.info/v1/query?q=chr1:69000-70000

• Query Wellderly variants together with other annotation sources: http://myvariant.info/v1/query?q=_exists_:wellderly AND cadd.polyphen.cat:possibly_damaging

&fields=wellderly,cadd.polyphen

Many more ways of querying, across resources

Full-text queries Wildcard queries Range queries Boolean queries Regex queries Field existing/missing Faceting Paging Sorting Batch queries Support JSONP, CORS …

MyVariant.info stats• total (334,293,820)

• dbNSFP (82,030,830; v3.0)• dbSNP (145,132,257; v144)• ClinVar (131,383; 201602)• EVS (1,977,300; v2)• CADD (226,932,858; v1.3)• MutDB (420,221)• gwassnps (15,243; from UCSC)• COSMIC (1,024,498; v68 from UCSC)• DOCM (1,119)• SNPedia (5,907)• EMVClass (12,066)• Wellderly (21,240,519)• EXAC (10,195,872; v0.3)• GRASP (2,212,148; v2.0.0.0) As of April, 2016

MyVariant.info official Python/R Clients

myvariant Python client hosted in PyPI (initial release in Aug 2015)

myvariant R client hosted in Bioconductor(initial release in Oct 2015)

Use case 1

An easy resource to retrieve well-structured

variant annotations

Use case 2

Direct queries integrated in your analysis pipeline

User Case 2: An example workflow for variant prioritization

input variants

output variants

filter1 <- lapply(vars, function(i) subset(i, cadd.consequence %in% c("NON_SYNONYMOUS", "STOP_GAINED", "STOP_LOST", "CANONICAL_SPLICE", "SPLICE_SITE")))

filter2 <- lapply(filter1, function(i) subset(i, exac.af < 0.01))

filter3 <- lapply(filter2, function(i) subset(i, sapply(dbnsfp.1000gp1.af, function(j) j < 0.01 )))

Use case 3

For curator/data provider:

A platform for

integrating with other resources(saving repetitive efforts)

distribute your valuable data(under your own source field)

Use case 4

For variant curation itself:

Identify discrepancies

Serve as the base of community-engaged curation process

Linked data

URI (Uniform Resource Identifier):

Provide unique identifier for anything or any concept on the website

Connective:connecting data, concepts, applications and ultimately people.

URL (Uniform Resource Link):

Provide unique identifier for webpages

Text files, images, music, videos

Interactive:Twitter, Facebook, blogs

Why Linked Data?

Providing Unique Identifier for a concept

Genename

e.g. CDK2

genename, (database1)

gene_name, (database2)

{’gene’: {‘name’:…}}, (database3)

URI: http://identifiers.org/hgnc.symbol

Data Discrepancy ---- Example

http://myvariant.info/v1/variant/chr12:g.111351981C>T?fields=clinvar.rsid,dbsnp.rsid,evs.rsid

Data Discrepancy ---- Example 2

EVS web browser EVS txt data file

Acknowledgement

Funding and SupportU54GM114833U01HG008473

Washington U:Ben AinscoughObi Griffith

TSRI:

Chunlei WuAndrew SuJiwen XinCyrus AfrasiabiGinger TsuengAdam Mark

Greg StuppTim Putman

STSI:

Eric TopolAli TorkamaniGalina Erikson

U. Washington:

Sean MooneyMoritz JuchlerNikhil Gopal

OICR:Robin Haw

UC Berkeley:Chris Mungall

UCSD:Trish Whetzel

MyVariant.info