View
105
Download
0
Category
Preview:
Citation preview
Typical Mass-use Pipelines Complex Challenges and Workflows
NGS (Next Generation Sequencing) 1. Total-RNA Analysis (RNA-seq, Non-Coding RNA, Repeats)2. Epigenetics (CHiP-seq and Bisulfate-Seq)3. Variant Calling4. Microbiome (Metagenomics)
Mass Spec 1. Proteomics2. Metabolomics
Structural Biology 1. Libraries of Small Molecules (Query, Clustering)2. Docking (Including large molecules)
Machine Learning 1. Phenotypic Analysis and Modeling2. Analysis of visual data3. Standard Statistical methods4. Integration of heterogenous data sets
CirSeq Mutation Analysis 1. Analysis of viral CirSeq data for precise mutation identification2. Fitness of mutations reflecting viral adaptation3. Identification of viral quasi-species
Mass Spec 1. Protein-protein Interactions between host and viral
proteins2. Post translational modifications of host proteins
Structural Biology 1. Libraries of Small Molecules (Query, Clustering)2. Docking (Including large molecules)
NGS host data 1. Host gene expression variations in response to
infectious quasi-species
T-BioInfo is a user-friendly computational platform that enables analysis and integration of big data. The challenge of mining -omics data for meaningful patters that can be applied in biomedical and agricultural research as sequencing becomes cheaper and more precise. On the other hand, complex
networks of dependencies that define many conditions tend to require integration of huge heterogenous data sets from SNPs, gene expression, epigenetic markers, proteomic and metabolomic profiles, even structural biology data. Our company has developed innovative and user friendly workflows for analysis
and integration of these different datasets. Now we are looking to test and commercialize a platform that provides web access to the platform.
Simple, Flexible and Consistent Interface Across All Sections
Integration of analysis types
One environment for all types of data
and analysis
“one-button” approach to most areas of analysis
• Flexibleanalysispipelinesinthepla/ormsec4onsandeasytoperformdatainput
• Auserisassistedbythepla/orminconstruc4ngmeaningfulalgorithmicpipelinesforprocessingdata:modulesforpipelinecon4nua4onarehighlightedbyblackbackgroundandyellow4tle.
Analysis of Total RNA
Concept:rawtotaltranscriptomereadscontaininforma4onnotonlyaboutexpressedsplicevariants(isoforms)ofgenes,butalsoaboutexpressedtransposonsandregulatorynon-codingRNAs.Thecompleteanalysisconsistsofthreesteps.First,thereadsaremappedonisoformsinordertogetisoformexpressionlevels.Second,previouslyunmappedreadsaremappedonknownrepe44veelements(RE)andnon-codingRNAsinordertogettheirexpressionlevels.Third,therestofreadsareprocessedbyspecialclustering(BiClustering)inordertogetnewexpressedREandnon-codingRNAsaswellastheirexpressionlevelsunderappliedbiologicalcondi4ons.Onthenextstage,dataintegra4oncanbeperformed:interplaybetweenexpressedisoforms,transposons,andregulatoryRNAs.
1Detec4onofexpressedisoformsandtheirexpressionlevelsbymappingthereadsonconstructedtranscripts
√
2 Forunmappedreads: √
3Detec4onofmostexpressedrepeatsandregulatoryRNAfromdatabases
√
4BiClustering:associa4onsofkmersandreadsasabicluster,andgenera4onofKchainsofbiclusters
√
5 ExtensionsofKchains ±
6MappingofNGSreadsonfoundKchains:detec4onofmostexpressednoveltransposonsandregulatoryRNAs
√
T-Bioinfo RNA-seq/chip section
Example: Expression of RepeatsAlgorithmic Approaches:
Analysis of “Junk” RNA
Epigenetic Analysis: Bisulfite DNA Methylation and CHiP-Seq
BisulfiteConcept:bisulfitesequencingshowsTinsteadofCinareadifCofagenomicssite(likeCpG)ismethylated.Thus,detec4onofmethylatedsitesandgenomefragmentsenriched/depletedbymethyla4onisbasedonspecialtypeofreadmapping,andsegmenta4onofthewholegenomemethyla4onprofile.Theanalysisobjec4vesincludespecialmappingalgorithmswithtoleranceoftheT-to-Cmismatch,sta4s4cales4ma4onoftheper-sitemethyla4onlevel,allelespecificityofDNAmethyla4on,aswellasdetec4onoftheover-methylatedandunder-methylatedgenomicregions.
CHiP-SeqConcept:detec4onofepigene4csignalssuchashistonemodifica4onsofdifferenttypesandDNAmethyla4oneventsaswellasdeterminingprotein/DNAbindingsites(TFbindingsites)areperformedbyCHiP-seqandCHiP-chipexperiments.Analysisofprofilesofthesewholegenomesignalsisperformedbythegenomesegmenta4onalgorithms.Theanalysisobjec4vesincludeiden4fyingsignalenrichedgenomefragmentsasputa4veepigene4cevents,andacombina4onofenrichedfragmentsonposi4veandnega4vestrandswithacertaindistancebetweenthemastheTFbindingevent.Onthenextanalysisstage,thedataintegra4oncanbeperformed:interplaybetweengenomemuta4onsandepigene4csignalsononesideandexpressedisoforms,transposons,andregulatoryRNAsontheotherside.Thenetworkofgeneregula4onbyatranscrip4onfactorcanbereconstructedfromthewholegenomeTFbindingposi4onsandexpressionsofthedown-streamgenes.MicroarraydatasetsaretransformedintopseudoNGSreadsandareanalyzedbythesameCHiP-seqpipelines.
T-Bioinfo CHiP-seq section
1 Preprocessingofrawdata √
2MappingofNGSreadsbybisulfitemappingalgorithms:nopenaltyforT(read)-to-C(genome)mismatches
√
3Detec4onoftheDNAmethylatedposi4onsandtheirscoresbytheconfidenceintervalmethod
√
4 Allelespecificityofthemethyla4oninaposi4on. -
5Detec4onofover-methylatedandunder-methylatedgenomicintervalsbythesegmenta4onalgorithms
±
6Detec4onofdifferen4alDNAmethyla4ons(individualposi4onsandintervals)betweencontras4ngcondi4ons
±
Virology Pipeline
Mutation Fitness
Genome-wide fitness calculations enabled by CirSeq, combined with structural information, can provide high-definition, bias-free insights into structure-function relationships, potentially revealing novel functions for viral proteins and RNA structures, as well as nuanced insights into a viral genome’s phenotypic space. Such analyses have the power to reveal protein residues or domains that directly correspond to viral functional plasticity and may significantly inform our structural and mechanistic understanding of host–pathogen interactions.
Integration of Heterogenous Data sets Concept:mutualassocia4onoffeaturesofbiologicaldatasetsismostsubstan4alpartforintegra4onofseveralanalysesofbiologicalprojectsinonestory.Wearesugges4ngseveral
techniquesforsuchassocia4ons.
MatchingofmetaboliteandSNPprofilesaccordingtoLB’sselectionofSNPs
Patent Pending Technology for Drug Discovery
Fast screening and clustering of small molecules based on physico-chemical similarity (70-100 times faster than industry
standard)
SAMPLE STUDIES: !DENGUE! POLIO !
NS3 NS5
Analysis of Mass-Spec Proteomics Data:
Ank1(Adenylate kinase isoenzyme 1)
Increased expression during early infection
Analysis of RNA sequences to reveal Mutation Fitness Proteomic Mutation Fitness
Small Molecule CandidateProteins of Interest by Comparison:
Identifying a biologically active molecule (Polio)
Patent Pending: Ref. P-78368-US | App. No. 14/625,785 entitled SYSTEMS AND METHODS OF IMPROVED MOLECULE SCREENING
Computational analysis of small molecules can be roughly divided into three sections: pre-processing analysis, virtual screening methods, and clustering. The aim of the conformer generation process is to build a set of representative conformers that covers the conformational space of a given molecule. There are two main classes of virtual screening methods: similarity-based methods (descriptor-based screening; geometric querying; shape-based querying; fingerprints) and receptor-based methods (docking). One of the greatest challenges of docking software is to consider protein flexibility. These macromolecules are not static objects and conformational changes are often key elements in ligand binding. T-Bioinfo provides a number of proprietary methods that can be combined into pipelines for drug discovery.
Tauber Bioinformatics Research Center
Tauber Bioinformatics Research Center at the University of Haifa has a proven track record in Bioinformatics with scientific
collaborations with Hospitals, top US Universities, involvement in government-funded projects, and multiple publications in
leading journals such as Science and Nature.
Pine Biotech holds an exclusive license for commercialization of tools developed at the TBRC for research, industry applications
and education. The startup is located at the BioInnovation Center in New Orleans, LA. In collaboration with TBRC staff, Pine Biotech
is completing several pilot projects to validate our approach.
Aleph Therapeuticsא
Early Adopters and Collaborators:
Recommended