Transcript
Page 1: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

MCBIOS: XV

15th Annual Meeting

Genomics and Big Data

March 29 – 31, 2018 The Mill Conference Center at MSU

600 Russell St, Starkville, Mississippi

Hosted By: Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University

Page 2: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Hosts and Sponsors

Acknowledgement: This conference is partially supported by a NIGMS (P20 GM103429) grant from NIH and a FDA Scientific Conference Grant from FDA (5R13FD005931-03). Disclaimer: The views presented at this meeting do not necessarily reflect the current or future opinion or policy of the U.S. Food and Drug Administration. Any mention of commercial products is for clarification and not intended as an endorsement.

Page 3: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

List of Oral Presenters No Name Affiliation Membership

Type Title Session

Determination

Time Date Location

1 Adam Thrash

Mississippi State University, Starkville, MS

Student Keanu: An Interactive Tool for Exploring Sample Content

Session - 2: Next generation tools for environment and health research

9:35-9:50AM

3/30/2018 DELTA

2 Andrew Maxwell

University of Southern Mississippi, Hattiesburg, MS

Student Prediction of Chronic Diseases on Imbalanced Data with Deep Neural Networks

Session – 8: Genomics and Infectious Disease

11:05-11:20AM

3/31/2018 DELTA

3 Arif Rahman

University of Tennessee Health Science Center, Memphis, TN

Student Enzyme kinetics and toxicological evaluation of diallyl sulfide analogs for their novel role as CYP2E1 inhibitors using molecular modeling and in vitro methods

Session - 6: Big Data and Risk Assessment

12:05-12:20PM

3/30/2018 FOOTHILLS

4 AyoOluwa Aderibigbe

University of Mississippi, Oxford, MS

Student SIFTING THROUGH BIG DATA: THE SEARCH FOR PERIPHERALLY-RESTRICTED CB1 RECEPTOR ANTAGONISTS AND INVERSE AGONISTS

MCBIOS-JMP Young Scientist Award

04:00 – 05:15pm

3/29/2018 The Mississippian Ballroom C

5 Badri Adhikari

University of Missouri, St. Louis

Professional Two-level Deep Convolutional Neural Networks Applied to Protein Contact Prediction

Session - 7: Genomics and Proteomics application

10:20-10:35AM

3/31/2018 RIVERS

6 Binsheng Gong

National Center for Toxicological Research, US FDA, NCTR

Professional Next-Generation Sequencing using Galaxy

Workshop - 4: Next-Generation Sequencing using Galaxy

4:00-5:45PM

3/30/2018 FOOTHILLS

7 Bohu Pan National Center for Toxicological

Professional Post-Doc

Assessment release effects of human genome

Session - 3: Drug Discovery

10:20-10:35AM

3/30/2018 FOOTHILLS

Page 4: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Research, US FDA, NCTR

on SNVs calling results from different pipelines based on GIAB data

and Precision Medicine

8 Brian Counterman

Mississippi State University, Starkville, MS

Professional Patternize: an R package for color pattern variation

Session - 5: Transcriptomics and Genome Sequencing

11:00-11:20AM

3/30/2018 DELTA

9 Brian Walker

University of Arkansas at Little Rock, Little Rock, AR

Professional Post-Doc

SYNTHESIS OF XANTHINE DERVIATIVES FOR THE INHIBITION OF PARG

MCBIOS-JMP Young Scientist Award

02:30 – 03:45pm

3/29/2018 The Mississippian Ballroom C

10 Chathurani Ranathunge

Mississippi State University, Starkville, MS

Student Transcribed microsatellites as engines of adaptive evolution in common sunflower

MCBIOS-JMP Young Scientist Award

04:00 – 05:15pm

3/29/2018 The Mississippian Ballroom C

11 Dan Li University of Arkansas at Little Rock, Little Rock, AR

Student Identification of cancer related gene subnetworks based on global optimum searching

MCBIOS-JMP Young Scientist Award

04:00 – 05:15pm

3/29/2018 The Mississippian Ballroom C

12 Daniel Himmelstein

University of Pennsylvania, Philadelphia, PN

Professional Post-Doc

Encoding biomedical knowledge using hetnets

Workshop - 3: No-Boundry Thinking Bioinformatics Research workshop

4:00-5:45PM

3/30/2018 DELTA

13 Darshan Mehta

National Center for Toxicological Research, US FDA, NCTR

Professional Post-Doc

Mining pharmacogenomic information from drug labeling using FDALabel database for advancing precision medicine

MCBIOS-JMP Young Scientist Award

02:30 – 03:45pm

3/29/2018 The Mississippian Ballroom C

14 David Ashbrook

University of Tennessee Health Science Center, Memphis,

Professional Post-Doc

Sequencing the BXD family, a cohort for experimental systems genetics and precision

Session - 3: Drug Discovery and Precision Medicine

9:50-10:05AM

3/30/2018 FOOTHILLS

Page 5: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

TN medicine

15 Dong Wang National Center for Toxicological Research, US FDA, NCTR

Professional Infer the in vivo Point of Departure with ToxCast in vitro Assay Data Using a Robust Learning Approach

Session - 6: Big Data and Risk Assessment

11:35-11:50AM

3/30/2018 FOOTHILLS

16 Dongying Li National Center for Toxicological Research, US FDA, NCTR

Professional Post-Doc

Identification of MiR-486-5p As A Novel Gene Regulator of Liver Detoxification Enzyme Sulfotransferase 2A1

MCBIOS-JMP Young Scientist Award

02:30 – 03:45pm

3/29/2018 The Mississippian Ballroom C

17 Evan McConnell

University of North Carolina Chapel Hill, Cary, NC

Student The phosphorylated redox proteome of Chlamydomonas reinhardtii: revealing novel means for enzymatic regulation.

Session -4: Plant Omics II

11:35-11:50AM

3/30/2018 RIVERS

18 Gabriel Idakwo

University of Southern Mississippi, Hattiesburg, MS

Student SMOTEENNBagging: A novel ensemble resampling and learning approach to QSAR modeling with imbalanced data

MCBIOS-JMP Young Scientist Award

04:00 – 05:15pm

3/29/2018 The Mississippian Ballroom C

19 George Popescu

Mississippi State University, Starkville, MS

Professional Network-centric analysis of pathways for resistance and susceptibility in host pathogen interactions

Session -4: Plant Omics II

11:20-11:35AM

3/30/2018 RIVERS

20 Gouri Mahajan

University of Mississippi Medical Center, Jackson, MS

Student Altered Neuro-inflammatory Gene Expression in Hippocampus in Major Depressive Disorder

MCBIOS-JMP Young Scientist Award

04:00 – 05:15pm

3/29/2018 The Mississippian Ballroom C

21 Grover P Miller

University of Arkansas for Medical Sciences,

Professional Chasing the Ghost of Lamisil (terbinafine) Toxicit

Session - 6: Big Data and Risk Assessment

11:20-11:35AM

3/30/2018 FOOTHILLS

Page 6: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Little Rock, AR

22 Hamilton Wan

The Mississippi School for Mathematics and Science, Starkville, MS

High School Identification of host genes determining the pathogenesis of influenza A viruses in mice using machine learning

Session – 8: Genomics and Infectious Disease

10:50-11:05AM

3/31/2018 DELTA

23 Hongmei Jiang

Northwestern University, Evanston, IL

Professional Microbial interactions and microbe-host interactions

Workshop - 3: No-Boundry Thinking Bioinformatics Research workshop

4:00-5:45PM

3/30/2018 DELTA

24 Hossam Abdelhamed

Mississippi State University, Starkville, MS

Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment

Session – 8: Genomics and Infectious Disease

10:35-10:50AM

3/31/2018 DELTA

25 Inimary Toby

University of Texas Southwestern, Dallas, TX

Professional Carrer Development Workshop for Young Scientist

Workshop - 2: Carrer Development Workshop for Young Scientist

1:30-3:15PM

3/30/2018 DELTA

26 Jake Chen University of Alabama, Birmingham, AL

Professional “Super Gene Sets”: Toward Integrated Gene-set, Network, and Pathway Analysis

Session - 3: Drug Discovery and Precision Medicine

9:35-9:50AM

3/30/2018 FOOTHILLS

27 John Thomason

Mississippi State University, Starkville, MS

Professional PROTEOMIC ANALYSIS OF ERYTHROCYTE STORAGE LESIONS IN UNITS OF STORED CANINE PACKED RED BLOOD CELLS

Session - 7: Genomics and Proteomics application

10:35-10:50AM

3/31/2018 RIVERS

28 Joseph Luttrell IV

University of Southern Mississippi, Hattiesburg, MS

Student Benchmarking Protein Residue-Residue Contact Prediction Using Random Forests and Deep Networks

Session - 7: Genomics and Proteomics application

10:50-11:05AM

3/31/2018 RIVERS

29 Joshua Xu National Professional Sensitivity and Session - 9: 10:00- 3/31/2018 FOOTH

Page 7: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Center for Toxicological Research, US FDA, NCTR

reproducibility of onco-panel sequencing across multiple laboratories and technologies: An SEQC2 Component Study

MCBIOS Group Projects

10:25AM

ILLS

30 Karyn Willyerd

Oklahoma State University, Stillwater, OK

Professional Post-Doc

Exploration and Exploitation of Agro-Genomic Variation in Hexaploid Wheat for Yield Stability and End-User Quality

Session -4: Plant Omics II

11:50AM-12:05PM

3/30/2018 RIVERS

31 Kori Bohon University of Arkansas at Little Rock, Little Rock, AR

Student Harnessing a Polyspecific Response to Tumor Associated Carbohydrate Antigens

Session - 2: Next generation tools for environment and health research

10:20-10:35AM

3/30/2018 DELTA

32 Kyler Holmes

Alcorn State University, Lorman, MS

Student Analysis of full-length infectious genomic cDNA clones of SPFMV and SPLCV and exploiting the approaches of biotechnology in sweetpotato for virus diseases resistance

Session - 1: Plant Omics I

10:20-10:35AM

3/30/2018 RIVERS

33 Lawrance J. Lesko

University of Florida, Orlando, FL

Professional Real World Data and Precision Medicine: Treatment Selection and Dose Optimization Strategies

Keynote Speaker

8:00-8:50AM

3/30/2018 The Mississippian Ballroom C

34 Ling Li Mississippi State University, Starkville, MS

Professional From Arabidopsis to crops: the QQS orphan gene modulates carbon and nitrogen allocation across species

Session - 1: Plant Omics I

9:50-10:05AM

3/30/2018 RIVERS

35 Marilyn Warburton

United States Department of

Professional A pathway-based method to interpret GWAS results

Session - 1: Plant Omics I

9:15-9:35AM

3/30/2018 RIVERS

Page 8: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Agriculture, Agriculture Research Services, MS

36 Martin Wubben

United States Department of Agriculture, Agriculture Research Services, MS

Professional Marker-assisted-selection coupled with recombinant inbred line genome sequencing identifies a root-knot nematode resistance gene in Upland cotton

Session - 1: Plant Omics I

9:35-9:50AM

3/30/2018 RIVERS

37 Minjun Chen

National Center for Toxicological Research, US FDA, NCTR

Professional Develop predictive model for assessing drug-induced liver injury in humans

Session - 6: Big Data and Risk Assessment

11:50AM-12:05PM

3/30/2018 FOOTHILLS

38 Natalia Reyero

Mississippi State University, Starkville, MS

Professional Tools for environmental and predictive biology

Session - 2: Next generation tools for environment and health research

9:15-9:35AM

3/30/2018 DELTA

39 Pankaj Pandey

University of Mississippi, Oxford, MS

Professional Post-Doc

Protein structure-based virtual screening: Identification of potent natural product-chemotypes as cannabinoid receptor 1 inverse agonists

MCBIOS-JMP Young Scientist Award

02:30 – 03:45pm

3/29/2018 The Mississippian Ballroom C

40 Ping Gong US Army Engineer Research & Development Center, Vicksburg, MS

Professional Bioinformatic Identification and Phylogenetic Analysis of Putative Chemosensory Receptors in Adult Northern Leopard Frog (Lithobates pipiens) from RNA-Seq Data

Session - 5: Transcriptomics and Genome Sequencing

11:20-11:35AM

3/30/2018 DELTA

41 Rakesh Kaundal

Utah State University, Logan, UT

Professional Complete genome sequence of

Session - 7: Genomics and

10:00-10:20AM

3/31/2018 RIVERS

Page 9: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Pythium brassicum P1, an oomycete root pathogen: insights into its host specificity to brassicaceae

Proteomics application

42 Robert J. Doerksen

University of Mississippi, Oxford, MS

Professional Protein structure-based virtual screening: deep learning for presision medicine

Session - 3: Drug Discovery and Precision Medicine

9:15-9:35AM

3/30/2018 FOOTHILLS

43 Russ Wolfinger

JMP Life Sciences, SAS Institute, Cary, NC

Professional Next-Gen Data Science

Keynote Speaker

1:20-2:10PM

3/29/2018 The Mississippian Ballroom C

44 Saroj Sah Mississippi State University, Starkville, MS

Student Transcriptome analysis of abscisic acid-activated protein kinase in abiotic stress in soybean (Glycine max)

Session -4: Plant Omics II

12:05-12:20PM

3/30/2018 RIVERS

45 Scott M. Williams

Case Western Reserve University, Cleveland, Ohio

Professional Evolution as a metaphor for No Boundary Thinking

Workshop - 3 : No-Boundry Thinking Bioinformatics Research workshop

4:00-5:45PM

3/30/2018 DELTA

46 Shraddha Thakkar

National Center for Toxicological Research, US FDA, NCTR

Professional Predicting Drug-Induced Liver Injury (DILI) – Comparing in silico, genomic and Tox21 screening methods

Session - 9: MCBIOS Group Projects

10:25-10:50AM

3/31/2018 FOOTHILLS

47 Siamak Yousefi

University of Tennessee Health Science Center, Memphis, TN

Professional Single-cell RNA-seq analysis of retinal ganglion cell subtypes of glaucoma DBA/2J mice

Session - 5: Transcriptomics and Genome Sequencing

11:50AM-12:05PM

3/30/2018 DELTA

48 Stephen Pruett

Mississippi State University, Starkville, MS

Professional Machine Learning Analysis of the Relationship

between Changes in Immunological

Session – 8: Genomics and Infectious Disease

10:00-10:20AM

3/31/2018 DELTA

Page 10: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Parameters and Changes in

Resistance to Listeria

Monocytogenes: A New Approach

for Risk Assessment and

Systems Immunology

49 Steve Jennings

University of Arkansas at Little Rock, Little Rock, AR

Professional No-Boundary Thinking: Defining Problems So Their Solutions Matter

Keynote Speaker

3:15-3:50PM

3/30/2018 The Mississippian Ballroom C

50 Sundar Thangapandian

University of Illinois Urbana-Champaign, Urbana, IL

Professional Post-Doc

Quantitative Target-specific Toxicity Prediction Model (QTTPM): A Novel Computational Toxicology Approach Integrating Molecular Dynamics Simulation and Machine Learning

MCBIOS-JMP Young Scientist Award

02:30 – 03:45pm

3/29/2018 The Mississippian Ballroom C

51 Tessa Burch-Smith

University of Tennessee, Knoxville, TN

Professional Focused ion beam-scanning electron microscopy for three-dimensional modelling of cellular ultrastructure

Session -4: Plant Omics II

11:00-11:20AM

3/30/2018 RIVERS

52 Thanh Nguyen

University of Alabama, Birmingham, AL

Student Weighted in-network Node Expansion and Ranking (WINNER): a New Approach to Identify Potential Biomarkers

Session - 2: Next generation tools for environment and health research

9:50-10:05AM

3/30/2018 DELTA

53 Ujwani Nukala

University of Arkansas at Little Rock, Little Rock, AR

Student Development of novel vitamin E analogs as potent radioprotectors

Session - 3: Drug Discovery and Precision Medicine

10:05-10:20AM

3/30/2018 FOOTHILLS

54 Wei Tan Mississippi Professional Effects of Ethanol Session – 8: 10:20- 3/31/2018 DELTA

Page 11: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

State University, Starkville, MS

on Escherichia coli-mediated sepsis: Differences Between Changes in Gene Expression Early and Late in the Course of Infection.

Genomics and Infectious Disease

10:35AM

55 Wei Zhuang

National Center for Toxicological Research, US FDA, NCTR

Professional A Nonparametric Statistical Method for Detection of Serum microRNAs Biomarkers with Complete and Incomplete qPCR Data

Session - 5: Transcriptomics and Genome Sequencing

12:05-12:20PM

3/30/2018 DELTA

56 Weida Tong

National Center for Toxicological Research, US FDA, NCTR

Professional A decade of MAQC effort and its contribution to our understanding of high-throughput genomics technologies

Keynote Speaker

11:30-12:20PM

3/31/2018 The Mississippian Ballroom C

57 Wenjun Bao

JMP Life Sciences, SAS Institute, Cary, NC

Professional Advanced Data Analytics using Jmp Genomics

Workshop - 1: Advanced Data Analytics using Jmp Genomics

1:30-3:15PM

3/30/2018 FOOTHILLS

58 William J Welsh

Rutgers University, New Brunswick, NJ

Professional Informatics Tools for Big Biologicals and Small Drug Molecules

Keynote Speaker

9:00-9:50AM

3/31/2018 The Mississippian Ballroom C

59 William Mattes

National Center for Toxicological Research, US FDA, NCTR

Professional Systems Biology and Big Data: Little Mitochondria as a Big Example

Session - 6: Big Data and Risk Assessment

11:00-11:20AM

3/30/2018 FOOTHILLS

60 William Sanders

IT Research and Cyberinfrastructure, Hartford, Connecticut

Professional MCBIOS Timber Rattlesnake Genome Project: Current Status and Lessons Learned

Session - 9: MCBIOS Group Projects

10:50-11:15AM

3/31/2018 FOOTHILLS

61 Yan Li Bennett Aerospace,

Professional Distinguishing Resistance-

Session - 2: Next

10:05-10:20

3/30/2018 DELTA

Page 12: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Cary, NC Conferring from Susceptible Mutations in Acetohydroxyacid Synthase by High Performance Computing-Enabled Computational Modeling

generation tools for environment and health research

AM

62 Ying Wang Mississippi State University, Starkville, MS

Professional Applying multiple transcriptome analyses to understand plant-viroid interactions

Session - 1: Plant Omics I

10:05-10:20AM

3/30/2018 RIVERS

63 Didi Ren Mississippi State University, Starkville, MS

Student A new genome annotation, comprehensively compared and evaluated against a prior annotation for the filamentous fungus Cryphonectria parasitica

Session - 5: Transcriptomics and Genome Sequencing

11:35-11:50AM

3/30/2018 DELTA

64 Zongliang Yue

University of Alabama, Birmingham, AL

Student WIPER: Weighted in-path edge ranking for biomolecular association networks

Session - 7: Genomics and Proteomics application

11:05-11:20AM

3/31/2018 RIVERS

Page 13: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Keanu: An Interactive Tool for Exploring Sample Content

Adam Thrash1, Tony Arick1, Robyn Barbato2, Robert Jones2, Tom Douglas3, Ed Perkins4, Natalia Garcia-Reyero4

1 Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Starkville, MS

2 US Army Engineer Research and Development Center, Cold Regions Research and Engineering Laboratory, Hanover, NH

3 US Army Engineer Research and Development Center, Cold Regions Research and Engineering Laboratory, Fairbanks, AK

4 US Army Engineer Research and Development Center, Environmental laboratory, Vicksburg, MS

Background: The Holocene Epoch in interior Alaska marks the development of modern ecosystems. Understanding how ecosystems changed in the past in response to climate is crucial to predict how ecosystem function may change in response to the recent and projected continued warming of the local climate. Here, we generated shotgun metagenomics data from soil microflora using a combination of the HiSeq and MiSeq Illumina platforms in order to compare ancient paleosols representing the Holocene Epoch at an alpine site in interior Alaska. One of the main challenges when dealing with complex metagenomics data is the fact that large amounts of information need to be presented in a comprehensive and easy-to-navigate way. In the process of analyzing FASTQ data, especially metagenomics data or data suspected to be contaminated, visualizing the organisms present in the data can be useful. Results: Keanu, a tool for exploring sample content, allows a user to understand what organisms are present in a sample and in what abundance by analyzing alignments against the BLAST NT database and displaying them in an interactive web page. The content of a sample is presented as a collapsible tree, with node sizes indicating abundance. Conclusions: Here, we show how Keanu was used to analyze and visualize the sequence data generated from the Alaskan paleosols.

Page 14: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Abstract Identifying Number: 1006462

Prediction of Chronic Diseases on Imbalanced Data with Deep Neural Networks Andrew Maxwell1, Runzhi Li2, Bei Yang2, Zhaoxian Zhou1, Ping Gong3, and Chaoyang Zhang*1

1* School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA

2 Cooperative Innovation Center of Internet Healthcare, School of Information & Engineering, Zhengzhou University, Zhengzhou, 450000, China

3 Environmental Lab, US Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA

Background: Deep Neural Networks have gained in popularity as a useful machine learning technique for medical data. As a classification tool, Deep Neural Networks can be utilized to handle various types of medical data to find correlations that otherwise may not be found. In this study, Deep Neural Networks are used to identify whether or not a patient's medical data can reveal what type of chronic disease a patient may have. For intelligent health risk prediction, it is found that Deep Neural Networks can be a viable option for physicians to improve the quality of life for their patients by early identification of chronic diseases. Results: Physical examination records of 110,300 anonymous patients were used to predict three different types of chronic diseases: hypertension, diabetes, and fatty liver. The dataset was split into training and testing sub-datasets. Ten-fold cross validation was used to evaluate prediction accuracy with metrics such as precision, recall, and F-score. Deep Learning (DL) architectures were compared with standard and state-of-the-art multi-label classification methods. Results suggest that Deep Neural Networks (DNN), when applied to multi-label classification of chronic diseases, produced accuracy that was comparable to that of common methods such as Support Vector Machines. We have implemented DNNs to handle both problem transformation and algorithm adaption type multi-label methods and compare both to see which is preferable. Conclusions: Intelligent health risk prediction is an area in which machine learning techniques are used in an effort to better understand the risks to a patient. Methods utilizing deep neural networks can infer correlations about chronic diseases as good as or better than traditional classification techniques. In addition, these networks have the ability to expand the predictive power of the network as new information for chronic diseases are introduced to the network. Innovation in research: The advancement of computing technology has allowed the resurgence of methods involving Neural Networks to become a viable approach in machine learning for medical data. At the time of this study, few multi-label datasets were available for analysis, especially in medical data. This experience has produced two acknowledgements; strategies in the consideration of multiple features in a dataset for feature selection, as well as a specific approach that ensures outcomes from the neural network are a result of the network’s knowledge of the dataset instead of an assumed predictive value. Because the medical data originates from patient examinations, it demonstrates that the data gathered will not always be perfect, and certain adjustments are necessary for accuracy and reproducible results. Deep

Comment [CZ1]: You can add some specific results here.

Formatted: Font: Bold

Formatted: Not Highlight

Formatted: Not Highlight

Formatted: Not Highlight

Formatted: Not Highlight

Formatted: Font: Italic, Not Highlight

Formatted: Not Highlight

Page 15: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Learning is valuable in multi-label datasets when thoughtful consideration of the distribution and features is aligned with persistence. Author contribution: The author has learned a great deal about the Deep Learning framework used to carry out analysis of the architecture for the medical data. The traditional classification techniques in analysis of multi-label datasets were performed using the WEKA and Mulan packages, meaning knowledge of Java, WEKA, and the Mulan API was gathered to integrate these different services in allowance of classifier predictions. Additionally, modification of the Mulan package occurred to parallelize tasks and speed up analysis, as the performance was exceptionally slow. The author collaborated with Runzhi Li, who originally obtained the data to both apply her previous methods for comparison and to learn about the data for feature selection. Further guidance from other authors ensured validity of the analysis in conveyance to the audience of the utility of Deep Learning methods.

Formatted: Highlight

Formatted: Font: Bold

Page 16: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Enzyme kinetics and toxicological evaluation of diallyl sulfide analogs for their novel role as CYP2E1 inhibitors using molecular modeling and in vitro methods

Mohammad A. Rahman, Narasimha M. Midde, Xiaoxin Wu, Wei Li, and Santosh Kumar*

Department of Pharmaceutical Sciences, College of Pharmacy, University of Tennessee Health Science Center, Memphis TN 38163 USA

Backgrounds: Cytochrome P450 2E1 (CYP2E1) in chronic ethanol (ETH) use and acetaminophen (APAP) abuse is associated with severe hepatic and extra-hepatic toxicity through the production of CYP2E1-mediated toxic metabolites. Diallyl sulfide (DAS), a selective inhibitor of CYP2E1, has shown protective effects against ETH- and APAP-induced toxicity in many studies. However, it is also a CYP2E1 substrate that upon metabolism produces toxic metabolites. The objective of this study was to find a potent and safer DAS analog as a CYP2E1 inhibitor. We analyzed seven compounds that resemble DAS, but have slightly different chemical moieties. These compounds were selected based on the molecular docking study. The binding mode and binding energy obtained from this analysis suggested a strong potential for these analogs to act as a CYP2E1 inhibitor. We also performed a comprehensive inhibition kinetics of these analogs and determined the relative IC50, K i, and types of inhibition compared to that of DAS. Further, we conducted in vitro toxicity study using hepatic and extra-hepatic cell lines. Results: The results showed that compared to DAS, diallyl ether (DE) and allyl methyl sulfide (AMS) had lower Ki values. These results were in consistent with the molecular modeling data. Thiophene (TP) showed similar inhibitory capacities to that of DAS, and four other analogs showed lower potency than DAS. The in vitro study showed that DE, TP, and AMS were significantly less cytotoxic than DAS, even at high concentration. Most importantly, DE TP, & AMS were able to prevent ethanol and acetaminophen mediated toxicity in primary hepatocytes. Conclusions: In conclusion, this is the first report on a thorough analysis of CYP2E1 inhibition kinetics of DAS and its seven structural analogs. These are significant findings in search of a novel CYP2E1 inhibitor that can be used as a better tool than DAS for toxicity studies. Projects funded by: NIH (RO1AA022063 to SK)

Page 17: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Abstract Identifying Number: 1006913

SIFTING THROUGH BIG DATA: THE SEARCH FOR PERIPHERALLY-RESTRICTED CB1 RECEPTOR ANTAGONISTS AND INVERSE AGONISTS

AyoOluwa O. Aderibigbe1, Pankaj Pandey1 and Robert J. Doerksen1,2,*

1Division of Medicinal Chemistry, Department of BioMolecular Sciences; and 2National Center for Natural Products Research, School of Pharmacy, University of Mississippi, University, MS 38677, USA

Background: The development of cannabinoid receptor 1 (CB1) antagonists as pharmacotherapeutic agents for treating obesity has been limited by their CNS-mediated adverse effects. However, selective antagonism of the peripheral CB1 receptors has shown potential in body weight reduction and in avoiding CNS-mediated side effects. The purpose of this project is to identify novel peripherally-restricted CB1 antagonists/inverse agonists using 3D-QSAR analysis and a hybrid virtual screening protocol.

Results: We prepared a dataset of 77 molecules for which there was experimental evidence in the literature for peripheral restriction, strong CB1 binding and selectivity for CB1 over CB2 receptor subtype. To ensure structural diversity, 2D molecular fingerprints, including linear and ECFP fingerprints, of the dataset were computed. We clustered the dataset by a hierarchical method, using the Tanimoto coefficient as metric and the Kelley criterion to link clusters. We built 28 pharmacophore models and evaluated the models based on their survival and BEDROC scores. The highest-ranking pharmacophore model, a 4-point AHRR model, had a BEDROC score of 0.9977. The preparation of the 3D-QSAR training and test sets, consisting of 41 and 14 molecules, respectively, involved the selection of molecules from each cluster, while ensuring that the set of selected molecules had a wide spread of CB1 activity. Subsequently, we prepared 3D-QSAR models using all 28 pharmacophore models. Extensive analysis and validation of the 3D-QSAR models guided the selection of the most optimal model for hybrid virtual screening of public databases.

Conclusions: Results obtained from the ligand-based and protein structure-based virtual screening processes were further refined by considering the predicted ADME properties of the hit molecules to predict novel peripherally-restricted CB1-selective antagonists. Significant findings from the present study will be presented, which will serve as good starting points for drug development of peripherally-restricted CB1-selective blockers with excellent pharmacokinetic profiles for the management of obesity.

Page 18: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Innovations in research: The study was enhanced using a quality-by-design system through the careful selection of molecules with the desired properties of peripheral restriction and high CB1 selectivity, in contrast to the traditional 3D-QSAR approaches in which centrally-active CB1 ligands were included in training sets. The virtual screening methods we used were also enriched by the addition of excluded volumes based on a comparison of the alignments of active with inactive molecules. Finally, a consideration of the predicted pharmacokinetic properties of the hit molecules was implemented to help minimize the attrition rate of molecules in drug development studies.

Author contributions: The project design and all the computational work was implemented by AyoOluwa O. Aderibigbe under the supervision of Dr. Pankaj Pandey and Dr. Robert J. Doerksen.

Page 19: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Two-level Deep Convolutional Neural Networks Applied to Protein Contact Prediction

Badri Adhikari1, Jie Hou2, and Jianlin Cheng2

1Department of Mathematics and Computer Science, University of Missouri-St. Louis,

St. Louis, Missouri, 63121, USA. 2Department of Electrical Engineering and Computer Science, University of Missouri,

Columbia, Missouri, 65211, USA. Background: Significant improvements in the prediction of protein residue-residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. Results: In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks - the first five predict contacts at 6, 7.5, 8, 8.5, and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11, and 12 experiments, DNCON2 achieves mean precisions of 35%, 50%, and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset, and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. Conclusion: The improved performance of DNCON2 can be attributed to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. DNCON2 is currently available as a web-server at http://sysbio.rnet.missouri.edu/dncon2/ and a downloadable version is also available at https://github.com/multicom-toolbox/DNCON2/.

Page 20: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Assessment release effects of human genome on SNVs calling results from different pipelines

based on GIAB data

Bohu Pan1, Wenming Xiao1, Zhichao Liu1, Weida tong1, Huixiao Hong1*

1*Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079;

Background: Reference genome selection is the cornerstone in analysis of next generation sequencing (NGS) data. Hg19 and Hg38 are the two major versions of human genome used in the community for the last ten years. Coordinates change between these two versions has been analyzed and released by the UCSC. However, concordance in the results based on HG19 and HG38 has not been assessed. Results: We conducted comparative analysis on the SNVs based on HG19 and HG38 to assess the consistency in the SNVs between these 2 reference genomes using the WGS data from genome-in-a-bottle (GIAB) project. SNVs were called using 20 different pipelines based on both HG19 and HG38. Two conversion tools were used to convert SNVs between HG19 and HG38. We calculated the conversion rates, analyzed the discordant rates between HG19 and HG38, and characterized the discordant SNVs. The conversion from Hg38 to Hg19 (average 99%) was higher than the conversion from Hg19 to Hg38 (average 95%). The conversion rates slightly varied among calling pipeline but were consistent between the two conversion tools. Around 1.5% SNVs was discordant between the results from HG19 and HG38. The SNVs in the low confidence region defined by GIAB were found more discordant comparing with the SNVs in the high confidence region. The discordant SNVs with bases C and G showed a higher likelihood (52% observed versus 42% expected) compared to those with bases A and T (48% observed versus 58% expected). Conclusions: Our results indicate that SNVs generated using HG19 and HG38 were not the same. The discordant SNVs could be caused by the genomes and the conversion tools. Our findings suggest that caution should be taken when transferring the results between different reference genomes.

Page 21: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Patternize: an R package for color pattern variation

Brian Counterman, Steven Van Belleghem1

1Department of Biological Sciences, Mississippi State University, Starkville, MS, 39762 Background: The use of image data to quantify, study and compare variation in the colors and patterns of organisms requires the alignment of images to establish homology, followed by color-based segmentation of images. We developed patternize, an R package for image alignment and segmentation that has applications to quantify color patterns in a wide range of organisms. Patternize provides utilities to extract, transform, superimpose color patterns, as well as downstream statistical analyses. Results: We demonstrate the utility of patternize to study color patterning with Heliconius butterflies and more challenging examples from guppy fish, Galápagos wolf spiders and salamanders. With patternize we extracted color patterns from large numbers of images using an RGB threshold, k-means clustering or watershed transformation. Patternize then identified between specimens either through manually placed homologous landmarks or automated image registration. Using the superimpose and Principle Component Analyses functions to compare samples, confirmed the automated image registration produced comparable results to traditional landmark based methods. Conclusions: Using patternize, we showed that automated image registration removed the need for labor-intensive landmark placement for individual samples. We also saw that variation in photographic conditions complicated color pattern extraction, but showed that iterative runs and/or reference images for RGB thresholds, watershed andk-means clustering improved color pattern extraction. Patternize outputs rasters objects that provide for a wide range of downstream analyses. We demonstrated how patternize can be used to extract patterns, sum or subtract the patterns to plot heatmaps, calculate the relative pattern area and conduct principal component analysis (PCA) among groups of samples. These quantitative measures of variation in color patterns provided by patternize can be used for population comparisons, genetic association studies and investigating dominance and epigenetic interactions of color pattern variation in a wide range of organisms.

Page 22: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

SYNTHESIS OF XANTHINE DERVIATIVES FOR THE INHIBITION OF PARG

Brian L. Walker1, Lakshitha B. Modarage1, Zamal Ahmed2, John A. Tainer2, Darin E. Jones*1

1Department of Chemistry, University of Arkansas at Little Rock, Little Rock, AR 72204.

2Department of Molecular & Cellular Oncology, MD Anderson Cancer, University of Texas, Houston, TX 77030.

Background: PARG is a promising drug target for cancer and other disease states associated with oxidative stress or chronic inflammation. A genetic knock-down of PARG sensitizes a variety of cancer-derived cell lines to chemotherapeutic agents and radiation, including BRCA-deficient cancer cells. A reduction of PARG activity also decreases cytotoxicity associated with inflammation, ischemia, and stroke. Despite the therapeutic potential of PARG, only a few inhibitors have been reported. Results: The PARG inhibitors we have identified are small, xanthine-based compounds bearing resemblance to drugs with very favorable safety and pharmacokinetic properties. Currently, we are preparing several focused chemical libraries from commercially available starting material using well-established and straight forward synthetic methods that can be executed in a parallel format to allow the rapid development of rigorous structure activity relationships for refinement of our pharmacophore model. Conclusions: Preliminary results of structure elaboration to improve the selectivity and potency of xanthine-based inhibitors of poly(ADP-ribose) glycohydrolase (PARG) are presented. Innovation: The synthesis and turnover of PAR by PARP1 and PARG, respectively, are required for normal responses to DNA damage and both enzymes are being evaluated as targets for cancer therapy. PARP1’s enzymatic activity is activated by binding to damaged DNA, causing auto-modification of PARP1 and recruitment of the DNA repair scaffold XRCC1, which lacks enzymatic activity but interacts with multiple repair factors. PARylated PARP1 binds to a conserved PAR-binding motif (PBM) located in one of two BRCA1 C-terminus (BRCT) domains of XRCC1. Following the recruitment of XRCC1 and associated repair enzymes, PARP1 dissociates from the damaged chromatin and it is polyubiquitinylated and degraded by the proteasome. PARG activity is required for this repair process and may facilitate the exchange of XRCC1 from PARP1 to another binding partner that retains XRCC1 at sites of DNA damage. The role of PARG activity and mechanism explaining the requirement for PAR degradation during PARP1-dependent repair of DNA strand breaks are unknown. Author contributions: My main contributions to this project have been in the design and synthesis leading to the functionalization of a xanthine-based template. This template has several sites that allow for elaboration. I am systematically designing the synthesis of different analogs based on the electronics and the size of the target binding domain. I am also purifying and fully characterizing these analogs before sending samples to our collaborators. After the compounds are biologically screened, we can identify the motif that shows the greatest inhibition. NCI R01 CA200231

Page 23: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Abstract

Transcribed microsatellites as engines of adaptive evolution in common sunflower

Chathurani Ranathunge1, Gregory L. Wheeler2, Andy D. Perkins3, and Mark E. Welch*1

1*Department of Biological Sciences, Mississippi State University, Starkville, MS, 39759

2Department of Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, OH, 43210

3Department of Computer Science and Engineering, Mississippi State University, MS, 39759

Background: The mechanisms by which natural populations rapidly adapt to their local environments are not completely understood. One such proposed mechanism, the “tuning knob” model, predicts that stepwise changes in microsatellite allele length can lead to stepwise effects on phenotypes. To test the predictions of the “tuning knob” model, we estimated the effect of microsatellite allele length on heritable phenotypic variation at the level of gene expression with natural populations of the common sunflower (Helianthus annuus L.). Seeds collected from six populations at two distinct latitudes in Kansas and Oklahoma were planted and grown in a common garden. An RNA-Seq experiment was conducted with 95 of these individuals. Results: Of the 3325 microsatellites genotyped using the RNA-Seq data, 479 showed significant correlation between allele length and gene expression (hereafter termed eSTRs). Further, when irregular allele sizes not conforming to the repeat motif were removed from the analysis, the number of eSTRs rose to 2379. The percentage of variation in gene expression explained by eSTRs ranged from 1 – 86% when controlling for population and allele-by-population interaction effects at the 479 eSTRs initially identified. The majority (70.4%) of the 479 eSTRs were located within the untranslated regions (UTRs) which suggests that they are well positioned to function as cis-regulatory elements. A Gene Ontology (GO) analysis revealed that eSTRs are significantly enriched for GO terms associated with cis- and trans-regulatory processes. Conclusions: This study provides compelling evidence that a substantial number of transcribed microsatellites can rapidly generate heritable and potentially adaptive genetic variation. MCBIOS-SAS Young Scientist Excellence Award 2018 Innovation in research: Microsatellites have long been regarded as non-functional, neutrally evolving regions of the genome. Studies now provide evidence consistent with a functional, adaptive role for microsatellites. However, many of these studies have been limited to studying these effects at one or few microsatellites and only a few studies have addressed these hypotheses regarding microsatellites at the genome level. In this study we demonstrate using RNA-Seq that a substantial number of microsatellites in the transcribed regions of sunflower are capable of regulating gene expression. This research has allowed seemingly anecdotal

Page 24: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

hypotheses regarding microsatellites to be tested across thousands of loci and present compelling evidence for their role in adaptive evolution. This project also demonstrates novel ways to use RNA-Seq data obtained from large-scale studies to address questions regarding regulatory elements.

Author Contributions: Dr. Mark Welch and Dr. Andy Perkins conceived, designed, and oversaw this project. Gregory Wheeler and I handled the transcriptomic data from the 95 individuals, genotyped the microsatellites and estimated the effect of allele length on gene expression. I conducted the downstream analyses to identify specific microsatellite motif sizes and types within the group of microsatellites with significant allele length effects on gene expression (hereafter termed eSTRs). I mapped the eSTR-containing transcripts to the sunflower genome and assessed whether the eSTRs were located within the UTRs or the coding regions of the transcripts. Additionally, I identified differentially expressed (DE) genes between the two latitudinal populations from Kansas and Oklahoma and assessed the enrichment of microsatellites within the DE genes to identify a potential role for microsatellites in gene expression divergence in sunflower.

Page 25: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Identification of cancer related gene subnetworks based on global optimum searching

Dan Li1 and Mary Yang*1

1*MidSouth Bioinformatics Center and Joint Bioinformatics Ph.D. Program of University of

Arkansas at Little Rock and Univ. of Arkansas Medical Sciences. 2801 S. Univ. Ave., Little Rock, AR, 72204 USA

Background: A gene module that differentially expressed (DE) between normal and cancer tissue samples provides more biological insights into cancer initiation and progression than individual DE genes. Computational methods such as network have been using to generate cancer-related subnetworks. A global optimum search can yield more informative outcomes for this purpose. Results: We developed a Genetic Algorithm (GA) based network approach for identifying cancer-related gene subnetworks. We first searched the global optimum gene modules that could distinguish normal and cancer samples of lung adenocarcinoma. Then, based on the protein-protein interactions (PPIs), protein-DNA interactions (transcription factor and target gene), predicted long non-coding RNA regulations, and putative driver mutations, we generated 19 core subnetworks. Further functional annotation and pathway analysis showed that the gene core subnetworks were highly involved in lung cancer invasion. Moreover, some regulatory relationships potentially associated with the lung cancer development were discovered. Conclusions: The network analysis of the genes and gene interactions revealed high-confident lung cancer-related gene subnetworks. Genetic Algorithm provided a solution for global searching for optimal gene modules involved in lung cancer progression. Our GA based network approach is expected to be used in a wider range of cancer studies. This work was supported by a grant from NIGMS (P20 GM103429) at NIH. Innovation: Differentially expressed genes have been widely used in the cancer studies usually as signatures. The significant changes of the gene expressions between normal and cancer tissue samples indicate their critical roles in cancer progression. In our previous study, we identified gene modules that can distinguish the normal and cancer samples defined as differential gene module (DGM). Our results showed that the DGMs can be used for accurate prediction of the cancer cases and provide more insights into the mechanisms of the cancer progression at the pathway level. Here, we developed a GA based network approach that can identify the DGMs more comprehensively based on the global optimum search. Moreover, by integrating multiple gene interaction information, we generated gene core subnetworks that revealed the underlying mechanisms of cancer development. Performed on our lung cancer RNA-seq data, we identified high-confident subnetworks related to lung cancer invasion. Author Contributions: Dr. Mary Yang and I conceived the project and designed the experiments. I carried out the experiments. Dr. Mary Yang and I performed further analysis.

Page 26: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

DANIEL HIMMELSTEIN

Postdoctoral Fellow

Department of Systems Pharmacology and Translational Therapeutics

University of Pennsylvania

https://s.gravatar.com/avatar/202c6aacad8ae48ef4077bb307872a94?s=600

Title

Encoding biomedical knowledge using hetnets

Abstract

Approaches in network medicine have traditionally focused on generating insights from graphs with a single type of node and relationship. However, biology's complexity demands a richer network structure capable of integrating diverse, multi-scale information. Towards this end, we develop hetnets — networks with multiple types of nodes and relationships. Specifically we created Hetionet — a network of biology, disease, and pharmacology. This resource encodes knowledge from millions of biomedical studies over the last half century. Version 1.0 contains 47,031 nodes of 11 types and 2,250,197 relationships of 24 types. We host a public Neo4j database instance at https://neo4j.het.io allowing users to interact with Hetionet.

In Project Rephetio, we applied Hetionet to predict new uses for existing drugs. Our approach learned the network patterns of connectivity that differentiate treatments from non-treatments, enabling us to predict the probability of treatment for 209,168 compound–disease pairs. These predictions prioritize treatments under investigation by clinical trial. Going forward, we're investigating more efficient algorithms for feature extraction on hetnets.

Bio

Dr. Himmelstein is a postdoctoral fellow in the Greene Lab at the University of Pennsylvania. Previously, he received his PhD from the University of California San Francisco. His research focuses on integrating biomedical knowledge using hetnets. Daniel is a relentless contributor to the open source and open data ecosystems, and advocates for the open licensing of publicly-funded research outputs.

Page 27: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Mining pharmacogenomic information from drug labeling using FDALabel database for advancing precision medicine

Darshan Mehta1, Ryley Uber1, Zhichao Liu1, Shraddha Thakkar1, Minjun Chen1, Joshua Xu1, Baitang Ning1, Shashi Amur2, Padmaja Mummaneni2, Steve Harris1, Guangxu Zhou1, Leihong

Wu1, Taylor Ingle1, Junshuang Yang1, Weida Tong1, Hong Fang1 1National Center for Toxicological Research, US FDA, 3900 NCTR Road, Jefferson, AR 72079

2Center for Drug Evaluation and Research, US FDA, Silver Spring, MD 20993

Abstract

Background: It is known that drug response can have significant interpatient variability. The relationship between drug response and genetic makeup of an individual/population, studied in the field of pharmacogenomics, is rapidly accelerating advancements towards precision medicine. The US FDA has included pharmacogenomic information in the labeling of approved drug products to improve drug safety and efficacy. In this research, we used FDALabel database for mining pharmacogenomic information from drug labeling.

Results: FDALabel database is a web-based application that allows users to perform customizable searches of about 95,000 labeling documents that include human prescription drugs and human over-the-counter (OTC) drugs. Using a set of 62 biomarkers obtained from a public FDA resource website, we queried FDALabel database to identify drugs with biomarker information. Each new drug labeling identified was checked manually to validate the clinical relevance of biomarker information contained therein and the drug-biomarker pairs were then classified as either causing adverse reactions, having dose-related information, targeted for a genetic indication, or simply informative. Using this approach, we identified 225 unique drugs with pharmacogenomic information in their drug labeling and a total of 289 drug-biomarker pairs. An analysis of the relationship between drugs and biomarkers revealed that the most frequently observed biomarkers are CYP2D6, G6PD, and CYP2C19 and the most frequently observed therapeutic areas are oncology, psychiatry, and infectious diseases. The labeling sections with the most occurrences of biomarkers were found to be Clinical Pharmacology, Clinical Studies, and Indications and Usage.

Conclusion: The results presented in this research show the utility of FDALabel database in mining drug labeling for pharmacogenomic information. As new biomarkers are discovered and more information is added to drug labeling, FDALabel can help researchers to propose better hypotheses to guide the advancement of precision medicine.

Innovation in research

Before embarking on this project, my expertise was in the field of cheminformatics and QSAR modeling. Working on this project provided me with a unique opportunity to learn new skills and apply novel data analysis methods. To successfully accomplish my research objectives, I had to learn an entirely new area of research in the field of pharmacogenomics. I also had to

Page 28: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

become familiar with drug labeling documents and their different labeling sections. I took extra efforts to search for drug labels containing several new genetic biomarkers using the FDALabel database. This enabled us to compile an updated and comprehensive list of drug-biomarker pairs. I also helped to create a system for classifying drug-biomarker pairs as either causing adverse reactions, having dose-related information, targeted for a genetic indication, or simply informative. This is a novel classification system that can help researchers to distinguish between the different drug-biomarker pairs identified.

Author contributions

As mentioned above, one of my major contributions in this research was to help create a novel system for classifying drug-biomarker pairs. Along with this, I also queried the FDALabel database to search for drug labels containing several new genetic biomarkers. I manually evaluated drug labels retrieved by the search to validate the clinical relevance of biomarker information contained therein. I also conceived of innovative ways for analyzing data and made plots with better visualization. I am grateful to Ryley Uber in our group for having initiated the process of collecting information about drug-biomarker pairs from drug labeling. He also provided expert guidance when needed to increase our knowledge in pharmacogenomics. Others in the group helped to guide the overall direction of the research project, suggested meaningful analyses, and gave periodic feedback to improve the quality of research results.

Page 29: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Sequencing the BXD family, a cohort for experimental systems genetics and precision medicine

David G. Ashbrook1, Danny Arends2, Megan K. Mulligan1, Evan G. Williams3, Cathleen Lutz4, Alicia Valenzuela4, Casey Bohl1, Jesse Ingels1, Melinda McCarty1, Arthur Centeno1, Reinmar Hager5, Johan Auwerx6, Saunak Sen7, Lu Lu1, Kelley Harris8, Abraham Palmer9, Yu-yu Ren9, Jonathan K Pritchard10, Andrew G. Clark11, Robert W. Williams1

1. Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA 2. Lebenswissenschaftliche Fakultät, Albrecht Daniel Thaer-Institut, Humboldt-Universität zu Berlin, Invalidenstraße 42, Berlin, Germany 3. Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland 4. Mouse Repository and the Rare and Orphan Disease Center, The Jackson Laboratory, Bar Harbor, ME USA 5. Division of Evolution & Genomic Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester, UK 6. Laboratory of Integrative and Systems Physiology, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland 7. Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN, USA 8. Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA, USA 9. Institute for Genomic Medicine, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA 10. Department of Genetics, Stanford University, Stanford, CA, USA, 11. Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY.

Background: The BXD mouse genetic reference population is the most deeply phenotyped mammalian model system, with ~6000 phenotypes in GeneNetwork.org (GN), the repository for BXD family data. GN enables analysis of complex interactions among gene variants, phenotypes at different biological levels, environmental factors, and many cofactors and confounders that will influence quantitative phenotypes. The BXD family now consists of 152 inbred lines, each of which is a unique mosaic of alleles from the C57BL/6J and DBA2/J inbred founders, and segregate for ~5 million common sequence variants. Using the current genotype data from arrays and RNA-seq, it is possible to achieve mapping precision of under ±2.0 Mb over most of the genome. In this study we have sequenced the entire BXD family and the two parental strains.

Results: We have carried out 40X sequencing using a 10x Chromium linked-read barcoding strategy. This deep sequencing of ~40 kb DNA fragments has several uses including: identification of structural variants that cannot be detected reliably using short read shotgun sequencing; identification of variants unique to each ‘epochs’ of BXD, derived in the last four

Page 30: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

decades; identification of truly rare spontaneous mutations, and production of the first ‘infinite marker maps’ of this family, allowing even higher precision mapping of phenotypes.

At present we have confirmed ~4.5 million variants that differ between C57BL/6J and DBA2/J parents. We have aligned sequences for ~50 samples and identified haplotype blocks with greater precision than had been possible with microarray-based genotyping. Several candidates have been identified for a phenome-wide association study.

Conclusions: This family is an excellent resource for testing networks of causal and mechanistic relations among clinical phenotypes and millions of molecular and organismal traits, including metabolic syndrome, infection, addiction, neurodegeneration, and longevity. Full sequencing will only increase its usefulness as a platform for experimental precision medicine.

Page 31: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

A new genome annotation, comprehensively compared and evaluated against a prior annotation for the filamentous fungus Cryphonectria parasitica.

Di Ren and Angus Dawe

Department of Biological Sciences, Mississippi State University, Starkville, 39762

Background: Next generation sequencing (NGS) technologies create great opportunities for various genomics, transcriptomics and proteomics projects of non-model organisms. A well-annotated genome reference is critical for these projects like exploring target genes’ functions or characterizing gene expression profiles. As one of the non-model organisms, the filamentous fungal plant pathogen Cryphonectria parasitica has an outdated genome annotation from 2009 with insufficient and inaccurate predictions demonstrated in experiments. However, with recent RNA-seq data and upgraded protein homolog database available, it was possible to perform a new genome annotation. To do this, a practical bioinformatics workflow was designed to perform the annotation and develop a comparison between it and the 2009 version.

Results: In the 2017 version, 11170 genes were predicted with additional gene structure features, a new quality metric system, and new gene function features. Furthermore, a newly developed python script was used to compare the 11609 predicted genes from the 2009 version to the 2017 version and categorized them into four groups (Match, Similar, Different, Nogene) based on their differences in coding region (start/end coordinates) and internal domains (InterPro ID). 33% genes in the 2009 version were in the Match category, 35% were in the Similar category, 31% in the Different category and 1% did not exist in the new annotation. From the last three groups, 22 out of 27 (81.5%) manually chosen genes were experimentally validated to support the new annotation using PCR.

Conclusion: An integrated workflow to perform a new genome annotation followed with a comprehensive comparison of new/preexisting annotation were developed to significantly improve the quality of the genome annotation, a valuable method that can be applied to other non-model organisms’ research projects. The information required for awards competition. Innovation: A new optimized genome annotation and curation pipeline, Two-pass MAKER2 was used to annotate and mask repetitive elements in genome, align protein and RNA evidence in a splice-aware manner and accurately identify genes through comparing all evidences with gene models from GeneMark-ES, SNAP and Augustus gene predictors. As the prediction evidences like the novel assembled transcriptome, the Uniprot/Swiss protein homolog databases were chosen rather than the FungiDB, OrthoMCL ones with optimum genes prediction yields and accuracy. To the best of our knowledge, a comprehensive comparison between two genome annotations have not been presented yet to illustrate the structural differences of each gene

Page 32: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

locus between them and their accuracy supported by RNA and protein evidence. A novel integrated comparison tool was developed to provide new insights to researchers of any gene from the preexisting annotation they are interested in four categories (Match, Similar, Different, Nogene). Author contributions: All the programming skills and algorism strategies were acquired from the classes and self-training process. All the above work was completed by myself with the guidance of my advisor, Dr. Angus Dawe.

Page 33: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Infer the in vivo Point of Departure with ToxCast in vitro Assay Data Using a Robust Learning Approach

Dong Wang1

1*Division of Bioinformatics and Biostatistics, FDA National Center for Toxicological Research,

Jefferson, Arkansas, 72079 Background: The development and application of high throughput in vitro assays is an important development for risk assessment in the 21st century. However, significant challenges exist for statistical approaches of relating in vitro readouts to in vivo findings. Results: We developed a high dimensional robust regression model to infer the in vivo point of departure (POD) with in vitro assay data from ToxCast and Tox21 projects. The in vitro PODs were derived and combined with in vivo PODs from ToxRefDB regarding the rat and mouse liver to build a high dimensional robust regression model. This approach separates the chemicals into a majority, well predicted set; and a minority, outlier set. Salient relationships can then be learned from the data. It was used to demonstrate the predictive power of in vitro PODs for in vitro PODs. The accuracy is comparable with extrapolation between related species (mouse and rat). Chemicals in the outlier set tend to also have more biologically variable characteristics. Conclusions: Available data from high throughput in vitro assays can be used to build useful predictive models to inform prioritization for toxicity screening regarding PODs. The emphasis on the robustness of the model is of significant importance. With the continued accumulation of high throughput data for a wide range of chemicals, predictive modeling can provide a valuable complement for adverse outcome pathway based approach in risk assessment.

Page 34: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Abstract

Identification of MiR-486-5p As A Novel Gene Regulator of Liver Detoxification Enzyme Sulfotransferase 2A1

Dongying Li, Bridgett Knox, Leihong Wu, Gokhan Yavas, Wenming Xiao, Weida Tong, and

Baitang Ning*

Division of Bioinformatics and Biostatistics, National Center for Toxicological Research,

US Food and Drug Administration, Jefferson, AR, 72079

Background: Sulfotransferase 2A1 (SULT2A1) is an important liver detoxification enzyme for the metabolism of many drugs, hormones and neurotransmitters. SULT2A1 downregulation is associated with several liver diseases, including cholestasis and primary sclerosing cholangitis. While transcriptional regulation of SULT2A1 expression is well-established, it remains elusive how SULT2A1 expression is modulated at the post-transcriptional level. MicroRNAs are small non-coding RNAs that downregulate genes involved virtually in all biological processes. However, little is known about the roles that microRNAs play in SULT2A1 expression. In this study, we aimed to identify novel microRNA regulators of SULT2A1. Results: We first executed prediction analyses using multiple bioinformatics tools and discovered that miR-486-5p can potentially bind to SULT2A1 mRNA at multiple sites based on sequence complementarity. The minimum free energy (MFE) of the mRNA/microRNA hybridization at these sites, calculated via RNAhybrid, is less than -20 kCal/mol, indicating a high likelihood of these interactions in the cells. We also extracted and analyzed RNA-seq and microRNA-seq data from The Cancer Genome Atlas and found that the expression of mir-486 is inversely correlated with that of SULT2A1 in normal human liver samples. Moreover, we validated the repressive effect of miR-486-5p on SULT2A1 expression in human hepatoma HepG2 cells with a series of wet-lab experiments. Conclusions: Utilizing an integrative approach combining in silico and in vivo analyses, we identified miR-486-5p as a novel repressive regulator of SULT2A1 expression.

Innovation in Research

This project utilizes a unique, integrative, step-wise approach that combines bioinformatics analyses, statistical evaluation, and experimental validation to identify a novel microRNA regulator of an important drug metabolizing enzyme. With little knowledge on microRNA regulation of SULT2A1, it is impractical to experimentally test each of nearly 2500 human microRNAs, considering the high requirement of time, cost and labor. Thus, I took advantage of bioinformatics tools to conveniently overcome that issue. I chose multiple highly-cited prediction programs and adopted a chemist’s perspective in evaluation of RNA binding strength, which all increased the prediction confidence. Statistical correlation analysis of NGS data on human gene expression eliminated more putative candidates and left only two for the

Page 35: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

following experimental validation consisting of cellular, molecular and biochemical assays. I integrated strategies from various disciplinary in this project which provides a useful framework in identification of novel microRNA regulators in a timely and cost-effective manner.

Author Contribution

Initially trained as a cell and molecular biologist, I started this project just 6 months ago with zero background in drug metabolism, microRNA regulation, bioinformatics and biostatistics. I conducted a thorough literature search to determine the significance of studying microRNA regulation of SULT2A1. Among numerous bioinformatics tools, I made an educated decision on selecting those used in this project by reviewing publications and researchers’ discussion on science forums. By teaching myself and learning from collaborating bioinformaticians, I have gained enough programming skills to extract and analyze NGS data from the comprehensive TCGA database. I actively sought advice and input from collaborators who have expertise in different fields so that I could develop the integrative approach used in this project. Besides the successful elimination of microRNA candidates, I carefully designed and carried out my wet-lab experiments and assisted our supporting scientist in budget management to minimize the cost of study.

Page 36: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

1

Sample Abstract

The phosphorylated redox proteome of Chlamydomonas reinhardtii: revealing novel means for enzymatic regulation.

Evan W. McConnell, Emily G. Werth, and Leslie M. Hicks

Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599

Background: Chlamydomonas reinhardtii is a green unicellular alga that has become a premier system for investigation of oxidative stress response and the reversible oxidation of protein thiols. Identification of reversibly oxidized proteins serves as a crucial element in understanding stress adaptation in photosynthetic organisms and beyond. In addition, knowing the ensemble of modifications co-occurring with oxidation, notably phosphorylation, highlights novel mechanisms of protein regulation. Our platform uses a novel combination of protein-level redox enrichment and subsequent phosphopeptide analysis to (1) exhaustively define the reversibly oxidized proteome and (2) identify the subset of redox proteins also phosphorylated, thereby revealing co-modified proteins across diverse signal transduction and metabolic pathways.

Results: Proteins with in vivo reduced thiols were blocked with iodoacetamide and all reversibly oxidized Cys-residues were then reduced using dithiothreitol. Nascent thiols were enriched at the protein-level using Thiopropyl Sepharose 6B resin for thiol-disulfide exchange chromatography. On-resin trypsin digestion of Cys-bound proteins was performed and peptides collected in the eluate and flow-through were then equally combined before enriching for phosphopeptides and analyzing with liquid chromatography-mass spectrometry. Label-free quantification was used to measure the abundance of oxidized Cys-sites on 1457 proteins, where sequential phosphopeptide enrichment of combined eluate and flow-through identified 720 phosphoproteins with 23% (172 proteins) also identified as reversibly oxidized. The protein overlap was improved to 43% by fractionating eluate with strong cation exchange chromatography, suggesting that most phosphorylated redox proteins in our dataset are low abundance and preclude detection without phosphopeptide enrichment or exceptional depth of coverage.

Conclusions: Proteins identified with both modifications were involved in signaling transduction, ribosome and translation-related machinery, and energy pathways. While this method broadly identified new targets of reversible oxidation and phosphorylation, the functional significance of these modifications is largely unknown and must be studied in more detail.

Page 37: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

2

MCBIOS-SAS Young Scientist Excellence Award 2018

Innovation in research: This study is an extension of my primary research project (i.e., measuring protein reversible oxidation in photosynthetic organisms). It was with an appreciation for biological complexity that I wondered if oxidized proteins were additionally modified by other post-translational modifications. With the enthusiastic permission of Dr. Hicks’, I started generating preliminary results and found that there were indeed proteins with identifiable sites of reversible oxidation and phosphorylation. This project required that I develop skills in multiple new areas: (1) chemistry via optimization of reproducible phosphopeptide enrichment using immobilized metal affinity chromatography and (2) bioinformatics due to phosphopeptides having distinct fragmentation schemes in tandem mass spectrometry, thereby requiring alternative processing of data compared to peptides enriched from reversibly oxidized proteins.

Author contributions: With the guidance of Dr. Hicks, I developed the enrichment technique for reversibly oxidized proteins and data analysis for site-specific quantification of oxidized Cys. I have also been directly involved with the successful development of phosphopeptide analysis in Dr. Hicks’ lab since starting graduate school, mostly with parameter optimization for analysis using liquid chromatography-mass spectrometry and bioinformatics. For the full-scale study presented in this abstract: I cultured the Chlamydomonas, extracted proteins, and enriched reversibly oxidized proteins from the lysate. The samples were then handed over to Emily G. Werth (another graduate student in Dr. Hicks’ laboratory) for phosphopeptide enrichment. I was then tasked with liquid chromatography-mass spectrometry of enriched samples and downstream data analysis.

Page 38: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

SMOTEENNBagging: A novel ensemble resampling and learning approach to QSAR modeling with imbalanced data

Gabriel Idakwo1, Ping Gong2, Sundar Thangapandian2, Yan Li3, Nan Wang4, Zhaoxian Zhou1,

Chaoyang Zhang1*

1School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406 2Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg,

MS 39180 3Bennett Aerospace Inc., Cary, NC 27518

4 Department of Computer Science, New Jersey City University, Jersey City, NJ 07305

Background: The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance of Quantitative Structure-Activity Relationship (QSAR) modeling. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, it is common that class boundaries are not well delineated since inactive toxicants often appear in the minority class. Removing majority class instances can result in information loss, whereas increasing minority instances by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to over-fitting. Although these traditional resampling methods have been widely used for ensemble learning where multiple learners are trained to solve the same problem, the performance may be affected by information loss or over-fitting. In order to improve prediction accuracy, we propose SMOTEENNBagging, a novel ensemble resampling and learning approach, for QSAR modeling with imbalanced data. In SMOTEENNBagging, SMOTEENN replaces random undersampling in ensemble learning to form SMOTEENNBagging. SMOTEENN works by oversampling the minority class, followed by cleaning of the instances (undersampling) to create better defined class boundaries. Results: The performance of SMOTEENNBagging was compared with state-of-the-art oversampling and undersampling ensembles using 12 highly imbalanced bioassay datasets. Evaluated by such imbalance-sensitive metrics as AUPRC and G-measure, SMOTEENNBagging outperformed oversampling ensembles in 83% of the cases tested. SMOTEENNBagging had better AUPRC scores in 90% of tested cases and G-measure scores comparable to those of undersampling ensembles. Experimental results also showed a significant improvement in the recall metric for SMOTEENNBagging over that of the other ensemble models. Conclusions: The ability to separate few active compounds from vast amounts of inactive ones is of great importance in computational toxicology. This work shows a significant improvement in the performance of QSAR modeling using imbalanced data with SMOTEENN as a resampling method for ensemble learning.

Page 39: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

ADDITIONAL INFO FOR YOUNG SCIENTIST EXCELLENCE AWARD (GRADUATE STUDENT) Innovation in Research The core innovation of my work involves replacing undersampling in ensemble learning with SMOTEENN as a resampling technique. Traditional bootstrap aggregation relies on random undersampling. Oversampling variations have also been reported. These two methods have their drawbacks, which can be alleviated in my innovative work through replacing these resampling techniques with SMOTEENN while harnessing their advantages. SMOTEENN had been previously reported as a hybrid resampling technique for individual classifiers but not for ensemble learners. With SMOTEENN as a resampling technique for bootstrap aggregation, we showed marked improvement in making predictions with imbalanced bioassay data. Author’s Contribution Under the supervision of my advisor (CZ) and a collaborator (PG), I participated in the study design of this work. I was also responsible for conducting computational experiments, implementing algorithms, drafting the abstract and preparing the PowerPoint presentation. ST and YL helped with chemical descriptors generation and bioassay data curation, respectively. NW and ZZ provided useful insights for machine learning implementation.

Page 40: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Network-centric analysis of pathways for resistance and susceptibility in host pathogen interactions

Elizabeth K. Brauer*1, George V. Popescu*2, Dharmendra K. Singh*3, Mauricio Calviño4, Kamala

Gupta5, Bhaskar Gupta6, Suma Chakravarthy4, and Sorina Popescu7

1 Ottawa Research and Development Center, Agriculture and Agri-Food Canada, K1A 0C6, Ottawa, ON, Canada 2 Institute for Genomics, Biocomputing, and Biotechnology, Mississippi State University, Mississippi State, MS 39759 3 HM.Clause, 9241 Mace Blvd, Davis, CA 95618, USA 4 The Boyce Thompson Institute for Plant Research, New York, Ithaca, 14853 USA 5 Department of Botany, Government General Degree College, Singur, West Bengal, India 6 Department of Zoology, Government General Degree College, Singur, West Bengal, India 7 Dept. of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS * These authors had equal contributions to this work. The molecular basis and dynamics of the plant-pathogen interplay in the context of plant resistance or susceptibility to infection are poorly understood aspects of plant pathogenesis. The mostly unknown pathogen-activated signal processing network and the outcome of effector-induced perturbance of this network may be discovered by focusing the analysis on cross-species molecular interactions and the organization of the kinase signaling networks. Here we propose a network-centric approach to identify host-pathogen interactions using multiple types of molecular screens and a network identification methodology inspired from the analysis of computer internetworks. We have developed a novel network tomography method that is generalizable and could effectively address current challenges in the study of biological networks. To characterize the events at the plant signaling/pathogen effector interface, we assembled, using a high-throughput protein-protein interaction screen, a host-pathogen network between 139 tomato kinases and four effectors from the bacterial pathogen Pseudomonas syringae. A subset of 36 multi-effector interacting kinases belonging to various structural classes is selected for in depth phenotypic analyses. We developed a new method of pathways inference from phenotypic data using a co-occurrence pattern analysis algorithm. We used the co-occurrence of nodes with immune related phenotypes in multiple network perturbation assays (silencing of effector-interacting kinases) as an indication of pathway co-occurrence. We evolved the network by adding edges between co-occurring nodes in adjacent layers, as well as cross-layer edges for strong co-occurrence patterns. Our analysis identified kinases acting as positive or negative regulators of the innate immunity, effector-triggered immunity (ETI), and programmed cell death (PCD). Using clique analysis we identified “essential” network nodes for immune responses. We integrated the phenotypic data to construct hierarchical stimulus-specific networks and ranked essential kinases in these networks. Our results provide a framework for studying the host-pathogen interface and prioritizing targets for enhancing plant resistance.

Page 41: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Gouri Mahajan, graduate student in Clinical Health Sciences; University of Mississippi Medical Center

Innovation in research

My research examines genomic changes in the brain associated with major depressive disorder (MDD) with an understanding of neuroanatomy and of bioinformatic analyses. I have used next-generation sequencing for differential gene expression within the post-mortem human hippocampal dentate gyrus from 23 pairs of normal control and matched MDD subjects. I isolated the hippocampal neurogenic cell layer for deep-sequencing of RNA to include mRNA, long-noncoding RNA, and microRNA. Bioinformatic tools confirmed the hypothesis of altered neurogenesis and neuro-inflammatory gene expression in MDD. This is the first study in this brain region to examine neurogenesis and generate a compendium of the whole transcriptome in depression leading to identifying the connection between genomic information and a behavioral phenotype in well-characterized subjects. The discovery of differentially expressed genes within this unique brain region in depression reveals pathology in neurogenesis and glia, and may identify new targets for intervention in the medical treatment of depression.

Author Contributions

Drs. Craig Stockmeier, Ham Benghuzzi, Eric Vallender and I designed the study and performed it according to Dr. Stockmeier’s Institutional Review Board protocol. Hippocampal tissues were collected at autopsy and frozen at the Cuyahoga County Medical Examiner’s Office (Cleveland, Ohio). I selected the MDD and psychiatrically-normal control subjects, prepared 60 μm thick sections of frozen hippocampal dentate gyrus with a cryostat, and isolated the dentate gyrus for sequencing. RNA was extracted and whole transcriptome RNA-sequencing was performed by the Molecular and Genomics Core at UMMC. I prepared all data files and performed the bioinformatic analyses using a Lumenogix software pipeline with assistance from Drs. Eric Vallender and Lavanya Challagundla. I used Ingenuity Pathway Analysis to determine biological significance. Statistical analyses were performed by Dr. Challagundla. I performed validating q-PCR as guided by Dr. Damian Romero and Maryam Syed. I generated the first draft of a manuscript, generating tables and graphs.

Page 42: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Chasing the Ghost of Lamisil (terbinafine) Toxicity

Dustyn A. Barnette1, Anirudh S. Pidugu1, Mary A. Davis1, Lena Dang2, Tyler Hughes2, S. Joshua Swamidass2, Grover P. Miller1

1*Department of Biochemistry and Molecular Biology, University of Arkansas

for Medical Sciences, Little Rock, AR, 72205 2 Department of Pathology and Immunology, Washington University, St. Louis, MO, 63130

Background: In rare cases, the widely prescribed, effective antifungal drug Lamisil (terbinafine) causes idiosyncratic liver toxicity possibly due to a reactive metabolite, 6,6-dimethyl-2-hepten-4-ynal (TBF-A). Observation of TBF-A required trapping as a transiently stable glutathione adduct, yet metabolic pathways responsible for TBF-A remain unknown. Through an inter-disciplinary collaboration, we combined modeling and experimental approaches to predict and validate the preferential N-dealkylation pathway and cytochromes P450 responsible for TBF-A. Results: A deep learning N-dealkylation model predicted the most probable pathway to TBF-A involved initial terbinafine N-demethylation to desmethyl-terbinafine and then N-dealkylation to TBF-A. The experimental studies required development of labeling methods coupled to LC-MS analysis to improve sensitivity and quantitation of metabolites like TBF-A. The reactive metabolite was shown to spontaneously decay. Subsequent stead-state experiments demonstrated the most efficient N-dealkylation pathway (V/Km) was terbinafine N-demethylation followed by direct N-dealkylation to TBF-A and very minor N-denaphthylation. Based on experimental studies, P450-specific models identified correctly N-demethylation by multiple P450s but were less accurate for alternate N-dealkylation pathways including direct TBF-A formation. Kinetic studies with recombinant P450s revealed the catalytic efficiency (V/Km) order for TBF-A was CYP3A4>2C19>2B6≈2D6. CYP2C19 N-demethylation was much more efficient than that by 3A4, and N-denaphthylation efficiency order was CYP2C19≈3A4>>2B6≈2D6. Overall, CYP2C19 dominated terbinafine N-dealkylation by catalyzing all three pathways. Conclusions: Experimental studies on terbinafine metabolism yielding TBF-A required development of novel methods and extensive kinetic experiments, while microsomal and P450-specific model predictions were much more accessible. Nevertheless, models performed well only for N-demethylation reactions and not others suggesting the need for a more diverse training set. Based on experiments, CYP3A4 generates most, but not all, TBF-A, yet decay and stabilization of the reactive metabolite may further impact exposure. Taken together, knowledge gained from these studies provides a foundation for investigating terbinafine pathways contributing to toxic risk and hence developing strategies to mitigate that risk.

Page 43: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Identification of host genes determining the pathogenesis of influenza A viruses in mice using machine learning Hamilton J. Wan Mississippi School for Mathematics and Science, 1100 College St, Columbus, MS 39701, USA Mentor: Dr. Adrianus C. M. Boon, Department of Pathology and Immunology at Washington University School of Medicine, St Louis, MO, 63110, USA. Influenza A virus is a virus that varies in severity and pathogenesis from host to host. Genetic polymorphisms in the hosts may contribute significantly to the variance in influenza pathogenesis. However, the detailed genetic factors determining influenza A pathogenesis remain unknown. The objectives of this study were to use a machine learning program to identify murine host genes responsible for the pathogenesis of the Influenza A virus in mice and to identify the associated human gene homolog(s). Two hypotheses were tested in this study: (1) A small set of genetic factors can be used to predict the pathogenesis of the Influenza A virus in mice; (2) The human gene homologs for these genetic factors can be identified. A data set is comprised of the genotypes and phenotypes of 10 congenic mice strains (male and female) that have identical genomes except for the murine chromosome 4, in which eight inbred mice had a mix of their parent strains. LASSO and elastic net regressions were performed on all unique genetic features, including 24 DNA markers and 1,114 genes, with the survival rates as the dependent variable. Through the LASSO regressions, the DNA segment D4Mit204 and the DNA segment D4mit334 were selected as important features in the female and male mice analyses, respectively. The LASSO regression also selected the murine gene Ptafr as a particularly important gene relative to the other genes. The elastic net regression failed to provide consistent feature selection. Further analyses were also conducted on the D4Mit204 DNA segment for the female mice survival data, and the LASSO regression selected the murine gene Ahdc1. In conclusion, several important genetic factors determining the pathogenesis of the influenza A virus in mice were identified using machine learning methods, especially the LASSO regression model.

Page 44: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

HONGMEI JIANG

Associate Professor

Department of Statistics

Northwestern University

http://www.statistics.northwestern.edu/images/faculty/hongmei-jiang.jpg

Title

Microbial interactions and microbe-host interactions

Abstract

Metagenomics is a powerful tool to study the microbial organisms living in various environments. The relative abundance of a taxon is usually estimated using the proportion of total reads assigned to the corresponding taxon. Due to the constraint of the sum of the relative abundances being 1 or 100%, standard and conventional statistical methods cannot be directly applied to the metagenomic relative abundance data. In this talk we will discuss characterization of the association between microbial compositions and the host phenotype such as disease status, and interactions between the microbes. Current statistical and computational methods that are being developed to analyze the metagenomics data and the challenges will also be highlighted.

Bio

Dr. Jiang is an Associate Professor of Statistics at Northwestern University. She received her Ph.D. in Statistics from Purdue University. Her research focuses on developing statistical methodologies and computational algorithms to analyze and understand the massive amount of data generated by high throughput biological technologies, especially genomics and metagenomics data analysis. Her work has been supported by NSF, Chicago Biomedical Consortium, and NIH.

Page 45: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Characterization of the Gut Microbiome of channel catfish following florfenicol treatment

Hossam Abdelhamed, Attila Karsi, Ozan Ozdemir, Mark L. Lawrence

Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi, 39762

Background: The intestinal microbiota is important in host health, and use of antibiotics is an important factor affecting intestinal microbial dynamics. Florfenicol is a commonly used antibiotic to treat bacterial fish diseases. The present study aimed to investigate the extent to which use of florfenicol modulates the intestinal populations of healthy channel catfish. To achieve this, specific pathogen-free channel catfish were fed commercial catfish feed with and without florfenicol for 10 days. At the end of the trial, intestinal content was collected, and microbial species were determined by Illumina sequencing of 16S rRNA gene amplicons. Data were analyzed using the Qiime software pipeline. Results: A total of 353,841 quality-filtered sequences was obtained. They were sorted into 11,035 operational taxonomic units (OTUs) based on 97% sequence similarity. A Venn diagram analysis showed 4,396 unique OTUs in the medicated feed group, and 1,682 shared (overlapped) OTUs between the two groups. Phylum Proteobacteria abundance increased from 79.23% of the population in control fish to 98.82% of the population with florfenicol group. Members of phyla Firmicutes and Bacteroidetes decreased with florfenicol feeding (0.77% and 0.28%) compared with control group (18.09% and 2.56%, respectively). Specifically, Enterobacteriaceae and Escherichia populations increased in florfenicol treatment compared to control treatment. In contrast, Aeromonas, Plesiomonas, Clostridium, Romboutsia, Bacteroides, Klebsiella, Turicibacter, and Lactobacillus decreased in florfenicol treatment compared to control group. Conclusions: Florfenicol feeding caused a reduction in microbial diversity in the catfish microbiome compared to control treatment. This result will have implications for the management of disease, nutrition, and antimicrobial use in the catfish aquaculture industry.

Page 46: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

“Super Gene Sets”: Toward Integrated Gene-set, Network, and Pathway Analysis

Jake Yue Chen1, Zongliang Yue1, Thanh Nguyen1,2, Michael Neylon3, Chayaporn SUphavilai3, Liugen Zhu3, Xiaogang Wu3, Sudhir Chowbina3

1* Informatics Institutes, School of Medicine, the University of Alabama at Birmingham, Birmingham, AL 35209

2 Department of Computer and Information Science, Indiana University Purdue University Indianapolis, IN 46202

3 Indiana Center for Systems Biology and Personalized Medicine, Indiana University Purdue University Indianapolis, IN 46202

Background: Current approach to downstream interpretation of high-throughput biological data sets such as microarrays, mass spectrometry-based proteomic data, and next-generation sequencing genomic data after the upstream “omics” characterization has been limited. For example, Gene Set Enrichment Analysis (GSEA) was introduced to analyze the significant enrichment of annotated gene sets or pathways from the MSigDB. DAVID has been developed to help map the gene ontology term enrichment from the same set of statistically significant genes. Comprehensive network analysis methods have also been applied to the characterization of candidate genes. Until the recent development of “Super Gene Sets” called PAGs—pathways, annotated lists, and gene signatures—and PAGER as the comprehensive repository of PAGs, the analysis of gene-sets, networks, and pathways from the candidate genes are performed ad hoc. Results: In this work, I will describe an integrated strategy for developing and applying PAGs—pathways, annotated lists, and gene signatures, or “super gene sets”—to the problem of downstream interpretation of genomic and functional genomic results. The strategy spans the work over a few years in the past and include: 1) characterization of the presence of interconnected super gene sets; 2) development of a quality metric for super gene sets; 3) development of a strategy for developing and curating super gene sets; 4) defining interconnected relationships among super gene sets; and 5) defining causal/regulatory relationships between super gene sets. Conclusions: We expect the work to have immediate and broad impact on how downstream analysis can be performed for future bioinformatics tasks. References: 1. Zongliang Yue, Qi Zheng, Micheal T Neylong, Minjae Yoo, Jimin Shin, Zhiying Zhao, Aik Choon Tan,

and Jake Y. Chen* (2017) In Nucleic Acids Research, 2. Zongliang Yue, Madhura Kshirsagar, Thanh Nguyen, Chayaporn Suphavilai, Michael Neylon, Liugen

Zhu, Timothy Ratliff, and Jake Y. Chen* (2015) in Bioinformatics, Vol. 31, No. 12, pp. i250-i257. 3. Zongliang Yue, Michael Neylon, Thanh Nguyen, Timothy Ratliff, and Jake Y. Chen (2018) in IEEE

Transactions on Computational Biology and Bioinformatics (in press) 4. Michael Neylon, Zongliang Yue, Thanh Nguyen, Timothy Ratliff, and Jake Chen* (2016) in Proceedings

of the 2016 International Conference on Bioinformatics & Biomedicine, Shenzhen, China, doi:

Page 47: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

10.1109/BIBM.2016.7822534.

Page 48: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Abstract Identifying Number: XXXX

PROTEOMIC ANALYSIS OF ERYTHROCYTE STORAGE LESIONS IN UNITS OF STORED CANINE PACKED RED BLOOD CELLS

John Thomason1, Leslie A. Shack2, Bindu Nanduri2

1 Department of Clinical Sciences, 2 Department of Basic Sciences College of Veterinary

Medicine, Mississippi State University, Mississippi State, MS, 39762

Background: Stored blood products develop progressive red blood cell (RBC) damage, known as “storage lesions”. Storage duration and environment can significantly impact storage lesions, which influence RBC survival and function. The study objectives were to identify the accumulation and alteration in RBC membrane protein expression and storage lesions during storage in units of packed red cells. Results: Blood was collected from five healthy dogs, separate into three units. Two units were stored for 14 and 28 days, while the third unit (day 0) acted as a control. The RBC membrane proteins were identified by 1D LC ESI MS/MS. All MS/MS spectra were searched using a Canis lupus familiaris protein database using sequest algorithm. Impact of storage on the canine RBC proteome was determined by comparing RBC protein expression data from day 0 with days 14 and 28. A total of 781 canine proteins were identified of which 679, 568 and 597 proteins were identified on days 0, 14 and 28, respectively. Protein networks identified in the expression profile at day 0 represent protein ubiquitination pathways, mitochondrial dysfunction and TCA cycle, while day 14 storage identified phagosome maturation and day 28 we identified sirtuin signaling pathway that regulates energy homeostasis. Compared to day 0, there were significant changes in protein expression at day 14 (57 proteins) that included unfolded protein response and heme degradation and day 28 (237 proteins) showed altered expression of proteins involved in immune response. Conclusion: Storage of canine RBCs causes degradation of proteins, and alterations in the immune response and stress alleviation pathways. Our study identified specific proteins and pathways that change during storage and could provide mechanisms to improved erythrocyte health during storage. This work was supported by grant #110000-182500-021000-CVM047 (Office of Research and Graduate Studies) from the Mississippi State University College of Veterinary Medicine.

Page 49: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Benchmarking Protein Residue-Residue Contact Prediction Using Random Forests and Deep Networks

Joseph Luttrell IV1, Tong Liu2, Chaoyang Zhang1, Zheng Wang 2, *

1. School of Computing, University of Southern Mississippi, 118 College Drive #5106,

Hattiesburg, MS, 39406 2. Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables,

FL, 33124

Background: The ability to predict which pairs of amino acid residues in a protein are in contact with each other offers many advantages for various areas of research that focus on proteins. For example, contact prediction can be used to reduce the computational complexity of predicting the structure of proteins and even to help identify functionally important regions of proteins. These predictions are becoming especially important given the relatively low number of experimentally determined protein structures compared to the amount of available protein sequence data. Results: Here we have benchmarked a set of machine learning methods for performing residue-residue contact prediction, including random forests, direct-coupling analysis, support vector machines, and deep networks we developed based on stacked denoising autoencoders. These methods are able to predict contacting residue pairs given only the amino acid sequence of a protein. According to our own evaluations performed at a resolution of +/- two residues, the predictors we trained with the random forest algorithm are our top performing methods with average top 10 prediction accuracy scores of 85.13% (short range), 74.49% (medium range), and 54.49% (long range). These results suggest comparable performance to contact predictors developed by other groups in recent years. Moreover, we have provided our random forest contact predictor and C++ implementation of the direct-coupling analysis method as a web server that is freely available to the public. Conclusion: Due to the challenging nature of contact prediction and the large amount of protein data that exists, it is beneficial to benchmark a variety of different prediction methods. We believe that our work has produced a useful tool with a simple interface that can provide contact predictions to researchers and other potential users without the hassle of installing software on their own machines.

Page 50: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Sensitivity and reproducibility of onco-panel sequencing across multiple laboratories and technologies: An SEQC2 Component Study

Joshua Xu1 on behalf of the Oncopanel Sequencing Working Group of the SEQC2 Consortium

1Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US FDA,

3900 NCTR Road, Jefferson, Arkansas, 72079

Background: Onco-panel sequencing targets sequencing reads to few small regions of the genome and thus is better enabled to detect of rare but clinically relevant sub-clonal mutations. Accurate diagnosis and subsequent tailoring of therapy depends on thorough characterization of tumor mutational spectra. The lack of tumor reference samples, comprehensive approaches to assessing reproducibility and detection sensitivity, and standard practice guidelines is hindering the development of onco-panel sequencing and the realization of its benefits in cancer diagnosis and treatment. Results: As a component of the FDA-led SEQC project phase 2 (SEQC2), the Onco-panel Working Group (WG#2) has designed and characterized a set of reference materials. These materials were created using cell line gDNA pooling and dilution, and will power such comprehensive assessments of targeted mutation detection approaches. To this end, a cross-lab evaluation of eight Pan-Cancer panels and 4 ctDNA liquid biopsy assays is currently underway. To further evaluate methods to boost reproducibility, synthetic spike-in controls and synthetic plasma were also incorporated into the testing samples. Contrived samples for liquid biopsy testing were prepared through enzymatic fragmentation and size selection to mimic cell-free DNA. The onco-panels tested represent two target enrichment approaches (amplicon based and capture based) and sequencing technologies (multiple Illumina platforms and Ion Torrent). Of note, the liquid biopsy assays employ various molecular barcoding techniques to improve their detection sensitivity of rare mutations. Conclusions: This comprehensive study will yield insights into factors underpinning sensitivity and reproducibility of onco-panel sequencing. Quantitative performance metrics and actionable data analysis recommendations will be presented to the targeted sequencing panel community.

Page 51: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Exploration and Exploitation of Agro-Genomic Variation in Hexaploid Wheat for Yield Stability and End-User Quality

Karyn Willyerd1, Shuzhen Sun1,2, Yuanwen Guo3, Xiaowei Hu4, Carol Powers5, Liuling Yan5, Lan Zhu4, Brett Carver5, and Charles Chen1,5

1Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078

2Department of Forest and Conservation Sciences, Forest Science Centre, University of British Columbia, Vancouver, B.C. Canada, V6T 1Z4

3Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506

4Department of Statistics, Oklahoma State University, Stillwater, OK, 74078

5Department of Plant and Soil Sciences, Oklahoma State University, Stillwater, OK, 74078 Background: Duster and Billings are leading winter wheat cultivars in both yield production and end-user qualities in the southern Great Plains. A genome-wide SNP profile derived from genotyping-by-sequencing and exome-capture was established to mine beneficial genomic variants segregating in the Duster x Billings doubled-haploid (DH) population. Genomic basis of evaluated field trials from 2014-2016 was assessed through complimentary approaches of association (GWAS) and QTL mapping, while responses to water stress were inspected utilizing overlapping RNAseq data. Results: GWAS identified 251 significant SNP variants associated with yield production and 94 variants associated with four end-use quality measurements. Significant GWAS associations accounting for ~16% of phenotypic variation co-localize with the yield QTL previously identified on chromosome 1BS only in drought years, signifying impact of genotype-environment interactions for the complex yield trait. In contrast, GWAS and QTLs for oligogenic grain hardness co-localize on chromosome 5DS persisting through field seasons. RNAseq analysis between drought tolerant and susceptible DH lines pinpointed to 144 and 376 differentially expressed genes associated with all QTLs for yield and end-user qualities respectively, revealing greater genetic control of quality traits compared to the omnigenic nature of yield stability under abiotic stress. Twelve differentially expressed genes correspond to the 1BS yield QTL and nine genes are differentially expressed across quality related QTLs providing potential stress responsive targets. Finally, improved predictability across field seasons can be achieved from harnessing functional genomic information. Conclusions: Comprehensive understanding of functional genomic variants that pinpoint specific gene targets provide powerful tools for molecular breeding; further demonstrated in our study, the resultant genomic knowledge can be incorporated to improve genomic prediction model performance, lessening hesitation and enabling confidence for a broad adoption of cost effective, agro-genomics based breeding approaches. The large genome complexity and non-tractable nature of winter wheat warrants such tools to sustain food security in worsening drought climates.

Page 52: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Abstract

Harnessing a Polyspecific Response to Tumor Associated Carbohydrate Antigens

Kori Bohn1, Ramachandran Murali2, Anastsas Pashov3, Thomas Kieber-Emmons4

1UALR, Little Rock, Arkansas 2 Cedar Sinai Medical Center, Los Angeles Ca. 3 Stephan Angelov Institute of Microbiology, Academy of Sciences, Sofia Bulgaria

4 UAMS, Little Rock, Arkansas

Background Polyspecific antibodies might be a viable approach to target cancer cell heterogeneity. Among Tumor Associated Antigens are carbohydrates. Among these the ganglioside GD2 and the neolactoseries antigen Lewis Y (LeY) stand out. We have focused on the rational design of a carbohydrate mimetic peptide (P10s) that displays reactivity to several anti-GD2 and LeY reactive monoclonal antibodies as a means to develop a pan-immunogen to induce antibodies with biosimilar functionally as these glycan reactive monoclonals. Results

The current study relied on structural analyses of enhanced binding to the respective antibodies using molecular modeling, carbohydrate reactivity patterns of column affinity eluents of human antibodies and assessment of immune response to P10s in a phase I clinical trial. Molecular modeling suggests that P10s binding is reliant on hydrophobicity and is stabilized by anchor hydrogen bonding to key residues essential for glycan binding. Structural geometry was shown to be consistent between GD2 and LeY within the shared lacto-ceramide core (Galβ(1,4)Glcβ(1-1′)Cer), this effect was observed to be most pronounced with superimposition via Gal-β tethering. Conclusion We define the molecular characteristics for P10s mimicry by considering the minimum epitope for polyspecificity of LeY and GD2 reactive antibodies wherein these antigens share the lacto-ceramide core (Galβ(1,4)Glcβ(1-1′)Cer). We further define the mimicry characteristic for P10s from consideration of consistent binding regions and ligand contacts compared to monoclonal antibodies with GD2 binding modalities; as well as, geometrical consensuses and conserved structural motifs. Innovation in Research We deduced the minimal epitope for mimicry of a pan immunogen that induces responses to GD2 and LeY. P10s is a first in man pan-immunogen. Author contributions Kori Bohn performed molecular modeling; Ramachandran Murali supportive analyses of GD2 binding; Anastas Pashov supportive analyses for P10s reactivity patterns; Thomas Kieber-Emmons guidance for the project.

Page 53: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Analysis of full-length infectious genomic cDNA clones of SPFMV and SPLCV and exploiting the approaches of biotechnology in sweetpotato for virus diseases resistance

Kyler Holmes1, Chunquan Zhang2, Yan Meng3

1Department of Biotechnology, Alcorn State University, Lorman, MS, 39096

There are two major viruses infecting sweetpotato worldwide, Sweet potato leaf curl virus (SPLCV), a begomovirus that is persistently transmitted by whitefly (Bemisia tabaci), and Sweet potato feathery mottle virus (SPFMV), a potyvirus that is non-persistently transmitted by aphids. SPLCV, and SPFMV infection reduces yield as well as quality of sweet potato. Field isolates of SPLCV and SPFMV were collected from Alcorn State, MS and viral full genomic DNAs and cDNAs were cloned and molecularly characterized. Whole genome sequencing of the SPFMV-AS isolate indicates it has a 11.5 kb genome that encodes an open reading frame of 3481 amino acids. Sequence analysis showed that the SPLCV-AS isolates contain 6 gene products. In addition, results showed that there is significant genetic diversity within the same SPLCV-AS isolate that is important for virus disease epidemic. Based on the viral genomic and genetic information, we explore novel biotechnological methods to develop transgenic sweetpotato plants with resistance to these two viruses. SPFMV coat protein gene and SPLCV replication origin segments were engineered into a binary vector and used to induce gene silencing in transgenic sweetpotato. Plant transformation and regeneration protocols were optimized for the production of this pathogen derived resistance (PDR) sweetpotato lines. Expression of foreign genes has been achieved by using Agrobacterium tumefaciens strain EHA105 harboring the expression cassette; plants induced from transformed leaf and petioles showed positive signs of foreign gene expression as shown by PCR detection. Biotechnological approaches of gene delivery appear to have potential for generating transgenic sweetpotato with useful agronomic traits. These results warrant further investigation on phenotyping transgenic plant resistance to virus infection under various conditions.

Page 54: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Innovation in Research: This research is innovative in the concept and in its potential application. The use of biotechnology application to engineer multiple virus resistances in sweetpotato potentially resulting in new disease resistant cultivars is of importance to the Mississippi agriculture and beyond. Specifically, the genetic engineering of the anti-virus constructs is based on the local prevalent virus isolates in Mississippi. However, the unique design will also allow the induction of gene silencing on other viral isolates known in US. Moreover, the concept developed should be applicable not only to sweetpotato, but also to other crop plants, and may contribute to the development of new antiviral drugs as well. Its innovative approaches lie in the fact that it will be the first time that the potential of engineering multiple resistances to both RNA and DNA viruses as well as to a complex virus infection in a crop plant will be explored.

Page 55: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Author Contributions: All authors contributed extensively in this project. My contributions included being responsible for ensuring that all laboratory experiments were being performed. Conducted the observation, collected data and optimized protocols. I was responsible for the maintenance of plants and culture medium, DNA & RNA extraction, PCR and gel electrophoresis. Performed gene bombardment and Agrobacterium- mediated transformation. My collaborators work included performing full genomic cloning and cDNA cloning, designing, planning, conducting, analyzing and ensuring that all requirements are met for the project.

Page 56: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

From Arabidopsis to crops: the QQS orphan gene modulates carbon and nitrogen allocation across species

Ling Li*1, and Seth O’Conner1

*1Department of Biological Sciences, Mississippi State University, Starkville, MS, 39762, USA

Deficiency in dietary protein is globally one of the most severe health problems; the ability to optimize protein productivity of plant-based foods has far-ranging impact on world health and sustainability. The Arabidopsis thaliana QQS orphan gene modulates carbon and nitrogen allocation to protein and starch1. Ectopic expression of QQS increases protein content2 in leaf and seed in soybean Williams823, in soybeans with different high-/low-protein levels, and in rice and corn4. The QQS protein binds to a transcriptional regulator in Arabidopsis and its soybean, rice and corn homologs: Nuclear Factor Y subunit C4 (NF-YC4). Little is known about the functional significance of the species-specific orphan genes1, 4, 5. QQS transcript levels are altered in plants under stresses and in mutants of genes involved in all sorts of stress responses, indicating that QQS may integrate primary metabolism and environmental perturbations6. Specifically, NF-YC4 overexpression affects carbon and nitrogen allocation to protein in soybean and maize. Transcriptomics analyses of the QQS mutant materials have identified genes potentially involved in regulation of carbon and nitrogen allocation. QQS and its related network may be utilized as a tool to increase the crop protein content, and to study the carbon and nitrogen allocation network. Our research reveals the core of a previously undefined network in which QQS participates4, and opens a new non-transgenic strategy to create high-protein crops7. It presents QQS as a model plant orphan gene regulating plant metabolism and adaptation to environment, and illustrates an example of basic research in Arabidopsis applied in agriculture. References 1Li et al. PJ (2009). 2Li et al. U.S. Patent 9157091 (2012). 3Li et al. PBJ (2015). 4Li et al. PNAS (2015). 5Jones et al. Frontiers Plant Science (2016). 6Arendsee et al. TIPS (2014). 7Li et al. U.S. Patent Office 62/244,131 (2015).

Page 57: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

A pathway-based method to interpret GWAS results

Marilyn Warburton1, Erika Womack1, Paul Williams1, Juliet Tang2

1*USDA ARS Corn Host Plant Resistance Research Unit, Mississippi State, MS, 39762

2USDA FS Forest Products Laboratory, Durability and Wood Protection, Starkville, MS 39759

Background: Genome-wide association studies (GWAS) tend to identify genomic regions significantly associated with a quantitative trait of interest, but few have a large effect on the trait and jointly, tend not to explain the majority of the variation associated with the trait. Many real associations are missed because their effects are too small to register, or the single nucleotide polymorphisms (SNPs) used to scan the genome are too far from the gene and the effect looks diminished. The CHPRRU has developed a methodology that combines GWAS results, linkage disequilibrium among SNPs, and a gene-set enrichment procedure, for a pathway-based approach to interpreting association data. The methods allow the cumulative effects of genes in a pathway to give insight into genetic mechanisms of phenotypes, even if each SNP accounts for too small a proportion of the genetic variation to display association p values better than the significance threshold. Results: Several phenotypic traits were analyzed with the pathway method in one maize (Zea mays mays L.) association mapping panel. A study of grain color identified the carotenoid biosynthesis pathway. A study of a much more quantitative trait, Aspergillus flavus and aflatoxin accumulation resistance, identified the jasmonic acid biosynthesis pathway as significant for resistance. Pathway analysis of corn earworm, Helicoverpa zea (Boddie), resistance found metabolic pathways that modified cell wall components, especially homogalacturonan, wax esters, and fatty acids; those involved in antibiosis, especially DIMBOA, flavonoids, and phenolics; and those involved in plant growth, including nitrogen uptake and energy production. Conclusions: Each GWAS study of the highly quantitative disease and insect resistance identified significantly associated SNPs, but these did not provide actual mechanisms the corn plants were using to resist the biotic stresses. The pathway analysis provides more information and lines of practical investigation to improve maize for the traits under study.

Page 58: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Marker-assisted-selection coupled with recombinant inbred line genome sequencing identifies a root-knot nematode resistance gene in Upland cotton

Martin J. Wubben1, Gregory N. Thyssen2, Ping Lee2, David Fang2, Franklin E. Callahan1, Dewayne

D. Deng1, Jack C. McCarty1, and Johnie N. Jenkins1

1USDA-ARS, Crop Science Research Laboratory, Genetics and Sustainable Agriculture Research

Unit, Mississippi State, MS, 39762, USA 2USDA-ARS, Southern Regional Research Center, Cotton Fiber Bioscience Research Unit, New

Orleans, LA, 70124, USA

BACKGROUND: The southern root-knot nematode (RKN; Meloidogyne incognita) remains the primary yield-limiting biotic stress to Upland cotton (Gossypium hirsutum) throughout the southeastern United States. Simple-sequence-repeat (SSR) markers linked to RKN resistance quantitative trait loci (QTLs) on chromosomes 11 and 14 have been used to incorporate a high level of resistance into agronomically superior Upland germplasm; however, the genes responsible for resistance remain unidentified. The identification of these resistance genes would improve molecular breeding efficiency and provide insights into the molecular signaling cascades that mediate RKN resistance in Upland cotton. RESULTS: A random mated Upland cotton population (RMUP) comprised of 11 parents, including one parent having the chromosome 11 and 14 resistance QTLs, was developed through five cycles of random mating and used to create ~ 550 recombinant inbred lines (RILs). All RILs were subjected to SSR genotyping and 3-5X coverage genome sequencing. Based on SSR genotype and genome sequence, RILs were identified that showed recombinations within the known mapping interval of the chromosome 14 RKN resistance QTL. RKN resistance phenotyping of these RILs delimited a mapping interval on chromosome 14 to approximately 30 kb, within which resided four predicted genes. SNP analysis, qRT-PCR, and VIGS (virus induced gene silencing) identified a single gene within the mapping interval that was responsible for RKN resistance mediated by the chromosome 14 QTL. CONCLUSIONS: The mapping approach implemented in this study, i.e., SSR genotyping plus RIL genome sequencing, shows tremendous promise in the identification of specific genes underlying complex traits including pathogen resistance.

Page 59: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Title: Develop predictive model for assessing drug-induced liver injury in humans

Minjun Chen

The Division of Bioinformatics and Biostatistics, US Food and Drug Administration, Jefferson, AR

Background: Drug-induced liver injury (DILI), although rare, is a frequent cause of adverse drug reaction (ADR) resulting in warnings and withdrawals of numerous medications. Despite best efforts, current testing strategies aimed at identifying hepatotoxic drugs prior to human trials are not sufficiently powered to predict the complex mechanisms leading to DILI. Recent advances in the field have discovered that several drug properties and toxicological properties, such as daily dose, lipophilicity and the capability to form reactive metabolites (RM), are strongly associated with serious DILI potential in humans.

Results: Here, we will introduce the Rule-of-Two model (i.e. daily dose ≥ 100mg/day and logP ≥ 3) and DILIScore model (i.e. a scoring model derived from daily dose, logP and formation of RM) developed by the NCTR research team. We will discuss the applications of these models in the context of regulatory processes and independent validations reported in literature.

Conclusion: Our studies suggest that Rule-of-Two + RM as predictors could help predict DILI risk in humans.

Order no. 1747

Type of presentation: Oral

Membership type: Professional

Consider publishing: No

Page 60: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Next Generation Tools for Environmental Research Natalia Reyero Environmental research can encompass many disciplines, including environmental toxicology, paleogenomics, or metagenomics among others. New and better tools are needed to understand the enormous amounts of data generated, as well as the different and challenging types of data. For instance, paleogenomics deals with highly degraded DNA, metagenomics, deals with different extraction methods and types of samples and challenging bioinformatics tools to elucidate the different species present; predictive toxicology deals with predictive tools to understand hazard. Here I will present several approaches used for predictive toxicology and environmental monitoring in order to promote discussion and novel ideas that can help the field.

Page 61: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Protein structure-based virtual screening: Identification of potent natural product-chemotypes as cannabinoid receptor 1 inverse agonists

Pankaj Pandey1, Kuldeep K. Roy1, #, Haining Liu1, Robert J. Doerksen*1,2

1Department of BioMolecular Sciences, Division of Medicinal Chemistry and 2National Center for Natural Products Research, School of Pharmacy, The University of Mississippi, University,

MS 38677, USA

Background: Natural products are an abundant source of potential drugs and many of them are currently being used for several human disease treatments. Cannabinoid receptor 1 (CB1) antagonists are clinically established to be effective in treating obesity, obesity-related cardio-metabolic disorders, and substance abuse, but their potential CNS-mediated adverse effects hinder the potential of new drugs and no such drug is currently on the market. This limitation amplifies the need for new agents with reduced or null CNS-mediated side effects. In this project, we attempted to discover new classes of CB1 antagonists, utilizing a protein structure-based virtual screening (SBVS) approach. We prepared and filtered the ZINC natural products subset for VS.

Results: We docked prepared drug-like compounds of the ZINC database natural products subset on the CB1 protein model using Glide software. This screening initially afforded a total of 192 top-ranked hits that were further analyzed for structural diversity, and thereby a total of 18 structurally diverse compounds were selected for procurement and in vitro testing against CB receptors. This study has revealed some novel nonselective chemotypes from natural sources as a starting point for further analog exploration [with 1 ≤ Ki (µM) ≤ 15]. In order to understand the stability and interaction patterns of our best compound (PCB-2) with CB1 and CB2 receptor, we performed 50 ns MD simulations with the NAMD suite using the CHARMM general force field. Based on the MD simulation results, we purchased compounds with >80% similarity to PCB-2, tested them for CB1 and CB2 activities and found two highly potent CB1 inverse agonists.

Conclusions: We have successfully identified two new natural product chemotypes as CB1 inverse agonists with low nanomolar activities. Key findings and insights gained will be presented, which could be used for further identification of novel and high affinity and selective CB1 inverse agonists.

# Current address: National Institute of Pharmaceutical Education and Research, 4, Raja S. C. Mullick Road, Jadavpur, Kolkata 700 032, WB, India

Page 62: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Innovation in research: We used an in house prepared and validated CB1 homology model to carry out this project. Considering physicochemical properties such as absorption, distribution, metabolism, excretion, and toxicity (ADME/T) at the early stage of the drug development process is important to avoid downstream drug development issues; therefore, we downloaded the purchasable ZINC natural products subset (www.zinc.docking.org), and prepared and filtered it to retrieve only those natural products matching drug-like characteristics of MW < 700, LogP ≤ 5, HBA ≤ 10, HBD ≤ 5, PSA ≤ 140 and rotatable bonds ≤ 10. We also analyzed ligands for structural diversity considering docking score and the two-dimensional fingerprint properties of ligands before purchasing the final compounds. The exploration of PCB-2 analogs from the molecular dynamics studies guided us to modify the structure to maximize receptor drug interaction to enhance the potency and efficacy. We determined that substituents at the N2 position of PCB-2 are critical for receptor interaction. Therefore, the procurement and testing of various PCB-2 analogs differing at the N2 position led to discovery of two highly potent CB1 inverse agonist.

Author Contributions: All the designing and execution of computational work was carried out by Dr. Pankaj Pandey under the supervision of Dr. Robert J. Doerksen. The analysis of results was supported by Dr. Kuldeep Roy and Dr. Doerksen. The CB1 homology model was prepared by Dr. Liu which was further optimize by Dr. Pandey for this project. All the in vitro binding and functional data were provided by the In vitro Center of Research Excellence, part of The Center of Research Excellence in Natural Products Neuroscience (CORE-NPN).

Page 63: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Distinguishing Resistance-Conferring from Susceptible Mutations in Acetohydroxyacid Synthase by High Performance Computing-Enabled Computational Modeling

Yan Li1, Michael D. Netherland2, Chaoyang Zhang3, and Ping Gong2*

1 Bennett Aerospace Inc., Cary, NC 27518

2 Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS 39180

3 School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406 Background: The acetohydroxyacid synthase (AHAS) is a key enzyme catalyzing the biosynthesis of the essential branched-chain amino acids. AHAS-inhibiting herbicides are the largest site-of-action group on the market, comprising more than 50 active ingredients across 5 chemical classes. However, repeated intensive use of these herbicides has resulted in resistance evolution in a wide variety of weed species. The resistance is primarily conferred by alteration in the AHAS gene that attenuates sensitivity to the herbicides. On the other hand, not all target site mutations in AHAS confer resistance to a target-specific herbicide. Due to the prohibitive costs of population-level resistance testing, it is of great demand and significance for a reliable, cost-effective, and systematic approach to identify resistant AHAS mutations in field samples. Here we developed an approach based on homology modeling, molecular docking and molecular dynamics (MD) simulation and implemented it in high performance computing (HPC) systems. We selected Kochia scoparia as the test species because of the availability of both AHAS mutations and resistance testing data to two AHAS inhibiting herbicides (tribenuron methyl and thifensulfuron methyl). Results: We determined the binding affinity between AHASs (1 wild-type and 28 mutated) and two herbicides by analyzing MD simulation data using a series of molecular mechanics (MM) and hybrid quantum mechanics (QM)/MM methods. The performance of each method was evaluated by such metrics as enrichment, area under the curve (AUC) of the receiver operating characteristic, and accuracy. MM-PBSA (MM combined with Poisson-Boltzmann and surface area) showed superior performance over other methods in discerning between resistant (25) and susceptible (4) genotypes with AUC and accuracy both over 0.9. Conclusions: This study demonstrated that our HPC-enabled computational modeling approach has a high discriminating accuracy and predictive power, and is a promising tool for screening and early detection of target site mutation-conferred herbicide resistance.

Page 64: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Complete genome sequence of Pythium brassicum P1, an oomycete root pathogen: insights into its host specificity to brassicaceae

Mojtaba Mohammadi1$, Eric A. Smith2$, Michael Stanghellini1, and Rakesh Kaundal2*

1Department of Plant Pathology and Microbiology, University of California, Riverside, CA 92521 2Bioinformatics lab, Department of Plants, Soils, and Climate; Center for Integrated BioSystems, College of

Agriculture and Applied Sciences, Utah State University, Logan, UT 84322

Background: Pythium brassicum P1 Stanghellini, Mohammadi, Förster and Adaskaveg is an oomycete root pathogen that has recently been characterized. It only attacks plant species belonging to brassicaceae family causing root necrosis, stunting and yield loss. Since P. brassicum P1 is limited in host range, this prompted us to sequence the whole genome and compare it to those of broad host range Pythium spp. such as P. aphanidermatum and P. ultimum var. ultimum. Results: A total of 374 million reads were generated with quality of 91.4% of bases > Q30 using Illumina HiSeq 2500. The sequencing data were assembled using SOAPdenovo, yielding a total genome size of 50.3 Mb contained in 5,434 scaffolds, N50 of 30.2 Kb, 61.2% G+C content, and 13,232 putative protein-coding genes. Pythium brassicum P1 has 175 species-specific gene families, which is slightly below average for the species used in this comparison. Like P. ultimum, P. brassicum P1 genome does not encode any classical RXLR effectors or cutinases suggesting a significant difference in virulence mechanisms with other oomycete species. Pythium brassicum P1 has a much smaller proportions of the YxSL sequence motif in both secreted and non-secreted proteins, relative to other Pythium species. Similar to YxSL effector motifs, P. brassicum P1 also had the fewest Crinkler (CRN) effectors of all the Pythium species. There are 633 proteins that are predicted to be secreted in the P. brassicum P1 genome, which is, again, slightly below average among published Pythium genomes. Pythium ultimum has four cadherin genes with calcium ion-binding LDRE and DxND motifs (Levesque et al. 2010). In contrast, P. brassicum P1 contains only one cadherin gene in its genome. Pythium brassicum P1 has a reduced number of proteins falling under carbohydrate binding module and hydrolytic enzymes. Like P. ultimum, we did not detect any xylan degrading enzymes in P. brassicum. We also show that P. brassicum has a reduced complement of cellulase and pectinase genes relative to P. ultimum. P. brassicum P1 shows increased numbers of CE 4, a family that includes chitin deacetylases, chitooligosaccharide deacetylases, and peptidoglycan GlcNAc deacetylases; GH 7, a family that includes reducing end-acting cellobiohydrolases and chitosanases; glycosyltransferase (GT) 48, a 1,3-β-glucan synthase; and GT 32, which includes α-1,6-manosyltransferases and inositol-phosphorylceramide transferases. The contraction in ABC transporter families may be a result of the lack of host diversity, and therefore nutrient diversity, in P. brassicum P1. Conclusion: We identified a new oomycete root pathogen that infects only brassicaceae. Sequencing results followed by a comprehensive annotation process and comparison with other Pythium species shows why this pathogen has a narrow host range. We identified several factors to support this hypothesis.

$ both authors contributed equally * Corresponding author

Page 65: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Docking of Small Molecules to the Cannabinoid Receptors: Finding Value from Big Chemical Datasets

Robert J. Doerksen*1,2

1Department of BioMolecular Sciences, Division of Medicinal Chemistry and 2National Center for Natural Products Research, School of Pharmacy, The University of Mississippi, University, MS 38677, USA Background: The cannabinoid (CB) receptors, CB1 and CB2, which belong to the membrane-bound rhodopsin-like GPCR family, are implicated in the pathophysiology of various diseases. CB1 agonists have been well-studied as potential drugs for the attenuation of chemotherapy induced-vomiting, treatment of pain, and the management of multiple sclerosis and glaucoma, along with many other indications. CB2 has been studied as a target for reducing kidney damage after renal ischemia-reperfusion injury. Computational modeling of the three-dimensional structures accessible to the proteins and the stability and dynamics of those structures including for ligand-free and ligand-bound forms can be helpful for understanding protein function and interactions (such as with G-proteins) and for design of drugs with activation or blocking phenotype. Docking methods, which position a ligand in various poses in a binding pocket and score the poses, can be used to help choose which protein model is best and to decide, from among millions of possibilities, which ligands will optimally interact with the protein. Results: Examples will be presented to show the limitations of docking (failed cases, such as for Serine/threonine-protein kinase Chk1 using Glide), ways to choose a suitable docking method (self-docking and cross-docking with various docking protocols, such as for glycogen synthase kinase-3), and the uses of docking for protein structure validation (docking of known ligands, such as for CB1) compared to experiment, drug characterization (understanding the interactions of well-known CB ligands such as THC or newly-identified CB2 agonists), and drug discovery (in which we have found CB ligands with outstanding functional activity). Conclusions: Careful and systematic applications of docking can help to yield new and useful discoveries from among millions of possibilities.

Page 66: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Transcriptome analysis of abscisic acid-activated protein kinase in abiotic stress in soybean (Glycine max)

Saroj Kumar Sah1, George Popescu2, K. Raja Reddy3, Vincent Klink4, and Jiaxu Li1

1 Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS, 39762 2 Institute for Genomics, Biocomputing, and Biotechnology, Mississippi State University, Mississippi State, MS, 39762 3 Department of Plant and Soil Sciences, Mississippi State University, Mississippi State, MS, 39762 4 Department of Biological Sciences, Mississippi State University, Mississippi State, MS, 39762 Background: Glycine max (soybean) is most important crop cultivated worldwide known for protein sources. The productivity of soybean crop is severely limited due to drought stresses. To cope with environmental constraints, plants can sense the changes of their environmental stress and responds to those challenges using multiple defense mechanisms. Abscisic acid (ABA) is one of the most critical phytohormones, also called a vital messenger, which acts as the signaling mediator in different environmental stress for adaptive response of plants. Previously, we identified in Vicia faba the first abscisic acid-activated protein kinase (AAPK), a guard cell-specific kinase, and showed that it is a positive regulator of ABA signaling. A protein kinase is well studied in Arabidopsis and rice. However, little is known about soybean. This is the first study to determine the role of Glycine max AAPK-like protein kinase in plant responses to drought stress. Results: We studied the expression of GmAAPK genes in the root, especially in drought conditions. Analysis of GmAAPK overexpression lines shows that these transgenic plants exhibit enhanced drought tolerance, suggesting that the GmAAPK is a positive regulator of drought tolerance. For further understanding the molecular mechanisms of GmAAPK gene, RNA-seq approach was used. Several key genes involved in drought response have been identified as differentially expressed genes (DEGs) between control and drought conditions, i.e. 6800 DEGs in control samples, and 2813 DEGs in the GmAAPK RNAi silenced samples were under drought stress. Also, ABA signaling pathway components were identified which were unique to the drought response triggered by RNAi silencing. Conclusions: The study reveals the dynamic transcriptome reconfiguration in soybean roots to acclimate drought stress and highlights the gene expression changes associated with signaling through AAPK. These results will also provide genetic foundations for developing drought-tolerant soybean cultivars via manipulating kinase gene family.

Page 67: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

SCOTT M. WILLIAMS

Professor

Department of Population and Quantitative Health

Case Western Reserve University

http://epbiwww.case.edu/wp-content/uploads/2016/10/swilliams_cropped.jpg

Title

Evolution as a metaphor for No Boundary Thinking

Abstract

TBD

Bio

Dr. Williams’ research focuses on studies of the distribution of genetic variation among human populations and the role that differences in patterns of variation play in disparity of disease among populations. He is especially interested in common, complex diseases that do not have genes of major effect, but are more likely to be due to genetic models involving interactions among risk factors. These interests have led to research dealing with diversity among African and African descent populations and studies of multiple diseases that are either more common in these populations, such as hypertension and preterm birth, or less common, such as gastric cancer.

Page 68: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Predicting Drug-Induced Liver Injury (DILI) – Comparing in silico, genomic and Tox21 screening methods

Shraddha Thakkar, Weida Tong National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR

Drug safety assessment is one of the primary challenge for drug development as well as regulatory application owing to the poor performance of existing preclinical models. This concern has led to significant efforts on evaluating alternative methods for predicting DILI. The era of 21st century toxicology relies heavily on higher throughput approaches such as computational and in silico models, genomics, and in vitro approaches such as high content biology screening assays. Some of these technologies have emerged as critical tools in regulatory decision-making in the EU under the REACH initiative, guided by the 3Rs principles (Replacement, Reduction and Refinement of animal use). In the US, both Tox21 and ToxCast have evaluated the potential of these methodologies in regulatory applications. Mathematically, a predictive model is a solution of Y=f(X), where, Y is a toxicological end point, and X is the data generated from these technologies such as gene activities derived from genomic methods or parameters that can be readily obtained via computational means such as descriptors. These are then computed using a mathematical function of f to predict a toxicological endpoint (Y). The choice of X (descriptors from different technologies) and f (such as machine learning methods) affects the predictive performance of Y, along with the choice of drugs used in this equation. Many predictive toxicological models have been described but their performances are difficult to compare due to minimal overlap in the choice of drugs, mathematical functions used (f) and measurement technologies (X). Thus, the true value of these technologies in terms of toxicological prediction is poorly understood. This study aims to compare, evaluate, and contrast the three main approaches (high throughput technology, genomics-based, in silico computational approaches) by systematically assessing the factors important in Y=f(X). In addition, benchmarking data will be generated via a direct comparison of the methodologies to understand their strengths individually and in combination. Crowdsourcing approaches will be taken via engaging research communities to leverage their experience in this field. The project will focus on drug-induced liver injury (DILI) since the dataset for this particulartoxicity is rich and diverse. Overall, this systematic, crowdsourcing-based approach will generate evidence to support establishing realistic expectations from these technologies for toxicological prediction and will analyze readiness and gaps for application to regulatory decision making.

Page 69: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Single-cell RNA-seq analysis of retinal ganglion cell subtypes of glaucoma DBA/2J mice

Siamak Yousefi1,2, Hao Chen1, Jesse Ingels1, Sumana Chintalapudi, Megan K. Mulligan1, Bryan Jones3, Vanessa M. Morales-Tirado1, Pete A. Williams4, Simon WM John4, Felix L. Struebing4,

Eldon E. Geisert5, Monica M. Jablonski2, Lu Lu1, Robert W. Williams1

1Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science

Center, Memphis, TN, 38163 2Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN,

38163 3Department of Ophthalmology, University of Utah, Salt Lake City, UT, 84132

4The Jackson Laboratory, Bar Harbor, ME, 04609 5Department of Ophthalmology, Emory University, Atlanta, GA, 30322

Background: We are developing methods to define molecular signatures of cellular stress during early stages of glaucoma for major subtypes of retinal ganglion cells (RGCs). Our first aim is to develop reliable mRNA biomarkers for RGC subtypes in the DBA/2J (D2) mouse model of glaucoma prior to disease onset. Our second objective is to quantify cellular stress in RGC subtypes at early stages of disease using known stress-responsive transcripts (e.g. Struebing et al. 2016 PMID:27733864; Williams et al. 2017 PMID:28209901; Lu et al. 2018 ARVO abstract). Whole retinas from D2 or D2.Cg-Tg(Thy1-CFP)23Jrs/SjJ at 130 to 150 days-of-age were dissociated gently and size selected (>10 µm). RGCs were enriched using THY1 antibody-coated beads. Fluidigm HT microfluidics plates were used to isolate and generate scRNA-seq libraries of full length polyA-positive mRNAs using SMART-Seq v4. Libraries were sequenced using HiSeq3000, PE151. Following alignment using STAR, expression was normalized to log2(FPKM+1) across ~25,000 unique transcript models. Cells with fewer than 900 detected genes and genes expressed in fewer than 3 or 2000+ cells were excluded. Sets of 905 genes with high variance and/or high expression were used for principal component analysis (PCA). Eight PC components were used for density-based unsupervised clustering and visualized using t-distributed stochastic neighbor embedding (tSNE). Gene specificity was computed for all transcripts across all clusters. The top transcripts per cluster with expression >1 in 1% or more of cells, were used to diagnose cellular identify of clusters. Results: A total of 2400 cells, of which well over half are RGCs, were analyzed. The scRNA-seq protocol generates 150,000–200,000 uniquely mapped mRNA reads/cell and ~5000 genes/cells. Roughly 75% of cells are positive for two or more of the following RGC markers: Thy1, Rbpms, Rbpms2, Jam2, G3bp1, and Ywhaz. We identified at least six clusters in the initial data sets using these protocols and are now linking clusters to major classes of RGCs. Conclusions: Molecular signatures of cellular stress and RGC subtypes in early stage of glaucoma should now be identifiable using unsupervised learning techniques.

Page 70: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Effects of Ethanol on Escherichia coli-mediated sepsis: Differences Between Changes in Gene Expression Early and Late in the Course of Infection.

Xioamin Deng, Ruping Fan, Bindu Nanduri and Stephen B. Pruett. Excessive acute ethanol exposure (binge drinking) is a significant risk factor for mortality in patients with sepsis. Ethanol has similar effects in a mouse model of sepsis produced by intraperitoneal administration of Escherichia coli (E. coli). In the present study, ethanol administered shortly before challenge with bacteria decreased expression of numerous pro-inflammatory mediators involved in innate immunity, and this was associated with decreased survival time and percentage. Microarray analysis of peritoneal leukocytes was used to examine global gene expression profiles at 1 and 2 hr (early sepsis) and 18 hr (late sepsis) after E. coli challenge, with or without ethanol pre-treatment. Ethanol inhibited the E. coli-mediated induction of many genes which code for pro-inflammatory mediators, particularly at 2 hr. Key categories that were markedly inhibited were toll-like receptor signaling, cytokine/chemokine production, macrophage mobilization, and bactericidal activity. Changes in gene expression were generally consistent with functional effects in these same experiments (which have been reported previously). In addition, the gene expression results reported here indicate previously unrecognized molecular pathways which may contribute to decreased resistance to sepsis, such as inhibition of an interferon amplification loop, protein translation initiation, phospholipase signaling, and cell proliferation. At 18 hr, just 6 hr before lethal outcomes began, expression of surprisingly few immune parameters were altered by ethanol. This suggests a conceptual paradigm in which early inhibition of innate immune responses leads to decreased clearance of bacteria late in sepsis without concomitant upregulation of innate immune gene expression. This has implications regarding the failure of inhibitors of cytokines and chemokines to effectively treat sepsis. This work was supported by grants from NIAAA (R01AA009505) and NIGMS (P20GM103646).

Page 71: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Quantitative Target-specific Toxicity Prediction Model (QTTPM): A Novel Computational Toxicology Approach Integrating Molecular Dynamics Simulation and Machine Learning

Sundar Thangapandian1, Gabriel Idakwo2, Nan Wang3, Chaoyang Zhang2 and Ping Gong1*

1* Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg,

MS 39180 2 School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406

3 Department of Computer Science, New Jersey City University, Jersey City, NJ 07305

Background: Quantitative structure-activity relationship (QSAR) modelling is a chemical descriptors-based approach for quantitative prediction of biological activity, potency or toxicity of a chemical. QSAR modelling may suffer low prediction accuracy in the absence of information on chemical-biomacromolecule interactions. In order to mitigate this problem, we developed a novel Quantitative Target-specific Toxicity Prediction Model (QTTPM) approach that integrated molecular dynamics (MD) simulation and machine learning. As a proof-of-concept study, we chose androgen receptor (AR) as the toxicant-targeted biomacromolecule because AR is a nuclear receptor playing crucial roles in the development of male reproductive system and tumors in prostate, bladder, liver, kidney and lung. Molecular docking and MD simulations were employed to generate a new set of dynamic protein-ligand interaction descriptors (dyPLIDs) used for developing QTTPMs. We selected 274 chemicals (154 agonists/120 antagonists) with quantitative AR assay outcomes from Tox21 datasets. First, we performed five 100-ns MD simulations of AR crystal structures in its un-bound (apo), two agonist-bound (testosterone and dihydrotestostrone), and two antagonist-bound (R-bicalutamide and cyproterone acetate) forms and identified key interaction patterns leading to >400 dyPLIDs. Second, 6-ns MD simulations of 274 AR-ligand docked complexes were performed to calculate dyPLIDs. Third, Random Forest (RF) algorithm was deployed to identify key descriptors (including both conventional 1D/2D/3D descriptors and dyPLIDs). Fourth, QTTPMs were built using the key descriptors and AR assay data. Results: QTTPMs demonstrated superior accuracy than QSAR models constructed with conventional chemical descriptors. In addition, QTTPMs provided insights of key protein structural changes upon ligand binding that modulated the activity of the AR. Conclusions: The novel QTTPM approach was developed using a small dataset of 274 AR agonists/antagonists. Although more biomacromolecular targets and chemicals warrant further investigations, this study demonstrates the superiority of QTTPM over QSAR and that QTTPM is a promising new tool for computational predictive toxicology. Innovations The innovations of this study include (i) overcoming technical challenges in obtaining the dyPLIDs by performing an ensemble of MD simulations (~2.5 µs in total) for 274 AR-chemical complexes on a Super Computer system. To the best of our knowledge, this is the first attempt towards this goal. (ii) Building QTTPMs by integrating dyPLIDs, a new dimension of variables describing dynamic interactions between ligand and receptor, in comparison with the conventional ligand-based QSAR models, which only use the physico-chemical properties of the

Page 72: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

chemicals in model building. (iii) The QTTPM approach can be applied to other chemicals and toxicity targets, e.g., 10K chemicals and estrogen receptors, PPARγ and aryl hydrocarbon receptor from the Tox21 project. Author contributions PG conceived the QTTPM approach. ST developed the methodology to compute dyPLIDs (including molecular docking and MD simulations on Super Computers), performed machine learning studies to down-select key descriptors, and conducted the computational experiments to build QTTPMs and QSAR models. GI assisted with Tox21 bioassay data curation. NW and CZ helped with machine learning studies. ST drafted the abstract and prepared the PowerPoint Presentation.

Page 73: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Focused ion beam-scanning electron microscopy for three-dimensional modelling of cellular ultrastructure

Brandon C. Reagan,1 Paul J.-Y. Kim,1 Preston D. Perry,1 John R. Dunlap2 and Tessa M. Burch-

Smith1,3

1Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996 2Advanced Microscopy and Imaging Center, University of Tennessee, Knoxville, TN 37996 3School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 Background: Over the last several decades transmission electron microscopy (TEM) has been indispensable for understanding cellular ultrastructure. In recent years, three-dimensional analysis of organelles has been advanced through the use of TEM tomography. TEM tomography generates a series of TEM images, taken over a range of angles, to construct a three-dimensional projection of the sample called a tomogram. TEM tomography has greatly advanced knowledge of cellular processes, but its implementation is limited to samples of 150-200 nm thickness when imaged on a typical transmission electron microscope. An alternative to TEM tomography is focused ion beam/scanning electron microscopy (FIB/SEM). This technique combines an ion beam for performing serial sectioning with an electron beam for collecting serial images of a sample. In theory, the thickness of a sample that can be imaged by FIB/SEM is limited only by the length of time over which samples are collected, overcoming the sample-thickness limitations of traditional TEM tomography. Results: We have used FIB/SEM can to study the ultrastructures of several plant tissues in samples previously prepared for TEM via common fixation and embedding protocols. We have generated 3D images of complete Nicotiana benthamiana leaf mesophyll cells, and have also examined the structures of two types of nitrogen-fixing nodules: those of soybean root nodules and nodules formed on the roots of Medicago truncatula. Conclusions: Our results demonstrate that FIB/SEM is a powerful tool for understanding the relationships between plant organelles and for interrogating plant-microbe interactions. In particular, the three-dimensional models generated have been very useful for studying the structures of cellular membranes in tissues examined.

Page 74: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Weighted in-network Node Expansion and Ranking (WINNER): a New Approach to Identify Potential Biomarkers

Thanh Nguyen1,2, Zongliang Yue1, Min Gao1, Jake Yue Chen1,*

1* Informatics Institutes, School of Medicine, the University of Alabama at Birmingham,

Birmingham, AL 35209 2 Department of Computer and Information Science, Indiana University Purdue University

Indianapolis, IN 46202

Background: In network-based analysis, biologists and bioinformaticians regularly apply the hub paradigm to identify potential biomarkers. To increase the comprehensiveness and novelty of the hub-based analysis, bioinformaticians apply the network extension strategy: adding genes directly connected to the initial interested gene set prior to hub analysis. Although this dual-strategy has been proven successful in many studies, there are two opened questions. First, to what extend network expansion achieves the power law characteristics so that the hub genes could highlight the network modularity? Second, how to address the statistical significance of the hub analysis in identifying novel biomarkers? Results: In this work, from the dual-strategy above, we propose the WINNER (weighted in-network node expansion and ranking) framework to prioritize and identify novel biomarkers. To answer the first question, WINNER ensures that the result of network expansion achieves strong power-law characteristics. In addition, WINNER applies the random-walk methodology to improve the hub analysis results. To answer the second question, we apply two network randomization strategies: the preserving-node-degree and the modularity-preserving randomization to compute the p-value as the metric for statistical significance of each hub gene. Applying WINNER for Alzheimer’s case study, we identify four significant hubs MYC, PSEN1, SMAD2, and SMAD3. By ontology analysis, we shows that these hubs have more significance modularity than the other hubs. Conclusions: The WINNER framework could answer two questions in applying network expansion and hub analysis to identify potential biomarkers. The WINNER model for statistical significance shows strong capability to remove potential false-discovered genes resulted from hub analysis.

Page 75: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Development of novel vitamin E analogs as potent radioprotectors

Ujwani Nukala1,3, Awantika Singh1,3, Shraddha Thakkar1,3, Nukhet Aykin-Burns1, Mahmoud Kiaei2, Rupak Pathak1, Philip J. Breen1, and Cesar M. Compadre1,

1Department of Pharmaceutical Sciences, University of Arkansas for Medical Sciences, Little Rock, AR, 2Department of Pharmacology and Toxicology, University of Arkansas for Medical Sciences, Little Rock, AR, 3Joint Bioinformatics Graduate

Program, University of Arkansas at Little Rock and University of Arkansas for Medical Sciences, Little Rock, AR,

Background: There is still an unmet need for radioprotectors, compounds that protect against radiation

injury in the event of radiation accidents or terrorism scenarios. In this context, Vitamin E is a very well-

known anti-oxidant that scavenges the free radicals produced by radiation exposure. Vitamin E family

includes eight different isoforms including four tocopherols (α, β, γ and δ) and four tocotrienols (α, β, γ

and δ), which are collectively known as tocols. The standard vitamin E containing preparation sold in the

market is α – tocopherol (AT) because AT has the slowest rate of elimination (t1/2= 18 h) and thus it can

be used for once-a-day administration. However, the therapeutic efficacy of AT has been disappointing

and rather poor. On the other hand, there is a rapidly increasing number of studies that show that the

tocotrienols have a much superior biological activity compared to the tocopherols but show very low

bioavailability.

Results: In this study, the cellular uptake levels of different isoforms of vitamin E family including three

tocopherols (α, γ and δ) and three tocotrienols (α, γ and δ) in Endothelial (Huvec) Cells, Motor neuron like

cells (NSC-34 cells) and Hepatocytes (HepG2 cells) was determined and TBARS assay was performed to

compare the inhibition of TBHP induced lipid peroxidation in wister rat liver microsomes by tocols. The

results of our analysis show that the tocols show dramatic variances in their PK/PD profile despite of their

minor structural differences. Based on these results, we suggest a paradigm in which the observed

differences can be explained by a multifactorial function: “Tocol therapeutic efficacy = Fn (intrinsic bioactivity, elimination rate, cell-uptake)”. Using an in-silico screening procedure novel vitamin E

analogs, tocoflexols were designed using the above paradigm and molecular dynamic (MD) simulations

of tocoflexols with α-tocopherol transfer protein (ATTP), the protein responsible for maintaining the

plasma levels of the tocols showed that tocoflexols can bind better to ATTP than tocotrienols, and

therefore have better bioavailability.

Conclusions: The above suggested paradigm is used as a guide to develop novel vitamin E analogs,

tocoflexols which can be developed as potent radioprotectors.

Page 76: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

INNOVATION IN RESEARCH

The process of drug development is an overwhelming task, in which failure is much more likely than

success. In this study, using in vitro PK/PD modeling results, we suggested a paradigm that can be used

as a guide to develop novel vitamin E analogs as potent radioprotectors. We optimized the chances of

success using molecular dynamic simulation approach to screen tocoflexols for their ability to bind to

ATTP. These in silico studies will help in identifying the potential candidates, thus saving time, money and

efforts. We will validate these results by testing the bioavailability and efficacy of tocoflexols as radio

protectants in the near future using animal models.

AUTHOR CONTRIBUTIONS

This project is part of U. Nukala’s, Ph.D. research project, she conducted all the computational

experiments, in vitro experiments, analyzed the results and drafted the paper. Other co-authors have

advised and assisted in conducting in silico and in vitro experiments and C. Compadre supervised the

project and assisted with the analysis of the results and the writing of the paper.

Page 77: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Effects of Ethanol on Escherichia coli-mediated sepsis: Differences Between Changes in Gene Expression Early and Late in the Course of Infection.

Wei Tan, Xioamin Deng, Ruping Fan, Bindu Nanduri and Stephen B. Pruett. Excessive acute ethanol exposure (binge drinking) is a significant risk factor for mortality in patients with sepsis. Ethanol has similar effects in a mouse model of sepsis produced by intraperitoneal administration of Escherichia coli (E. coli). In the present study, ethanol administered shortly before challenge with bacteria decreased expression of numerous pro-inflammatory mediators involved in innate immunity, and this was associated with decreased survival time and percentage. Microarray analysis of peritoneal leukocytes was used to examine global gene expression profiles at 1 and 2 hr (early sepsis) and 18 hr (late sepsis) after E. coli challenge, with or without ethanol pre-treatment. Ethanol inhibited the E. coli-mediated induction of many genes which code for pro-inflammatory mediators, particularly at 2 hr. Key categories that were markedly inhibited were toll-like receptor signaling, cytokine/chemokine production, macrophage mobilization, and bactericidal activity. Changes in gene expression were generally consistent with functional effects in these same experiments (which have been reported previously). In addition, the gene expression results reported here indicate previously unrecognized molecular pathways which may contribute to decreased resistance to sepsis, such as inhibition of an interferon amplification loop, protein translation initiation, phospholipase signaling, and cell proliferation. At 18 hr, just 6 hr before lethal outcomes began, expression of surprisingly few immune parameters were altered by ethanol. This suggests a conceptual paradigm in which early inhibition of innate immune responses leads to decreased clearance of bacteria late in sepsis without concomitant upregulation of innate immune gene expression. This has implications regarding the failure of inhibitors of cytokines and chemokines to effectively treat sepsis. This work was supported by grants from NIAAA (R01AA009505) and NIGMS (P20GM103646).

Page 78: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

A decade of MAQC effort and its contribution to our understanding of high-throughput genomics technologies

Weida Tong, NCTR/FDA

Emerging genomics methodologies contribute to our understanding of disease and health. However, its value in clinical and regulatory applications requires rigorous assessment and consensus between various stakeholders. The presentation overviews the FDA efforts in this field with a specific discussion of the FDA led community wide Microarray/Sequencing Quality Control (MAQC/SEQC) consortium. The consortium promotes standardization and quality control to address alarming concerns on the lack of reproducibility in the generation, analysis, and interpretation of genomics data. Specifically, the presentation will discuss some of advancements in this area based on the data generated from MAQC/SEQC and beyond. In addition, the fourth MAQC project, known as SEQC2, will be introduced which is focused on assessing the power and limitations of whole genome sequencing and target gene sequencing in clinical application and precision medicine. In the end, a set of lessons-learned and general guideline will be provided to explicitly consider reproducibility, a fundamental hallmark of good science, in analysis of transcriptomics data.

Page 79: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Systems Biology and Big Data: Little Mitochondria as a Big Example

William B Mattes1

1*Division of Systems Biology, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079

Background: Systems biology represents a continuation of historical efforts to model biological systems. It may be viewed as an extension of the principles of physiology to the molecular and cellular level. However, that extension hinges upon integrating the vast and diverse data representing molecular type and modification, cellular and tissue localization, molecular and cellular interactions, and temporal variations based upon development or stimulus response. Mitochondrial systems biology offers a challenging example, given the interplay between nuclear and mitochondrially encoded proteins, tissue and developmental type, and responses to external factors. An understanding of this interplay would be of great value in understanding the toxicity of many drugs. Accordingly, we have examined several data sources for information on both drug-induced mitochondrial injury (DIMI) as well as studies of mitochondrial components. Results: Drugs associated with DIMI are found in almost all therapeutic areas, and result in a broad range of adverse events. At the same time a proteomic survey of mitochondria from 14 tissues shows that there are minor, but potentially relevant, tissue differences, with some tissues (e.g. liver and kidney) clustering closer to each other as compare to other tissues on the basis of mitoproteome content. Conclusions: What is not clear is how these differences impact toxicity or function. Nonetheless, they provide substrate for hypotheses, as well as suggesting the need for more comprehensive data collection. Overall, this talk will provide some examples of the promise of mitochondrial systems biology, but will mostly highlight our knowledge gaps and the work remaining to be done.

Page 80: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:
Page 81: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

MCBIOS Timber Rattlesnake Genome Project: Current Status and Lessons Learned William S. Sanders1

[email protected] 1 The Jackson Laboratory, 10 Discovery Drive, Farmington, CT 06032 The MCBIOS Timber Rattlesnake Genome Project was first presented and proposed to the MCBIOS community at the 2012 annual MCBIOS meeting. Initially proposed as a large scale collaboration at the scientific society level, in the six years that have followed the inception of the project, the project has undergone several status changes: starts and stops, bursts of activity, and periods of stagnation. As MCBIOS begins discussing a new collaborative project, this talk will provide the history and current state of the first large scale MCBIOS collaboration, along with the next steps needed to successfully close out this project. Finally, the lessons learned and pitfalls to avoid will be described.

Page 82: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Distinguishing Resistance-Conferring from Susceptible Mutations in Acetohydroxyacid Synthase by High Performance Computing-Enabled Computational Modeling

Yan Li1, Michael D. Netherland2, Chaoyang Zhang3, and Ping Gong2*

1 Bennett Aerospace Inc., Cary, NC 27518

2 Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS 39180

3 School of Computing, University of Southern Mississippi, Hattiesburg, MS 39406 Background: The acetohydroxyacid synthase (AHAS) is a key enzyme catalyzing the biosynthesis of the essential branched-chain amino acids. AHAS-inhibiting herbicides are the largest site-of-action group on the market, comprising more than 50 active ingredients across 5 chemical classes. However, repeated intensive use of these herbicides has resulted in resistance evolution in a wide variety of weed species. The resistance is primarily conferred by alteration in the AHAS gene that attenuates sensitivity to the herbicides. On the other hand, not all target site mutations in AHAS confer resistance to a target-specific herbicide. Due to the prohibitive costs of population-level resistance testing, it is of great demand and significance for a reliable, cost-effective, and systematic approach to identify resistant AHAS mutations in field samples. Here we developed an approach based on homology modeling, molecular docking and molecular dynamics (MD) simulation and implemented it in high performance computing (HPC) systems. We selected Kochia scoparia as the test species because of the availability of both AHAS mutations and resistance testing data to two AHAS inhibiting herbicides (tribenuron methyl and thifensulfuron methyl). Results: We determined the binding affinity between AHASs (1 wild-type and 28 mutated) and two herbicides by analyzing MD simulation data using a series of molecular mechanics (MM) and hybrid quantum mechanics (QM)/MM methods. The performance of each method was evaluated by such metrics as enrichment, area under the curve (AUC) of the receiver operating characteristic, and accuracy. MM-PBSA (MM combined with Poisson-Boltzmann and surface area) showed superior performance over other methods in discerning between resistant (25) and susceptible (4) genotypes with AUC and accuracy both over 0.9. Conclusions: This study demonstrated that our HPC-enabled computational modeling approach has a high discriminating accuracy and predictive power, and is a promising tool for screening and early detection of target site mutation-conferred herbicide resistance.

Page 83: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

Applying multiple transcriptome analyses to understand plant-viroid interactions

Ying Wang1,*, Yi Zheng2, Zhangjun Fei2,*, Biao Ding3

1Department of Biological Sciences, Mississippi State University, Starkville, MS, 39759 2Boyce Thompson Institute, Cornell University, Ithaca, NY, 14853

3Department of Molecular Genetics, Ohio State University, Columbus, OH, 43210

Background: When challenged by pathogens, plants undergo extensive changes to shift from normal growth to defense mode. Next-generation deep sequencing technology and bioinformatics now provide great opportunity to deepen the understanding of gene expression changes during plant-pathogen interactions. Results: We applied multiple transcriptome analysis to dissect the molecular basis underlying plant-viroid interactions. Viroids are circular noncoding RNAs that infect plants. When using a combination of RNA-seq as well as small RNA and degradome deep sequencing, our data revealed that potato spindle tuber viroid (PSTVd) along can trigger plant immune responses and alter gene splicing patterns. We also observed a new pattern of Dicer-like proteins mediated defense against viroids. Conclusion: Next-generation deep sequencing greatly deepens our understanding of plant-viroid interactions and provide a foundation for future mechanistic studies.

Page 84: Genomics and Big Data - MCBios State University, Starkville, MS Professional Characterization of the Gut Microbiome of channel catfish following florfenicol treatment Session – 8:

WIPER: Weighted in-path edge ranking for biomolecular association networks

Zongliang Yue1*, Thanh Nguyen1,2, Min Gao1, Jake Chen1§

1 Informatics Institute in School of Medicine, University of Alabama at Birmingham, Birmingham, AL 435233, USA

2 Department of Computer and Information Science, Purdue University School of Science, Indianapolis, IN 46202, USA

*Presenting Author (Student): Zongliang Yue ([email protected]) §Corresponding Author: Jake Chen ([email protected])

Abstract: Background:

In integrative high-throughput biological studies, bioinformaticians regularly need to interpret genomic and functional genomic data with condition-specific biomolecular association networks. Current network biology research has focused on network nodes, e.g., using networks to characterize unknown gene/protein functions or ranking genes/proteins based on network topology. Little research has been reported on the computational characterization and prioritization of network edges, which may encode key regulatory events significant to the sub-network. WIPER (Weighted in-Path Edge Ranking) is a new computational technique that we report here to help biomedical researchers prioritize and rank-order weighted edges from biomolecular association networks, and explore the novo edges in the critical topological paths in the network with statistical significance in our statistical model.

Result:

WIPER had been applied to the toy models to show the edge prioritization influenced by edge weight, edge traversal path length, network topology, break of the edge, and WIPER’s iteration. The result showed that WIPER yielded the usage fold change score (UFS) can generate stable result avoiding the biases caused by different edge weights and edge traversal path lengths. In the simulated typical biological network structure, WIPER could identify the bridge-edge between the sub-networks as the most significant one. To explorer the novo edges, we set the novo edge cutoff score with a low false discover rate. After applying the statistical model and log fold change transformation method, a significance p-values is assigned to each ranked edge. We applied WIPER in an alzheimer disease study, and found the HMGB1-TP53 (#1), TP53-SMAD3 (#3), TP53-SMAD2 (#5) as the top important exsisting edges in the database. We also found ESR1-HMGB1 (#2), TP53-PLG (#4) as the top important novo edges.

Conclusion:

We believe that WIPER can assist the biologists and informatics to discover the important interations in the disease-specific biomolecular association network. WIPER is made publicly available from http://discovery.informatics.uab.edu/wiper/.


Recommended