41
SIB Profile 2015 SIB, a crucial link in the life science chain

SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

SIB Profile 2015SIB, a crucial link in the life science chain

Page 2: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

3

Foreword 5

What is bioinformatics? 6

About SIB 8

Vision and missions 9

SIB organization: an efficient collaborative Swiss model 10

Governance 12

Bioinformatics resources for medicine and life sciences 14 Databases and software platforms 15 Core facilities and high-performance computing centres 18

Leading and coordinating bioinformatics in Switzerland 23 Clinicalbioinformatics:pavingthewaytoafinerkindofhealth 24 Legalandtechnologytransferoffice:enablingthecommunitytobenefitfromSIB’sinnovations 26 SIB technology: optimizing technology-related activities 27 Training and outreach: training the next generation of bioinformaticians 28

SIB – the Swiss node of ELIXIR 30

In the spotlight in 2014 32

Finance 34

Staff 35

Research and service activities 36 SIBcollaborativenetwork 36 A wide variety of activity domains 38 Genesandgenomes 40 Proteinsandproteomes 47 Medicine and health 51 Evolutionandphylogeny 56 Structuralbiology 63 Systemsbiology 66 Bioinformatics infrastructure 73

Acknowledgements 77

Page 3: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

5

It has become difficult to imagine a life science project without bioinformatics. Fromunderstanding the 3D structure of macromolecules to designing drugs and mapping molecular pathways,bioinformaticscontinuestobeattheforefrontofmanyfieldsofresearch.

Today Switzerland has the highest concentration of bioinformaticians worldwide. SIBcurrentlycounts56researchandservicegroupsacrossSwitzerland–a25%increaseoverlastyear–andmorethan650members,demonstratingthe interest in,andneedfor,suchan infrastructure.Collaboration isanother importantmeasureofsuccess,andasshown in pp.36-37,SIBgroupshaveestablishedanoutstandingcollaborativenetwork,whichweareveryproudof.

Withtheflourishingdevelopmentofpersonalizedmedicine,ourInstituteisincreasinglytakingpartinthisfield(pp.24-25).Personalizedmedicineproposestotailormedicaldecisionstotheindividualpatient’sdata,whichrelyonDNAsequencingandother“omics”technologies,aswellasimaging.Thehugeamountofdatageneratedbythesemethodsisthenprocessedwithspecificbioinformaticstoolstotransform“bigdata”intomeaningful,“smartdata”.Thisisthenewfieldofclinicalbioinformatics,whoseimportancewasrecognizedbySIBveryearlyon.

Science has never been confined to geographical boundaries. And the same goes forbioinformatics.Overtheyears,SIBhasnotonlymultiplieditsinternationalpartnershipsbutthanks to its intercantonal and interinstitutional model of collaboration, the Institute was a keyplayerinthecreationofELIXIR–theEuropeanLifeScienceInfrastructureforBiologicalInformation–andistodayitslargestnationalnode(pp.30-31).In2014,theEuropeanCouncilidentifiedELIXIRasoneofEurope’sthreepriorityresearchinfrastructures,furtherhighlightingthekeyroleofbioinformaticsinlifesciencesandmedicine.

Bioinformaticsisattheheartofscienceanditslong-termsustainabilityiscrucial.WewouldliketothanktheSwissgovernment,theFederalAssembly,theStateSecretariatforEducation,Researchand Innovation, theSwissNationalScienceFoundationandall those in fundingroles,aswellasourpartnerinstitutionsfortheirunwaveringandinvaluablesupport.

Last but not least, our heartfelt gratitude goes to all SIB members without whose talent and dedicationSwissbioinformaticswouldnotbewhereitistoday.

Felix GutzwillerPresident of theFoundationCouncil

Manuel PeitschChairman of the Board of Directors

Ron AppelExecutive Director

Bioinformatics is at the heart of science and its long-term sustainability is crucial

Page 4: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

6 7

Bioinformatics is the application of computer technology to the understanding andeffectiveuseofbiologicaldata.Inotherwords,ithelpstoconvert“bigdata”into“smartdata”orknowledge.

Computinghasbecomeacentralcomponentofmodernscientific research: largevolumesof data (“big data”) are generated by increasingly automatedmeasuring devices. Thesedata need to be stored, organized and analysed to extract new insights and knowledge.Because new discoveries often reveal their relevance when they are compared with the alreadyavailableknowledge,extensivecomparisonwithlargedatasetsisagreatadvantage.In addition, computational simulation has become a third pillar of science–alongwithexperimentationandtheory–allowingresearcherstoadvancetheirunderstandingofcomplexsystems in silico.

Bioinformatics provides:• Biocuration and bioinformatics expertise enabling life scientists to have accurate

and comprehensive representation of biological knowledge and take full advantage of bioinformatics technologies

• Databases and knowledgebases for giving life scientists access to curated biological data and information

• Software for analysing, visualizing, interpreting and comparing biological data and for modelling biological systems

• Computing and storage infrastructures for storing, analysing and processing biological data,including“bigdata”

Bioinformatics is thusan interdisciplinary field that bringsmajor advances inmanydifferentlifescienceandmedicalareas.Bioinformaticsdomainsinclude(formoredetails,pleaseseecorrespondingsectionsonp.38andfollowingpages):

Ensuring online access to bioinformatics resources, such as databases and software platforms, as well as training and support from skilled bioinformaticians, is therefore essential formedicineandlifesciences.

Thischartshowsthecrucialroleofbioinformaticsinmedicineandlifesciences.Thankstoresourcessuchasdatabasesandsoftwareplatforms,bioinformaticsallowsthestorageandanalysisof“bigdata”producedbyexperimentation.Thesedataarethenconvertedinto“smartdata”orknowledgethatcanbeusedbylifescientistsandclinicians.Bioinformaticscanalsogeneratedataknowledgethrough in silicoexperiments.

Genes and genomes

Proteins and proteomes

Medicine and health

Evolution and phylogeny

Structural biology

Systems biology

Bioinformatics infrastructure

Gene expression

LIFE SCIENCES & MEDICINE

BIOINFORMATICS

FIELDS OF ACTIVITY RESOURCES

Genes & genomes

Proteins & proteomes

Medicine & health

Evolution & phylogeny

Structural biology

Systemsbiology

Bioinformaticsinfrastructure

Databases

Algorithms

Software tools

Computing & storage

infrastructure

People & expertise

EXPERIMENTAL DATA PRODUCTION

Molecularstructures

Sequencing

Scientific literature

Clinical data

And so on

“BIGDATA”

“SMART DATA”

In silico data &

knowledge production

Data analysis

Data storage

Page 5: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

98

The SIB Swiss Institute of Bioinformatics is a unique success story at the frontier of lifesciences and computer science. When the Institute was founded in 1998, bioinformaticswasstillinitsinfancy,bothinSwitzerlandandabroad.TodaySIBisanindependent, non-profit foundation recognizedofpublicutility thatprovidesworld-classbioinformatics.Thedecentralized, federating organizational structure of SIB is a model for countries setting up their own bioinformatics infrastructure, as well as for the European bioinformatics programme ELIXIR,whichhasadoptedthehubandnodesmodel.

By sharing its expertise in storage, analysis and dissemination of large biological datasets and through education and collaborations with research institutes and industrial partners, SIB has created a true bioinformatics culture in Switzerland, which counts today thehighest concentrationof bioinformaticians in theworld.The Institute continues to leaddevelopments in the field of bioinformatics.Well aware that the wealth of data producedby modern technologies and the growing self-awareness of patients will change the way of considering medical data, SIB is also now embracing the challenge to strive for bioinformatics excellence in personalized health.

SIBinfiguresMore than

650members

who carry out outstanding science and services

56research

and service groups fromthemajorSwiss

schools of higher education and research institutes

16institutional

members spread across Switzerland

More than

150 resources, including high-quality

databases and software accessible through the SIB portal ExPASy;

created in 1993, ExPASy wasatthattimethefirstwebsiteavailableinthebiomedicalfield

More than

900,000 requests

per month on the UniProtKB/Swiss-Prot website

More than

1,300 peer-reviewed

articles published by SIB members

over the last 15 years

SIB helps shape the future of life sciences through excellence in bioinformatics

Mission 1To provide world-class core bioinformatics resources to the national and international

life science research community

Mission 2To lead and coordinate the field of bioinformatics in Switzerland

four strategic goals

Create, maintain and disseminate core bioinformatics databases, software and services worldwide

Offerkey competencies and research support in bioinformatics to the national life science community

Federate bioinformatics research groups from Swiss universities and research institutes and foster collaboration and innovation at the highest levelofscientificexcellence

Train first-rate researchers

In order to achieve these two missions, SIB has established

SIB, a crucial link in the life science chain

Formoreinformation:www.isb-sib.ch

Page 6: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

10 11

ModelledonSwitzerland’sfederalstructure,SIBisafederationofbioinformaticsresearchandservicegroupsfromthemajorSwissschoolsofhighereducation.TherelationshipbetweentheacademicworldandSIBrepresentsanefficientform of symbiosis: both SIB and the host institutions gain from intergroup synergiesandfromthenationalandinternationalrecognitionofthehighqualityofitsscienceandservices.

SIB’sinstitutionalmembersaretheUniversitiesofBasel,Bern,Fribourg,Geneva,Lausanne,andZurich,theUniversitàdellaSvizzeraitaliana(USI),theSwissFederalInstitutesofTechnologyofLausanne(EPFL)andZurich(ETHZurich),theGenevaSchoolofBusinessAdministration(HEG),theZurichUniversityofAppliedSciences(ZHAW);aswellasresearchinstitutes:theFriedrichMiescher Institute forBiomedicalResearch (FMI), theLudwig Institute forCancerResearch,theSwissTropicalandPublicHealthInstitute(SwissTPH),theAgroscope,andtheInstituteofOncologyResearch(IOR)(see map on the facing page).SIBalsoworkscloselywithindustrialpartners.

SIB has a unique organization: the group leaders are senior academic staff of partnerinstitutions, while a number of the scientists are paid directly by SIB. Although eachresearch group carries out its own research and teaching activities independently within its host institution, it can benefit from a wide range of resources and support providedby SIB. Moreover, SIB offers its partner institutions an efficient nationwide coordinationof bioinformatics research, core resources and teaching activities. In return, the Swissuniversities and research institutes provide SIB members with the infrastructure needed to performtheirtasks.

SIB sets up offices at Campus Biotech Members of the SIB groups Swiss-Prot and Vital-IT entered Campus Biotech to contribute their expertise to the new centre of excellence in biotechnology and life scienceresearch.CampusBiotechislocatedintheformerMerckSeronoheadquartersinGenevaandhostsacademicgroupsfromtheUniversityofGenevaandEPFL,anewWyssCentreforBio-andNeuro-Engineeringandrelatedorganizationsandbusinesses.SIB is closely associated with this endeavour, in particular through its involvement in personalizedhealthresearch.www.campusbiotech.ch

University of BaselUniversity of Zurich

SwissFederalInstituteof Technology Zurich

Zurich University of Applied Sciences

FriedrichMiescherInstitute for Biomedical Research

Swiss Tropical and PublicHealthInstitute

Agroscope

University of Bern

UniversityofFribourg

University of Geneva

University of Lausanne

Università della Svizzera Italiana–USI

SwissFederalInstituteof Technology Lausanne

Geneva School of Business Administration

InstituteofOncologyResearch

Page 7: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

12 13

The Foundation Councilis the highest authority of the Institute, with supervisory power. Its responsibilities include changes to the SIBstatutes, the nomination of Group Leaders, and the approvalof theannualbudgetandfinancial report.TheSIBpartnerinstitutionsarerepresentedinthisCouncil.

The Board of Directors (BoD) takes all the decisions necessary to achieve the aims of the Institute, e.g. defines the scientific strategy andinternal procedures, and allocates federal funds to service and infrastructure activities. It consists of twoGroupLeaders elected jointly by theCouncil ofGroupLeaders and the BoD; two external members elected by the FoundationCouncil on the recommendation of theBoD;andtheExecutiveDirector.TheBoDmembersareappointedforarenewablefive-yearperiod.

The Scientific Advisory Boardacts as a consultative body providing recommendations to theBoDandtheCouncilofGroupLeaders.Itsmaintasksconsist in monitoring the service and infrastructure activities, as well as the core bioinformatics resources. It ismadeup of at least fivememberswhomust be internationallyrenownedscientistsfromtheInstitute’sfieldsofactivities.

The Council of Group Leadersdiscusses all matters relating to the SIB groups as a whole, and proposes the nomination of new Group Leaders. It consists of theGroup Leaders, the affiliateGroupLeadersandtheExecutiveDirector.

Honorary Members

PresidentProf. Felix Gutzwiller Senator

FoundingMembersProf. Ron AppelExecutive Director,SIB and University of Geneva

Prof. Amos BairochDirector, Department ofHumanProteinSciences,University of Geneva; and SIB Group leader

Dr Philipp BucherGroup leader,SIBandEPFL

Prof. Denis HochstrasserVice-Rector,University of GenevaHeadofGeneticandLaboratoryMedicine Department,GenevaUniversityHospital(HUG)

Prof. C. Victor JongeneelDirector, Bioinformatics and Biomedical Informatics, University of Illinois at Urbana-Champaign

Prof. Manuel PeitschChairman, SIB Board of Directors and Vice-President for Biological Systems Research atPhilipMorrisInternational(PMI)

Ex officio MembersProf. Karl AbererVice-President for Information Systems, EPFL

Dr Claire BaribaudDirector, Geneva School of Business Administration

Prof. Henri BounameauxDean,FacultyofMedicine,University of Geneva

Prof. Edwin ConstableVice-Rector of Research and Talent Promotion, University of Basel

Prof. Carlo CatapanoDirector, Institute ofOncologyResearch(IOR)

Mr Marc FilliettazGeneral Manager,GeneBio

Prof. Susan GasserDirector, FriedrichMiescherInstituteforBiomedicalResearch(FMI)

Prof. Denis HochstrasserVice-Rector, University ofGeneva,HeadofGeneticand Laboratory Medicine Department, Geneva UniversityHospital(HUG)

Prof. Piero MartinoliPresident, Università della Svizzera italiana

Prof. Andreas MayerVice-Dean, FacultyofBiologyandMedicine,University of Lausanne

Prof. Philippe MoreillonVice-Rector, University of Lausanne

Prof. Jean-Marc PiveteauPresident, Zurich University ofAppliedSciences(ZHAW)

Prof. Alexandre ReymondCIG Director, FacultyofBiologyand Medicine, University of Lausanne

Prof. Roland Y. SiegwartVice-President Research and Corporate Relations, ETHZurich

Prof. Pierre Spierer Professor emeritus, FacultyofScience,University of Geneva

Dr Paul SteffenHeadoftheInstitutefor Sustainability Sciences andHeadofCorporateResearch,Agroscope

Dr Robert L. StrausbergExecutive Director of Collaborative SciencesLudwig Institute forCancerResearch(LICR)

Mrs Turkan TahtasakalHewlettPackard,Geneva

Prof. Marcel TannerDirector, Swiss Tropical andPublicHealthInstitute(SwissTPH)

Prof. Martin TäuberRector, University of Bern

Prof. Guido VergauwenRector,UniversityofFribourg

Prof. Daniel WylerVice-Rector, Medicine andNaturalSciences,University of Zurich

Co-opted MembersProf. Manolo GouyCNRSResearchDirector,Laboratory of Biometry and Evolutionary Biology, Claude Bernard-Lyon 1 University,France

Prof. Manuel Peitsch(Chairman)Vice President for Biological Systems Research atPhilipMorrisInternational(PMI)

Ms Martine Brunschwig GrafFormerNationalCouncillor

Prof. Ron AppelSIB Executive Director and University of Geneva

Prof. Christian von MeringSIB Group Leader and University of Zurich

Prof. Torsten SchwedeSIB Group Leader and University of Basel

Prof. Alfonso Valencia(Chairman)Director of the Structural Biology and Biocomputing Programme, SpanishNationalCancerResearch Centre, Madrid, Spain

Prof. Manolo GouyCNRSResearchDirector,Laboratory of Biometry and Evolutionary Biology, Claude Bernard-Lyon 1 University,France

Dr David de GraafPresidentandCEOof Selventa, Cambridge, MA, USA

Prof. Alexey I. NesvizhskiiDepartment of Pathology and Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, USA

Prof. Christine OrengoDepartment of Structural and Molecular Biology, University College London, United Kingdom

Prof. Ron ShamirComputational Genomics Group at the Blavatnik School of Computer Science, Tel Aviv University, Israel

Prof. Anna TramontanoComputational Biology Laboratory, La SapienzaUniversity, Rome, Italy

The SIB Group LeadersarelocatedinSIBpartnerinstitutions(seepp.36-37). Inaddition,OsiriXledbyProf.OsmanRatibisanAffiliateGroup.

Prof. Ernest FeytmansHonoraryDirector

Ms Christiane LangenbergerFormerSenator,HonoraryPresidentoftheSIBFoundationCouncil

Dr Johannes R. RandeggerFormerNationalCouncillor,HonoraryPresidentoftheSIBFoundationCouncil

Executive Director

Management Team

FoundationCouncil

Boardof Directors

Political Industry Executive Director

Group Leaders

Clinicalbioinformatics

Legal and technologytransferoffice

SIBTechnology

Training &outreach

SIB-wide activities

SIB Group

SIB Group

SIB GroupSIB Group

SIB GroupSIB Group

SIB Group

SIB Group

SIB Group

SIB Group

ScientificAdvisoryBoard

Council of Group Leaders

Page 8: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

14 15

CATEGORIES SUB-CATEGORIES EXAMPLES OF DATABASES

EXAMPLES OF SOFTWARE

Genes and genomes

Sequencealignment CodonSuite,LALIG,NewickUtilities

Similarity search LALIGN,Phylogibbs

Characterization/annotation CLIPZ,EPD,miROrtho,OMA,OpenFlu,OrthoDB,smirnaDB,SwissRegulon

CLIPZ,EPD,ISA,OMA,smirnaDB

Transcriptomics Bgee, CleanEx, CLIPZ, smirnaDB, SwissRegulon

CLIPZ, ISMARA, MirZ, PPA, smirnaDB

Proteins and proteomes

Proteinsequenceandidentification

neXtProt, UniProtKB, UniProtKB/Swiss-Prot, ViralZone

LALIGN,PeptideMass,Translate

Mass spectrometry and 2-DE data

SWISS-2DPAGE,WORLD-2DPAGERepository

FindPept,GlycoMod,Msight

Protein characterization and function

neXtProt, UniProtKB, UniProtKB/Swiss-Prot AACompSim, Biochemical Pathways, ProtScale

Families,patternsandprofiles

MyHits,PROSITE MyDomains,MyHits,PRATT

Post-translational modification

UniCarbKB, SugarBind, UniProtKB/Swiss-Prot

FindMod,GlycanMass,ISMARA,UniCarbKB

Protein-protein interaction STRING,UniProtKB/Swiss-Prot PredictProtein, ProtBud

Similarity search/alignment MyHits,UniProtKB BLAST,ClustalW,MyHits

Imaging ImageMaster / Melanie, MSIght

Medicine and health SwissSidechain SwissDock, SwissParam, SwissBioisostere, SwissTargetPrediction

Evolution and phylogeny Bgee,ImmunoDB,miROrtho,OMA,OrthoDB Arlequin,CT-CBN,Newickutilities,OMA,TriFLe

Structural biology SWISS-MODELRepository,SwissSideChain SwissDock,SWISS-MODELWorkspace,Swiss-PdbViewer

Systems biology Progenetix, SwissRegulon arrayMap,MetaNetX,TheSystemsBiologyResearch Tools

Bioinformatics infrastructure nfswatch, Soaplab services

SIB is instrumental togoodscience inSwitzerlandandworldwide.The instituteprovidesthenecessarybioinformaticsservicesandresourcesforlifescientists.

SIBdevelops,suppliesandmaintainsmorethan150high-qualitydatabasesandsoftwareplatformsforthegloballifescienceresearchcommunity.

MostofSIBresourcesareinopen-accessonExPASy,theSIBbioinformaticsresourceportal.TheSIBresourcescoverdifferentareasoflifesciences,suchasgenomics,proteomicsandevolution.www.expasy.org

SIB develops and maintains more than 150 internationally recognized databases and software,whichlifescientistsworldwideuseextensively.

Through eight bioinformatics core facilities and high-performance computing centres and more than 30 embedded bioinformaticians, SIB provides bioinformatics and statistical supportaswellasservices,expertiseandinfrastructuretolifescientists.

Thistableshows,foreachbioinformaticsdomain,examplesofSIBdatabasesandsoftwarethatareavailableonExPASy.

Vital-IT

sciCORE

UniProtKB/Swiss-Prot

SWISS-MODEL

BCF

neXtProt

Page 9: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

16 17

UniProtKB/Swiss-ProtProtein sequence database UniProtKB/Swiss-Prot is the most widely used protein information resource in the world, with over 900,000 requests permonth. It provides concise, but thorough, descriptions ofa non-redundant set of over 547,000 proteins including their function, domain structure,post-translationalmodificationsandvariants.Itshigh-qualityannotationisthefruitofexpertcurationbybiologists,whouseinformationavailableinthescientificliteraturetoprovideanaccuratedescriptionofeachprotein’sfeatures(seealsop.50).

neXtProtHuman protein knowledge platform neXtProtisaninnovativeknowledgeplatformdedicatedtohumanproteins.neXtProtincludescurated informationonvariousaspectsofhumanproteinbiologysuchas function,mRNA/proteinexpression,protein/protein interactions,post-translationalmodificationsandproteinvariations.Itsaimistohelplifescienceresearchersintheirquesttounravelthecomplexityofhumanlifeprocesses(seealsop.48).

SWISS-MODELStructure homology-modelling SWISS-MODELisanautomatedproteinstructurehomology-modellingserverforgenerating3Dmodelsofaprotein.Comparativeapproachesarecurrentlythemostaccurateandreliablecomputational methods to derive 3D models of proteins, for which experimental structures arenotavailable.SWISS-MODELautomatesthecomplexprocessofmodelbuildingonaneasy to use web-based system, thereby making model information also available for non-specialists(seealsop.65).

STRINGProtein-protein interactions STRINGisadatabaseofknownandpredictedprotein-proteininteractions, includingdirect(physical) and indirect (functional) associations. They are derived from different sourcessuch as the genomic context, high-throughput experiments, (conserved) co-expression,andtheliterature.STRINGquantitatively integratestheinteractiondataforalargenumberof organisms, and transfers information between these organisms where applicable. Thedatabasecurrentlycovers5,214,234proteinsfrom1,133organisms(seealsop.49).

SwissRegulonAnnotations of regulatory sites The SwissRegulon web portal provides information and tools for the analysis of genome-wide transcription regulatory networks in organisms ranging from E.coli to human. Thedatabase frontend offers an intuitive interface showing genomic information in a clear and comprehensiblegraphicalform(seealsop.44).

SwissDrugDesignDrug design TheSwissDrugDesignprojectisanambitiousinitiativeaimingtoprovidethefirstcomprehensive,integrated and freely accessible web based in silicodrugdesignenvironmenttothescientificcommunityworldwide. Itoffersa largecollectionof toolscoveringallaspectsofcomputer-aideddrugdesign,fromtargetpredictionofsmallmolecules(SwissTargetPrediction)totheprovision of topology and parameters of drug-like molecules (SwissParam). Other toolsincludeSwissDock,SwissSideChainandSwissBioisostere(seealsop.65).

EPDCollection of eukaryotic promoters Genesarefirst“transcribed”intomessengerRNAusedastemplatestosynthesizeproteins.Transcription begins on “promoters”. The Eukaryotic Promoter Database (EPD) providesquality-controlled informationonexperimentally definedpromoters of higher organismsaswellasweb-basedtoolsforpromoteranalysis(seealsop.42).

BgeeGene expression evolution Bgee is a database of gene expression evolution, which integrates all types of transcriptome andexpressioninformationforanimals–includinghuman,modelorganismssuchasmouseor Drosophila,anddiversespeciesofevolutionaryoragronomicalrelevance.Bgeeistheonlyresourcewhichprovideshomologousexpressionbetweenspecies(seealsop.60).

UniCarbKBGlycan database UniCarbKB is a knowledgebase that offers public access to a curated database of information on glycoproteins,whichareproteinstowhichcarbohydrates(orglycans)areattached.UniCarbKBprovides comprehensive information on glycan structures, and published glycoprotein informationincludingglobalandsite-specificattachmentinformation(seealsop.49).

SugarBindPathogen sugar-binding Host-pathogencommunicationisknowntobemediatedbycarbohydrate-proteininteractions,which ensure adhesion at cell surface. The SugarBind database covers knowledge ofinteractionsbetweenpathogensandcarbohydrateligandsofmammalianhosts.Informationin SugarBindDB is manually curated and supports the investigation of microbial or viral infections(seealsop.49).

OrthoDBHierarchical catalogue of orthologs OrthoDB isa catalogueof “equivalent” genesamongspecies, calledorthologs.Resolvinggene ancestry is the most accurate way to predict putative gene functions by association with genesstudiedinmodelorganisms.OrthoDBishencecriticalbothforevolutionarystudiesandforinterpretinggenecontentfromnewlysequencesgenomes(seealsop.46).

OMAOrthology predictionTheOrthologyMAtrix(OMA)Browserprovidesorthologypredictionsamongpubliclyavailablegenomes.Startedin2004,ithasundergone17releasesandnowelucidatesorthologyamong8.82milliongenesfrom1,706species,makingitoneofthelargestresourcesofitskind(seealsop.58).

Page 10: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

18 19

Througheightcorefacilitiesandhigh-performancecomputing(HPC)centres,aswell as embedded bioinformaticians, SIB groups provide expert data analysis services and computing power to life scientists in academia and industry, thus enablingthemtoperformworld-classbiomedicalresearch.

Theservicesprovidedincludeanalysisofhigh-throughputdata(genome/exomesequence,RNAsequencing,proteomics),scientificsupportof(bio)medicalprojects,developmentofalgorithms,biostatisticstraining,aswellasaccesstocomputationalspace,helpdeskandsupport.

The SIB Vital-IT group provides computational infrastructure, development support and bioinformaticsexpertisetothelifesciencecommunity.SIBalsoco-managestheCenterforScientificComputing(sciCORE)andtheBioinformaticsCoreFacility(BCF),andcollaborateswiththeServiceandSupportforScienceIT(S3IT)facility,theScientificInformationServices(SIS),theBioinformaticsandBiostatisticsCoreFacility(BBCF),theBioinformaticsUnravellingGroup(BUGFri)andtheInterfacultyBioinformaticsUnit(IBU)(seefigurebelow).

InadditiontoitsbioinformaticscorefacilitiesandHPCcentres,SIBprovidesassistancetothe Swiss universities and institutes by embedding bioinformaticians in laboratories. Thepresence of these embedded bioinformaticians in the research groups is a real advantage, as they can provide direct guidance on how to manage and analyse data for optimized use of thevariousbioinformaticstools.

More than

10 million jobs

run by users every year

More than

5 million CPU hours

consumed every year

Author of, or acknowledged in,

more than

90 publications in2014including

28in Science or Nature

over the last 5 years

760users

in2014versus671in2013

5 Petabytes

of storage capacity

+

5,000CPUsin2014

Vital-IT Ioannis Xenarios’ groupVital-IT is a bioinformatics competence centre that supports and collaborates with life scientistsinSwitzerlandandbeyond.Themultidisciplinaryteamprovidesexpertise,trainingandsupportinhigh-performancecomputing(HPC)andstorageresourcesandinfrastructure,soastohelpdevelop,maintainandextendlifescienceandmedicalresearch.Additionally,Vital-IT performs research within the group and in collaboration with groups worldwide and servesasaninterfacebetweenacademiaandindustry.Seealsop.75andwww.vital-it.ch.

Services• ProvidinganHPCenvironmenttosupporttheresearchworkofitspartnersinareasranging

fromsequenceanalysis,overmolecularmodelling,toimageprocessing• Developingnewcomputeralgorithmstomeettheusers’needsandremainup-to-datewith

the new generation of computers • Maintaininganddevelopingspecialistsoftwareengineeringtechniquesforparallelization,

optimization and validation of complex algorithms• Developmentactivitiestoturnconceptsderivedfromresearchintorobustsoftwaresolutions• Consulting and educational activities geared towards the computational needs of life

science companies • Actingasanagentfornewcollaborationswithindustry,withpotentialforspinningoffnew

companiesinthefieldoflifescienceinformatics

Vital-IT in figures

Education ResearchgroupsInfrastructure

Expertise

Database and algorithms

INTEGRATIONof“omics”profiles Computationalinfrastructureand algorithmdevelopments

Training andadvancedanalysis

techniques

Interfacefor

data mining

High-throughputEXPERIMENTS

andliteratureBIOCURATION

Genome, transcriptome, proteome, metabolome, phenotypes

Swiss-Prot

ExperimentalVALIDATIONGene function, biomarkers,

mechanism of action,diagnostics tests

Systems-levelMODELINGandSIMULATION

Qualitative/quantitativesimulations

CELLULARANDMULTISCALE network

reconstruction

Vital-IT

sciCOREUniversity of Basel BCF

University of Lausanne

S3ITUniversity of Zurich

SISETHZurichIBU

University of Bern

BBCFEPFL

BUGFriUniversity ofFribourg

SIB

Page 11: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

20 21

BCFBioinformatics Core FacilityMauro Delorenzi’s group, University of LausanneThe core competence and activities of BCF reside in the interface between biomedicalsciences,statisticsandcomputation,particularlyintheapplicationofhigh-throughput“omics”technologies, such as all sequence- and array-based methods, to problems of clinicalimportanceincludingdevelopmentofcancerbiomarkers.BCFoffersconsulting,teachingandtraining, data analysis support and research collaborations for both academic and industrial partners. In particular, BCF provides a consulting service on biostatistics matters, with afocusonhigh-throughputtechnologiesingenomicsorproteomics.ThisserviceisaimedatallpeopleactiveinlifesciencesinSwitzerland.Seealsop.53andhttp://bcf.isb-sib.ch.

Services• Statisticalconsulting• Educationandpracticaltraining• Dataanalysis• Personnelembedding• Scientificcollaboration

S3ITService and Support for Science ITPeter Kunszt’s group, University of ZurichS3ITisproactivelyparticipatingandpartneringincollaborativeresearchprojectswithaclearfocusonsupportandenablingITtechnologiesforscience.S3ITassistsinresolvingcurrentand future challenges around data collection, data validation and analysis, as well as data publicationandreproducibility.Seealsop.74andwww.s3it.uzh.ch/index.en.html.

Services• Partnershipwithresearchgroupsandprojectsincomputationalanddataintensiveresearch• Access to competitive infrastructure: HPC supercomputing resources, capacity cloud

computing resources as well as optimized server infrastructures for research with access to standardized tools and software

• Datamanagement,automationanddataanalysiscapabilities• Consultancyanddevelopmentofsoftwareandmethodsforusageinscientificprojects• Re-engineeringofexistingsoftwareandtools,forcontributiontocommunityresources• Trainingandeducationintheusageoftheinfrastructureandsoftware

SISScientific IT ServicesBernd Rinn’s group, ETH ZurichSIS is an interdisciplinary service group enabling and supporting computing- and data-intensiveresearch.Tothisend,SIScollaborateswithresearchersatETHZurichandbeyondonsolvingscientificcomputingproblems,andisinvolvedinnationalandinternationalscientificITefforts,suchasSystemsXSyBIT(http://sybit.net),FAIR-DOM(http://fair-dom.org),HPC-CH (http://www.hpc-ch.org)andSwissuniversitieseSCT.Seealsop.74andhttp://sis.id.ethz.ch.

Services• BigdataandHPCinfrastructure,andrelatedsupport• Flexibleandrobustdatamanagementandintegrationsolutionsforlifesciencedata• Openresearchdataportalanddatapublicationservices• Customdataanalysispipelines• Scientificsoftwareengineeringservices• Consultingandtrainingonresolutionofscientificcomputingandsoftwareproblems

sciCORECenter for Scientific Computing Torsten Schwede’s group, University of BaselsciCORE is a centre of competence in scientific computing – providing high-performancecomputing (HPC) infrastructure, large-scale storage resources, scientific software anddatabases, server infrastructures, user support, training and teaching, as well as know-how andexpertise toscientific researchgroups.sciCOREprovidesaprofessionalenvironmentfor a broad spectrum of scientific applications, ranging from bioinformatics, computationalchemistry, physics, systems biology, to medicine and economics. In direct collaborationwith scientific research groups, sciCORE helps develop, deploy, operate and extend thecomputational tools required for performingmodern life science and biomedical research.sciCOREoperatestheITinfrastructureforseveralofSIB’sservices,suchasSWISS-MODELandSwissRegulon,whicharewidelyusedbytheinternationalresearchcommunity.Seealsop.65andhttp://scicore.unibas.ch.

Services• Operationofaprofessionalstorageinfrastructureforscientificdata,whichisrequiredfor

managing the vast amounts of data produced in modern life sciences• Developmentandoperationofastate-of-the-artHPCinfrastructureenvironmenttosupport

large-scale data analysis, modelling and simulation applications• Support to researchgroupsandprojectsoncomplexdatamanagementquestions, from

requirement analysis to development, deployment andoperation of productive softwaresolutions

• Consulting for researchersonbioinformaticsandcomputationalbiologyaspectsof theirprojects

• Supporttousersincomputationalaspectsofprojectdefinitionforapplicationstonationaland international grant programs

• OrganizationoftrainingandteachingcoursesinscientificHPC,bioinformatics,biostatisticsand data visualization

• Operationofcomputationalscientificapplicationsasprofessionalandrobustservicesforthe life science community

HPCclusters

> 1,500TB diskstorage

Up-todate

software> 6,000

CPU cores

Training& support

Page 12: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

22 23

SIB is developing at rapid pace and its role on the bioinformatics scene is expanding.EfficientcoordinationofactivitiesacrossSIBgroupsandwithexternalstakeholdersismorethanevercrucial.

SIB has created four groups dedicated to the following activities:

• Clinical bioinformatics group which aims at developing bioinformatics for personalized health, to facilitate the analysis andtheuseof“bigdata”byclinicians,inordertosupportdiagnosisaswellaspreventiveand therapeutic approaches

• Legal and technology transfer office withthegoalofenablingthescientificcommunitytobenefitfromSIB’smanyinnovationsinthefieldofbioinformatics

• Technology group to optimize the coordination of all technology-related activities within SIB and across SIB

groups

• Training and outreach group torespondtotheever-growingneedforqualifiedbioinformaticians,aswellastoprovide

professional training in bioinformatics to the life science community

BBCFBioinformatics and Biostatistics Core FacilityJacques Rougemont’s group, EPFLBBCFsupportsresearchinlifesciencesbyprovidingdataanalysisforlarge-scaleandcomplex-design genomics experiments, by developing original software and methods, and by providing state-of-theartbioinformaticsapplicationsandgenomicdatathroughwebportals.Thefacilitycurrentlyfocusesoncomplexhigh-densityDNAarraysandhigh-throughputsequencing.Seealsop.75andhttp://bbcf.epfl.ch.

Services• Dataanalysisandstatisticalconsulting• Softwareanddatabasedevelopment• Designandimplementationofdatamanagementsolution• Bioinformaticstrainingforlifescienceresearchers

BUGFriBioinformatics Unravelling GroupLaurent Falquet’s group, University of FribourgBUGFri supports life science researchers by providing expertise in data analysis of next-generationsequencing(NGS)experiments,oranylarge-scalebiologicalexperimentrequiringbioinformaticsresources.Seealsop.43andwww.unifr.ch/bugfri.

Services• Projectplanningandgrantwriting• Softwaretestinganddevelopment• Datamanagementandanalysis• Trainingandteaching

IBUInterfaculty Bioinformatics Unit Rémy Bruggmann’s group, University of BernIBUprovidesservicesandexpertisetoassistresearchersindataanalysisandprojectplanninginthecontextofNGS.Seealsop.41andwww.bioinformatics.unibe.ch.

Services• Discussionswith researcherson theirprojectsprior tostartingwith IBU’sbioinformatics

expertsand–ifrequired–expertsfromthewetlab(e.g.NGSlab)• Provision of data analyses including short read mapping, de novo assembly, mutant

screening, transcriptomics, epigenomics, whole exome and genome analysis• Collaborationinlargeandcomplexprojects

Page 13: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

24 25

The medical world is about to undergo a revolution called personalizedhealth.Itallstartedwiththedevelopmentoftechnologiesthatproducegiganticquantitiesof data on almost all parts of a human being and tools to process them so as tosketchaverypersonalandspecificportraitofanindividual’sstateofhealth.These developments are expected to greatly assist clinicians in the tailoring ofpreventivemeasuresandtreatmentsfortheirpatients.Bringing the fruits of biotechnology into medical practice is one of the roles of clinical bioinformatics.

Clinical bioinformatics, SIB’s choiceAspersonalizedhealth isgainingground,averyparticularfieldofbioinformatics,knownasclinicalbioinformatics,isemerging–itisbioinformaticsdedicated to the storage, organization and analysis of data pertaining to an individual’s state of health, which can be used and understood by clinicians.

SIBsawearlyontheroleitcouldplaywithintherealmofpersonalizedhealth,andthesubsequentsupport itcouldoffer toclinicians.Consequently,back in2012 the Institutecreatedagroupdedicatedtoclinicalbioinformatics,undertheleadershipofJacquesBeckmann.Theinitialaimwastoidentifyareaswheretheanalysesof“bigdata”aremostlikelytoentertheclinicalarena,andthenprioritizethesedomainsandmotivatestakeholderstocontributetothem.

SIBhastheknow-howtodevelopstandardizedtools fordiagnosticandclinicalapplications.Thereisalready,forinstance,agrowingdemandbycliniciansfortheinterpretationofapatient’sgenomeinthediagnosisofinheriteddisordersorthesequencingoftumourcellssoastooffercustomizedtreatment.Itisalsoparamounttoadoptcommon,cost-effectiveandinter-operableproceduresthatarebothflexibleandcompatiblewithproceduresusedthroughoutthevariousSwissdiagnosticsandmedicalcentres.

TherigourandqualityofSIB’sservices,softwareanddatabasesoffermajoradvantagessincetheycanbegraftedontoexistingproceduresandprovideenhancedpowerof interpretation.What is more, the Institute ensures the continuous maintenance of state-of-the-art performances andoptionsfornewdevelopments.

Sucharevolutionalsoentailsethical,securityandeducationalissues.Clinicianswillhavetolearnhowtodealwithdatabases,howtointerpretthedataandwheretheboundariesofprivacylie.

Clinical bioinformatics, crossing boundariesHospitals, diagnostic centres and medical practitioners need to cooperate – both on theregionalandnationalscale–tobuildapoolofdatathatcanbeshared,sothatcomparisonsofallsortscanbemadeacrossthepopulation.

At the regional level, a section of SIB moved intoGenevaCampusBiotech–anewcentreofexcellenceinlifescienceresearch–whichbringstogetherothergroupsfromacademicinstitutions involved in personalized health, such as the Universities of Geneva, Lausanne andtheEPFL.On the national level, SIB took an active part in elaborating a proposal for a federal personalized health programme.

Ontheinternationalfront,in2014SIBbecameoneofthemembersoftheGlobalAllianceforGenomicsandHealth,itselffoundedin2013by50scientistsfromeightdifferentcountries.Today, the Global Alliance is represented by over 220 leading institutions involved in healthcare, research, disease advocacy, life sciences and information technology with a single aim: to create a global framework of approaches that will enable the responsible, voluntary and securesharingofgenomicandclinicaldata.

JacquesBeckmannPersonalized health: a definition The rapid development of technologies in the past decades and the huge volumes of data they produce on an individual level will give rise to a very different type of medical practice, also usually referred to as precision medicine,personalizedmedicineorpersonalizedhealth.

Medicalpractitionerscanalreadyhaveaccesstodata,whichareuniquetotheirpatients–suchastheirgenomesequenceforinstance–nottomentiondatasuppliedbyvarioussmartphoneapplicationsbroughtalongbythepatientsthemselves.Coupledwiththeindividual’slifestyleandeatinghabits,theinformationmadeavailabletoaclinicianonapatient’sstateofhealthwillbenotonlymoreandmorepersonalbutalsounique.

As a result, in the not so distant future, such knowledgewill assist clinicians in a patient’s diagnosis and indeciding on tailored treatment, besides tracking down the propensity for a disease before its actual onset, and delaying,ifnotpreventing,diseases.

Several countries, such as the UK, Korea, China and the USA, have already launched extensive programs to speeduptheprocess.

Bioinformatics and the medical field – the future of medicineBioinformatics is,orwillbe,at theveryheartof thismedicalrevolution.Forseveralyearsnow,SIBhasbeeninvolvedinresearchwhoseapplicationsaredirectlylinkedtothemedicalfield.

SIB’shigh-performancecomputingcentreVital-IT,forexample,hasdevelopedthealgorithmforanon-invasiveprenataltestwhichcandetectthemostfrequenttrisomiesandchromosomalrearrangementsfromapregnantwoman’sbloodsample,andithasbeencommercialized.AnotherSIBgroupdevelopedamodelwhichpredictsthe evolution of an aneurysm, and is intended to support clinicians in deciding upon the best treatment to offer apatient.Yetanothergroupofferedtheirsupportduringlastyear’soutbreakoftheEbolavirusinWestAfricabyestimatingtheinfection’sdynamics–informationofcrucialimportanceforcontrollinganepidemic.

ThesearejustthreeofthemanyprojectsdevelopedbySIBinthefieldofhumanhealth–nottomentionthosethatareinvolvedinthediagnosisandtreatmentsofdifferentkindsofcancer.

SIB has the know-how to develop standardized tools for diagnostic and clinical applications

Page 14: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

26 27

Withexpertisethatcoversabroadspectrumofapplicationfields,SIBoccupiesapivotalhubpositioninbioinformaticsinnovationinSwitzerland.TheSIBLegaland technology transferoffice (LTTO)’smission is toshowcase,manageandtransfer SIB knowledge and know-how so that the scientific community canbenefitfromSIB’smanyinnovations.

TheSIBTechnologygroup,headedbytheChiefTechnologyOfficer(CTO)HeinzStockinger, is in charge of optimizing the coordination of all SIB technology-related activities.The groupworks in close cooperationwith technology andinfrastructure providers and competence centres such as Vital-IT/Swiss-Prot andsciCOREtocombineforceswherevernecessaryandpractical.

TheLTTO,headedbyMarcFilliettaz,strivestoenhancethescientificandindustrialvisibilityofSIBinnovationbyassistingSIBmembersintheircontactswithexternalpartners.TheLTTOworkscloselywithSIB’sGroupLeadersinordertoofferinnovativeproductsandservices.

Partnerships with industryCompanies involved in medicine and life sciences can collaborate with SIB to complement their internalcapacity.Outsourcingcertainactivitiesandrelyingonanon-industrial,world-renowned network of scientists and bioinformaticians provides companies with additional flexibilityandenhancestheirresearch.

SIB collaborates with industrial partners in the following domains: • Scientific support and data analysis: SIB has developed in-house computational tools and

acquired in-depthexpertise inabroad rangeoffieldsand techniques, includingcancersubtype discovery, biomarker selection, class discrimination, cross-platform analysis, and meta-analysisofpubliclyavailableclinicalandgenomicsmultipledatasets.

• Education and practical training: SIB provides scientists with educational support and practicaltrainingintheuseofsoftwareandanalysismethods.Thisincludestheorganizationofseminars,workshopsandtrainingcourses.

• Text mining/web monitoring:SIBdevelopstextminingtools,e.g.enablingthecreationofa patient cohort based on patient records in text format or the monitoring of social media platformsinthecontextofdrugsafetysurveillance.

The SIB Technology group’s core competencies are: • Design, development, testing and operation of scientific, technical and administrative

software in bioinformatics, in cooperation with various SIB groups, with a strong focus on webandinternettechnologies–mainlywiththeSIBWebTeam.

• Technical coordination of topics that require an SIB-wide approach: web applicationdeployment,securityandrelatedguidelines,coderepositories,etc.

• SupportandoperationofSIB-wideservicesdevelopedand/ordeployedbythegroup,suchas: - ExPASySIBBioinformaticsResourcePortalandsomescientificresourcesavailable

on the portal - Requesttrackingoperationsforusersupport - Web applications for SIB course registration and administration - Web applications for SIB personnel management/administration

• Coordination of SIB technical activities within the ELIXIR project and operation (moreinformationaboutELIXIRonp.30).

MarcFilliettaz HeinzStockinger

Page 15: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

28 29

It is essential to train the next generation of bioinformaticians and to ensure that life scientistsmake thebestofbioinformaticsandSIB resources.Underthe leadership of Patricia Palagi, the Training and outreach team is in charge of promoting bioinformatics to the layman, as well as coordinating education and traininginthefield,bothinSwitzerlandandinternationally.

Training in 2014 SIB’s training strategy in 2014 followed the same trend as in past years, focusing onprofessionaltraining,thePhDtrainingnetworkandinternationalcollaborations.

Professional trainingTheSIBprofessionaltrainingportfolio,whichiscontinuouslyevolvingtomeetthescientificcommunity’sneeds,proposescoursesonseveraltopics,suchasbioinformaticstechniques,computationalbiologymethods,statisticsandnext-generationsequencinganalysis.Torespondtoanincreasingdemandforbasictraining,anew“FirstStepswith”courseserieswasstartedin2014,withaone-daycourseontheUNIXcomputeroperatingsystem.SimilarcoursesonR,thestatisticalpackage,andPythonareindevelopment.YoucanfindmoreinformationonSIBcoursesonitstrainingportal:www.isb-sib.ch/training.

PhD training networkThePhDtrainingnetworkspecificallytargetsSwissbioinformaticsandcomputationalbiologyPhDstudents.Itfostersinteractionsamongstudentsandtrainsthemonselectedhottopicsinbioinformatics.Specialeventsof2014includedtheannualretreatandthesummerschoolon“SystemsMedicineanditsapplications”jointlyorganizedwiththeSwissInitiativeinSystemsBiology(SystemsX).FormoreinformationonthePhDtrainingnetwork,visit:www.isb-sib.ch/training/sib-phd-training-network.html.

International collaborationsSIB organized the Workshop in Education for Bioinformatics during the IntelligentSystemsforMolecularBiology(ISMB)2014conference inBoston,in collaboration with the Global Organization for Bioinformatics Learning,Education&Training(GOBLET),ofwhichSIBisamember.SIBhasdevelopedstrong connections with the European bioinformatics training community as well,especiallywithinELIXIR,andvariousprojectsareunderway.

SIB Fellowship programmeThroughitsFellowshipprogrammelaunchedin2012,SIBaimstocreateapoolofexcellentyoung bioinformaticians.Thanks to the generous support of committed partners, the beststudents chosen worldwide have the opportunity to carry out their PhD research in one of the Institute’sgroups.In the fall of 2014, a new call was launched. The selection committee, made up ofinternationally renowned professors and SIB Group Leaders, will choose the laureates in thefirstquarterof2015.

Outreach in 2014 Another SIB mission is to bring bioinformatics to the layman, contributing to a better understandingofthisrelativelynewscience.

At the national level:SIB took part in various community events, such as: • Popular science fairs, including “LaNuitde laScience” inGeneva,“LesMystèresde

l’UNIL”inLausanne,andthefirst“SwissHealthFair”atEPFL,topromotepublicawarenessand understanding of bioinformatics;

• Workshops for high-school students in the French-speaking part of Switzerland,somewithintheframeworkofaspecialcollaborationwiththeBioscopeandthe“RamèneTaScience” initiative (bothofwhichareUniversityofGenevaprojects), andduring theTechDays in Lausanne, Köniz and Locarno;

• Public scientific exhibitions,suchastheLAB/LIFEexhibitionoftheMuséedelaMain(MuseumoftheHand)inLausanne;

• “Music for a Gene”: Inaproject directedbyLydieLane (co-director of theSIBgroupCALIPHO),theFrenchcomposerOlivierCalmelwasaskedtowriteastringquartetthatcouldtransposethecomplexityofthehumangenomeintomusicalemotions.TheconcertOpus23–MusicforaGenewasperformedbytheRamsesQuartetinGenevaandattractedsignificantinterestfromtheSwissmedia.

At the international level:• “Chromosome Walk” exhibition:In2014,thisexhibitionwentbeyondtheSwissborders.

Itwasshowninthe“VillagedesSciencesdeGENOPOLYS”inMontpellier,France. The virtual version is available on www.chromosomewalk.ch.• Workshops for high-school teachers: SIB developed two bioinformatics workshops for

high-schoolteachersorganizedbyGOBLET,oneinTorontoandtheotherinBoston.

Patricia Palagi

SIB professional

training in2014infigures:

More than

650participants trained

More than

25 invited speakers

23 short workshops

organized

More than

50 SIB members

Involved experts and teachers

11 long block-courses

coordinated

Page 16: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

3130

The demand for bioinformatics support is exploding, and the need to coordinate alleffortsand improvetechnologydevelopment in thisfield is,more thanever,necessary.Europe’s awareness of the need for an international bioinformaticsinfrastructure is embodied by ELIXIR, the European Life Science Infrastructure for BiologicalinformationthatwasofficiallylaunchedinBrusselsinDecember2013.

VisionTheamountofdataproducedbylifescienceexperimentsisrapidlyexpanding.Someofthesedatasets are highly specialized and would previously only have been available to researchers within the country in which they were generated. Open access to the biological datasetswill facilitate discoveries and support life science research and its translation to medicine, agriculture,bioindustryandsociety.

MissionThe goal of ELIXIR is to build a responsive and sustainable European infrastructure to orchestratethecollection,qualitycontrolandarchivingof largebiologicaldatasets.ELIXIRintegrates research data from different parts of Europe and ensures a service provision that iseasilyaccessibletoall.

SIB represents Swiss bioinformatics at ELIXIRTo date, 11 countries, including Switzerland, and the European Molecular Biology Laboratory-EuropeanBioinformaticsInstitute(EMBL-EBI)aremembersoftheELIXIRconsortium.SwitzerlandwasoneofthefirstcountriestosigntheELIXIRConsortiumAgreementandhasbeenonboardsincetheveryfirstproposalsconcerningthisimpressiveundertaking.SIBactsastheSwissnodeofELIXIRandisthebiggestnationalnodewithintheorganization.Asaprovider of various renowned resources to the international life science community, SIB plays animportantroleintheELIXIRproject.

FormoreinformationaboutELIXIR,pleasevisitwww.elixir-europe.org.

Highlights ELIXIR 2014• ELIXIRwasidentifiedbytheEuropeanCouncilasoneofEurope’sthreepriorityresearchinfrastructuresthat

arepushingtheboundariesofscientificexcellence.Asaresult,ELIXIRwasinvitedtorespondtotheHorizon2020callINFRADEV-3,whichwillfunddevelopmentofnewworld-classresearchinfrastructures.TheprojectthathasbeensubmittediscalledELIXIR-EXCELERATEandSIBiscloselyinvolved.

• Thescientificprogrammefor2014-2018,preparedincollaborationwithmorethan100scientistsfromallofELIXIR’snationalnodes,waspublished.

• TorstenSchwede,leaderoftheSIBComputationalStructuralBiologyGroupandmemberoftheSIBBoardofDirectors,wasappointedChairoftheELIXIRBoard,startingfrom1January2015.

ELIXIR provides the facilities necessary for life science researchers to make the most of the rapidly growing store of information about living systems, which is the foundationonwhichtheunderstandingoflifeisbuilt.

Page 17: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

32 33

SIB and its director honoured with the BioAlps Award 2014Ron Appel received the BioAlps Award 2014, recognizing his major contribution to the creation and successfuldevelopmentofSIB.ThepurposeoftheBioAlpsAwardis to honour a person without whom western Switzerland wouldnotenjoyitsextraordinaryinternationalreputationinthefieldoflifesciences.

Benoît Dubuis, President of BioAlps, presented the award toRonAppelandSIBatthetraditionalBioAlpsNetworkingDay,statingthat“asco-founderandExecutiveDirectorofSIBsince2007,RonAppelhasbeenakeyfigure inenabling Swiss bioinformatics to rank among the world leadersinthisdiscipline.Heisleadingoneoftherarelifescience organizations, which is able to bring together so manyentitieswithsomuchelegance”.

Cloëtta Prize 2014 awarded to HenrikKaessmannTheProfessorMaxCloëttaFoundation,basedinZurich,promotes medical research and related disciplines in the natural sciences in Switzerland. In 2014, tworesearchers were awarded the annual Cloëtta Prize, one ofwhomwasHenrikKaessmann,leaderoftheSIBgroup“FunctionalEvolutionaryGenomics”.HenrikKaessmannwas recognized in the fundamental research area for his discoveriesinthefieldofmoleculargenetics,inparticularthenatureofdiversificationinhighermammals.

Winners of SIB Awards 2014Joshua Payne received the SIB Young Bioinformatician Award for hispromisingworkon “The robustnessandevolvability of transcription factor binding sites”, whichhe conducted at the University of Zurich, in the SIB group EvolutionarySystemsBiologyledbyAndreasWagner.

The SIB Best Graduate Paper Award went to Josephine Daub. She was honoured for her work on “Evidencefor polygenic adaptation to pathogens in the human genome”,carriedoutattheUniversityofBern,intheSIBgroup Computational Population Genetics led by Laurent Excoffier.

Darwin’stheoryofnaturalselectionexplainshowusefuladaptations are preserved over time. But the biggestmystery about evolution eluded him. In his book, SIBGroup Leader Andreas Wagner draws on over 15 years of researchtopresentthemissingpieceinDarwin’stheory.Using experimental and computational technologies that were heretofore unimagined, he found that adaptations arenot justdrivenbychance,butbyasetof lawsthatallow nature to discover new molecules and mechanisms inafractionofthetimethatrandomvariationwouldtake.

The book was selected among the best popular science booksof2014byTheGuardian.More information on http://arrival-of-the-fittest.com

Christine Durinx joins SIB as Associate DirectorTo support Ron Appel in his role as Executive Director, SIB reinforced its leadership by appointing Christine Durinx asAssociate Director. Christine Durinx holds aPhD inpharmacyandhas tenyears’experience in thepharmaceutical industry. Before joining SIB, she wasDirector of International Medical Communication for a majorfirm.

Three SIB Group Leaders electedmembers of EMBOThe world-renowned European Molecular Biology Organization (EMBO) announced that 106 outstandingresearchers in life sciences, including three SIB Group Leaders – Emmanouil Dermitzakis, HenrikKaessmannandAndreasWagner–werenewlyelectedto its Membership. EMBO Members make invaluablecontributions to the organization by providing suggestions andfeedbackontheactivitiesofEMBO.TheyserveonselectioncommitteesforEMBOprogrammesandmentoryoung scientists. The EMBO Membership currentlycomprisesmorethan1600lifescientists.

Andreas Wagner’s book selectedamong the best popular science books of 2014 by The Guardian

I am proud to receive this award which honoursSIB’sexpertise and the contribution of Swiss bioinformaticians to the progress oflifesciences.Ron Appel, winner ofBioAlpsAward2014

Page 18: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

34 35

In a context of challenging times for research funding worldwide, SIB funds remainedstablein2014,thankstothecontinuedsupportofitsfunders.

SIBcontinuestogrowintermsofthenumberofstaffmembers.

In 2014, the total income of SIB groups reached CHF 67.9 million, of which 25.1 were managed by SIB.

ThefirstsourceofSIBfunds is theSwissgovernment(11.0CHFmillion,43.8%), followedbytheSwissNationalScienceFoundation/Europeanfundsandtheindustry(CHF3.4million,13.5%each).

Asof31December2014,SIBstaffwascomposedof670 SIB members, of whom 202 had acontractwithSIB,whichrepresented178full-timeequivalents(FTEs).

SIB employees FTEs Total SIB members

2013

2013

2014

2014

12

76

11

4

53

3

1

4

810

10

9

911

8715

88

7

1

75

62

25

81

64

54

14

68

38

64

30

38

78

63

55

16

75

52

45

10

8

5

70

33

67

27

2819

4

4

1 1198

670

178202

111086 8 119 84 7

2012

2012

2011

2011

2010

2010

2009

2009

2008

2008

2007

2007

2006

2006

2001

2001

2005

2005

2000

2000

2004

2004

1999

1999

2003

2003

1998

1998

Total income

CHF million

Number of FTEs

Total charges

0

60

140

20

0

5

10

15

20

25

80

160

600

200

400

0

40

120

100

180

700

300

500

100

2002

2002

Administration

Basel

Lausanne

Geneva

Zurich

Funds managed by SIB Funds managed by the partner institutions

Total number of grants/ contracts

inCHFmillions % inCHFmillions %

Swissgovernment(article16) 11.0 43.8%

SNSFandEuropeanfunds 3.4 13.5% 9.7 22.7% 117

Industry 3.4 13.5% 1.1 2.6% 23

NIH 2.2 8.8% 0.1 0.2% 3

Universities and hospitals 2.1 8.4% 26.2 61.2% 59

SystemsX.ch 1.1 4.4% 4.8 11.2% 53

Other 1.9 7.6% 0.9 2.1% 17

Expenditures

Wages and expenses 86.1%

Facilitiesandfunctioning 13.9%

25.1 42.8

32

11.0

3.43.4

2.2

2.1

1.1 1.9

Fundsmanagedby SIB

(CHFmillion)

Page 19: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

3736

Asoftoday,SIBcounts56researchandservicegroupsandover650scientistsfrom Swiss universities and research institutes located in the cantons of Basel, Bern,Geneva,Fribourg,Ticino,VaudandZurich.

One of SIB’smissions is to lead and coordinate the field of bioinformatics in Switzerlandby fostering collaborations.Over time, an intense network of collaborative links has beenestablishedbetweenthe56SIBgroups(see collaboration network on the facing page).

SIB also collaborates at the international level with renowned institutions:• inEurope:e.g.theEuropeanMolecularBiologyLaboratory-EuropeanBioinformaticsInstitute

(EMBL-EBI,UK),theBioinformaticsServicestoSwedishLifeScience(BILS),theSpanishNationalBioinformaticsInstitute(INB),theNetherlandsBioinformaticsCentre(NBIC)

• intheUS:e.g.theNationalInstitutesofHealth(NIH),theNationalCenterforBiotechnologyInformation(NCBI),theProteinInformationResource(PIR)

• elsewhere:e.g.SOKAUniversity(Japan),MacquarieUniversity(Australia),theUniversityofCapeTown(SouthAfrica),theWeizmannInstituteofScience(Israel)

The SIB collaboration network was generated from a programme developed by Michael Baudis, SIB Group Leader: http://progenetix.org/collabplots/

Agroscope, Wädeswil

D-BSSE,ETHZurich,Basel

FriedrichMiescherInstitute,Basel

Geneva School of Business Administration

InstituteofOncologyResearch,Bellinzona

SwissFederalInstituteofTechnologyLausanne

SwissFederalInstituteofTechnologyZurich

SwissTropicalandPublicHealthInstitute,Basel

University of Basel

University of Bern

UniversityofFribourg

University of Geneva

University of Lausanne

University of Zurich

Università della Svizzera italiana, Lugano

Zurich University of Applied Sciences, Wädenswil

New SIB groupsFromfivegroupsatitsinception,SIBhasbeengrowingexponentiallyeversince.Since1January2014,11newgroupsjoinedtheInstitute:

• MariaAnisimova,ZurichUniversityofAppliedSciences,Wädenswil• ChristianAhrens,Agroscope,Wädenswil• ManfredClaassen,ETHZurich• DavidGfeller,UniversityofLausanne• RudiyantoGunawan,ETHZurich• PeterKunszt,UniversityofZurich• IvoKwee,InstituteofOncologyResearch,Bellinzona• MichelMilinkovitch,UniversityofGeneva• ChristophSchmid,SwissTropicalandPublicHealthInstitute,Basel• TanjaStadler,D-BSSE,ETHZurich,Basel• AndrzejStasiak,UniversityofLausanne

Page 20: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

3938

Genes and genomes pp.40-46

Proteins and proteomespp.47-50

Medicine and health

pp.51-55

Evolution and phylogenypp.56-62

Structural biologypp.63-65

Systems biologypp.66-72

Bioinformatics infrastructurepp.73-75

Sven BergmannRémyBruggmannPhilipp BucherBart DeplanckeEmmanouil Dermitzakis

LaurentFalquet

Zoltán Kutalik

ErikvanNimwegen

MarkD.Robinson

Michael StadlerAndrzejStasiakEvgeny ZdobnovChristian AhrensAmos Bairoch & Lydie LaneFrédériqueLisacekChristian von MeringIoannis Xenarios & Lydie BougueleretMichael BaudisNikoBeerenwinkelMauro DelorenziJacquesFellay

David GfellerIvo KweePatrick RuchChristoph SchmidMaria AnisimovaLaurentExcoffierGaston GonnetJérômeGoudetJeffreyD.JensenHenrikKaessmannBernard MoretMarc Robinson-RechaviNicolasSalaminTanjaStadlerAndreas Wagner

Daniel WegmannSimonBernècheMatteo Dal PeraroOlivierMichielin& Vincent ZoeteTorsten SchwedeBastien ChopardManfred ClaassenRudiyanto GunawanVassilyHatzimanikatisDagmar IberChristian MazzaMichel MilinkovitchFélixNaefIgorV.PivkinJörg StellingMihaela ZavolanPeter KunsztBernd RinnJacquesRougemontIoannis Xenarios

Bioinformatics is the application of computer technology to the understanding and effective useofbiologicaldata.Itisthusaninterdisciplinaryfield,targetingdifferentareasofmedicineandlifesciences.ThevastmajorityofSIBgroupsarethereforeinvolvedinnumerousdomains.

The table on the facing page presentsSIB’smain fields of activity.TheSIB groups areclassifiedaccordingtotheirmainresearcharea(indicatedbyafullblacksquare)andtheninalphabeticalorder.Theotherdomains,inwhichthegroupsareworking,areindicatedbyemptysquares.

Page 21: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

4140

Genome is the word used by life scientists to describe the sum of genetic material, includinggenes,inheritedbyalivingbeing.Agenomeislikeanopenbookontheprocessesoflife,ifyouknowhowtoreadit.

Bioinformatics develops tools not only to read the genetic information, but also tostore theresultingdata,analyseand interpret them.Aberrations ingeneticmaterialcanbeattheheartofdiseasessuchascancerorDownsyndrome.

What do they do? Sven Bergmann and his group develop and apply algorithms for the analysis oflargesetsofbiomedicaldata.Thefocusisontheintegrationofinformationon DNA variability (“genotypes”) and on molecular and organismal traits(“phenotypes”).

2014 highlightsDuring 2014, the groupmade very good progress in developing LAS-VEGAS, a powerful tool for computing gene and pathway scores from single nucleotide polymorphisms (SNP)-phenotype association summarystatistics. For gene score calculation, they use analytical and numericalsolutions to derive various aggregate test-statistics such as the sum and the maximumofchi-squarestatisticsathighspeedandprecision.Forpathwayscoring,theirmethodusesamodifiedFishermethod,whichoffersnotonlysignificantpower improvementovermoretraditionalenrichmentstrategiesbut also eliminates the problem of arbitrary threshold selection inherent in any binarymembership-based pathway enrichment approach. The groupdemonstrates the increase in power by analysing summary statistics from a largenumberofdifferentmeta-analyses.

Sven Bergmann and his team also developed MetaboMatching, which is a simple method that uses nuclear magnetic resonance data from a cohort to linkgeneticvariantstochemicalcompounds–if theirconcentrationhasageneticcomponent.

Selected publications: Rueedi R, et al. Genome-wide association study of metabolic traits reveals novelgene-metabolite-diseaselinks.PLoSGenet2014;10:e1004132.

Hohm T, et al. Plasma membrane H+-ATPase regulation is required forauxin gradient formation preceding phototropic growth. Mol Syst Biol2014;10:751.

Hersch M, et al. Light intensity modulates the regulatory network of the shadeavoidanceresponseinArabidopsis.ProcNatlAcadSciUSA2014;111:6515-20.

What do they do? IBU provides bioinformatics services, expertise and computational resourcestoassistresearchersin“bigdata”analysesandprojectplanningforlarge-scaleexperiments(seealsop.22).RémyBruggmann’sgroupalsohas its own research programme and develops methods to analyse high-throughputdata.

2014 highlightsResource: SetRankVital-IT and Rémy Bruggmann’s team implemented a web interface toprovide access to the gene set enrichment analysis tool SetRank. TheSetRank algorithm was developed at IBU and is useful for prioritizing the resultsofexpressionstudies. Itskeyprinciple is todiscardgenesetsthatwere initially considered significant, where their significance was solelybasedonpartsofgenesalsoinvolvedinotherprocesses.

The web implementation allows the visualization of pathways in a network, aswellasbrowsingandfurtherprioritizingtheresults.SetRankisinprinciplenot limited to model organisms but can be applied to any organism for which pathwayannotationdataexists.

Research: novel astrovirusOneof the research highlights in 2014was the discovery of a novel virusprobablyassociatedwithencephalitisincattle.Viralencephalitisisdiagnosedinapproximately10-15%ofcattleinEuropewithneurologicalclinicalsigns.A high proportion of viral encephalitis in cattle is of unknown etiology.Attempts to culture and to identify the infectious agents associated with these conditions have yielded largely inconclusive results. To overcome theselimitations, Rémy Bruggmann’s group together with Torsten Seuberlich’steam successfully performed a viral metagenome study using next-generation sequencingandidentifiedanovelastrovirus[BoAstV]-CH13inthebrainofacowwithencephalitis.Inaretrospectivestudy,Seuberlich’sgroupfoundthisvirusinfiveoutof22animalswithencephalitisofunknownetiology.

Selected publicationsBouzalas IG, et al. Neurotropic astrovirus in European cattle with non-suppurativeencephalitis.JClinMicrobiol2014;52:3318-24.

HiltyM,et al. Global phylogenomic analysis of nonencapsulated Streptococcus pneumoniae reveals a deep-branching classic lineage that is distinct from multiplesporadiclineages.GenomeBiolEvol2014;6:3281-94.

Greminger MP, et al.GenerationofSNPdatasets fororangutanpopulationgenomics using improved reduced-representation sequencing and directcomparisonsofSNPcallingalgorithms.BMCGenomics2014;15:16.

Sven BergmannComputational Biology GroupUniversity of Lausanne

Rémy BruggmannInterfacultyBioinformaticsUnit–IBUUniversity of Bern

Page 22: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

42 43

What do they do? Philipp Bucher and his team are set on cracking the regulatory code of the humangenomebyanalysingDNAsequencesandexperimentaldatawhichrevealthefunctionofthesesequences.Thecurrentfocusisondevelopingmethodsforexploitingtheso-callednext-generationsequencing(NGS)datatothisend.

2014 highlightsThegroupdevelopsEPD,anannotateddatabaseofexperimentallydefinedeukaryotic promoters (seealsop. 17).Thanks to largeamountsof newlyavailable transcript mapping data, the team was able to more than double the promoter coverage for mouse – reaching 21,239 promoters from aprevious9,773.Withthisnewincrease,EPDhasreachedover90%genecoverage for the twomost important organisms: human andmouse.Theteam also introduced a comprehensive promoter collection for a new model organism, the worm C. elegans.

EPDisgeneratedfromdatastoredinMassGenomeAnnotation(MGA),apubliclyaccessibledirectory thatcontainsvariouskindsofNGSdatasets.ReflectingtherapidaccumulationofNGSinpublicrepositories,thecontentsoftheMGArepositoryalmostdoubledduring2014toexceed8,000samples.Among the newly added samples are ChIP-seq data from the RoadmapEpigenomicsProject.

The web servers for programmatic access to eukaryotic promoters have been greatly improved and made more user-friendly. Relevant programsfrom theChIP-Seqanalysis andSignalSearchAnalysis servers cannowbeaccessedviadirect links from theEPDhomepage.Anew tool namedPWMscan has been added to the EPD promoter analysis suite, allowing forrapidwholegenomescansforpromoterelementsdefinedbyapositionweightmatrixorconsensussequence.

Philipp Bucher and his group also develop methods for the analysis of ChIP-Seqandrelateddata,aswellasmethods forautomaticclassificationanddiagnosisofmolecularlycharacterizedclinicalsamples. Selected publications NairNU,et al. Study of cell differentiation by phylogenetic analysis using histonemodificationdata.BMCBioinformatics2014;15:269.

Polychronopoulos D, et al. Classification of selectively constrained DNAelements using feature vectors and rule-based classifiers. Genomics2014;104:79-86.

NairNU,et al.ProbabilisticpartitioningmethodstofindsignificantpatternsinChIP-Seqdata.Bioinformatics2014;30:2406-13.

What do they do? BartDeplancke’slaboratoryuseshigh-throughputsequencing,microfluidics,large-scale yeast screens and computational approaches to characterize the regulatory code in Drosophila melanogaster and mammals, as well as to examine how variations in this code affect molecular and organismal diversity.

2014 highlightsBart Deplancke and his team participated in a consortium-driven effort to comprehensively characterize naturally occurring genetic variation in D. melanogaster Genetic Reference Panel (DGRP), consisting of 205sequencedinbredlines.Thisresourcewillbeofgreatvalueforanalysingthegeneticandmolecularbasisofquantitativetraitswithrelevancetohumanbiology. In this consortium, thegroupemployedan integratedgenotypingstrategytoidentifyalmost5millionsinglenucleotidepolymorphisms(SNPs)and1.3millionnon-SNPsathighresolution.Thislevelofnaturallyoccurringgenetic variation is about 10-fold larger than that found in humans, thus constitutingapowerfulresourcetomolecularlydissectquantitativetraitlocidowntothenucleotidelevel.

Interestingly, the group found that their own local assembly-based variant detection tool PrInSeS performed particularly well when compared to other variantmapping approaches – suchasGATKand the ensemble of toolsusedforthe1000(human)genomesproject–inthatitcontributedboththelargestnumberofuniquelycalledvariantsaswellasvalidatedvariants.Thisprobably reflects the fact thatPrInSeSwasspecificallydeveloped for theanalysis of Drosophilagenomeswhich,asindicated,arehighlypolymorphic.Thisinturnmaylowertheefficacyoftoolsthatweredevelopedforhumangenome analyses. The group is now testing the efficacy of PrInSeS indetectingvariantsinbothhumanexomeandwholegenomedata.

Selected publications Gubelmann C, et al. IdentificationofZEB1asacentralcomponentof theadipogenicgeneregulatorynetwork.eLife2014;3:e03346.

Huang W, et al. Natural variation in genome architecture among 205Drosophila melanogaster genetic reference panel lines. Genome Res2014;24:1193-208.

Waszak SM, et al. Identification and removal of low-complexity sites inallele-specificanalysisofChIP-seqdata.Bioinformatics2014;30:165-71.

What do they do? The team led by Emmanouil Dermitzakis is interested in the genomics of complex traits, and is using various methodologies to understand the role of genetic variation in phenotypic variation. The group’s focus is on thegenetics and genomics of cellular phenotypes, genomic medicine and the developmentofmethodstoanalysethegeneticsofmolecularphenotypes.

2014 highlightsIn 2014, the group developed a number of algorithms to improve thediscovery of regulatory variants in the human genome and link those involvedindiseaserisk.

In addition, methodologies and pipelines developed in previous years were used to address, among other issues, two fundamental problems:

1)Thecontributionofnon-codingregulatoryvariantsincancerprogression.The team combined information from the genome and gene expression levels in tumours of colorectal cancer to implicate changes in gene regulation of 71 genes in cancer progression, and is extending this analysis to additional samplesandcancertypes.

2) The group quantified the contribution of interaction among geneticvariants(GxG),andbetweengeneticvariantsandtheenvironment(GxE)inthevariabilityofgeneexpression.ThiswasasteppingstonetounderstandthecontributionofGxGandGxEindisease.

Dermitzakis’ group participated in a number of other projects, on themessuchastheanalysisofgeneexpressionquantitativetraitlociQTLs(eQTLs)in multiple tissues, the genetic and epigenetic contribution to variability in gene regulation, the contribution of regulatory variation to colorectal cancer risk,andtheanalysisofthepancreaticbetacelltranscriptome.

Selected publicationsBuil A, et al. Gene-gene and gene-environment interactions detected by transcriptomesequenceanalysisintwins.NatGenet2015;47:88-91.

OngenH,et al. Putativecis-regulatorydriversincolorectalcancer.Nature2014;512:87-90.

Bryois J, et al.Cisandtranseffectsofhumanvariantsongeneexpression.PLoSGenet2014;e1004461.

What do they do? BUGFri supports life science researchers by providing expertise in dataanalysis of next-generation sequencing experiments, or any large-scalebiologicalexperimentthatrequiresbioinformaticsresources(seealsop.22).

The group focuses on genome assembly, annotation and comparison as wellasonmutantandstructurevariantidentificationbyresequencing.Italsoconcentrates on RNA sequencing data analysis, proteome clustering andortholog/paralogclassification,andgenesetenrichmentanalysis.

2014 highlightsDuringthecourseof2014,BUGFriexpandeditsactivitiesandcollaborationswithlabresearchersoftheUniversity.Afirstarticlewaspublishedreportingthediscoveryofgenesinvolvedinthemycorrhizationprocess–asymbioticcollaborationbetweenplantsandfungi.Thisworkwascarriedoutcompletelyby in silico mining of existing databases, and applying comparative proteomicsamongdozensofplantspecies.

Thefirsteditionof theBEFRIGenomicsDaywasorganized inJune–anevent the group plans to organize every other year, in collaboration with the bioinformaticscorefacilityofBern.

Selected publications FavreP,et al. A novel bioinformatics pipeline for gene discovery based on conservationoftheproteincodingsequence.BMCPlantBiol2014;14:333.

Cannarozzi G, et al. Genome and transcriptome sequencing identifiesbreeding targets in theorphancrop tef (Eragrostis tef).BMCGenomics2014;15:581.

Miyazaki R, et al. Comparative genome analysis of Pseudomonas knackmussii B13, the first bacterium known to degrade chloroaromaticcompounds.EnvironMicrobiol2014;17:91-104.

Laurent Falquet BioinformaticsUnravellingGroup–BUGFriUniversityofFribourg

Emmanouil Dermitzakis Genomics of Complex Traits GroupUniversity of Geneva

Bart Deplancke Laboratory of Systems Biology and GeneticsEPFL,Lausanne

Philipp BucherComputational Cancer Genomics GroupEPFL,Lausanne

Page 23: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

44 45

What do they do? Zoltán Kutalik and his team develop statistical methods to gain a greater understandinginthegeneticmechanismswhichinfluencecomplexhumantraits, such as obesity, or which trigger diseases, such as infectious or autoimmunediseases.Thegroupisparticularlyinterestedtolearnhowthehumanenvironmentmodifiestheimpactofthesegeneticfactors.

2014 highlightsGenome-wideassociationstudies(GWAS)measurethecorrelationbetweengeneticandphenotypicvariation in largegroupsof individuals.Almostallprevious studies assumed that the effect of all genetic variants is the same, regardless of whether they are inherited from the mother or the father.Kutalik’s teamdiscovered that it is possible to detect genetic effects thatare different according to their parental origin, even in samples of unrelated individualswhere the parental origin is unknown.This is because, at thegenetic markers producing these effects, an increase in the variability of bodymass index (BMI) is observed among peoplewho have inherited adifferent genetic code from their mother and father compared to those who haveinheritedthesamegeneticcodefrombothparents.

The group applied this method to discover genetic markers with parent-of-origineffects(POE)onoverweight.Thisresultedinsixcandidatemarkersshowing strong POE association. In family-based studies (where theparentaloriginofthevariantscanbeinferred),twoofthecandidatesshowedasignificantassociation.Surprisingly,theyfoundthatthesamegeneticcodeat these markers may increase BMI when inherited from one parent, but decreasewheninheritedfromtheother.

Kutalik’sgroupextendedtheQuickTestsoftwaredevelopedforGWAscansto identify loci with POE effect based on the newly developed method.Thanks to this software, many collaborators applied it to their data and the combinedresultsyieldedamajordiscovery.

Selected publicationsRueedi R, et al. Genome-wide association study of metabolic traits reveals novelgene-metabolite-diseaselinks.PLoSGenet2014;10:e1004132.

HoggartCJ,et al.NovelapproachidentifiesSNPsinSLC2A10andKCNK9withevidenceforparent-of-origineffectonbodymassindex.PLoSGenet2014;10:e1004508.

Rüeger S, et al. Impact of common risk factors of fibrosis progression inchronichepatitisC.Gut;Epubaheadofprint.

What do they do? Erik van Nimwegen’s group uses both theoretical and experimentalapproaches to study the structure, function and evolution of the regulatory networks that cells use to control the expression of their genes.Anotherkeyinterestofthegroupistheidentificationofgeneralquantitativelawsingenomeevolution.

2014 highlightsSwissRegulon is a database of genome-wide annotations of regulatory sites (seealsop.16).Besidesgenome-wideannotationsofregulatorysites,thegroup’sISMARAwebserverallowsanyresearchertoreconstructgenome-wide regulatory networks from their ownmicroarray,RNA-sequencing, orChlP-seqdatainacompletelyautomatedfashion.

In 2014, there were three key developments and highlights related toSwissRegulon:

1)ThemostimportantcurrentresourcewithinSwissRegulonistheISMARAweb server, and the main article describing the methodologies underlying thisresourcewaspublishedinGenomeResearch.

2)Withthecontinuingdramaticdropincostsofnext-generationsequencing,manygroupsarenowroutinelysequencing largenumbersofgenomesofrelated microbial strains, but no easily useable methods for reconstructing phylogenies fromsuchdataare currently available.Erik vanNimwegen’sgroup recently developed a completely automated procedure called REALPHY for reconstructing phylogenies directly from raw sequencingdata,andhasmadeitavailableasawebserverwithinSwissRegulon.

3)Overthelastfewyears,thegrouphasbeendevelopingalargecollectionofmethodologiesfortheanalysisofgenome-widebindingdata(ChIP-seq),including various methods for inferring regulatory motifs and predicting regulatory sites. They have now integrated all these procedures into acomprehensive pipeline called CRUNCH, also available as a webserver,which takes rawChIP-seqdata as input, and performsall analysis stepsfrom quality control, read mapping, peak calling, up to comprehensiveregulatorymotifanalysisinanautomatedway.

Selected publicationsBalwierz PJ, et al. ISMARA: automated modeling of genomic signals as a democracyofregulatorymotifs.GenomeRes2014;24:869-84.

Bertels F, et al. Automated reconstruction of whole-genome phylogenies fromshort-sequencereads.MolBiolEvol2014;31:1077-88.

FANTOM5Consortium,et al.Apromoter-levelmammalianexpressionatlas.Nature2014;507:462-70.

What do they do? Mark Robinson and his team develop robust data analysis solutions for the analysis of genome-scale data. They develop statistical methods forinterpretingdatafromhigh-throughputsequencingandothertechnologies,primarilyinthedomainofgeneexpressionandepigeneticregulation.

The group is largely data- and problem-driven, and the methods they develop are catered to the characteristics of the technology platform that is generating the data. They develop open-source software tools via theBioconductorproject.

Where needed, the team also designs experiments and collects data to comparetheperformanceofcompetingmethodsandplatforms.

Selected publicationsZhou X, et al.RobustlydetectingdifferentialexpressioninRNAsequencingdatausingobservationweights.NucleicAcidsRes2014;42:e91.

Riebler A, et al. BayMeth: improved DNA methylation quantification foraffinity capture sequencing data using a flexible Bayesian approach.GenomeBiol2014;15:R35.

Robinson MD, et al. Statistical methods for detecting differentially methylated lociandregions.FrontGenet2014;5:324.

What do they do? Michael Stadler and his team study gene regulation through the analysis and modelling of genome-wide datasets. Most analysed data aregenerated by high-throughput sequencing.The group collaborates closelywith experimental researchers on experimental models including cancer progression and cellular differentiation. Most of the projects measurevariousaspectsofgeneexpression, includingDNAmethylation,singlecelltranscription,protein-bindingtoDNA,andtranslation.

2014 highlightsThe group has been developing QuasR, an R/Bioconductor package with a collectionofhigh-throughputsequencinganalysistools.Itisfullydocumentedand very easy to install, although it contains powerful algorithms and is designedtosupportbothbeginnersandexperts.www.bioconductor.org

TheQuasRpackagewaspublished in theBioinformatics journal,andhasbeen widely used to analyse in-house and external data in over a dozen projects. Since its publication, the package has been downloaded over4,000 times from the main Bioconductor server. QuasR supports various

typesofhigh-throughputsequencingexperimentsincludingRNAsequencing(splicedalignmentanddenovosplicesitedetection),bisulphitesequencingtomeasureDNAmethylationandallele-specificanalysis.WithQuasRandR/Bioconductor, it is possible to perform a complex analysis in the form of a single R script, thus simplifying exchange between collaborators and improving reproducibility, even on different computer platforms (Windows,OSXandUnix/Linux).

Selected publications Gaidatzis D, et al.QuasR:quantificationandannotationofshortreadsinR.Bioinformatics;Epubaheadofprint.

Tocchini C, et al. The TRIM-NHL protein LIN-41 controls the onset ofdevelopmental plasticity in Caenorhabditis elegans. PLoS Genet2014;10:e1004533.

Gaidatzis D, et al.DNAsequenceexplainsseeminglydisorderedmethylationlevels in partially methylated domains of Mammalian genomes. PLoSGenet2014;10:e1004143.

Mark D. Robinson Statistical Bioinformatics GroupUniversity of Zurich

Zoltán Kutalik Statistical Genetics GroupUniversity of Lausanne

Michael StadlerFMIComputationalBiologyGroupFriedrichMiescherInstitute,Basel

Erik van Nimwegen Genome Systems Biology GroupUniversity of Basel

Page 24: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

46 47

What do they do? Andrzej Stasiak’s team uses numerical simulations to shed light on theorganization of chromosomes in yeasts and higher eukaryotes such as humans.Thegroupisespecially interestedinunderstandingchromosomestructureandorganizationduringinterphase.Interphaseisthelongeststageofthecellcycle,andthestageduringwhichgenesareexpressed.

2014 highlightsThanksto theprogressofhigh-resolution3C(ChromosomeConformationCapture) methods, in 2012 scientists discovered that interphasechromosomesofhighereukaryotesarecomposedofsequentialblockswithahigh frequencyof internalcontacts.Theaveragesizeof theseblocks–alsoknownas topologicaldomains–areabout1Mb. It isnotyetknownhowchromatin fibresarearranged in theseblocksnor is themechanismresponsible for their formation known. Inspired by the fact that bacterialchromosomes are composed of supercoiled topological domains, Stasiak and his team performed Brownian dynamics simulations of supercoiled chromatin fibres that are found between borders of individual topologicaldomains.Theirsimulationsrevealedthattranscription-inducedsupercoilingexplainstheformationandallknownpropertiesoftopologicaldomains.

Another important development consisted inmodifying the chromatin fibremodelthegroupuses,byintroducingatorsionalresistanceintoit.Thankstothismodification,Stasiak’sgroupwasabletosimulatetheeffectofsupercoilingonchromosomestructure.Supercoilingarisesduring transcriptionandhasimportantconsequencesonoverallchromosomeorganization.

InacollaborativeprojectinvolvingresearchersinPolandandtheUSA,thegroup worked on the database of proteins with polypeptide chains that form knotsandslipknots.

Selected publications: BenedettiF,et al. Models that include supercoiling of topological domains reproduce several known featuresof interphase chromosomes.NucleicAcidsRes2014;42:2848-55.

BenedettiF,et al.Effectsof supercoilingonenhancer-promoter contacts.NucleicAcidsRes2014;42:10425-32.

Jamroz M, et al.KnotProt:adatabaseofproteinswithknotsandslipknots.NucleicAcidsRes2015;43:D306-14.

What do they do? EvgenyZdobnovandhisgroupusecomparativegenomics techniques tointerpretgenomesequencingdataandlearnabouthowgenesandgenomesevolve, as well as to investigate the possible functions of protein-coding genes,microRNAgenesandconservednon-codingsequences.

2014 highlightsThe group continues to develop OrthoDB resource – a catalogue of“equivalent” genes among species, called orthologs (see also p. 17).Each updated release of OrthoDB includesmore species with availablesequenced genomes. The current release holds 61 vertebrates, 87arthropods,25basalmetazoans,227fungiand2,627bacteria.Thewebsitehas been recently redesigned to display the information more clearly to users.Moreover,theunderlyingorthologclusteringsoftwareisnowfreelyavailable to researcherswhoneed tofindorthologs incustomdatasets.By identifyingequivalent genes frommanydifferent organisms,OrthoDBhelpsbiologiststofindinformationontheevolutionaryhistoryandpossiblefunctionsofaspecificgene.

The team also collaborates actively with clinical colleagues to bring the sequencing revolution tomedicalpractice, inparticular in thefieldofviralmetagenomics.Thegroupisalsocontributingtothei5Kinitiativetoanalysethegenomesof5,000insects.Thiswillprovideanexceptionalopportunityto explore the evolutionary processes that act on genes and genomes, since insects are the most diverse and successful terrestrial animals, having by far thelargestnumberofspecies.

Selected publicationsKriventseva EV, et al. OrthoDB v8: update of the hierarchical catalogof orthologs and the underlying free software. Nucleic Acids Res2015;43:D250-6.

Neafsey DE, et al. Highly evolvablemalaria vectors: the genomes of 16Anophelesmosquitoes.Science2015;347:43.

Petty TJ, et al. Comprehensive human virus screening using high-throughput sequencingwithauser-friendlyrepresentationofbioinformaticsanalysis:apilotstudy.JClinMicrobiol2014;52:3351-61.

Proteome describes the entire set of proteins expressed by a cell, a tissue or an organismatagiventime.Proteinsaretheproductsofgenesandareinvolvedinnearlyeverytaskinthebody–fromshapingcellstodefendingthebodyagainstpathogens.

An altered protein, produced by a mutation in its gene, can be at the heart of diseases such as cystic fibrosis orCreutzfeldt-Jakob disease. Bioinformaticsdevelopstoolstounderstandhowproteinsexercisetheirrole.

Andrzej Stasiak DNAandChromosomeModellingGroupUniversity of Lausanne

Evgeny Zdobnov Computational Evolutionary Genomics GroupUniversity of Geneva

Page 25: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

48 49

What do they do? Christian Ahrens’s group uses and develops bioinformatics tools for theintegration and analysis of datasets provided by experimental biologists (genome sequences, gene and protein expression data, metabolomicsdata).Onefocusistoexploittheuniqueadvantagesofproteinexpressiondata; another is on proteogenomics, i.e. the identification of all proteinsencodedinagenome.

The group contributed to the development of Protter, a software tool that allows users to visualize the topology of membrane proteins and to integrate annotations and experimental evidence in the form of publication-readyplots.

Protter addresses an important and previously unmet need of the research community, namely to visualize the predicted topology of membrane proteinsinthecontextofadditionalinformationonaprotein’sfunction,whileconcomitantlyhighlightingpartsof theprotein thathavebeen identified inproteomicsexperiments.http://wlab.ethz.ch/protter/start/

2014 highlightsUlrichOmasits,aPhDstudentco-supervisedbyChristianAhrensandProf.BerndWollscheid(ETHZurich),finalizedthedevelopmentoftheresourcePeptideRank, which allows users to select the best-suited peptides to quantitativelymeasureproteinamountsindiverseorganisms.Thisparticularresourceperformsbetter thanothercurrentlyavailablesoftwaresolutions.http://wlab.ethz.ch/peptiderank/

Selected publicationsQeli E, et al. Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithmand organism-specific data. JProteomics2014;108:269-83.

Stekhoven DJ, et al.Proteome-wideidentificationofpredominantsubcellularprotein localizations in a bacterial model organism. J Proteomics2014;99:123-37.

Omasits U, et al. Protter: interactive protein feature visualization and integrationwithexperimentalproteomicdata.Bioinformatics2014;30:884-6.

What do they do? Amos Bairoch, Lydie Lane and their team are set on broadening our knowledge and understanding of the function of the 20,000 or so protein-coding genes that exist in the human genome. Their main focus is onbuilding neXtProt, a knowledge resource on human proteins, annotating the effects of human protein variations in the context of cancers and genetic diseases, and analysing results of high-throughput experiments to shed light onaspectsofproteinfunction.

2014 highlightsneXtProt includes curated information on various aspects of human proteinbiologysuchasfunction,mRNA/proteinexpression,protein/proteininteractions, post-translational modifications and protein variations (seealsop.16).neXtProt isthereferenceresourcefortheinternationalHUPOc-HPPconsortium,whoseaimistoexperimentallyvalidatealltheproteinspredictedfromtheanalysisofthehumangenome.CALIPHOisinvolvedindefiningtheguidelinestocurateandintegrateallthedatageneratedwithintheframeworkoftheseprojects.www.nextprot.org

In 2014, theCALIPHO team focused their efforts on the development ofa new powerful search functionality for neXtProt, based on a software technologycalledSPARQL.SPARQLallowsusers tomakevery complexqueriesthatusethewealthofdataencapsulatedinneXtProt.Thesoftwarealso allows users to seamlessly merge information from neXtProt with data fromotherresourcesthatalsouseSPARQL.

The team also pursued their laboratory activities targeted towards the functional characterization of human proteins assisted by bioinformatics data-miningactivities.In2014,theypublishedtwopapers,onedescribingaproteininvolvedinciliaformation,andanotheronstressgranuleformation.

Selected publicationsLane L, et al.Metricsforthehumanproteomeproject2013-2014andstrategiesforfindingmissingproteins.JProteomeRes2014;13:15-20.

BontemsF,et al.C2orf62andTTC17are involved inactinorganizationandciliogenesisinzebrafishandhuman.PLoSOne2014;9:e86476.

Salleron L, et al. DERA is the human deoxyribose phosphate aldolase and is involvedinstressresponse.BiochimBiophysActa2014;1843:2913-25.

What do they do? ThegroupledbyFrédériqueLisacekisinvolvedinsoftwareanddatabasedevelopment for the proteomics and glycomics communities. Their focusis on mass spectrometry data analysis, the discovery of posttranslational modifications,andthefunctionalstudyofglycoconjugatesandcarbohydrates.

2014 highlightsSugarBind database covers knowledge of interactions between pathogens andcarbohydrate ligandsofmammalianhosts (seealsop.17andhttp://sugarbind.expasy.org).Thenew redesigned version ofSugarBindDBwasreleasedin2014.

The framework was chosen to match that of UniCarbKB, the universal glycan structure and glycoprotein database co-developed by an international consortiumofwhichPIGisamajorplayer.http://unicarbkb.org

Information in SugarBindDB was extended to cover properties of pathogens including the diseases they cause, as well as to describe in detail the tissue source and the constitutive parts of the glyconjugate carrying theligand.Namingconventionsofallentitiesinvolvedwereadopted.Standardrepresentationsoftheglycanmoleculeswereintroduced.Cross-linkswereput in place to expand exploration of the functional role of cell-surface

carbohydrates. Internal links support explorationof thedatabasecontent.External links with bioinformatics resources provide background information on sugar-binding proteins. Graphic tools were also added to track thespecificityofaparticularligandacrossavarietyofstrains.

Furthermore,PIGdevelopssoftwareformassspectrometryanalysis,whichisstructuredandstoredinalibraryrenamedMzJava.http://mzjava.expasy.org

Thegroupalsodevelopssoftwarespecificallyforanalysingglycanstructuraldataand,in2014,releasedGlycoDigest.http://glycodigest.org

Selected publicationsMariethoz J, et al. SugarBindDB, a resource of pathogen lectin-glycan interactions.InGlycoscience:BiologyandMedicine2014.Springer,Japan.

Gotz L, et al. GlycoDigest: a tool for the targeted use of exoglycosidase digestionsinglycanstructuredetermination.Bioinformatics2014;30:3131-3.

Bilbao A, et al. Processing strategies and software solutions for proteomics studies using mass spectrometry data-independent acquisition.Proteomics;Epubaheadofprint.

What do they do? Christian von Mering and his group are interested in biological networks and thequantitative,high-throughputdatathatdefinethem.Asnetworksevolve,their impact on the molecular players within the cell is studied at the levels of proteinexpression,sequenceevolutionandgenomearchitecture.

2014 highlightsThe group maintains STRING, a popular database of protein-proteininteractions(seealsop.16).In2014,STRINGwasupdatedtoversion10,amajornewreleasethatalmostdoubledthenumberoforganisms.STRINGnow covers 2,031 organisms from all three domains of life, including many newlysequencedgenomes.

Thisnewversionincludesanimportantchangewithrespecttothe“transfer”ofinteractionknowledgefromoneorganismtoanother.Interactiontransferis necessary because many molecular studies are conducted in model organisms,whichcanbeappliedtohumanbiology.Thistransfercanbedoneby applying interactions from model organisms to humans and vice versa, assuming that the proteins in question serve the same overall functions.Thisassumptionholdsbest forso-called “orthologs”, i.e.pairsofproteinsthat have followed a similar evolutionary trajectory as they split from thelastcommonancestorofbothorganisms.STRINGnowexecutesorthology-informedinteractiontransfersystematically,inanall-against-allfashion.

Other changes include improvements to the user interface. For example,users can upload their own interaction and protein data and view them privatelyinthecontextofallintegratedproteinnetworkdatainSTRING.

The group continues the development of its PaxDB database dedicated to systematicproteinabundancemeasurements inmodel organisms. It alsoinitiated a number of collaborations focusing on microbial communities relevant in diseases, such as cystic fibrosis, salmonella infection andinflammatoryboweldisease.

Selected publicationsSzklarczyk D, et al. STRING v10: protein-protein interaction networks,integratedoverthetreeoflife.NucleicAcidsRes2014;43:D447-52.

FranceschiniA,et al.SpecificinhibitionofdiversepathogensinhumancellsbysyntheticmicroRNA-likeoligonucleotidesinferredfromRNAiscreens.ProcNatlAcadSciUSA2014;111:4548-53.

Schmidt TS, et al.EcologicalconsistencyofSSUrRNA-basedoperationaltaxonomicunitsataglobalscale.PLoSComputBiol2014;10:e1003594.

Christian von Mering Bioinformatics / Systems Biology GroupUniversity of Zurich

Amos Bairoch & Lydie Lane Computer and Laboratory Investigation of Proteins ofHumanOrigin–CALIPHO/UniversityofGeneva

Frédérique Lisacek ProteomeInformaticsGroup–PIGUniversity of Geneva

Christian Ahrens Bioinformatics and Proteogenomics GroupAgroscope, Wädenswil

Page 26: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

50 51

What do they do? The Swiss-Prot group produces and maintains a number of internationally renowned databases for the life science community, namely UniProtKB/Swiss-Prot,HAMAP,PROSITE,Rhea,UniPathwayandENZYME.Allthesedatabasesaremanuallycuratedandcontinuouslyupdated.

2014 highlightsOne of the group’s main activities is the production and development oftheUniProtKB/Swiss-Protknowledgebase.UniProtKB/Swiss-Protprovidescomprehensiveinformationonproteinsequencesandtheirfunctionsourcedfrom the scientific literature by expert curators. It is an internationallyrenownedresourcewithover900,000requestspermonthfromallovertheworld(seealsop.16).

In2014,thegroupreleasedacompletelyredesignedversionoftheUniProtwebsite – the culmination of an extensive user-centred design processcarried out over two years. The new UniProt website features improvedsearch functionalities and more intuitive navigation, with individual protein pages restructured to make links between related knowledge and data more obvious. The provenance of information is rendered transparent throughimproved“evidencetags”thatlinkeachannotationtoitssource–suchasascientificpublicationorhomologousprotein.

ThefirstpublicversionoftheSwissLipidsknowledgebaseforlipidchemistryandbiologywas released.SwissLipids featuresa library of over 240,000possible structures for common lipid classes with extensive curated links to proteins(UniProtKB)andmetabolism(Rhea).

Selected publicationsMorgat A, et al. Updates in Rhea-a manually curated resource of biochemical reactions.NucleicAcidsRes2015;43:D459-64.

Pedruzzi I, et al.HAMAPin2015:updatestotheproteinfamilyclassificationandannotationsystem.NucleicAcidsRes2015;43:D1064-70.

Poux S, et al. Expert curation in UniProtKB: a case study on dealing with conflictinganderroneousdata.Database(Oxford)2014:bau016.

Bioinformatics provides ever-growing support to the field of medicine andhealthbyofferingitsexpertiseinmanydifferentways.Drawingonpatients’data, bioinformaticians develop tools that help medical practitioners in their decisionmaking.

As an illustration, SIB has developed the algorithm for a non-invasive prenatal test thatcandetectthemostfrequenttrisomiesandchromosomalrearrangements.Italso developed models to predict the evolution of brain aneurysms and estimate thedynamicsoftheEbolavirusduringthe2014outbreakinWestAfrica.

Ioannis Xenarios & Lydie Bougueleret Swiss-Prot GroupUniversity of Geneva

Page 27: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

52 53

What do they do? The group focuses on the exploration of changes in the genomes of cancer cells. They use their continuously expanding data collections todetect both specific mutation events that affect individual tumours, aswell as complex genomic aberration patterns and their relationships to cancerentities.Collaborativeprojects–e.g.inassociationwithspecialistsin thefieldsof childhoodbrain tumoursor skindiseases–areaimedatthe biological characterization of these diseases, to facilitate targeted therapeuticapproaches.

2014 highlightsThearrayMapdatabaseprovidesover60,000pre-processedcopynumberprofiles from cancer genome studies to biomedical researchers. Thisresource,whichisthelargestofitskind,facilitatestheidentificationofcancertype specificmutation patterns, and allows for a simplified association ofpotentialtargetgeneswithaffectedtumourentities.

In 2014, Michael Baudis’ team launched a new edition of the arrayMapresource, which almost doubled its data content and introduced a range ofnew features, includinganHTTP-baseddataAPI.Thisallowsusers toaccessdirectlythegroup’spre-processedcancergenomedataandtousethemintheirowndownstreamapplications.

In 2014 the group also published an extensive study analysing“chromothripsis”-like genome patterns in cancer, based on datasetscollected in arraymap.org.Aspartoftheresults,thestudyshowedthatthegenomic aberrations, commonly referred to as “chromothripsis”, probablyconstitute a group of phenomena with variable pathophysiologies and not all refertothecoredefinitionofsynchronousgenomeshatteringevents.

ThegroupcontinuestoextendtheProgenetixdatabase(progenetix.org)andthe DIPG repository for childhood glioma data (dipg.progenetix.org), andparticipatesinacollaborativeprojecttodelineateinflammatorydermatologicdiseasesfromcutaneouslymphomas.

Selected publications CaiH,et al. Chromothripsis-like patterns are recurring but heterogeneously distributedfeaturesinasurveyof22,347cancergenomescreens.BMCGenomics2014;15:82.

CaiH,et al.Progenetix: 12 yearsof oncogenomicdata curation.NucleicAcidsRes2014;42:D1055-62.

Baderca F, et al. Biopsying parapsoriasis: quo vadis?Are morphologicalstains enough or are ancillary tests needed? Rom J Morphol Embryol2014;55:1085-92.

What do they do? Niko Beerenwinkel and his team develop and apply statistical methodsandalgorithmstoanalysehigh-throughputmoleculardata.Theirgoalistopredict the effect of genetic alterations and to support the diagnosis and treatmentofdiseases.

The group has four main focuses:1)computationalbiologyandbioinformatics;2)medicalsystemsbiologyandcomputationalmedicine;3)statisticalmethodsforhigh-throughputmolecularprofilingdata;4)evolutionarydynamicsofcancerandpathogenpopulations.

2014 highlightsHaploClique is an efficient haplotype multi-assembly software for theanalysisofwithin-hostheterogeneousviruspopulations.Itisessentialforareliable detection of all viral variants present within individual patients, and henceforimproveddiagnosisandpredictionofresponsetoantiviraltherapy.

HaploClique has been optimized for both short-read and long-readsequencingtechnologies.Itcanhandlepaired-enddataandisparticularlyusefulfordetectinglong-rangedeletionsinviralhaplotypesequences.

Selected publicationsGiallonardo FD, et al. Full-length haplotype reconstruction to infer thestructure of heterogeneous virus populations. Nucleic Acids Res2014;42:e115.

Töpfer A, et al. Viralquasispeciesassemblyviamaximalcliqueenumeration.PLoSComputBiol2014;10:e1003515.

Szczurek E, Beerenwinkel N. Modeling mutual exclusivity of cancermutations.PLoSComputBiol2014;10:e1003503.

What do they do? Mauro Delorenzi and his group provide bioinformatics-statistical expertise tothelifesciencecommunityandpromotecollaborations(seealsop.21).They perform the analysis of biomedical-genomics data with a focus on biomarkerstudiesincancerresearchandstatisticalmethodsforgenomics.Recently, the group concentrated on molecular heterogeneity and pathway activationpatternsincancersubtypes.

2014 highlightsThe importance of competences in data analysis continues to increase.Manyresearchersobtainresultstheyconsideras“bigdata”,whichprovetobeverychallengingtoanalyse.Statisticianshavebeenworkingwithsuchdata for a long time and Delorenzi and his team are well positioned to tackle thechallenges.

In order to meet these challenges, the group developed their offer in two directionsin2014:

1) The group gave courses on state-of-the-art methods for statisticalanalysisofbiologicaldata. Inparticular, theyorganizedseveraladvancedcourses on topics for which training has rarely been offered to the Swiss life sciencecommunity.

2)The group continues to provide consulting services to the life sciencecommunity, enabling researchers to gain access to cutting-edge statistical analysismethods in theirwork.Today,manyresearchgroups,both in theacademicworldand in industry,haveprojects that requirestatisticalhelp.However,theyareoftenunabletohirededicateddataanalystswiththerightcompetences,andrelyontheDelorenzigroup’sservicestoanalysedata.

The Biostatistics Service of BCF provides consulting services, focusingon the analysis of data produced by modern high-throughput biological methods. The expertise they provided led to the publication of severalscientific articles over the course of the year, in particular in the field ofcancerbiology.

Selected publicationsDiNarzoAF,et al.Testoffourcoloncancerrisk-scoresinFFPEmicroarraygeneexpressiondata.JNatlCancerInst2014;106(10).

Ragusa S, et al.PROX1promotesmetabolicadaptationandfuelsoutgrowthofWnt(high)metastaticcoloncancercells.CellRep2014;8:1957-73.

Soneson C, et al. Batch effect confounding leads to strong bias in performance estimatesobtainedbycross-validation.PLoSOne2014;9:e100335.

What do they do? ResearchintheFellaylabfocusesonhumangenomicsofinfection.Usinggenome-wideassociationanalysis,exomesequencingandtranscriptomics,the group explores the genetic roots of inter-individual differences in response topathogens.HostgenomicsofHIV infection, jointanalysesofinteractions between human and viral genomes, and exome sequencingin patients with extreme infectious disease phenotypes are some of the importantresearchdirections.

2014 highlightsThegroupisleadinglargeinternationalprojectstounderstandhowhumangeneticvariationimpactsHIVdiseaseprogression.Incollaborationwithover20 cohorts or centres studying HIV progression, the group has collectedgenome-widegenotypingdataonabout11,000HIV-infectedindividualswithclinicalfollow-up.This project (the International Collaboration for the Genomics of HIV –ICGH)hasthreemaingoals:1)theidentificationofcommongeneticmarkers(>1%frequency)associated

withHIVdiseaseprogression;2)theidentificationoflikelycausalvariantsunderlyingassociatedregions

throughfine-mapping;3)thequantificationoftheproportionofvariationinHIVdiseaseprogression

thatisheritable.

The group also investigates the influence of rare gene polymorphisms.Thisprojectusesexomesequencingtoidentifymutationsthatalterproteinsequences and test them for an impact on HIV viral load individually, incombinationwithinageneandacrossrelevantgenesets.Sofarthegrouphasobtainedexomesequencedataonabout1,000HIV-infectedpatients.

Themaingoalofbothprojectsistoimprovetheunderstandingofhuman-HIVinteractionsatthegenomiclevel,whichmayassistinthedevelopmentofnewtherapeuticsand/orvaccines.

In addition, the team developed an online compendium of host genomic data inHIVbiologyanddisease.Thisintuitivewebinterfaceallowsqueries,andsupports validation of the rapidly growing body of host genomic information pertinenttoHIVresearch.www.guavah.org

Selected publicationsRegoes RR, et al. Disentangling human tolerance and resistance against HIV.PLoSBiol2014;12:e1001951.

Rausell A, et al. Analysis of stop-gain and frameshift variants in human innateimmunitygenes.PLoSComputBiol2014;10:e1003757.

Bartha I, et al.GuavaH:AcompendiumofhostgenomicdatainHIVbiologyanddisease.Retrovirology2014;11:6.

Jacques Fellay Host-PathogenGenomicsGroupEPFL,Lausanne

Niko Beerenwinkel Computational Biology GroupD-BSSE,ETHZurich,Basel

Mauro Delorenzi BioinformaticsCoreFacility–BCFUniversity of Lausanne

Michael Baudis ComputationalOncogenomicsGroupUniversity of Zurich

Page 28: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

54 55

What do they do? Recent developments in immunotherapy are revolutionizing cancer treatments.Using large genomics data,DavidGfeller and his teamhopeto gain a better understanding of fundamental properties of tumours, such as how tumours are recognized by the immune system and how this recognition can be exploited towards clinical benefits.To reach this goal,the team develops computational tools from statistics, machine learning and modelling.

2014 highlightsTumours are characterized by changes at the genetic and epigenetic level, whichareoftendetrimental topatients.But thesechangesalsoprovideawayofdistinguishingcancercellsfromnormalcells.Amongtheproteinsthatarespecificallyexpressedormutatedincancercells,onlyagivenfractionisdetectedbytheimmunesystem.Thisisbecausecertainproteinsneedtobedisplayedonthecell’smembranesoastobecomevisibletotheimmunesystem–aprocessthatishighlyregulatedincells.

DavidGfeller’sgroup is interested incomputational tools topredictwhichantigens have the greatest potential to elicit an immune reaction that –together with immunotherapy drugs – could help to redirect the immunesystemagainstcancercells.Thegroupisalsoworkingonoptimizingtoolsbased on genomics data to predict and characterize the presence of immune cellsintumoursamples.

Proteinsperformtheirfunctionbyinteractingwitheachother.Experimentallymapping all possible interactions is challenging, and computational approaches are promising to narrow the list of interesting candidates.Therefore a second interest of the group is to develop protein-protein interaction prediction tools by combining statistical approaches with structuralmodelling.Theyarealso interested inunderstandingbetterhowcancersomaticmutationsaffectproteininteractions.

Selected publicationsGfeller D, et al. SwissTargetPrediction: a web server for target prediction of bioactivesmallmolecules.NucleicAcidsRes2014;42:W32.

Gfeller D, et al. Prediction and experimental characterization of nsSNPsalteringhumanPDZ-bindingmotifs.PLoSOne2014;9:e94507.

What do they do? Ivo Kwee and his team are involved in cancer bioinformatics. Their unitsupports the research groups at the Institute of OncologyResearchwithcomputational and statistical services for genomic profiling, sequencinganalysis, functional genomics, pharmacogenomics, and clinical study support.Theydevelopnewcomputationalmethodsandtoolstocomplementanddrivecancerresearch.

2014 highlightsCancer is not caused by a single gene but through complex interactions of multiple genes. The team’s efforts are to go beyond single genestatistics and pursue functional analysis on gene sets and interaction networks.ThegroupisusingoptimizationonPrizeCollectingSteinerTrees(PCST) tofind functionalcorrelatedsubnetworks thatmaypoint todrivingoncogeneticpathways.Inalargeinteractionnetwork,PCSTattemptstofinda neighbourhood – or subnetwork – where genetic aberrations aremostconcentrated.Forexample,aconnectedneighbourhoodwheremanygenesare differentially expressed can be found; or, combined with survival data, a subnetwork that locates interacting genes that are differentially expressed and correlated with patient survival can be found. The optimal solutionis NP-hard. The group has developed a fast heuristic for PCST that canhandle tensof thousandsofnodeswithhundredsof thousandsofedges.Furthermore,thegroupaimsto integratedifferentdatatypessuchascopynumber,methylationandexpressioninasinglenetwork.

The group received a number of grants for bioinformatics projects incollaborationwith theArtificial Intelligence institute IDSIA.Asanexample,they developed a Bayesian algorithm for copy number segmentation, and in arunningprojecttheyaredevelopinganoptimizationmethodforfunctionalanalysisofgeneticnetworksusingPCST.

Selected publicationsAhkmedov M, et al. A fast heuristic for the prize-collecting Steiner tree problem.InLectureNotesinManagementScience2014;6:207-16.

Mian M, et al.Genome-wideDNAprofilingidentifiesclonalheterogeneityinmarginalzonelymphomas.BrJHaematol2014:164:896-9.

Ronchetti D, et al. Distinct patterns of global promoter methylation in early stage chronic lymphocytic leukemia. Genes Chromosomes Cancer2014;53:264-73.

What do they do? The Computational PathoGenOmics group focuses on data analysis.Methods used in biomedical research increasingly include high-throughput assays to conduct millions of chemical, genetic and pharmacological tests on biological samples. The group leads corresponding projects and alsosupports collaborating research groups in the development and application of computational methods to address research questions in the fields ofinfectionbiologyandpublichealth.

2014 highlightsChemicalmodificationsinDNAhaveemergedasanadditionalsuperposedlayer of information defining some of the properties of a living cell. Theroles of such epigenetic modifications, including DNA methylation, areincompletelycharacterized.Byapplyingarecentlydevelopedgenome-widesequencing assay, the group assessed themodifications in the genomesof Neisseria meningitidis from African epidemics. The applied PacBiosequencing method readily detected unexpected diversity in epigeneticmodifications,andconfirmsthepotentialofepigeneticchangestocontributetorapidlyadaptingpropertiesofbacterialcells.

Thegroupisinvolvedinseveralotherprojectstoassessgenomesequencesofavarietyofpathogens.PathogenicorganismssuchasPlasmodium, the causativeagentofmalaria,haveevolvedspecificmechanismstoevadethehuman defence systems. In collaboration with groups at the SwissTPH,SchmidandhisteamhaveidentifiedaspecificportionintheDNAsequencethatcontrolstheproductionofproteins.

ThegroupisalsocontributingtoaEuropeanprojectassessingthepotentialeffects of electromagnetic fields.Human cellswereexposed in culture toelectromagnetic fields and epigeneticmodifications were assessed usinghigh-throughput sequencing methods. Ongoing analysis approaches arecomparingepigeneticprofilesofexposedtounexposedcontrolcells.

Selected publications BrancucciNM,et al. A var gene upstream element controls protein synthesis attheleveloftranslationinitiationinPlasmodiumfalciparum.PLoSOne2014;9:e100183.

Begitt A, et al.STAT1-cooperativeDNAbindingdistinguishes type1 fromtype2interferonsignaling.NatImmunol2014;15:168-76.

Seguín-Estévez Q, et al. Extensive remodeling of DC function by rapid maturation-induced transcriptional silencing. Nucleic Acids Res2014;42:9641-55.

Ivo Kwee Bioinformatics Core UnitInstituteofOncologyResearch,Bellinzona

Christoph Schmid ComputationalPathoGenOmicsGroupSwissTPH,Basel

David Gfeller Computational Cancer BiologyUniversity of Lausanne

What do they do? Patrick Ruch and his team transform the wealth of implicit and hidden information buried in the biomedical literature, into human-actionable and machine-readable knowledgeitems(e.g.answerstoquestions,functionalannotation).

The group maintains an updated data infrastructure to store, access and analysethebibliome,i.e.biomedicalliteraturecontentssuchasMEDLINE,biomedical patent libraries and other web contents, including clinical practice guidelinesandpatient communities.Thebibliome is thennormalizedandenrichedwithunambiguoussemanticdescriptors, i.e.databaseaccessionnumbersandonto-terminologicaldescriptors.

2014 highlightsThe Gene Ontology Categorizer (GOCat), a tool used to assign geneontologyfunctionaldescriptorstoanyinputtext(abstract,full-textarticle…)receivedabout1millionvisitsin2014!http://eagl.unige.ch/GOCat/

GOCatwasfurtherdevelopedforFull-Texts(GOCat4FT).http://eagl.unige.ch/GOCat4FT/

GOCatwascustomizedtoparticipateintheBioCreativecompetition,whichitwon.Thesystemisnowable toaccept full-textpapers.Fromthisbasicinput, it generates a curation-driven summary, which is ultimately sent to

GOCat to generate a ranked list of functional descriptors to support theannotator.Theserviceisavailablefordemonstrationandiscurrentlybeingintegrated into thecurationworkflowofneXtProtwithin thecontextof theneXtPressoproject.

For Switzerland, the group maintains the biomedical terminologicalresources of the EXPAND project. In particular, they support the cross-border exchange of electronic health records between Switzerland and Europeancountries.http://www.expandproject.eu/

In collaboration with Dr Rodolphe Meyer, from the Geneva University Hospital (HUG), Ruch and his team began the TransCoD (TranslationalCodingDatabase)project,whoseaimistobuildaholisticclinicalknowledgerepositorytosupportclinicalresearch.

Selected publicationsGobeill J, et al. Closing the loop: from paper to protein annotation using supervised geneontologyclassification.Database(Oxford)2014;2014:bau088.

Vishnyakova D, et al. Electronic processing of informed consents in a globalpharmaceuticalcompanyenvironment.StudHealthTechnolInform2014;205:995-9.

Gobeill J, et al. Instance-based learning for tweet categorization inCLEFREPLAB2014.ProcCLEF2014;1491-9.

Patrick Ruch Text Mining GroupGenevaSchoolofBusinessAdministration(HEG)

Page 29: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

56 57

Agenomecan informlifescientistsonhowaspecieshasevolvedover time.Phylogeny studies the phenotypic and genetic closeness of species, and is illustratedbyphylogenetictrees.

Bioinformaticsdevelopstoolsthatareabletoreadaspecies’genome,tellthestoryandbuildtrees.Inthisway,lifescientistshave,forexample,acquiredabetter understanding of human migration in the past, the formation of crocodile scalesandtheevolutionaryhistoryofgrass.

What do they do? Maria Anisimova and her team develop computational methods to study genomeevolutionandadaptation.Keepingpacewithgrowingdatasizeandcomplexity,theyprovideefficientsolutionsfortheanalysisofphylogeneticpatterns and selection in genomic sequences. Their goal is to developbioinformatics applications for real problems in biotechnology, medicine, ecologyandagriculture.

2014 highlightsMolecular phylogenies provide a test-base for biological hypotheses or support downstream bioinformatics analyses. Selection acting on protein-codingDNAaffectsphylogeneticshapesandshouldbe includedduringphylogenyinference. CodonPhyML allows for fast maximum likelihood phylogenyinferencefromprotein-codinggenesundercodonsubstitutionmodels.

The latest version of CodonPhyML includes multiple partition models that can be used for phylogeny inference from multiple loci or multiple domain proteins. Differentmodels can share parameters, providing thereisaconvenient framework forhypotheses testing.Furthermore,basedonfast approximation algorithms, the group aims to include the alignment uncertaintyduringphylogenyestimation.Thiswillallow formoreaccuratephylogeneticinferencesfromvasthigh-throughputdata.http://sourceforge.net/projects/codonphyml/

Their recent methods for analysing genomic tandem repeats are implemented intheTandemRepeatAnnotationLibrary(TRAL).https://github.com/elkeschaper/TandemRepeats/tree/master

ProGraphMSA implements fast probabilistic graph-based phylogeny-aware alignmentwithtandemrepeats.http://sourceforge.net/projects/prographmsa

Selected publicationsSchaper E, et al.ThemajorityofhumanproteintandemrepeatsarehighlyconservedwithintheEukaryotes.MolBiolEvol2014;31:1132-48.

SchaperE,AnisimovaM.Theevolutionandrolesofproteintandemrepeatsinplants.NewPhytol2015;206:397-410.

DimitrievaS,AnisimovaM.Unravelingpatternsofsite-to-sitesynonymousrates variation and associated gene properties in protein domains and families.PLoSOne2014;9:e102721.

What do they do? Laurent Excoffier and his team use genomic data to understand howpopulations evolve under the joint effects of demography and selection.They also develop statistical methods to reconstruct and infer evolutionary processesfromgenomicdata.

The team focuses on the effect of range expansions on genomic and functional diversity, and the detection of signatures of adaptation and selectionatthemolecularlevel.

2014 highlightsThe group carried out: 1) empiricalstudiesofhybridzonesinsmallrodents;2) themodellingoffunctionalgenomicevolutionduringrangeexpansions;3) thedevelopmentofstatisticalmethodstoevidencesignalsofconvergent

adaptations at the genomic level;4) experimentalevolutionwithbacteria;5) humangenomescans.

In addition, the team develops Arlequin, a resource which has becomevery popular. Arlequin allows empirical population geneticists to extractinformation on genetic and demographic features from a collection of population samples, using a large set of population genetics methods and statisticaltests.http://cmpg.unibe.ch/software/arlequin35/

Selected publications FollM,et al. Widespread signals of convergent adaptation to high altitude in AsiaandAmerica.AmJHumGenet2014;95:394-407.

Sousa V, et al. Impact of range expansions on current human genomic diversity.CurrOpinGenetDev2014;29:22-30.

Lischer HEL, et al. Ignoring heterozygous sites biases phylogenomic estimates of divergence times: implications for the evolutionary history of Microtusvoles.MolBiolEvol2014;31:817-31.

Maria Anisimova Applied Computational Genomics TeamZurich University of Applied Sciences, Wädenswil

Laurent Excoffier Computational Population Genetics GroupUniversity of Bern

Page 30: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

58 59

What do they do? Gaston Gonnet and his team are interested in bioinformatics problems, in particular the modelling and simulation of molecular sequence data.The group is pursuing large-scale computational problems, such as the OrthologousMAtrixProject(OMA).Thisparticularprojectaimstoproduce,automatically, reliable orthologous groups of proteins that are derived from entiregenomes.

2014 highlightsTheOMABrowserprovidesorthologypredictionsamongpubliclyavailablegenomes (see also p. 17).The resource includes aweb interface (“OMABrowser”),DASandSOAPprogrammaticinterfaces,anddownloadabledataandmeta-datainvariousstandardformats.Recently,OMAalsoprovidedanefficientstand-aloneversionthatmakesiteasytocombinecustomuserdatawithpre-existingreferencegenomes.

In2014,twonewreleasesoftheOMAdatabaseweremadepublic,inwhichthenumberofcoveredgenomeshassteadilyincreased.Overthecourseoftheyear,theresourcecoveredanadditional162genomes,including18newplantgenomes.Acompleteredesignofthewebsitewasalsoimplemented,and several new functionalities were added to the service. Of note arethegene functionpredictionsbasedonOMAorthologsand thehomeologrelationsforpolyploidgenomes.http://omabrowser.org

AspartoftheongoingSIBinfrastructuregrantforOMAandwiththeOrthoDBgroup,Gonnet’sgroupdevelopstheSIBLINGsdatabase.Thisnewservicewill allow computationally inferred homologs among complete genomes to beavailabletothepublic.ThisparticularstepisthemosttimeconsumingpartoftheOMAandOrthoDBpipeline.Thedevelopmentofthiswebservicewas again a focus over the year and a beta version of the service will be availableinearly2015.

Selected publicationsAltenhoff AM, et al. The OMA orthology database in 2015: functionpredictions,betterplantsupport,syntenyviewandother improvements.NucleicAcidsRes2014;43:D240-9.

Wittwer LD, et al. Speeding up all-against-all protein comparisons while maintainingsensitivitybyconsideringsubsequence-levelhomology.PeerJ2014;2:e607.

Coman D, et al. Distinct evolutionary strategies in the GGPPS family from plants.FrontPlantSci2014;5:230.

What do they do? Thegroup ledbyJérômeGoudet strives tounderstandhow the interplaybetween population structure, trait architecture and selection can be disentangled.Itfocusesonthesimulationofindividualsandtheirgenomesthrough time in a realistic landscape, inferring historical events and selection from genomic and phenotypic data, and developing statistical tools for populationgenomics.

2014 highlightsThefocusin2014wasalongthefollowinglines:1) Vignettes were written to illustrate how to simulate metapopulations,

neutral genetic traits, quantitative traits as well as how to estimatepopulationparametersusingApproximateBayesianComputation.

2) Sex can now be determined using a genetic basis rather than beingallocatedatrandom.

3) Thesizeofthegeneticmapcanevolve–chromosomescaneithergroworshrink.

4) Implementationofatwo-stagedispersal,withagameticdispersalphaseand a zygotic dispersal phase. This is an important feature of plantmatingsystemswherenotonlyseedsbutalsopollendisperse.

ResourcesHowdiversity amonggenomeshasbeen shaped is a huge challenge forresearchers, and will impact fields from evolutionary biology to medicalgenetics. quantiNEMO has been designed to tackle such questions. It isan individual-based, genetically explicit stochastic simulation program which allows the investigation of the effects of selection, mutation, recombination anddriftonquantitativetraits.

Besides quantiNEMO, the team is developing HierFstat, a collection ofstatisticalfunctionsfortheanalysisofpopulationgenomicdata.

Selected publicationsAntoniazza S, et al.Natural selection inapostglacial rangeexpansion: thecaseofthecolourclineintheEuropeanbarnowl.MolEcol2014;23:5508-23.

Pellissier L, et al. Soil fungal communities of grasslands are environmentally structuredataregionalscaleintheAlps.MolEcol2104;23:4274-90.

Alcala N, et al. On the transition of genetic differentiation from isolationto panmixia: what we can learn from GST and D. Theor Popul Biol2014;93:75-84.

Jérôme Goudet Population Genetics and Genomics GroupUniversity of Lausanne

Gaston Gonnet Computational Biochemistry Research GroupETHZurich

What do they do? TheprimaryresearchthemeofJensen’sgroupistodrawstatisticalinferencefromDNApolymorphismdata,andmorespecifically,todescribetheprocesseswhich determine the amount and distribution of genetic variation within and between populations. Lab members work on both applied and theoreticalproblems,infieldsrangingfrompopulationgenomicstomedicalgenetics.

The group focuses on developing statistical methodology to infer the parametersofpositiveselectionforspecificsitesinthegenome,aswellasoncharacterizingthefulldistributionoffitnesseffectsofallnew,segregatingandfixedmutationsinthegenome.

2014 highlightsIn 2014, Jensen’s team had a particular focus on estimating selectionparameters from time-sampled population data, be it via ancient DNA orclinically/experimentallysampledpopulations.ThisresultedintheprogramWFABC (hosted on their lab website: http://jensenlab.epfl.ch), which hasbeen demonstrated to outperform other existing time-sampled based approaches.Thismethodwasfurtherutilizedtocharacterizetheevolutionofdrug-resistanceevolution(totheantiviraldrugoseltamivir)intime-sampleddatafrominfluenzavirus.

Selected publications Foll M, et al. Influenza virus drug resistance: a time-sampled populationgeneticsperspective.PLoSGenet2014;10:e1004185.

Bank C, et al. A Bayesian MCMC approach to assess the complete distributionoffitnesseffectsofnewmutations:uncoveringthepotentialforadaptivewalksinchallengingenvironments.Genetics2014;196:841-52.

Jensen JD.On the unfounded enthusiasm for soft selective sweeps.NatCommun2014;5:5281.

What do they do? HenrikKaessmannandhisgrouppursueintegratedbioinformaticsprojectsregarding the functional evolution of mammalian genomes based on publicly available genomic data as well as various experimental data sets generated bythewetlabunitofthegroup.

The main focus is on the evolution of gene expression levels, the origin and evolution of long non-coding RNA genes, the birth and functionalevolutionofmicroRNAgenes,theevolutionofsexchromosomes–as,forexample,theevolutionofXdosagecompensationandoftheYchromosome–theevolutionofgermcelltranscriptomes,thefunctionalevolutionofnewprotein-codinggenes,andtheevolutionofalternativesplicing.

2014 highlightsChanges in gene expression are thought to underlie many of the phenotypic differences between species, but large-scale analyses of gene expression were, until recently, prevented by technological limitations.As sequels totheir major initial study published in 2011, Kaessmann’s team assessedthefunctionalevolutionoflongnon-codingRNAs,microRNAeditingandYchromosomes.

Otherprojectspursuedin2014–basedonuniquetranscriptomesgeneratedby their group – include the evolution of newly emerged coding genes,untranslatedregions(UTRs)andlizardsexchromosomes.

Selected publicationsCortez D, et al.OriginsandfunctionalevolutionofYchromosomesacrossmammals.Nature2014;508:488-93.

Necsulea A, et al. The evolution of lncRNA repertoires and expressionpatternsintetrapods.Nature2014;505:635-40.

NecsuleaA,KaessmannH.Evolutionarydynamicsofcodingandnoncodingtranscriptomes.NatRevGenet2014;15:734-48.

Henrik Kaessmann FunctionalEvolutionaryGenomicsGroupUniversity of Lausanne

Jeffrey D. Jensen Population Genetics GroupEPFL,Lausanne

Page 31: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

60 61

What do they do? Bernard Moret and his group develop, implement, and assess models and algorithms for genomeand network evolution.They conduct foundationalresearch in optimization as well as in stochastic models and associated algorithms; software produced in the course of the research is available to thecommunityinsourceformonitswebsite.

2014 highlightsIn2014,Moret’steammadesubstantialprogressontheverydifficultproblemof computing optimal solutions for the evolutionary history of groups of large genomesundervariousmodelsofrearrangementsandduplicationofgenes.Their approaches now allow them to compute maximally parsimonious solutions for positional orthology assignments in groups of 100 or more genomes,withthenumberofgenesrangingfrom2,000to30,000(toappearinRECOMB’15, the19thAnnual InternationalConferenceonResearch inComputationalMolecularBiology).

Selected publications Shao M, et al. An exact algorithm to compute the DCJ distance for genomes withduplicategenes.JComputBiol;Epubaheadofprint.

Ghiurcuta CG, Moret BM. Evaluating synteny for improved comparativestudies.Bioinformatics2014;30:i9-18.

NairNU,et al. Study of cell differentiation by phylogenetic analysis using histonemodificationdata.BMCBioinformatics2014;15:269.

What do they do? Research inMarcRobinson-Rechavi’sgroup ismainly focusedon linkingthe evolution of animal development to genome evolution. The groupdevelops databases for evolutionary biology, and studies genome evolution invertebrates.

2014 highlightsThe team develops Bgee, a database of gene expression evolution (seealsop.17andhttp://bgee.org/).Bgeehasgrownfrom5to17species,witha wealth of RNA sequencing data that is quality controlled,mapped andprecomputedforexpressioncalls.Theincreaseinspeciesnumberswasaresultofthegroup’scollaborationwiththeGeneOntologyandotherprojectsontheanatomicalUberonontology,aswellasthegroup’snewmethodsfortherapiddevelopmentofnewdevelopmentalandanatomicalannotations.All resources and developments linked to Bgee are being made publicly availableonGitHub.https://github.com/BgeeDB

With other biocuration groups, Robinson-Rechavi’s team produced anontologyofconfidenceinformation,forthebetteruseofannotateddatafromalldatabases(CIO,availableonBgeeGitHub).

The group also continues to develop high-performance computing tools for the detection of natural selection in protein coding genes, and uses them to enrichtheirdatabaseofpositiveselection.

Selectome: http://selectome.unil.ch/

Selected publicationsHaendelMA,et al.Unificationofmulti-speciesvertebrateanatomyontologiesforcomparativebiologyinUberon.JBiomedSemantics2014;5:21.

Rosikiewicz M, et al.IQRray,anewmethodforAffymetrixmicroarrayqualitycontrol, and the homologous organ conservation score, a new benchmark methodforqualitycontrolmetrics.Bioinformatics2014;30:1392-9.

Roux J, et al.Patternsofpositiveselectioninsevenantgenomes.MolBiolEvol2014;31:1661-85.

Bernard Moret Laboratory for Computational Biology and BioinformaticsEPFL,Lausanne

Marc Robinson-Rechavi Evolutionary Bioinformatics GroupUniversity of Lausanne

What do they do? Nicolas Salamin and his group estimate and use the evolutionaryrelationships between species to investigate the processes affecting species evolution. In particular, they are looking at the ecological, genomic andmorphologicalfactorsthatlimitandconstrainspeciationandadaptation.

Thegroupfocusesonphylogeneticreconstructionmethods,clownfishandplant genomics, the estimation of positive selection on genes, modelling theevolutionofDNAsequencesandphenotypes,themodeandtempoofspeciesevolution,andthespatiallyexplicitevolutionofdiversity.

2014 highlightsThegroupisdevelopingnewapproachestoestimatetherateofdiversificationof species.They use complex Bayesian approaches to incorporate fossilinformation and link them with phylogenetic methods to better estimate the tempoofspeciesevolutionthroughtime.Thesedevelopmentsareimportanttounderstandthefactorsthatareinfluencingtheappearanceandextinctionofspeciesovertime.ThemethodwasimplementedintoaPythonsoftwarecalledpyRate.

In2014,themodelsimplementedinpyRatewereextendedbydevelopinga new Bayesian approach that can estimate standard heterogeneous birth-deathprocesson fossildata.Theadvantageofsuchanapproach is that

the birth-death process is directly comparable with estimates done on phylogenetic trees, and it could provide additional information to infer the divergence times of the nodes in a phylogenetic tree estimated by moleculardata.

TheBayesian approach is very flexible andSalamin’s team incorporatedseveral models to fully account for the heterogeneity in the tempo of species evolution.Thisincludesallowingforshiftsofspeciationandextinctionratesduring species evolution, but also the association with external factors such asclimatevariationorcompetitionfromotherclades.

Thegroup’smodelsfullycomplementexistingapproachesandarecurrentlyused to estimate the evolution of large groups of mammals such as canids ortheoriginsoffloweringplants.

Selected publicationsDib L, et al. Evolutionary footprint of coevolving positions in genes.Bioinformatics2014;30:1241-9.

Silvestro D, et al. Bayesian estimation of speciation and extinction from incompletefossiloccurrencedata.SystBiol2014;63:349-67.

Zaheri M, et al. A generalized mechanistic codon model. Mol Biol Evol2014;31:2528-41.

What do they do? The team ledbyTanjaStadlerdevelopsphylogenetic tools tounderstandevolutionary, epidemiological and ecological processes from genetic sequencing data. They focus on defining and analysing stochasticmodels, implementing computational methods, analysing empirical data and discussing new insights with clinicians, public health policy makers, ecologistsandpalaeontologists.

The group addresses questions ranging from epidemiology, public healthandmedicine,toecology,evolutionandlanguageevolution.

2014 highlightsTanja Stadler’s group developed the add-on “phylodynamics” for thephylogenetic software platformBEAST.Themain purpose of this add-onis to quantify epidemiological and evolutionary dynamics using geneticsequencingdatafrompathogens.Forexample,thegroupusedittoquantifytheearlyspreadofEbolainSierraLeone.TheresultsarepublishedinPLoSCurrents:Outbreaks(seeSelectedpublications),andwerelargelycoveredbythemedia.

Formorenews:www.bsse.ethz.ch/cevo/research/west-africa-ebov-epidemic.html.

Selected publicationsHeathT,et al. The fossilized birth-death process for coherent calibration of divergence-timeestimates.ProcNatlAcadSciUSA2014;111:E2957-66.

Stadler T, et al. Insights into the early epidemic spread of Ebola in Sierra Leoneprovidedbyviralsequencedata.PLoSCurr2014;6.

Bošková V, et al. Inference of epidemiological dynamics based on simulated phylogeniesusingbirth-deathandcoalescentmodels.PLoSComputBiol2014;10:e1003913.

Nicolas Salamin Computational Phylogenetics GroupUniversity of Lausanne

Tanja Stadler Computational Evolution GroupD-BSSE,ETHZurich,Basel

Page 32: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

62 63

What do they do? Andreas Wagner and his team study the evolution and evolvability of biologicalsystemsatalllevelsofbiologicalorganization–fromgenesandgenomestobiologicalnetworksandwholeorganisms.Thegroupdevelopsbioinformatics tools for the integration of data from a variety of sources, such ascomparativewhole-genomesequencedata,microarrayexpressiondataandhigh-throughputproteininteractiondata.

2014 highlightsGene regulation is central to the development of all organisms, and dysregulationofgenesisbehindmanygeneticdiseases.Thegroupstudiestheprinciplesbywhichproteinscalledtranscriptionfactors(TF)bindDNAand regulate genes, in order to understand how changes in transcription factorbindingsitescreatenovelpatternsofgeneregulation.

The variation in genetic sequenceswhich determine the binding ofTF isan important facet of evolution.However, the degree towhich a genomeis robust or in other words able to withstand changes and how robustness affectsevolutionisunclear.

In a paper published in Science, the team investigated the empirical support formutationalrobustnessbyexaminingTFbindinginthemouseandyeastgenomes.Anetworkanalysis of thedegreeof variation revealed that thesiteswiththehighestaffinityforTFbindingexhibitthegreatesttoleranceformutations,whereaslow-affinitysitesexhibitgreatersensitivitytomutation.Thus, while mutational robustness and evolvability are antagonistic at the genotypiclevel,theyaresynergisticatthephenotypiclevel.

Selected publicationsPayneJL,WagnerA.Therobustnessandevolvabilityoftranscriptionfactorbindingsites.Science2014;343:875-7.

WagnerA.A genotype network reveals homoplastic cycles of convergentevolutionininfluenzaA(H3N2)evolution.ProcBiolSci2014;281:20132763.

SzövényiP,et al.Efficientpurgingofdeleteriousmutations inplantswithhaploidselfing.GenomeBiolEvol2014;6:1238-52.

What do they do? Daniel Wegmann and his group characterize the evolutionary and ecological processeswhichshapetherealmofbiologicaldiversityobservedtoday.Toachieve this, they design and evaluate new statistical and computational approaches to infer complex evolutionary histories, and apply them to the wealthofdatathatiscurrentlygenerated.

They focus on four themes of research: numerical algorithms for inference problems, inferring evolutionary histories, the evolution of morphological charactersandhumangeneticsanddisease.

2014 highlightsThe group develops and maintains ABCtoolbox, a versatile and user-friendly software package for Approximate Bayesian Computation (ABC). ABCis a numerical inference method that bypasses analytical evaluations of likelihood functions, and is hence particularly suited in situations in which thelikelihoodfunctioniseitherunknownorcannotbesolvednumerically.

The group has just finished developing an ABC algorithm suitable forinference problems of very high dimensionality, and which is available throughABCtoolbox.Thisalgorithmwillbeparticularlyusefulinthefieldofgenomics,wheremanyparametersareinferredforeachlocus.

As an example, the group is currently interested in understanding the strength of selection that acts upon each individual position in the genome of viruseswhenexposedtoanantiviraldrug.Sincethedemographichistoryofthe virus population is a confounding factor affecting all positions, Wegmann and his team would like to learn about that history while learning about the strength of selection at each individual locus. Thanks to the group’snewalgorithm,sucha joint inference isnowpossibleandcomputationallyefficient.

Selected publicationsAdrion JR, et al. Drosophila suzukii: the genetic footprint of a recent, world-wideinvasion.MolBiolEvol2014;31:3148-63.

Zawistowski M, et al. Analysis of rare variant population structure in Europeans explains differential stratification of gene-based tests. Eur JHumGenet2014;22:1137-44.

Dussex N, et al. Postglacial expansion and not human influence bestexplainsthepopulationstructureintheendangeredkea(Nestornotabilis).MolEcol2014;23:2193-209.

Andreas Wagner Evolutionary Systems Biology GroupUniversity of Zurich

Daniel Wegmann Statistical and Computational Evolutionary Biology GroupUniversityofFribourg

Biological macromolecules such as DNA and proteins acquire a specificarchitectureinspace.The3Dconformationtheyadoptisadirectconsequenceoftheirnucleicacidoraminoacidsequence,respectively.Aprotein’sfunctionisdefinedbyits3Dstructure.

Bioinformaticsdevelopssoftwarethat isable tomodelandpredictaprotein’s3Dstructure,andhencededuceitsprobablefunction.Suchtoolsareofgreatassistanceinthefieldofdrugdesign,forinstance.

Page 33: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

64 65

What do they do? SimonBernèche’sgroupusesmolecularmechanicsandstatisticalphysicsapproaches to understand how the functions of proteins emerge from their3Dstructure.Theyareespeciallyinterestedinthemechanismsthatregulate ion permeation in potassium channels and the resulting impact on neuronsignalling.

The group focuses on elementary mechanisms of protein folding, permeation and gating mechanisms in ion channels, transport mechanisms in passive andactivetransporters,andstochasticsimulationofmembraneactivity.

2014 highlightsThe relation between the amino acid (aa) sequence of a protein and its3Dstructureremains largelyunknown.A lastingdream is toelucidate theaa-dependent driving forces that govern the protein folding process, and understand the structural properties of intrinsically disordered proteins, whichplayfundamentalfunctionalrolesinhumanphysiology.

By comparing molecular dynamics (MD) simulations and NMR residualdipolarcoupling(RDC)data,thegroupstudiedthestructureanddynamicsof9-residuelongpeptidesinwhichonlythecentralresiduewaschanged.

The originality of their approach was to use the RDC data to validate the all-atom MD simulations by clustering the sampled conformations according totheirmatchingwiththeRDCdata.Thisallowedthemtoextractfromthesimulations the relevant microscopic information that provides a detailed viewofthefoldingmechanism.

At the atomic scale, the group showed that a single aa determines the local hydration level of the backbone. Bulkier aa prevent water molecules frominteracting properly with the backbone polar groups, favouring the formation of hydrogenbondsbetweenthesepolargroupsandthusthefoldingofthepeptide.

These findings are consistent with the general understanding that proteinfolding evolves from states in which the backbone polar groups are shielded fromthesolventandthuspreferablyinteracttogether.Forthefirsttime,suchamechanismwasshowntotakeplaceatthelevelofasingleaa.Thecalculationsrevealed how a short stretch of aa can act as a nucleus point from which folding isinitiated.Thislocalmechanismeffectivelyreducestheconformationalsearchspaceandexplainshowproteinfoldingcanbesoefficient.

Selected publicationsThomson AS, et al. Initial steps of inactivation at the K+ channel selectivity filter.ProcNatlAcadSciUSA2014;111:E1713-22.

What do they do? The group led by Matteo Dal Peraro strives to understand the physical and chemical properties of complex biological systems, in particular their functionwith regard to structure.To this end, they use and developmultiscale molecular simulation methods and dynamic integrative modelling to investigate large molecular assemblies, mimicking conditions found in the cellularenvironment.

2014 highlightsIn order to study the biophysics of biological membranes using models that closely mimic their real composition, the group has developed LipidBuilder, a web-based tool that allows the design of lipids of different nature and their assemblyintomodelsofcellularmembraneportions.http://lipidbuilder.epfl.ch

LipidBuildercapabilitiesgrewin2014withthepossibilityto:1) designandstorealargevarietyoflipidspecies;2) constructasymmetricmembranebilayersofdifferentcomposition,size

and shape; 3) useseveralmolecularforcefieldsformolecularsimulationsofbiological

systems.

This framework enabled the lab to build more realistic models of membrane compartments, like those of the mitochondrion, the endoplasmic reticulum, orthebacterialmembrane.

Another emerging line of research in the lab is the investigation of crowding effects on biomolecules. The cell is a crowded place, whose cytoplasmcontainsmyriadsofdifferentmolecules.Suchacrowdedenvironmentaffectsmolecular diffusion, and hence the molecular properties and events that dependonit,suchascollisionfrequenciesandsubcellularlocalization.ByusingmolecularsimulationsandNMRtechniques,thegroupisinvestigatingthe impact of crowding agents of different nature on the structure, dynamics andfunctionofproteins.

Selected publicationsSpiga E, et al. Dissecting the effects of concentrated carbohydrate solutions on protein diffusion, hydration and internal dynamics. J Phys Chem B2014;118:5310-21.

Lemmin T, et al. Perturbations of the straight transmembrane α-helicalstructure of the amyloid precursor protein affect its processing by γ-secretase.JBiolChem2014;289:6763-74.

Seitz P, et al.ComEAisessentialforthetransferofexternalDNAintotheperiplasm in naturally transformable Vibrio cholerae cells. PLoSGenet2014;10:e1004066.

Simon Bernèche Computational Biophysics GroupUniversity of Basel

Matteo Dal Peraro Laboratory for Biomolecular ModellingEPFL,Lausanne

What do they do? TheMolecularModellinggroupdevelopsandemploys techniques for thecomputer-aided rational design of proteins or small molecules, for research andtreatmentofhumandiseases,mostlyinthefieldofoncology.

2014 highlightsSwissDrugDesign is a large collection of tools covering all aspects of computer-aideddrugdesign(seealsop.16).Amongthem:1) SwissDockpredictsthemolecularinteractionsthatmayoccurbetween

atargetproteinandasmallmolecule.swissdock.ch2) SwissParam provides topology and parameters for the molecular

modellingofdrug-likemolecules.swissparam.ch3) SwissSidechaingathersinformationonhundredsofcommerciallyavailable

non-natural sidechains for in silicopeptidedesign.swisssidechain.ch4) SwissBioisostere collects several million molecular substructural

replacementsextractedfromtheliterature.swissbioisostere.ch

In2014,thegroupreleasedanewwebservice,SwissTargetPrediction,topredictthetargetsofbioactivesmallmolecules.Thisisusefultounderstandthe molecular mechanisms which underlie a given bioactivity, to rationalize possiblesideeffectsandtopredictoff-targets.swisstargetprediction.ch

The development of a new tool, SwissADME, was initiated to predict the physicochemical, PK and PD properties and drug likeness of real or virtual organiccompounds.

ThegroupisalsoactiveinthefieldofproteinengineeringandengineeredseveralT-cell receptors with enhanced binding properties for two melanoma-related antigens.T-cellsexpressingthisnewreceptorexhibitanincreasedabilitytokillcancercells.TheseT-cellreceptorsarenowbeingtestedinamousemodel.

TheteamisalsodevelopinginhibitorsofIDO,acancertarget.Theyobtainedseveralcompoundsshowingasignificantactivityinacancermousemodel.The group recently discovered new scaffolds of potential interest for the designofIDOinhibitors.

Selected publicationsGfeller D, et al. SwissTargetPrediction: a web server for target prediction of bioactivesmallmolecules.NucleicAcidsRes2014;42:W32-8.

Daina A, et al.iLOGP:asimple,robust,andefficientdescriptionofn-octanol/water partition coefficient for drugdesignusing theGB/SAapproach. JChemInfModel2014;54:3284-301.

RöhrigUF,etal. Detailed analysis and follow-up studies of a high-throughput screeningforindoleamine2,3-dioxygenase1(IDO1)inhibitors.EurJMedChem2014;84:284-301.

What do they do? Torsten Schwede and his team focus on the development of methods for modelling 3D protein structures and their applications in biomedical research.Themainemphasisisonhomologymodellingapproaches–usingevolutionaryinformationtomodelproteintertiaryandquaternarystructures,aswellasprotein-ligandinteractions.

2014 highlightsSWISS-MODEL is a widely used server for generating 3D models of aprotein(seealsop.16andswissmodel.expasy.org).Itusesinformationfromhomologous protein structures as templates to build models for target protein sequences.Overthecourseofevolution,thequaternarystructureofproteinsis lessconserved than their tertiarystructure.Therefore,even in thesameprotein family,a rangeofdifferentoligomericassembliescan frequentlybeobserved, which poses a challenge for the modelling and prediction of protein structures. The team has developed an improved approach to predict theoligomericstructureoftargetproteins.

TheaccuracyofthisnewapproachisbeingevaluatedbyCAMEO,aprojectdeveloped by the group to benchmark the accuracy of protein structure predictionservers.cameo3d.org

Homology models find widespread applications in biomedical researchwhere experimental structures are not available. The suitability of amodel for a specific application depends on its accuracy, and model

qualityestimation isanessential step instructureprediction.To thisend,the group is developing QMEANBrane as a statistical potential of meanforce.Mostcurrentapproaches forestimatingmodelqualityare limited tosolubleproteins.However,asmembraneproteinsplaycrucialrolesinmanybiological processes and are important drug targets, the group has extended theirQMEANapproachtomembraneproteinstructures.

TorstenSchwede’steamisdevelopingtheProteinModelPortalaspartoftheStructuralBiologyKnowledgebase.proteinmodelportal.org

TorstenSchwede isalso responsible for thesciCOREcenter (seep.20),and for organizing the SIB’s annual [BC]2 Basel Computational Biology Conference.bc2.ch

Selected publicationsBiasini M, et al. SWISS-MODEL: modelling protein tertiary and quaternarystructureusingevolutionaryinformation.NucleicAcidsRes2014;42:W252-8.

Studer G, et al.Assessing the local structural quality of transmembraneproteinmodelsusingstatisticalpotentials(QMEANBrane).Bioinformatics2014;30:i505-11.

Schmidt T, et al. Modelling three-dimensional protein structures for applicationsindrugdesign.DrugDiscovToday2014;19:890-7.

Olivier Michielin & Vincent Zoete Molecular Modelling GroupUniversity of Lausanne

Torsten SchwedeComputational Structural Biology GroupUniversity of Basel

Page 34: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

66 67

No biological macromolecule – nor living being – works on its own butinteractswithmanyotherswhich, in turn, interactwithothers.This is thefieldofsystemsbiology.

Bioinformatics develops mathematical models that can illustrate such systems and even address their evolution in time. Such tools can help to delineatemetabolic pathways, for instance, or predict what could happen if a given species isintroducedintoapre-existingecologicalsystem.

What do they do? Bastien Chopard and his team model and simulate complex systems and natural phenomena. They focus on multiscale modelling and computing,high performance computing, cellular automata, lattice Botzmann methods, multi-agentsystems,andoptimizingtechniquesandmachinelearning.

The group works on massively parallel computers, like the Scylla and Baobab clustersattheUniversityofGeneva,ortheCADMOSsupercomputer.Theyalso develop resources needed to perform simulations of large natural systems.

Finally, Bastien Chopard’s group develops the Lattice Boltzmann solveron massively parallel machines, as well as a multiscale modelling and computinginfrastructure(MAPPERFP7project).

2014 highlightsDuring the course of 2014, the group developed a numerical model ofepitheliumtostudyitselasticproperties.

They also provided a physical description of the adhesion and aggregation of platelets in the presence of shear induced diffusion, and determined the adhesion and aggregation rate of platelets by comparing experimental and numericalsimulations.

Finally,theydeterminedthethresholdofwallshearstressinaneurysmstodetermine whether cerebral aneurysms in given patients will thrombose or not.

Selected publicationsMontandon S, et al. Two waves of anisotropic growth generate enlarged folliclesinthespinymouse.EvoDevo2014;5:33.

Florez-Valencia L, et al. A new model for the construction of virtual fully resolved flow-diverters and their effect in blood flow simulations forstudyingintracranialaneurysms.Submitted.

AnzaiH,et al. Combinational optimization of strut placement for intracranial stentusingarealisticaneurysm.JFlowControl2014;2:66-76.

What do they do? The group led by Manfred Claassen carries out research to understand the composition of heterogeneous cell populations and their function in the contextofcancerandimmunebiology.Theirresearchcanthenbeusedtopinpointtherapeutictargetswithaperspectivetodesigningdrugs.

2014 highlightsManfredClaassen’slabdevelopsstatistics,machinelearninganddynamicsystems methods to describe the cellular dynamics and functionally relevant cell types in cancer and immunebiology.Theseapproachesareimplemented as program prototypes and ideally further refined to widelyapplicablesoftware.

In2014,thegrouphasdevelopedanapproachtodescribeheterogeneouscell populations such as tissues by means of cell-transcending networks that make explicit the differentiation-induced relationships among related celltypes.Thisapproach,forinstance,facilitatesthedetectionofrareandyetdiseaserelevantcelltypes,suchasstemcells.

Selected publicationsArvanitiE,ClaassenM.Markovnetworkstructurelearningviaensemble-of-forestsmodels.UncertainArtifIntell2014;2014:42-51.

DeVargasRoditiL,ClaassenM.Computationalandexperimentalsinglecellbiology techniques for thedefinitionofcell typeheterogeneity, interplayandintracellulardynamics.CurrOpinBiotechnol2014;34C:9-15.

Bastien Chopard ScientificandParallelComputingGroupUniversity of Geneva

Manfred Claassen Computational Single Cell Biology GroupETHZurich

Page 35: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

68 69

What do they do? The group led by Rudi Gunawan develops mathematical tools for systems modellingandanalysisofchemicalandbiologicalnetworks.Theirresearchspans many aspects of biology, from gene and metabolic networks in single cells to the ageing process in humans and animals and to bioreactors of cell culturefermentationinthepharmaceuticalindustry.

2014 highlightsGunawanandhisteamdevelopedTRaCE(TransitiveReductionandClosureEstimation) as a novel inference method for gene regulatory networks(GRNs)fromexpressiondata.TheydesignedTRaCEtospecificallyaddressundeterminedGRNinferencebyemployinganensembleinferencestrategy.TRaCE produces two networks – themost complicated (largest) and theleast complicated (smallest) – among a sumof networks that agreewithexpression data from gene perturbation experiments. The ensemble ofconsistent networks can be derived from these bounds, and the difference between the largest and the smallest networks describes the uncertainty in thenetworkinference.TRaCEoutperformedtopalgorithmsinDREAM4andDREAM5 network inference challenges in inferringGRN frommulti-geneknock-outdata.

The group also developed a new assessment strategy for comparing network inference performance based on an ensemble inference framework, byaddressingthequestion:howtocompare inferencemethodswhentheinferenceproblemisunderdeterminedand,thus,doesnothavea(unique)solution? In the new assessment, methods are not penalized for errors(false positive/false negative) which are associated with gene regulatoryinteractions that are not inferable from gene perturbation data. Theinferability of regulatory interactions from a given perturbation experiment wasdeterminedusingTRaCE.

Thegroupalsodevelopedanew tool,REDEMPTION, for the creationofkineticmodelsofbiologicalnetworks.

Selected publicationsSiegenthaler C, GunawanR.Assessment of network inferencemethods:howtocopewithanunderdeterminedproblem.PLoSOne2014;9:e90481.

Ud-Dean SM, Gunawan R. Ensemble inference and inferability of generegulatorynetworks.PLoSOne2014;9:e103812.

Liu Y, Gunawan R. Parameter estimation of dynamic biological networkmodelsusingintegratedfluxes.BMCSystBiol2014;8:127.

What do they do? The team concentrates on the mathematical modelling of complex cellular processes and develops computational methods for the integration of experimental information from multiple levels. The aim is to provideexperimentally testable hypotheses and targets for purposeful redesign and manipulationoftheseprocesses.

2014 highlightsBNICEisacomputationalframeworkfortheidentificationofnovelmetabolicreactionsandpathways.Itusesadatabaseofover580generalizedenzymerules, and employs a pathway generation algorithm for the generation of all possible routes from known metabolic intermediates to target natural and man-made chemicals. These routes are composed of generalizedenzymereactions.Hence,anyofthenovelreactionsintheseroutescanbeassociatedwithaclassofknownenzymes,whichcatalysesimilarreactions.BNICE enables researchers to explore the potential of nature’s chemicaltoolbox,aswellasdiscoverandexploititsdiversity.

In 2014, using the principles from BNICE, the group introduced acomputational framework for the atom-level reconstruction of metabolic networks from in silico labelled substrates, which allows the tracking of theatoms’ fate through the reconstructedmetabolicnetwork.Themethodwas applied for the reconstruction of an atom-level representation of core metabolic networks of E. coli.

Starting from the known biochemistry of E. coli metabolism, the team usedBNICE tosearch fornovel reactions. Indeed,BNICEapplies knownbiotransformation rules to generate a “super” network which capturesall possible reactions, given a set of E. coli core metabolites and known biotransformation rules present in E. coli.Thissupernetworkcapturesalltheknown E. coli reactions as well as novel pathways that can serve as potential novelbiosynthesispathwaystovaluablechemicals.

The group also develops ORACLE, a framework for the development oflarge-scale and towards-genome-scale kinetic models that account for uncertaintyinavailableexperimentaldata.

Selected publicationsHadadiN,et al. A computational framework for integration of lipidomics data intometabolicpathways.MetabEng2014;23:1-8.

AlmquistJ,et al.Kineticmodelsinindustrialbiotechnology–improvingcellperformance.MetabEng2014;24:38-60.

SohKC,HatzimanikatisV. Constrainingthefluxspaceusingthermodynamicsandintegrationofmetabolomicsdata.MethodsMolBiol2014;1191:49-63.

Rudiyanto Gunawan Chemical and Biological Systems Engineering Group/ETHZurich

Vassily Hatzimanikatis Laboratory of Computational Systems BiotechnologyEPFL,Lausanne

What do they do? Dagmar Iber and her team focus on the development of data-based models of biological signalling networks to gain a greater understanding of the dynamicsandevolutionofcellularsignalling.Theexperimentallyvalidatedmodels are used to investigate the biological system, as well as to address moregeneralquestions regarding theevolutionand thedesignofcellularsignallingnetworks.Thegroupisparticularlyinterestedinthemechanismsthat enable the spatiotemporal control of developmental processes such as organdevelopmentinthegrowingembryo.

2014 highlightsThe Iber group focuses on key processes in developmental biology.Highlights from the last year include the delineation of a number ofdevelopmentalmechanisms.Theseinclude:1) amechanismbywhichdevelopmentalpatternscanscaleaccordingto

thesizeofa(growing)embryonicdomain;2) themechanismbywhichdigitasymmetryislostinbovinelimbs,butnotin

murine or human limbs;3) areceptor-ligandbasedmechanismthatallowsTuringpatternstoemerge

forvirtuallyanyphysiologicalparametervalue.

In addition, Dagmar Iber and her team developed a computational framework to simulatemodels of organogenesis on growing embryonic domains. Toenable the coupled simulation of signalling and tissue dynamics, the group also developed LBIBCell, a tightly coupled cell-based mechano-regulatory simulation tool.A cellular resolution of the tissue domain is important todescribeadequatelytheimpactofcell-basedevents,suchascelldivision,cell-cellinteractionsandspatiallyrestrictedsignallingevents.

Selected publicationsMenshykau D, et al. An interplay of geometry and signaling enables robust lungbranchingmorphogenesis.Development2014;141:4526-36.

Fried P, Iber D. Dynamic scaling of morphogen gradients on growingdomains.NatCommun2014;5:5077.

Lopez-Rios J, et al.AttenuatedsensingofSHHbyPtch1underliesevolutionofbovinelimbs.Nature2014;511:46-51.

What do they do? Christian Mazza and his team study biological networks by focusing both on their geometrical structure (graphs, patterns) and on their underlyingdynamics (deterministic and stochastic). Typical examples are Lotka-Volterra dynamics on complex ecological networks and formation of the plantvascularsystem.

The group also provides services in statistics for the life science community inFribourgand isworkingonseveralprojects related tochemical reactionnetworks.

2014 highlightsChristian Mazza published a book with a fellow colleague, Michel Benaïm, Professor of mathematics at the University of Neuchâtel: Stochastic Dynamics for Systems Biology (CRC Press). The book describes asystematic study of the many varied stochastic models used in systems biology today, and shows how mathematical models can lead to conceptual insights on cellular processes, such as signalling and metabolic pathways, phosphorylationprocesses,geneticswitchesandgenetranscription.

Selected publicationsFellerC,et al.Patternformationinauxinflux.JMathBiol2014;68:879-909.MazzaC,BanaïmM. Stochasticdynamicsforsystemsbiology.CRCPress,Taylor&FrancisGroup,2014.

Dagmar Iber Computational Biology GroupD-BSSE,ETHZurich,Basel

Christian Mazza Biomathematics and Computational Biology GroupUniversityofFribourg

Page 36: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

70 71

What do they do? The laboratory led by Michel Milinkovitch combines evolutionary development biology(EvoDevo)and thestudyofphysicalprocesses tounderstand themechanismswhichgeneratecomplexityanddiversityinthelivingworld.

The group specializes in non-classical model species in reptiles and mammals and integrates data and analyses from comparative genomics, molecular developmental genetics, physical experiments, as well as from computermodellingandnumericalsimulations.

Mostofthegroup’sresearchrequiresintensivehigh-performancecomputingsuchasthecomparativeanalysisofgenomesequencesamongspecies,thereconstruction of super-high resolution coloured 3D geometry of biological objects,andthesimulationoftheinteractionbetweenskincellsandphotonsforthegenerationofstructuralcolours.

2014 highlightsIn 2014, Milinkovitch’s team finalized the development of R2OBBIE-3D,an integrated system that combines a robotic arm, a high-resolution digital colour camera, an illumination basket of high-intensity light-emitting diodes andstate-of-the-art3Dreconstructionapproaches.

They demonstrated that R2OBBIE generates accurate 3D models ofbiologicalobjectsbetween1and100cm,withcolour-textureandgeometricresolutions greater than 15 µm without the use of magnifying lenses.R2OBBIE has the potential to greatly improve quantitative analyses ofpattern formation in living organisms in addition to providing multiple new applications in biomedical and forensic sciences. The results are to bepublishedthisyear.

Selected publicationsMontandon SA, et al. Two waves of anisotropic growth generate enlarged folliclesinthespinymouse.EvoDevo2014;5:33.

MartinsAF, et al. R2OBBIE-3D, a fast robotic high-resolution system forquantitativephenotypingofsurfacegeometryandcolour-texture.Inpress.

Teyssier J, et al.Photoniccrystalscauseactivecolourchangeinchameleons.NatCommun2015;6:6368.

What do they do? FélixNaef’slabworksinthefieldofcomputationalandsystemsbiology,intheareasofgeneregulation,stochastictranscriptionandcircadianrhythms.While their research is largely computational, they started an independent wet lab working on circadian gene expression and stochastic transcription usingimaginginsinglecells.

2014 highlightsCircadian and cell cycles are two periodic, noisy, cell-autonomous processes with a period of about one day. Consequently, when these cycles run inparallel in the same cell, their coupling may lead to resonances or even synchronization. Observations of circadian variations inmitotic indices inmammalian cells and on the daytime-dependence of cell division in mouse liver have led to the hypothesis that the circadian cycle might gate cell-cycle progression.Abetterunderstandingofhowthetwosystemsmutuallyinteractis currently of great interest, notably with regard to the role of circadian clocksinproliferatingtissues,suchastheepidermis,immuneorstemcells.

As part of the group’s ongoing analyses of circadian cycles andtranscriptional bursting in singlemammalian cells, Félix Naef’s team hasbecome very interested in the interactions of circadian and cell cycle oscillators. In particular, they hope to gain a better understanding of thedynamic consequences of possible mutual coupling between the twooscillators, whichmay lead to resonance or synchronization phenomena.The group developed a combined experimental and modelling approach to investigatetheproblem.

Thegroup’srecentquantitativetime-lapseimagingstudyofcircadiancyclesindividingmammalianNIH3T3cells clearly indicated thatbothoscillatorstick in a tightly synchronized state. Contrary to their expectations, theyshowedthat,inNIH3T3cells,thecellcycleprogressionexertsaunilateralinfluenceonthecircadianclock,andnottheopposite.

Selected publicationsBieler J, et al. Robust synchronization of coupled circadian and cell cycle oscillatorsinsinglemammaliancells.MolSystBiol2014;10:739.

Mauvoisin D, et al. Circadian clock-dependent and -independent rhythmic proteomes implementdistinctdiurnal functions inmouse liver.ProcNatlAcadSciUSA2014;111:167-72.

Quinodoz M, et al.CharacteristicbimodalprofilesofRNApolymeraseIIatthousandsofactivemammalianpromoters.GenomeBiol2014;15:R85.

Michel Milinkovitch Artificial&NaturalEvolutionaryDevelopmentof Complexity Group / University of Geneva

Félix NaefComputational Systems BiologyEPFL,Lausanne

What do they do? IgorPivkinandhisteam’sinterestslieintheareaofmultiscale/multiphysicsmodelling, and parallel large-scale simulations of biological and physical systems.Their focus isonthedevelopmentofnewcomputationalmodelsand corresponding numerical methods suitable for the next generation of supercomputers.

The group’s current projects include stochastic multiscale modelling ofmotion, interaction, deformation and aggregation of cells under physiological flow conditions, biofilm growth, coarse grained molecular dynamicssimulations, as well as modelling of transport processes in healthy and tumour-inducedmicrocirculation.

2014 highlightsElectrostatic and polarization effects play an important role in many biologicalandphysicalsystems, includingmembranes,vesiclesandcells.The modelling of such effects on large length and time scales is of great interesttomanyscientificcommunities–frombiophysicstobioengineeringandappliedmathematicstonamebutthree.

Dissipative Particle Dynamics (DPD) is an efficientmethod formodellingmesoscale behaviour of systems at timescales larger than are currently accessible by atomistic methods. Polarizability has been introduced intosomecoarse-grainedmoleculardynamicssimulationmethods.However,ithasnotbeendonewithDPDbefore.

The group developed a new polarizable coarse-grained water model for DPD, whichemployslong-rangeelectrostaticsandDrudeoscillators.Theyexpecttheir model to be used in a wide range of studies of physical and biological systems,wherepolarizationeffectsareimportantandnotnegligible.

Selected publicationsPeterEK,PivkinIV.Apolarizablecoarse-grainedwatermodelfordissipativeparticledynamics.JChemPhys2014;141:164506.

Peter EK, et al.HowwaterlayersongrapheneaffectfoldingandadsorptionofTrpZip2.JChemPhys2014;141:22D511.

Cruz-Chu E, et al.Structureandresponse toflowof theglycocalyx layer.BiophysJ2014;106:232-43.

What do they do? Jörg Stelling‘s group studies complex cellular networks to elucidate their design and operational principles. This involves the development ofconcepts and tools for network inference, system modelling and analysis, andthedesignofbiologicalcircuitswithnewfunctionalities.Bioinformaticsmethodsareusedforthelarge-scaleanalysisofcellularnetworks.

2014 highlightsWith Vital-IT, the group developed the web resource MetaNetX.org for accessing, analysing and manipulating genome-scale metabolic networks (GSMs) and biochemical pathways. This resource integrates data fromvarious public resources in a standardized format. The methods enableresearcherstoconstructandanalyseGSMsefficiently.

With regard to metabolic network construction and analysis methods, progress in 2014 included the development of a novel pattern-basedapproachforinferringfeaturesofmetabolicreactions.

Identifying suitable patterns in complex biological interaction networks helps in understanding network functions and allows for predictions at the patternlevel.Byrecognizingaknownpattern,onecanassignitspreviouslyestablished function. However, former approaches failed for previouslyunseen patterns, when patterns overlap, or when they are embedded into a newnetworkcontext.

The teamshowedhow toconceptuallyextendpattern-basedapproaches.Aprobabilisticframeworkdecodestheimplicit informationinthenetworks’metabolite patterns to predict metabolic functions. They demonstratedthe predictive power by identifying “indicator patterns”; e.g. for enzymeclassification,bypredictingdirectionsofnovelreactionsandknownreactionsin new network contexts, and by ranking candidate network extensions for fillinggapsinthenetwork.

Such methods can be used to improve genome annotations as well as metabolic network models. Other theoretical approaches for themodularization of biological networks, in view of experimental design for topology inference for example, will help to exploit structural network propertiesbeyondmetabolism.

Selected publicationsAusländer D, et al.AsyntheticmultifunctionalmammalianpHsensorandCO2transgene-controldevice.MolCell2014;55:397-408.

Lang M, et al. Cutting the wires: modularization of cellular networks for experimentaldesign.BiophysJ2014;106:321.

Ganter M, et al. Predicting network functions with nested patterns. NatCommun2014;5:3006.

Igor V. Pivkin ScientificComputingGroupUniversità della Svizzera italiana, Lugano

Jörg Stelling Computational Systems Biology GroupD-BSSE,ETHZurich,Basel

Page 37: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

72 73

What do they do? Mihaela Zavolan and her team combine experimental and computational approaches to study the regulation of gene expression in development, ageinganddisease.

They develop:1) experimentalmethodstoisolateandquantifyvarioustypesofRNAs;2) computationalmethodstoanalysedataandmaketheresultsaccessible

to experimental biologists through web servers;3) computationalmethodstopredicttheregulatoryelementsingenomes;4) computationalmodelstounderstandthedynamicsofgeneregulationin

variousconditions.

2014 highlightsMihaelaZavolan’sgroupisoneofthefirsttodevelopcombined,experimentaland computational approaches to capture regulatory interactions within cells,inparticularbetweenproteinsandRNAs.

In this regard, they made substantial progress in characterizing the process ofthe3’endformationofmRNAs.AlmostallcellularmRNAsrelyonaspecificprotein complex for their termination. Most human genes can terminatetranscription at multiple places, a process called alternative termination.

In2014,thegroupdescribedthesequence-specificcomponentofthe3’endprocessingcomplex,whichhadremainedelusiveforover20years.Bycarryingout high-throughput measurements of transcript and protein expression levels, the group further discovered that alternative polyadenylation does not serve to systematically increase or decrease gene expression, but rather that the effectsaregene-specific.Finally,thegroupbuiltacomprehensivecatalogueof3’endprocessingsitesinthehumangenome.

Anotherofthegroup’smainprojectsconcernsgeneregulationbythesmallregulatoryRNAscalledmiRNAs.In2014,thegroupdevelopedanewmethodtopredictmiRNAtargetsitesgenome-wideand iscurrentlyextending themethodtoothersmallRNAs.TheyalsocontributedtothedevelopmentofanovelexperimentalmethodforisolatingmiRNAbindingsitesin vivo.

Selected publicationsImig J, et al.miR-CLIPcaptureofamiRNAtargetomeuncoversalincRNAH19-miR-106ainteraction.NatChemBiol2015;11:107-14.

Gruber AR, et al.Global3’UTRshorteninghasa limitedeffectonproteinabundanceinproliferatingTcells.NatCommun2014;5:5465.

Hausser J, ZavolanM. Identification and consequences of miRNA-targetinteractions--beyond repression of gene expression. Nat Rev Genet2014;15:599-612.

Mihaela Zavolan RNARegulatoryNetworksGroupUniversity of Basel

Astechnologydevelops,thequantityofdatageneratedbyresearchersgrowsandneedstobenotonlystoredbutprocessed.Lifescientistsneedthehelpofbioinformaticianstodothis.

Consequently, academic institutions and research centres are graduallydeveloping their own infrastructures that provide computational facilities for their researchers and develop software and databases, besides providing a link with industryandofferingtraining.

Page 38: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

74 75

What do they do? The S3IT unit led by Peter Kunszt provides IT infrastructure, software, tools and services to the research groups of the University of Zurich, including lifesciencesandmedicine(seealsop.21).S3ITalsotakespartinnationalprojectsandcooperateswithsimilartechnology-orientedgroupstoensurethatitsexpertiseisalwaysup-to-date.

2014 highlightsThegroupisaninfrastructureandserviceprovidergroup.Their“resource”is the local cluster and storage infrastructure, including scientists with bioinformaticsexpertisewhotheyprovidetoresearchgroupsas“embeddedbioinformaticians”.

The resource is a specialized application and user support; a collaborative model, where IT experts from S3IT work closely with end users to address their computationalandstorageneeds.Tothisend,S3ITprovidesaccesstoalarge-scaleprivatecloudinfrastructureforstoringandprocessingresearchdata.

In only one year, the team has been able to establish successful collaborations withover130endusersin45researchgroupsfrom22differentdepartmentsattheUniversityofZurich.Inparticular,S3ITtookpartintwoprojectsthatare

highly specialized in 3D imaging, making use of new Light Sheet Microscopy technology, and developed 3D segmentation and genealogy tracing algorithms intheseprojects.

Withregardtoinfrastructure,PeterKunsztandhisteamfinishedthepublicprocurementprocess for their localScienceCloud infrastructure.Thenewinfrastructure– tobe installedduring thecourseof2015–willhaveover3,000CPUcoresand1.5PBofusablestorage.

Selected publicationsWiewiórka MS, et al. SparkSeq: fast, scalable, cloud-ready tool for theinteractivegenomicdataanalysiswithnucleotideprecision.Bioinformatics2014;30:2652-3.

Quandt A, et al.Usingsyntheticpeptidestobenchmarkpeptideidentificationsoftware and search parameters forMS/MSdata analysis. EuPAOpenProteom2014;5:21-31.

RostH,et al.pyOpenMS:APython-basedinterfacetotheOpenMSmass-spectrometryalgorithmlibrary.Proteomics2014;14:74-7.

What do they do? TheSISgroupprovidesservicesandresourcesforscientificcomputingincomputing- and data-intensive fields of research, with a strong focus onprovidingservicesforlifescienceresearch(seealsop.21).

SIS operates high-performance computing (HPC) clusters, providessolutions for research data management and develops data analysis workflows.Thegroupalso offers platforms for sharing information, andprovides training and documentation for selected topics of scientificprogrammingandsoftwarepackages.

2014 highlightsSISprovidesimportantHPCresourcesdevelopedwithafocusonflexibilityandusability.Usershaveaccesstoalargeportfolioofcompilers,libraries,open-sourceandcommercialanalysisapplications.

In2014,phase1ofthenewgeneralpurposeHPCclusterEulerwithover10,000CPUcores,over400TBof fast,scalablediskspace,andapeakperformanceofabout250TFlopswassuccessfullyestablishedandbroughtinto production. It features a classical user interface for batch queuing,special support for very long-running jobs, and support for “computingclouds”.Thecloudinterfacebringsuserflexibilitytoanewlevelasitallows

running any platform and operating system usersmight need. The needfor computing power continues to increase strongly, which is why SIS immediatelystartedwiththeplanning,designandprocurementofphase2.

The scientific workflowmanager “BeeWM”, developed together with theUniversityofBaselandtheTargetInfectXproject,reachedausablestatein2014.Thissystemis“data-driven”bydesignandhasadvancedfeaturestosupportselectivere-computationsofpartsoftheworkflowwhenworkflownodesarechanging,orwhennewdatabecomeavailable.ItusesopenBISas a data management backbone and will be shipped with the next release version of openBIS. BeeWM is already being used in several projectswhichrequiresupport forheavycomputationalworkflows in imagingandgenomics, enabling both a high level of computational efficiency andreproducibilityofresults.

Selected publicationsRämö P, et al. Simultaneous analysis of large-scale RNAi screens forpathogenentry.BMCGenomics2014;15:1162.

Osterwalder M, et al. HAND2 targets define a network of transcriptionalregulators that compartmentalize the early limb budmesenchyme.DevCell2014;31:345-57.

Bernd Rinn ScientificITServices–SISETHZurich

Peter Kunszt ServiceandSupportforScienceIT–S3ITUniversity of Zurich

What do they do? TheBBCFofferssupportfortheanalysisofhigh-throughputgenomicdataforlifescienceresearchersintheGeneva-Lausannearea(seealsop.22).The group develops tools and methods to facilitate, automate and extend the analyses. In particular, it focuses on all aspects of gene regulation,functionalgenomicsandtheanalysisofDNAsequencingdata.

2014 highlightsTheexpansionofhigh-throughputsequencingtechnologiesforcesbiologistsand bioinformaticians to find efficient solutions for managing and storingtheir data. BioRepo (Biological data Repository) addresses those needsby allowing the storage, management and sharing of genomic data among collaborators, but also by facilitating visualization in publicly available genomebrowsers.

BioRepo is a web-based application developed in Python, SQL and Javascript,andrunsontheVital-ITinfrastructure.ItmanagesthearchivingandretrievaloflargefilesfromtheVital-IThierarchicalfilesystem,andusesShibboleth as a Swiss-wide authentication system to identify users and theirhost institution.Althoughthere isasingleback-endsystem,theuserinterfaceispersonalizedforeverylabviasimpleconfigurationfilesthatcanbeeditedatanytime.

In2014,majorusabilityimprovementsweremadewiththeimplementationof an Application Programming Interface, of a Dropbox-like loading system and bridging to the UCSC genome browser by interactively building track hubs.Thesystemcurrentlyholdsover5Tbofdatafrom6differentlabsatEPFLandUniversityofGeneva.http://biorepo.epfl.ch/

In 2014, Rougemont’s team also upgraded and extended their high-throughputsequencingdataanalysisplatform.http://htsstation.epfl.ch

Selected publicationsWoltering JM, et al. Conservation and divergence of regulatory strategies at Hoxlociandtheoriginoftetrapoddigits.PLoSBiol2014;12:e1001773.

Dos Santos AX, et al.Systematiclipidomicanalysisofyeastproteinkinaseand phosphatase mutants reveals novel insights into regulation of lipid homeostasis.MolBiolCell2014;25:3234-46.

Knight B, et al. Two distinct promoter architectures centered on dynamic nucleosomes control ribosomal protein gene transcription. Genes Dev2014;28:1695-709.

Jacques Rougemont BioinformaticsandBiostatisticsCoreFacility–BBCFEPFL,Lausanne

What do they do? Vital-IT maintains a competence centre in bioinformatics and computational biology(seealsop.19).Thegroupenablesandsupportslifeandmedicalscience research in multiple domains, such as behaviour, ecology, genetics, genomics, metagenomics, pharmacodynamics, phylogeny, population genetics,oncology,proteomics,structuralbiology,systemsbiology.

2014 highlightsScientificsupportandcollaborationsThe services and support were provided to various collaborators in Swiss universities and international research groups through joint researchprojects,suchas:1) thedevelopmentof theOpenFludatabasewhich, incollaborationwith

theFoodandAgricultureOrganization(FAO),enablesthemonitoringoftheflupandemicwithgenomicinformation.In2015,SIBwasdesignatedasthebioinformaticsreferencecentreforFAO.

2) theoptimizationofsoftwareenablingheavycomputationalcalculationson the high-performance computer to search for new genes related to the symptomsofAlzheimer’s disease, in the context of theEuropeanprojectAgedBrainSysBio.

www.vital-it.ch/projects/project_list.php

Computational infrastructureTomeettheusers’needs,andremainup-to-datewiththenewgenerationofcomputers,Vital-ITexpertsmaintainanddevelopnewcomputeralgorithms.Forexample,Vital-IToptimizedtheparallelcomputingoftheFastepistasissoftwarethatisusedtosearchforbiomarkersinAlzheimer’sdisease.

EducationGiventhegrowingneedtotrainhighlyqualifiedpersonnelandtodevelop,coordinate and maintain high-performance level computational resources enabling (big) large-scale data analysis andmanagement, Vital-IT’s poolof experts is deeply involved in the continued education of researchers, in particularviaSIB’strainingprogramme.

ELIXIRThrough its active participation in the European programme ELIXIR, Vital-IT along with all the SIB groups is contributing to the development of a sustainablesupportforlifeandmedicalscienceresearchinEurope.

Selected publications Genolet R, et al.DualityofthemurineCD8compartment.ProcNatlAcadSciUSA2014;111:E1007-15.

HauserPM,et al.Microbiotapresentincysticfibrosislungsasrevealedbywholegenomesequencing.PLoSOne2014;9:e90934.

Seguín-Estévez Q, et al. Extensive remodeling of DC function by rapid maturation-induced transcriptional silencing. Nucleic Acids Res2014;1:9641-55.

Ioannis Xenarios Vital-IT Group

Page 39: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

77

• TheSwissgovernmentandinparticular: - TheStateSecretariatforEducation,ResearchandInnovation(SERI) - TheSwissNationalScienceFoundation(SNSF) - TheCommissionforTechnologyandInnovation(CTI)• Ourinstitutionalmembers(seelistpp.10-11)• TheNationalInstitutesofHealth(NIH)• SystemsX.ch• Oncosuisse• TheCogitoFoundation• TheForceFoundation• TheSolidarImmunFoundation• SophiaGeneticsSA• GeneBioSA

The SIB Fellowship programme continues, thanks to the generous support of:• TheUniversityofGeneva• TheUniversityofLausanne• TheUniversityofZurich• TheLeenaardsFoundation• TheR.GeigyFoundation• TheSwissFoundationforExcellenceandTalentinBiomedicalResearch• SystemsX.ch

We further thank all our industrial and academic partners who trust SIB’s expertise.

We gratefully acknowledge the following funders, sponsors and partners for their financialsupportandencouragementinhelpingusfulfilourmissionin2014:

Page 40: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

©2015–SIBSwissInstituteofBioinformaticsSIBprofile2015writtenbytheSIBCommunicationsDepartment,withcontributionsbySIBMembers.

SIBGroupLeaders’pictures:NicolasRighetti,rezo.chCoverimage:Artist’sviewofhowDNAsupercoilingstimulatesenhancer-promoterinteraction,byJulienDorier,Vital-IT,SIB.

Design and layout: Atelier Valenthier, valenthier.ch

Page 41: SIB Profile 2015 - Swiss Institute of Bioinformatics · 2015-12-14 · platforms, bioinformatics allows the storage and analysis of “big data” produced by experimentation. These

SIB | Swiss Institute of BioinformaticsQuartier SorgeBâtimentGénopodeCH-1015LausanneSwitzerlandt+41216924050f+41216924055

www.isb-sib.ch

SWISS WORLD-CLASS BIOINFORMATICS

FOR MEDICINE AND LIFE SCIENCES