Cloud Computing in Life Sciences RD - Insight Pharma active in providing cloud services and related software. ... virtual computing, ... Cloud Computing in Life Sciences RD. applICaTIonS, . Of-

Embed Size (px)

Text of Cloud Computing in Life Sciences RD - Insight Pharma active in providing cloud services and...

  • Cloud Computing in Life Sciences R&


    Cloud Computing in Life Sciences R&D

    Ken Rubenstein, PhD

    April 2010

  • Reproduction prohibited i

    Cloud Computing in Life Sciences R&D

    by Ken Rubenstein, PhD

    Published in April 2010 by Cambridge Healthtech Institute

  • ii Reproduction prohibited

    Insight Pharma Reports is a division of Cambridge Healthtech Institute, a world leader in life science informa-tion and analysis through conferences, research reports, and targeted publications. Insight Pharma Reports focus on pharmaceutical R&Dthe technologies, the companies, the markets, and the strategic business impacts. They regularly feature interviews with key opinion leaders; surveys of the activities, views, and plans of individuals in industry and nonprofit research; and substantive assessments of technologies and markets. Managers at the top 50 pharma companies, the top 100 biopharma companies, and the top 50 vendors of tools and services rely on Insight Pharma Reports as a trusted source of balanced and timely information.

    Related Reports

    Next-Generation Sequencing: Solving the Genomeby Ken Rubenstein, PhD

    Bioinformatics and Computational Biology: Bottlenecks and Optionsby K. John Morrow, Jr., PhD

    General Manager: Alfred R. Doig, Jr. 781-972-1348,

    Editorial Operations Director: Laurie Sullivan


    Design Director: Tom Norton


    Production Director: Ann Handy


    Marketing Manager: James Prudhomme


    Customer Service: Rose LaRaia


    Global Licenses: Jack Valeri


    Corporate Subscriptions: David Cunningham


    Insight Pharma Reports, 250 First Ave., Suite 300, Needham, MA

  • Reproduction prohibited iii

    Cloud Computing in Life Sciences R&D

    by Ken Rubenstein, PhD

    A Cambridge Healthtech Institute publication 2010 by Cambridge Healthtech Institute (CHI). This report cannot be duplicated without prior written permission from CHI.

    Every effort is made to ensure the accuracy of the information presented in Insight Pharma Reports. Much of this information comes from public sources or directly from company representatives. We do not assume any liability for the accuracy or completeness of this information or for the opinions presented.

    Cambridge Healthtech Institute, 250 First Ave., Suite 300, Needham, MA 02494 Phone: 781-972-5444 Fax: 781-972-5425

    About the Author

    Ken Rubenstein, PhD, a biochemist and molecular biologist, received his PhD at the University of Wisconsin and postdoctoral training at the University of Pennsylvania School of Medicine. He was a key innovator and research manager for Syva Company, the diagnostics branch of Syntex Corporation. During his 13 years with Syva, Dr. Ruben-stein became vice president, scientific affairs, a function that included strategic planning. Since 1983, he has served as a technology and marketing consultant to biomedical companies and an industry analyst, with more than 40 published studies to his credit.

    For more information about published Insight Pharma Reports, visit or call Rose LaRaia at 781-972-5444.

  • iv Reproduction prohibited

  • Reproduction prohibited v

    Executive Summary

    Although Web-hosted applications are not particularly new, during the past few years they have morphed into what is now called cloud computing, which can arguably be considered a major paradigm shift for informatics. Early big iron computation was highly centralized with units in relatively few locations. As these early behemoths evolved into minicomputers and, later, personal computers, informatics became increasingly decentralized. The rise of cloud computing has migrated computation back toward infra-structure centralization, with large clusters of commodity hardware in relatively few physical locations. Early cloud-like applications centered on email, relatively simple productivity software, merchandizing, and social networking. In the past few years, several companies, led by Amazon Web Services, have made it possible to run more complex applications in the cloud, including some of great interest to life sciences R&D.

    This report was motivated by the rapidly growing importance of cloud computing in dealing with the deluge of data raining down on life science R&D organizations from several sources, notably next-gener-ation DNA sequencing systems and -omics tools. At the same time, demand for computationally com-plex modeling and simulation studies continues to rise dramatically. Limited funding and budgets make it difficult for many organizations to build the infrastructure necessary to keep pace with these demands, and cloud computing offers what appears to many as an attractive alternative to in-house expansion.

    Following a brief introduction, Chapter 2 of this report covers the evolution of cloud computing and explores the underlying concepts that provide context for deeper understanding of the subject. Chapter 3 focuses on technological aspects of cloud computing as it exists today, and describes the activities of companies active in providing cloud services and related software. The fourth chapter turns to explora-tion of current and emerging applications of cloud computing. Chapter 5 focuses on market aspects of cloud computing, and includes results from an extensive survey of bioinformatics people concerning their practices and views on the subject. The sixth chapter contains transcripts of interviews with six individuals who have extensive knowledge in the field. Extracts from these interviews have been inserted into the body of the report in their proper context. The final chapter provides general observations and conclusions.

  • Executive Summary

    vi Reproduction prohibited


    Cloud computing is, arguably, less a technological advance than it is a new business model. The evolu-tion of the subject can be traced back to the early days of computing when time-sharing permitted a number of users to simultaneously tap into centralized hardware. Computer clustering, which came into vogue starting in the 1960s, involves groups of computers linked in networks to emulate a single comput-er. The clustering concept eventually evolved into the Internet and also morphed into grid computing, which links computers at multiple sites, enabling them to perform a common task. Yet another important underlying concept, virtual computing, enables creation of a simulated computer environment within a given computer or network (e.g., emulating a PC environment on an Apple computer). An important cloud-related development in the software realm came from Google, which developed MapReduce, a program that permits large datasets to be broken into small segments. These can be spread among large numbers of computers without interfering with users ability to query and receive cohesive answers. An open-source adaptation, Hadoop, is currently a key element in bringing cloud computing to the life sci-ence sector.

    Cloud computing actually has diverse definitions, depending on who is doing the defining. For our purposes, it is sufficient to define the concept in terms of features that are commonly associated with the subject by users and observers.1 These features are resource outsourcing, utility computing, large collec-tions of inexpensive machines, automated resource management, virtualization, and parallel computing.

    Public clouds offer utility computing in much the same sense that energy companies provide electricity: You pay for what you use. Anyone with Web access and a credit card can order the hardware and soft-ware needed to process or store their data, and release them back to the cloud when no longer needed. Given lingering concerns over data security, large companies may choose to implement a private cloud, one that provides many of the advantages of the cloud model via infrastructure contained within their firewall. A third model, the hybrid cloud, allows companies to keep key data within their firewall while extending selective activities out to public clouds.

    Cloud services divide into four main categories. IaaS (infrastructure-as-a-service), which embodies the essence of cloud computing, allows customers to fully outsource provision of servers, software, data center space, and/or network equipment. PaaS (platform-as-a-service), also known as cloudware, offers a hosted computing platform that allows customers to deploy applications without having to buy and manage the required hardware and underlying software layers. Typically, PaaS provides customers with everything needed to build and deliver cloud-based applications and services. SaaS (software-as-a-service), which originated around the turn of the century, refers to software licensed by a provider to customers on either a contractual or utility basis. The software may reside on the providers network and get accessed via the Web, or be downloaded to the customers sys