Cloud Computing in Life Sciences RD - Insight Pharma active in providing cloud services and related software. ... virtual computing, ... Cloud Computing in Life Sciences RD. applICaTIonS, . Of-

  • Published on

  • View

  • Download


<ul><li><p>Cloud Computing in Life Sciences R&amp;</p><p>D</p><p>Cloud Computing in Life Sciences R&amp;D</p><p>Ken Rubenstein, PhD</p><p>April 2010</p><p></p></li><li><p> Reproduction prohibited i</p><p>Cloud Computing in Life Sciences R&amp;D</p><p>by Ken Rubenstein, PhD</p><p>Published in April 2010 by Cambridge Healthtech Institute </p></li><li><p>ii Reproduction prohibited</p><p>Insight Pharma Reports is a division of Cambridge Healthtech Institute, a world leader in life science informa-tion and analysis through conferences, research reports, and targeted publications. Insight Pharma Reports focus on pharmaceutical R&amp;Dthe technologies, the companies, the markets, and the strategic business impacts. They regularly feature interviews with key opinion leaders; surveys of the activities, views, and plans of individuals in industry and nonprofit research; and substantive assessments of technologies and markets. Managers at the top 50 pharma companies, the top 100 biopharma companies, and the top 50 vendors of tools and services rely on Insight Pharma Reports as a trusted source of balanced and timely information.</p><p>Related Reports</p><p>Next-Generation Sequencing: Solving the Genomeby Ken Rubenstein, PhD</p><p>Bioinformatics and Computational Biology: Bottlenecks and Optionsby K. John Morrow, Jr., PhD</p><p> General Manager: Alfred R. Doig, Jr. 781-972-1348,</p><p> Editorial Operations Director: Laurie Sullivan</p><p> 781-972-1353,</p><p> Design Director: Tom Norton</p><p> 781-972-5440,</p><p> Production Director: Ann Handy</p><p> 781-972-5493,</p><p> Marketing Manager: James Prudhomme</p><p> 781-972-5486,</p><p> Customer Service: Rose LaRaia</p><p> 781-972-5444,</p><p> Global Licenses: Jack Valeri</p><p> 781-972-1355,</p><p> Corporate Subscriptions: David Cunningham</p><p> 781-972-5472,</p><p>Insight Pharma Reports, 250 First Ave., Suite 300, Needham, MA</p></li><li><p> Reproduction prohibited iii</p><p>Cloud Computing in Life Sciences R&amp;D</p><p>by Ken Rubenstein, PhD</p><p>A Cambridge Healthtech Institute publication 2010 by Cambridge Healthtech Institute (CHI). This report cannot be duplicated without prior written permission from CHI.</p><p>Every effort is made to ensure the accuracy of the information presented in Insight Pharma Reports. Much of this information comes from public sources or directly from company representatives. We do not assume any liability for the accuracy or completeness of this information or for the opinions presented.</p><p>Cambridge Healthtech Institute, 250 First Ave., Suite 300, Needham, MA 02494 Phone: 781-972-5444 Fax: 781-972-5425</p><p> About the Author</p><p>Ken Rubenstein, PhD, a biochemist and molecular biologist, received his PhD at the University of Wisconsin and postdoctoral training at the University of Pennsylvania School of Medicine. He was a key innovator and research manager for Syva Company, the diagnostics branch of Syntex Corporation. During his 13 years with Syva, Dr. Ruben-stein became vice president, scientific affairs, a function that included strategic planning. Since 1983, he has served as a technology and marketing consultant to biomedical companies and an industry analyst, with more than 40 published studies to his credit.</p><p>For more information about published Insight Pharma Reports, visit or call Rose LaRaia at 781-972-5444.</p></li><li><p>iv Reproduction prohibited</p></li><li><p> Reproduction prohibited v</p><p>Executive Summary</p><p>Although Web-hosted applications are not particularly new, during the past few years they have morphed into what is now called cloud computing, which can arguably be considered a major paradigm shift for informatics. Early big iron computation was highly centralized with units in relatively few locations. As these early behemoths evolved into minicomputers and, later, personal computers, informatics became increasingly decentralized. The rise of cloud computing has migrated computation back toward infra-structure centralization, with large clusters of commodity hardware in relatively few physical locations. Early cloud-like applications centered on email, relatively simple productivity software, merchandizing, and social networking. In the past few years, several companies, led by Amazon Web Services, have made it possible to run more complex applications in the cloud, including some of great interest to life sciences R&amp;D.</p><p>This report was motivated by the rapidly growing importance of cloud computing in dealing with the deluge of data raining down on life science R&amp;D organizations from several sources, notably next-gener-ation DNA sequencing systems and -omics tools. At the same time, demand for computationally com-plex modeling and simulation studies continues to rise dramatically. Limited funding and budgets make it difficult for many organizations to build the infrastructure necessary to keep pace with these demands, and cloud computing offers what appears to many as an attractive alternative to in-house expansion.</p><p>Following a brief introduction, Chapter 2 of this report covers the evolution of cloud computing and explores the underlying concepts that provide context for deeper understanding of the subject. Chapter 3 focuses on technological aspects of cloud computing as it exists today, and describes the activities of companies active in providing cloud services and related software. The fourth chapter turns to explora-tion of current and emerging applications of cloud computing. Chapter 5 focuses on market aspects of cloud computing, and includes results from an extensive survey of bioinformatics people concerning their practices and views on the subject. The sixth chapter contains transcripts of interviews with six individuals who have extensive knowledge in the field. Extracts from these interviews have been inserted into the body of the report in their proper context. The final chapter provides general observations and conclusions.</p></li><li><p>Executive Summary</p><p>vi Reproduction prohibited</p><p>Technology</p><p>Cloud computing is, arguably, less a technological advance than it is a new business model. The evolu-tion of the subject can be traced back to the early days of computing when time-sharing permitted a number of users to simultaneously tap into centralized hardware. Computer clustering, which came into vogue starting in the 1960s, involves groups of computers linked in networks to emulate a single comput-er. The clustering concept eventually evolved into the Internet and also morphed into grid computing, which links computers at multiple sites, enabling them to perform a common task. Yet another important underlying concept, virtual computing, enables creation of a simulated computer environment within a given computer or network (e.g., emulating a PC environment on an Apple computer). An important cloud-related development in the software realm came from Google, which developed MapReduce, a program that permits large datasets to be broken into small segments. These can be spread among large numbers of computers without interfering with users ability to query and receive cohesive answers. An open-source adaptation, Hadoop, is currently a key element in bringing cloud computing to the life sci-ence sector.</p><p>Cloud computing actually has diverse definitions, depending on who is doing the defining. For our purposes, it is sufficient to define the concept in terms of features that are commonly associated with the subject by users and observers.1 These features are resource outsourcing, utility computing, large collec-tions of inexpensive machines, automated resource management, virtualization, and parallel computing.</p><p>Public clouds offer utility computing in much the same sense that energy companies provide electricity: You pay for what you use. Anyone with Web access and a credit card can order the hardware and soft-ware needed to process or store their data, and release them back to the cloud when no longer needed. Given lingering concerns over data security, large companies may choose to implement a private cloud, one that provides many of the advantages of the cloud model via infrastructure contained within their firewall. A third model, the hybrid cloud, allows companies to keep key data within their firewall while extending selective activities out to public clouds.</p><p>Cloud services divide into four main categories. IaaS (infrastructure-as-a-service), which embodies the essence of cloud computing, allows customers to fully outsource provision of servers, software, data center space, and/or network equipment. PaaS (platform-as-a-service), also known as cloudware, offers a hosted computing platform that allows customers to deploy applications without having to buy and manage the required hardware and underlying software layers. Typically, PaaS provides customers with everything needed to build and deliver cloud-based applications and services. SaaS (software-as-a-service), which originated around the turn of the century, refers to software licensed by a provider to customers on either a contractual or utility basis. The software may reside on the providers network and get accessed via the Web, or be downloaded to the customers system and disabled when the contracted use period expires. The fourth main service, cloud storage, employs commodity hardware linked by software to appear as a single storage device.</p></li><li><p>Cloud Computing in Life Sciences R&amp;D</p><p> Reproduction prohibited vii</p><p>All major companies that provide computer hardware, software, or both are involved to some degree in cloud computing. Yet the pioneer and overwhelming market leader in the field is Amazon Web Services. The breadth of their service offerings and attractiveness of their pricing structure have made them a prime cloud destination for life science organizations today. Amazon EC2 (Elastic Cloud Compute) al-lows customers to rent servers on which they can create virtual machines that run their own applications. They offer persistent storage via the Simple Storage Service (S3) and the more elaborate Relational Database Service (RDS). A number of additional services extend the capabilities of these basic ones. An interesting entry, the Amazon Virtual Private Cloud (VPC), provides a bridge between an organizations existing IT infrastructure and the Amazon cloud. VPC allows enterprises to connect their infrastructure to a set of isolated Amazon computational resources via a Virtual Private Network (VPN) connection. Pfizer has opted to go this route.</p><p>Other large organizations currently compete with Amazon or have positioned themselves for future attempts to capture cloud market share. Google participates in cloud computing via its App Engine platform, which became available to customers in April 2007. App Engine provides an environment that permits developers to build new Web applications, generate code, access compute resources, and store data on virtual machines. In October 2008, Microsoft announced its cloud-based operating system, Windows Azure, along with Azure Services, which will permit developers to build and run applications hosted on Microsofts rapidly growing server collections. In December 2009, Microsoft announced forma-tion of a new internal organization, the Server and Cloud Division, which combines the former Windows Server and Solutions group with the Windows Azure unit.</p><p>Hewlett-Packard (HP) sells hardware to cloud services providers and offers varied cloud consulting services to customers, with heavy emphasis on security and risk management. HP Cloud Assure consists of HP services and software, including HP Application Security Center, HP Performance Center, and HP Business Availability Center. The services are delivered to customers via the HP Software-as-a-Service facility. IBM is focused mainly on providing the enterprise market with public cloud services specific to a companys workload, hardware for use at the customer site, and consulting/systems integration services to aid customers in building private and hybrid clouds. IBM also has 13 cloud computing centers to enable enterprises, government agencies, and researchers to design, develop, and test applications for use in cloud environments. A number of other large companies, such as AT&amp;T, Yahoo, Sun, and Verizon, are involved in developing and/or providing cloud computing services.</p><p>A number of smaller companies provide middleware and cloud services to augment and extend large company offerings. Cloud computing has generated a great deal of buzz in the venture capital community as having great upside potential. A number of new companies have formed recently, and the list looks like it will keep growing for the next few years. Following is a brief look at some of the smaller compa-nies, especially those of interest to life science R&amp;D.</p></li><li><p>Executive Summary</p><p>viii Reproduction prohibited</p><p>BioTeam and Cloudera provide extensive consulting services to assist organizations in entering cloud computing. Cycle Computing features CycleCloud, a scheduling service for cloud computing. Darkstrand addresses the high-speed networking needs of high-performance computingbased cloud computing. GenoLogics focuses its collaborative data-management software platform on biomedical and drug discov-ery/development applications in the cloud, with emphasis on translational medicine and systems biology in pharma, biotechnology companies, and academic organizations.</p><p>GenomeQuest provides a cloud computing environment that allows researchers to perform sequence alignment and data mining on next-generation sequencing data. Geospiza develops and sells enterprise-class software systems for workflow management of genetic analysis. Their GeneSifter Analysis Edition provides end-to-end capability for data-intensive genetic analysis applications including microarrays and next-generation sequencingbased transcription. In collaboration with Applied Biosystems, Geospiza now offers GeneSifter for next-generation sequencing in the cloud through Amazon Web Services.</p><p>Nirvanix is a cloud storage company. Their Storage Delivery Network is a fully managed, highly se-cure service powered by patent-pending, proprietary technology and infrastructure. ParaScale provides enterprise-level cloud storage resources under the names Hyper-Scale Storage Cloud and Hi-Performance Storage Cloud. The company provides software that can be downloaded and installed on commodity hardware running standard Linux to create a storage cloud. Peng...</p></li></ul>


View more >