Click here to load reader

Life Sciences Company Uses SQL Server 2005 to …download.microsoft.com/documents/customerevidence/25182... · Web view“SQL Server 2005 has given us the performance we needed to

Embed Size (px)

Citation preview

Life Sciences Company Uses SQL Server 2005 to Enhance Full-Text Searches

Microsoft SQL Server 2005

Customer Solution Case Study

Life Sciences Company Uses SQL Server 2005 to Enhance Full-Text Searches

Overview

Country or Region: Germany

Industry: Pharmaceuticals

Customer Profile

Based in Winden, Germany, Metalife provides tools and services to assist pharmaceutical companies and scientists explore genetic databases as they search for ways to cure and prevent disease.

Business Situation

Metalife needed to provide increased capacity for concurrent, full-text searches, and provide easier data access, data snapshots for versioning, and easy-to-use reporting.

Solution

Metalife upgraded to Microsoft SQL Server 2005 (64-bit) database running on the Microsoft Windows Server 2003, Enterprise x64 Edition, operating system.

Benefits

Easier data navigation

Better support for concurrent full-text searches

Versioning with Database Snapshot

Easy-to-use reporting

SQL Server 2005 has given us the performance we needed to give scientists a single doorway into the world of genetics.

Michael Schnemann, Founder and Chief Executive Officer, Metalife

Metalife is helping researchers who are seeking new treatments and cures for a range of diseases. The companys PhenomicDB database provides a central repository for genetic information from the Human Genome Project and similar efforts. As the success of Metalifes database has grown, so has demand for support for concurrent, full-text searches against its database. The company needed the ability to support more concurrent users. By upgrading to Microsoft SQL Server 2005 (64-bit), the company now supports 3,000 users as they perform concurrent, full-text searches. And Metalifes lab tests show SQL Server 2005 can handle more than 10,000. Metalife is also using the Database Snapshot feature of SQL Server 2005 to enable scientists to run queries against earlier versions of the database.

Situation

Ever since James Watson and Francis Crick solved the mystery of how genetic material was carried within the double helix structure of DNA, a discovery for which they won the 1962 Nobel Prize in Physiology, there has been a worldwide race to use the findings to benefit humankind. That race has intensified in recent years through efforts like the Human Genome Project, a multinational project that, after 13 years, succeeded in providing a complete mapping of the 20,000 genes that make up the human chromosomes that make our life possible.

But mapping the genome is only the beginning of a huge scientific inquiry. Each gene is essentially a cookbook, holding instructions for how to create the proteins that are the building blocks of life. Researchers around the world are searching for ways in which proteins can be used to treat and prevent a spectrum of diseases.

Metalife AG, a German life sciences company and a Microsoft Certified Gold Partner, is dedicated to solving a major problem scientists facehow to deal with the wealth of genetic data that has been generated in the decades since Watson and Cricks discovery. Genetic researchers are interested in genotype (what the gene is made of) and phenotype (what the gene does) databases.

As a service to the scientific community, Metalife created its Web-based PhenomicDB multispecies relational database (www.phenomicdb.de ). PhenomicDB provides a single point of access for a number of genotype and phenotype databases. PhenomicDB compiles this wealth of data into a single, integrated resource by mapping of the phenotypic data fields and other methodologies. To allow for direct comparison of phenotypes of orthologous genes, from yeast to humans, these mapped data are uploaded together with a gene index-like database. Examples include the NCBI Gene database (Pruitt et al. 2001) and the associated orthology data HomoloGene database (Wheeler et al. 2004). The result is that PhenomicDB helps scientists to browse and compare known phenotypes for a given gene or a set of genes from different organisms.

Metalife provides a much larger database for its commercial customers. Called the Metalife Knowledgebase, this 1.4 terabyte database includes the 20 gigabytes (GB) PhenomicDB, in addition to a vast collection of genetic information (including articles from scientific journals from around the world), and data from the European Bioinformatics Institute (EBI), the UniProt protein sequence database, and a wealth of other sources.

Metalife has created a set of applications, the Metalife 2005 Suite, designed to help pharmaceutical companies and other organizations reduce the cost of research by giving them the ability to better predict outcomes through use of in silico (computer) rather than in vitro (live) experimentation. The Metalife 2005 Suite works in conjunction with the Metalife Knowledgebase.

As Metalifes PhenomicDB database and the Metalife Knowledgebase grew in popularity, Metalife saw the need to enhance performance and features, including:

Providing easier access to the data.

Increasing capacity for concurrent full-text searches, especially on the public PhenomicDB.

Providing data snapshots for versioning.

Giving users an easy-to-use but powerful reporting tool.

Solution

Metalife upgraded its PhenomicDB database and its Metalife Knowledgebase to Microsoft SQL Server 2005 (64-bit) from SQL Server 2000 Enterprise Edition (64-bit). The databases, deployed as a single instance of SQL Server, continue to run on the Microsoft Windows Server 2003, Enterprise x64 Edition, hosted on an 8-way server with Itanium 2 processors and 32 GB of RAM, and direct-attached storage. Windows Server 2003 and SQL Server 2005 are part of Microsoft Windows Server System integrated server software.

The company is using the full-text search feature of SQL Server 2005 to replace a custom indexing application Metalife had created after finding it needed even better performance than it had seen from the Full-Text Search feature of SQL Server 2000.

The Metalife Knowledgebase is supplemented by a suite of client applications that Metalife created by using the Microsoft Visual Studio .NET 2003 development system and the Microsoft .NET Framework. The applications include:

Metalife Predictor. Predictor is an application for in silico prediction of functional characteristics of biological objects and automatic annotation of unannotated sequences. Predictor is a combination of approved, state-of-the-art algorithms optimized for highest accuracy of produced knowledge. Some of the algorithms have been already implemented by Metalife by using Microsoft SQL Server 2005 common language runtime (CLR) stored procedures. Enhancements were made for distributing the algorithms on multiple machines.

Metalife Analyzer. Analyzer technology dramatically enhances interpretation and understanding of complex structures underlying large sets of complex data. The application combines powerful algorithms to improve knowledge extraction by automatic classification of data and abstraction of common concepts, document clustering, and content similarity search. The Metalife analyzer module includes also a complete thesaurus management system with worldwide-and user-defined scientific ontologies.

Metalife Explorer. Explorer applications automatically compile complete dossiers on defined objectsincluding proteins, genes, and low molecular weight compounds. Compiled dossiers of defined objects in Metalife Explorer contain information collected from all integrated, publicly available data sources. Dossiers include information extracted from scientific publications by Metalifes text extraction procedure or generated by highly sophisticated machine-learning algorithms.

Metalife Validator. Validator helps eliminate false positives by combining the strength of Metalife Predictor algorithms with background information from life science literature in the Metalife Knowledgebase. This validation results in predictions that are more valuable and verifiable.

Metalife Array Analyzer. Array Analyzer provides a computational and graphical environment for storing, preprocessing, analyzing, clustering, and saving results from DNA microarray experiments. The program can organize and analyze the data in a number of different ways dedicated to helping researchers design and analyze expression profiling experiments to generate high-quality, statistically sound, and biologically interesting results.

Metalife deploys the Metalife Knowledgebase as a dedicated hosted server for each customer (typically global pharmaceutical companies). A copy of the Metalife Knowledgebase mirrors to a Dell PowerEdge 6600 4-way server with AMD 64-bit processors. Each customer, by hosting its own dedicated server, has the security of knowing that it is the only company that uses the database, and that all of its queries are confidential. The hosted solution also means that a pharmaceutical doesnt have to open a firewall port to receive daily updates. Metalife uses the merge replication feature of SQL Server 2005 to push updates to its dedicated servers. Metalife also supports some of its customers by providing on-site databases that can be updated across a virtual private network.

Extract, transform, and load (ETL) procedures are handled by a custom application Metalife created by using Visual Studio .NET. The company uses the Database Snapshot feature of SQL Server 2005 for versioning, and SQL Server 2005 Reporting Services for generating reports.

Benefits

Since Metalife upgraded to the 64-bit version of SQL Server 2005, the company has realized a number of benefits, including easier data navigation, better support for concurrent, full-text searches, versioning with Database Snapshot, and easy-to-use reporting.

Easier Data Navigation

The 64-bit version of SQL Server 2005 helps Dr. Michael Schnemann, founder and Chief Executive Officer of Metalife, realize his goal of providing ease of access to a world of genetic data so that scientists can pursue their passions without having to become computer programmers and database administrators. SQL Server 2005 has given us the performance we needed to give scientists a single doorway into the world of genetics, Schnemann says. Weve brought together the essential, top 100 databases that, together, provide nearly a complete record of what is known about genotypes and phenotypes across several species. And using Visual Studio .NET and the .NET Framework, weve given our users powerful tools for exploring this genetic realm.

Unifying databases and providing easy-to-use searching tools makes life easier for researchers. If someone is looking for a protein interaction to help treat a disease, they should not have to be an expert on what databases they need to search, what algorithm they need to use, nor should they have to spend time away from science to learn how to write searching scripts, says Schnemann. The easier we can make it for researchers to explore, the greater are their chances for discovering applicable values from the wealth of genetic information that is being generated around the globe. Our client applications make it as easy to explore protein interactions as working with any Microsoft Office application. You can just drag-and-drop to create your queries.

The Metalife applications also support new ways of looking at data, including rich visual models, which are especially helpful in dealing with proteins where the folding shapes of the proteins can play key roles in their interactions with proteins of other biological entities. We want to display data in ways that can speed the scientific process, says Schnemann. We want to help researchers more quickly identify the sequences and interactions they need.

The images generated by the Metalife applications can be deeply explored because they are drawn from the SQL Server 2005 database. Clickselecting a protein or the line joining two proteins or a range of other pointsand you can drill down into the journal articles and other resources that describe the research behind the relationships, and follow any supporting citations.

The Metalife 2005 Suite of tools, running against the 1.4 terabyte SQL Server 2005 database, could also save time and money for pharmaceutical companies as they research new treatments. A pharmaceutical company can spend U.S.$800 million, and more than a decade attempting to bring a new drug to market, says Schnemann. We believe we can help researchers substantially reduce the time and money involved by giving them the chance to explore potential protein interactions and other factors within the complete biological pathway in silicowithin our SQL Server 2005 databaserather than in vitro in the laboratory. If we did nothing more than help researchers identify dead ends more quickly, it would be a substantial contribution to science.

Better Support for Concurrent, Full-Text Searches

With the growth in popularity of Metalifes PhenomicDB came increasing performance demands. Much of the information within the database is in text documents, placing more pressure on search performance. With more concurrent users performing full-text searches against the database, we found performance slowed noticeably, says Nadia Pavlova, Director of Business and Software Development, at Metalife. SQL Server 2005 has increased our ability to support concurrent, full-text searches by orders of magnitude. We are now supporting between 2,000 and 3,000 concurrent users performing full-text searches. And in our laboratory stress testing, we find we can support more than 10,000 concurrent full-text searches.

In addition to being robust in its support of concurrent users, Metalife has found SQL Server 2005 full-text searches to be fast. SQL Server 2005 is at least as fast as our custom code was, says Pavlova. We created a custom algorithm to return all full-text searches against our 1.4 terabyte database in less than a second, and that is what weve seen with SQL Server 2005.

Before the company deployed SQL Server 2005, Metalife had to create its own custom indexing application and external indexes. We stopped using our custom solution as soon as we saw the enhanced, full-text search performance of SQL Server 2005, says Schnemann. We were also happy to see that SQL Server 2005 backs up the full-text indexes at the same time it backs up the database. It is a relief to no longer have to support the external indexes and custom indexing application that we once had to use.

Versioning with Database Snapshot

The company uses the merge replication feature of SQL Server 2005 to automatically update its mirrored servers that support Metalife Knowledgebase customers. The company also uses merge replication to synchronize the logs and the history between the different mirrors of the database.

Before doing a merge replication, Metalife uses the Database Snapshot feature, new to SQL Server 2005, to preserve an image of the database. The Database Snapshot feature gives Metalife the ability to generate and use a read-only, stable view of a database without encountering the overhead or storage requirements of creating a complete copy of the database.

The scientists who use our solution appreciate the ability to rerun queries against an earlier version of the database, should something change when the data is updated, says Schnemann. SQL Server 2005 Database Snapshot provides a validation tool.

Before Metalife had SQL Server 2005, such versioning options couldnt be offered. There was no such possibility in SQL Server 2000, says Pavlova. Without snapshots, we would have had to have backed up the entire database under a different name, says Pavlova. To do that for more than a terabyte of data just wouldnt have been practical.

Easy to Use Reporting

After upgrading to SQL Server 2005, Metalife decided to abandon its custom reporting utilities and standardize on SQL Server Reporting Services to provide ease of use. Reporting Services is so logical yet powerful. We want to let our users benefit from it, Schnemann says. We will benefit too because, instead of asking us to do custom reports, our users will be able to create their own with Reporting Services.

Microsoft Windows Server System

Microsoft Windows Server System is a line of integrated and manageable server software designed to reduce the complexity and cost of IT. Windows Server System enables you to spend less time and budget on managing your systems so that you can focus your resources on other priorities for you and your business.

For more information about Windows Server System, go to:

www.microsoft.com/windowsserversystem

Microsoft SQL Server 2005

Microsoft SQL Server 2005 is comprehensive, integrated data management and analysis software that enables organizations to reliably manage mission-critical information and confidently run todays increasingly complex business applications. By providing high availability, security enhancements, and embedded reporting and data analysis tools, SQL Server 2005 helps companies gain greater insight from their business information and achieve faster results for a competitive advantage. And, because its part of Windows Server System, SQL Server 2005 is designed to integrate seamlessly with your other server infrastructure investments.

For more information about SQL Server 2005, go to:www.microsoft.com/sqlserver

Software and Services

Microsoft Windows Server System

Microsoft Windows Server 2003 Enterprise x64 Edition

Microsoft SQL Server 2005

Microsoft Visual Studio .NET 2003

Technologies

Microsoft .NET Framework

Microsoft SQL Server 2005 Reporting Services

Hardware

Intel-based 8-way server with Itanium 2 processors and 32 GB of RAM

Dell PowerEdge 6600 4-way server with AMD 64-bit processors

SQL Server 2005

has increased our ability to support concurrent, full-text searches by orders of magnitude.

Nadia Pavlova, Director of Business and Software Development, Metalife

2005 Microsoft Corporation. All rights reserved. This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Visual Studio, the Windows logo, Windows Server, and Windows Server System are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their respective owners.

Document published March 2005

For More Information

For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (877) 568-2495. Customers who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234 in the United States or (905) 568-9641 in Canada. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to: HYPERLINK "http://www.microsoft.com" www.microsoft.com

For more information about Metalife products and services, call 0700-metalife or visit the Web site at HYPERLINK "http://www.metalife.de" www.metalife.de

Easier Navigation Rich images from the Metalife applications running against the 1.4 terabyte SQL Server 2005 database.

"If we did nothing more than help researchers identify dead ends more quickly, it would be a substantial contribution to science.

Michael Schnemann, founder and Chief Executive Officer, Metalife

The scientists who use our solution appreciate the ability to rerun queries against an earlier version of the database.

SQL Server 2005 database snapshots provide a validation tool.

Michael Schnemann, founder and Chief Executive Officer, Metalife