3

Click here to load reader

Impact of microarray data quality on genomic data ...rusynlab.unc.edu/publications/course_data/Frueh 2006.pdf · Impact of microarray data quality on genomic data submissions to the

Embed Size (px)

Citation preview

Page 1: Impact of microarray data quality on genomic data ...rusynlab.unc.edu/publications/course_data/Frueh 2006.pdf · Impact of microarray data quality on genomic data submissions to the

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006 1105

Impact of microarray data quality on genomic data submissions to the FDAFelix W Frueh

How can microarray data best be exploited and integrated into the regulatory decision-making process?

Five years ago, the completion of the sequenc-ing of the human genome was announced1,2,

triggering many comments about the value of this knowledge for new approaches and insights into drug development. However, although genomics is used in an increasing number of drug development programs, the genomics-led ‘revolution’ in drug development has not hap-pened yet. This can be attributed to a variety of reasons; one reason is the lack of a thorough evaluation of the quality of novel technologies such as DNA microarrays as well as the man-ner in which the results of such experiments are analyzed and interpreted.

To investigate the challenges presented to regulators by microarray data, the US Food and Drug Administration (FDA) spearheaded the formation of the MicroArray Quality Consortium (MAQC), which brings together researchers from the government, industry and academia to assess the key factors contributing to variability and reproducibility of microarray data. Ultimately, the data from this initiative will help determine a new set of standards and guidelines for the use of DNA microarray data.

Genomic data maturesSeveral factors have encouraged the adoption and integration of genomic data in drug devel-opment and regulatory assessment, including a better understanding of disease pathophysi-ology and targeted drug molecules to sites of action. However, there are challenges to further

expansion of genomics use; one key issue fre-quently discussed is that genomic science has evolved more quickly than technologies suitable for generating consistent, high-quality genomic data. Before 2004, genomic information was largely absent from the investigational new drug submissions or new drug applications received by the FDA; today, that situation is chang-ing (Fig. 1). This more than likely reflects the timelines associated with the drug development process overall and the integration of genomics within that process. It is therefore logical that by this time, we should be starting to see an increase in submissions to the FDA containing genomic information; indeed, the number of data submissions containing genomic informa-tion is increasing significantly (Fig. 1).

On the basis of 20 voluntary genomic data submissions that have been submitted to the

FDA so far, it appears that the technologies for generating genomic data have only recently become a commodity of broader applica-tion. Recently, the integration of large-scale screening approaches (e.g., gene expression profiling or whole genome single-nucleotide polymorphism (SNP) scans has been observed in different stages of drug discovery and now also in drug development. Consequently, at this point, the generation and exploitation of genomic data generated from such large-scale efforts in modern drug development requires a regulatory environment adequately equipped to review such data.

The agency respondsShortly after the human genome sequence was announced, a seminal paper by the FDA’s Lesko and Woodcock3 was published highlighting

Felix W. Frueh, US Food and Drug Administration, Office of Clinical Pharmacology, Center for Drug Evaluationand Research, 10903 New Hampshire Avenue, Silver Spring, Maryland 20993, USA.e-mail: [email protected]

20

15

10

5

0

Num

ber

of s

ubm

issi

ons

Q1 '04 Q2 '04 Q3 '04 Q4 '04 Q1 '05 Q2 '05 Q3 '05 Q4 '05 Q1 '06 Q2 '06

Quarter of year

ConsultsVGDS

Figure 1 Increase in formal requests (consults) for genomic data review (data submitted as part of regular INDs, NDAs or BLAs) to the Office of Clinical Pharmacology, and voluntary genomic data submissions (VGDS) to the FDA, since 2004. IND, investigational new drug; NDA, new drug application; BLA, biologic license application.

C O M M E N TA RY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Page 2: Impact of microarray data quality on genomic data ...rusynlab.unc.edu/publications/course_data/Frueh 2006.pdf · Impact of microarray data quality on genomic data submissions to the

1106 VOLUME 24 NUMBER 9 SEPTEMBER 2006 NATURE BIOTECHNOLOGY

the importance of new guidance for regula-tory submissions containing genomic infor-mation. This ‘call to arms’ was followed by a series of workshops organized by FDA/the Drug Information Association (Horsham, PA, USA)/and the Pharmaceutical Researchers and Manufacturers of America (PhRMA; Washington, DC, USA) on pharmacogeno-mics, which led to the development of a guid-ance document and ultimately facilitated a new type of voluntary data submission process—voluntary genomic data submission (VGDS). This process allowed for a new informal inter-action between sponsors of voluntary submis-sions and regulators to discuss the science of novel, exploratory uses of pharmacogenomics. The Guidance for Industry: Pharmacogenomic Data Submissions4, released as a final guid-ance document in 2005, was accompanied by two additional documents explaining the newly created VGDS path and the function/responsibilities of a newly created FDA-wide Interdisciplinary Pharmacogenomic Review Group (IPRG), respectively.

At the same time, the FDA launched a new website (http://www.fda.gov/cder/genomics), which serves as a portal for regulatory infor-mation in the area of genomics. Together, these new regulatory resources allow and promote the submission of exploratory, cutting-edge genomic data to the FDA. This exploratory information is not used by regulators or industry as part of regulatory decision mak-ing, which is a critical aspect as it is understood that many of the data sets generated with this new technology are not yet sufficiently mature to contribute to critical regulatory decisions that have a wide-ranging impact on entire drug development programs. Nonetheless, these data are of value to regulators in understand-ing the changes underway in the processes, approaches and direction of drug research and development programs. It is also impor-tant to note that the Guidance for Industry: Pharmacogenomic Data Submissions4 is not a guidance about ‘voluntary’ submissions alone;

instead, in very general terms, it explains what types of genomic data need to be submitted to the FDA and when, and what types of data can be submitted on a voluntary basis.

Voluntary genomic data submissionThe VGDS program creates a forum for sci-entific data exchange and discussions with the FDA outside of the regular review process. The VGDS program is used for a variety of strategic purposes and continues to evolve. For example, sponsors submit data on a voluntary basis to discuss the potential impact of using this infor-mation in the drug development program: this leads to questions such as ‘how can we test the hypothesis and how can it be validated’ or ‘will this approach provide us with a clinically useful answer,’ but also to such questions as ‘how do we best analyze the data’ or ‘what is the most suitable approach for a biological (that is, mechanistic) interpretation of the data?’

To date, the FDA/IPRG has received and reviewed ~20 voluntary genomic data sub-missions. These submissions varied signifi-cantly in content and focus (Table 1), and a large number contained microarray gene expression data. Even though most of the microarray data were generated using a pho-tolithographically synthesized oligonucleotide chip platform (Affymetrix; Santa Clara, CA, USA), the heterogeneity of the data submis-sions was surprising, illustrating to the agency two key problems: first, the need for standard-ization in data generation, normalization and submission; and second, the need for measures of data quality.

Although VGDS data allow the FDA to gain insight into specific drug development pro-grams and genomic data used within them, the data often do not allow a systematic assess-ment of quality measures. Consequently, these VGDS data are ideal to create snapshots of the state of the art in industry generation and use of genomic data, but they may or may not be consistent with more general quality standards. This poses a challenge to the interpretation of

the data itself and the conclusions drawn from such data sets. Even so, our experience with reviewing microarray data sets has already given us invaluable information that has helped us both to design the experiments and strategies needed to create such standards and to point to the most critical aspects of data analysis and interpretation.

The genesis of MAQCTogether with a variety of other motivating fac-tors outlined elsewhere in this issue of Nature Biotechnology, reviewers in the IPRG created a list of issues that need to be addressed for microarray data to become acceptable for regu-latory review. For example, data normalization was identified as a major factor for differences when comparing results and data interpreta-tions performed by sponsors and FDA review-ers. The use of different data analysis protocols could explain differences as well as the use of different data interpretation tools, such as software for pathway analyses. In other words, the VGDS process, and data received in volun-tary submissions, have helped to identify the impact of different data analysis strategies, but the data themselves cannot be used to address and solve the issues. To do this, a broader, well defined and generalizable process needs to be used, such as the MAQC, which allows the sys-tematic exploration of all sources of variability, the assessment of the importance of each factor (e.g., how much does a difference in data nor-malization contribute to the overall variability in the data seen) and, ultimately, the determi-nation of a set of standards and best practices to be followed.

It is reasonable to expect, however, that dif-ferent parameters (technical as well as practical) may continue to hamper the implementation of ‘best practices’ in the future under certain cir-cumstances. We are aware that the studies con-ducted in the MAQC occur in an ‘optimized’ environment that may not always be possible to implement because of limitations to infra-structure, slower turnaround time of sample processing or other restrictions that real-life settings bear. Regardless, it is critically impor-tant to identify, and evaluate the importance of each individual step in the generation, process-ing and interpretation of microarray data to be able to assess the extent to which these steps may contribute to data variability.

For this reason—and because a regulatory agency must not only be able to understand the steps that led to the generation of data that are submitted, but should also be able to set this information into the context of other, similar data to assess data consistency and overall quality—it is important to know what could be considered a ‘gold standard.’ Ultimately,

Table 1 Focus areas of voluntary genomic data submissions as of February 2006Therapeutic areas Scientific field

Cancer (multiple types) Biomarkers

Alzheimer disease Genotyping devices

Hypertension Microarrays

Hypoglycemia Analysis software, databases

Depression Metabolic pathways

Obesity Enrichment design

Rheumatoid arthritis Registry design

All Toxicology

All Biostatistics

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy

Page 3: Impact of microarray data quality on genomic data ...rusynlab.unc.edu/publications/course_data/Frueh 2006.pdf · Impact of microarray data quality on genomic data submissions to the

NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 9 SEPTEMBER 2006 1107

this knowledge will help the agency to better understand and interpret data that have been generated under less ideal settings because the sources of the largest data variability or uncertainty in data interpretation are known and can be addressed more adequately. It has been suggested that, for example, GLP (Good Laboratory Practices, 21 CFR part 58)5 could be used to clarify some of the issues around microarray standards and data variability. The requirements of 21 CFR part 58 apply to non-clinical studies submitted to support safety findings, including nonclinical pharmacoge-nomic studies intended to support regulatory decision making. Given the exploratory nature of many of the microarray studies, it seems unreasonable to expect compliance with part 58 for these types of studies with the rigor of these standards. At the same time, it may not be feasible to conduct separate, long-term, non-GLP preclinical studies: sampling of tis-sues from GLP studies is a valuable means of conducting additional exploratorion, investi-gational studies. Although the removal of tis-sue samples and the reason for removal (e.g., exploratory, mechanistic study, tissue banking) should be specified in the protocol, the removal of specimens for investigational purposes from a study does not invalidate the GLP status of the main toxicology study, if otherwise acceptable (see ref. 4, section IV.D. for more details).

The ultimate goal—standardsData generated during modern drug develop-ment are becoming increasingly more com-plex and large data sets, such as microarray data, need to be handled and processed in an efficient and coherent fashion. The FDA has started with the implementation of new data standards, such as the ones recommended by the Clinical Data Interchange Standards Consortium for new regulatory submissions. To date, these standards are available for such data sets as pharmacokinetics/pharmacody-namics; they are not yet available for genomic data submissions. Although we feel that it is too early for definitive recommendations on how, and in what format, to submit genomic information, it is advisable to work toward such standards, even at this early stage. Lessons learned from the VGDS program have been helpful in explaining and recom-mending aspects of submitting genomic data already and the agency feels that efforts, such as the MAQC, will further help it create ‘best practices’ and recommendations detailing the preferred formats and extent of genomic data submissions that should accompany regula-tory filings with the FDA.

From the analysis of approximately ten voluntary data submissions that contained

microarray data, the agency has found that the results are heavily dependent on the qua-lity of the starting material that is being used for a microarray experiment, the data analysis protocol and the biological pathway analysis tools available to interpret lists of statistically significant genes. Sample storage and prepa-ration are critical for the reproducibility of these data. Poor sample quality can prevent data interpretation from being conclusive.

A second critical factor is the data analysis protocol. Different sets of gene expression signatures with different biological contexts can be generated from the same raw data by different data analysis protocols. Different biological contexts can also be generated from the same gene expression signature by different biological pathway analysis tools. The biological interpretation of the data is common currency for VGDS discussions and regulatory review between the sponsors and the FDA. The uniqueness of a list of genes in a signature is not in and of itself the goal of exploratory biomarker investigations submit-ted as part of a VGDS. It could, however, be important in the selection of signatures for validation studies.

Consequently, for the FDA to interpret microarray data that are submitted for regu-latory purposes, it is critical for sponsors of genomic data submissions to include a pre-cise description of the steps involved before the actual array experiment, including the method of sample collection, storage, RNA extraction and labeling, as well as the data analysis protocol and biological pathway interpretation tools applied to these data.

Much has been published about the con-cordance, or lack thereof, of data generated on different gene expression analytical plat-forms6,7. Although the MAQC addresses this issue with the goals of establishing quality parameters for microarray experimentation and of identifying sources of variability, the agreement (overlap) of different platforms in real-life settings might actually be less important, especially in situations where a gene signature is to be identified that might be narrowed down to a handful of key genes. In these cases, it is likely that the assay itself will be moved onto a different platform (e.g., from a high-density microarray plat-form onto low-density arrays or quantitative PCR) that would require new and indepen-dent validation. For this scenario, it is first important to identify the particular subset of genes that is predictive of a given state (e.g., disease, treatment effect) that in fact may or may not resemble the actual full set of genes altered in expression in that state. The use of one particular (e.g., high-density)

microarray platform can be used to screen for and identify these ‘predictive’ genes without the need to get a full ‘representative’ picture of the transcriptome, with the intention of producing a signature set of genes that can be used in downstream applications, such as clinical trials and clinical practice.

Consequently, it is sufficient and rea-sonable to expect that only a subset of the transcriptome will be analyzed for this pur-pose. This approach is not unlike the use of haplotypes and ‘tag-SNPs’ when genotyping (rather than gene expression profiling) is performed to characterize a particular dis-ease or disease state. Additional sources of discordance among platforms include, for example, the fact that often different loca-tions in the same gene are used as probes on different platforms, which may, for example, either result in different reported intensities (that are therefore interpreted as different fold changes) or result in the analysis of par-ticular splice variants of a gene.

ConclusionsDespite the limitations outlined above, we believe that microarray platforms are suitable tools to produce high-quality and reliable data that will prove useful in drug devel-opment and regulatory decision making. Understanding the limitations and assess-ing the variability is imperative, however. As those in regulatory agencies are exposed to, and expected to adequately analyze, microar-ray data, we need a better understanding of the technology and agreement on standards and formats for submission of data and the interpretation of the results.

The MAQC provides an excellent and unprecedented resource to determine ‘best microarray practices,’ including the use of reference material, data assembly and for-mats. This and other efforts, such as the VGDS program at the FDA are instrumental to the efficient and effective use of microarray data in the regulatory review process. Given the increasing number of genomic data sub-missions to the agency, these initiatives are happening just at the right time.

Disclaimer: The views expressed in this article are those of the author and not necessarily those of the US Food and Drug Administration.

1. International Human Genome Sequencing Consortium. Nature 409, 860–921 (2001).

2. Venter, J.C. et al. Science 291, 1304–1351 (2001).3. Lesko, L.J. & Woodcock, J. Pharmacogenomics J. 2,

20–24 (2002).4. http://www.fda.gov/cder/guidance/6400fnl.pdf5. http://www.cfsan.fda.gov/~dms/opa-pt58.html6. Shi, L. et al. Expert Rev. Mol. Diagn. 4, 761–777

(2004).7. Shi, L. et al. BMC Bioinformatics 6 suppl. Suppl. 2,

S12 (2005).

COMMENTARY©

2006

Nat

ure

Pub

lishi

ng G

roup

ht

tp://

ww

w.n

atur

e.co

m/n

atur

ebio

tech

nolo

gy