38
Special Focus Issue on Biostatistics * Data Basics * Data Cleaning * Data Types * Parametric and Nonparametric Tests * Tables and Graphs

Postscripts v5 n36 _2015Aug

Embed Size (px)

DESCRIPTION

Monthly newsmagazine of the American Medical Writers Association (AMWA) Pacific Southwest Chapter

Citation preview

Page 1: Postscripts v5 n36 _2015Aug

Official publication of the American Medical Writers Association Pacific Southwest Chapter

AMWAPacificSouthwestChapter

August2015

Postscripts

Volume V Issue 36August 201 5

POSTSCRIPTSSpecial Focus Issue on Biostatistics

* Data Basics * Data Cleaning* Data Types* Parametric and Nonparametric Tests* Tables and Graphs

Also in This Issue:

• How Grantwriters Measure Success• WHO Guidelines for Naming Diseases

• Summer (Work) Style — ASK APRIL

Page 2: Postscripts v5 n36 _2015Aug

POSTSCRIPTS

AIMS AND SCOPE

Postscripts magazine is the official publication of the PacificSouthwest chapter of the American Medical Writers Association(AMWA). I t publishes news, notices and authoritative articles ofinterest in al l areas of medical and scientific writing andcommunications. The scope covers cl inical and regulatory writing,scientific writing, publication planning, social media, currentregulations, ethical issues, and good writing techniques.

MISSION STATEMENT

The mission of Postscripts is to facil itate the professionaldevelopment of medical writers and serve as a tool to advancenetworking and mentoring opportunities among all members.Towards this mission, Postscripts publishes significant advances inissues, regulations and practice of medical writing andcommunications; ski l ls and language; summaries and reports ofmeetings and symposia; and, book and journal summaries.Additional ly, to promote career and networking needs of themembers, Postscripts includes news and event notices coveringChapter activities.

SUBSCRIPTION : Postscripts is published monthly except inJanuary and July. The magazine is available as open accesspublication and is currently distributed online only.

INSTRUCTION FOR CONTRIBUTORS: We consider articles onany topic of interest to medical writers and communicators. I t ishelpful to look at the past December issues for year end table ofcontents, and browse past issues for style and type of articlespublished. We welcome contributions from AMWAmembers andnon-members alike. Please contact editor.

ADVERTISING : Postscripts is an advertising-free magazine.However, articles describing products and services relevant tomedical writers may be considered or solicited. As a service to ourmembers, they may submit advertisements for their services orproducts for free. Please contact editor.

WEBSITES:

Postscripts: http: //issuu.com/postscriptsChapter Website: http: // www.amwa-pacsw.org

Copyright 2011 -201 5, American Medical Writers AssociationPacific Southwest Chapter, San Diego, CA. All rights reserved.(Authors retain copyright to their articles. Please contact authorsdirectly for permission to use or display their work in any form ormedium.) Design by Ajay Malik.

EDITORAjay K Malik, [email protected]

EDITOR-AT-LARGEDonna Simcoe, MS, MS, MBA, CMPPPresident, AMWA [email protected]

AMWA Pacific Southwest ChapterLeadership —

PresidentDonna Simcoe, MS, MS, MBA, [email protected]

Immediate Past PresidentJennifer Grodberg, PhD, [email protected]

President-ElectSusan Vinti l la-Friedman

TreasurerElise Sudbeck, [email protected]

SecretaryAndrew Hellman, [email protected] Midthune, PhD

Arizona LiaisonKathy Boltz, [email protected]

Membership CoordinatorGail Flores, [email protected]

Employment CoordinatorSharyn Batey, [email protected]

Outreach CoordinatorAsoka Banno, PhD

Website CoordinatorLaura J Cobb, [email protected]

Newsletter EditorAjay K Malik, [email protected]

AMWA Pacific Southwest Conference ChairsJacqueline A Dyck-Jones, PhD, MScJennifer Grodberg, PhD, RAC

BannerPhotoArtbyChipReuben,www.photoartw

indows.co

m/Redphonebooth.ByPetrKrato

chvil,publicd

omainpictu

res.net

© Chip Reuben 2008

1 08 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 3: Postscripts v5 n36 _2015Aug

POSTSCRIPTSAugust 201 5 | Volume 5, No.36

COVER:

Jim Yuen, a member of our chapter, shares a picture from his recent trip to Antarctica.

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 09

• Data Basics — Tanya L Hoskin, MS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• Always Look at the Data — Tanya L Hoskin, MS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• Data Types — Tanya L Hoskin, MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• More Good Reasons to Look at the Data — Tanya L Hoskin, MS . . . . . . . . . . . . . . . . . . . . . .

• Parametric and Nonparametric: Demystifying the Terms — Tanya L Hoskin, MS. . . . . . . . . . .

• AMA-zing Style — the AMA Manual of Style Column — Dikran Toroser, PhD, CMPP

Presentation of Data: Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Types of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• From the President's Desk— Donna Simcoe, MS, MS, MBA, CMPP . . . . . . . . . . . . . . . . . . . . .

• Editor's Desk: Life of Clinical Data: From Creation to Conclusions — Ajay K Malik, PhD . . . . . .

FEATURES

• How Do Grantwriters Measure Their Success — Meg Bouvier, PhD . . . . . . . . . . . . . . . . . . . . .

• A Disease by Another Name — Rebecca J. Anderson, PhD . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• Ask APRIL: What is Summer (Work) Style? — April Reynolds, MS, ELS . . . . . . . . . . . . . . . . . .

NETWORKING EVENTS

• Chapter Happy Hour in San Diego, June 1 8, 201 5 (Picture Summary). . . . . . . . . . . . . . . . . . . . .

DEPARTMENTS

• New Members — Compiled by Gail Flores, PhD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• Chapter Events' Calendar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

• Medical Writing Open Positions — Compiled by Sharyn Batey, PharmD, MSPH . . . . . . . . . . .

• Backpage: Sir Ronald Fisher, Statistician, Geneticist, and a Farmer . . . . . . . . . . . . . . . . . . . . . . .

© Chip Reuben 2008Special Focus Issue on Biostatistics. . .

11 3

11 5

11 7

1 21

1 24

1 27

1 29

11 0

111

1 32

1 34

1 35

1 37

1 38

1 39

1 42

1 44

Also in this issue. . .

Page 4: Postscripts v5 n36 _2015Aug

From the President's Desk

“August hangs at the very top of summer, the top of the live-long year, like the highest seat of aFerris wheel when it pauses in its turning”Natalie Babbitt, Tuck Everlasting

I hope you are enjoying the Summer!

Did you attend our happy hour in La Jolla in June? Or the 3 great medical device presentations inThousand Oaks in July? Stay tuned for the July meeting summary in our next newsletter. We havemore great events planned over the next few months. On August 1 5th, we are holding a joint meetingwith the San Diego Regulatory Affairs Network (SDRAN) about how to write labels that can helpimprove patient safety. This should be a very informative meeting that also includes a tour of theCareFusion Patient Safety Center.

Are you interested in how medical writers practice their craft? If so, please join us on September 19th

at our “Medical Writers’ Toolbox Decoded” Symposium. This is a free event hosted by our chapterand Amgen in Thousand Oaks. We thank Ajay Malik, the presenters and the representatives fromAmgen for all of their wonderful behind-the-scenes efforts. Be sure to register by September 4th!

In this August newsletter, we send a big thank you to Tanya Hoskin for her very thorough articlesabout data basics, data types, parametric and nonparametric data and how/why we should look at ourdata. Dikran Toroser has provided another excellent summary, this time about how to present data intables and figures.

Are you interested in learning more about grant writing? Please be sure to check out Meg Bouvier’sarticle on how grant writers measure their success. I am sure you will also enjoy the entertainingarticle by Rebecca Anderson about the different names of commonly-known diseases. We thankApril Reynolds for her fun column about what to wear/not to wear to work in the summer, and ouremployment coordinator, Sharyn Batey, for keeping us informed of jobs in the area.

President-Elect Announced!I am pleased to announce Susan Vintilla-Friedman, Principal at Vintilla Communications, as thePresident-Elect of the AMWA Pacific Southwest chapter. Susan has been an active member with ourchapter since 2003 and was a member in the Northern California chapter before then. Susan attendsour chapter meetings, has presented on our chapter’s behalf, is actively involved in the DIAMedicalWriting Community, and is on the AMWAMedical Writing Certification Examination DevelopmentCommittee since September 2013. Susan will become the chapter President in January 2016. Pleasejoin me in welcoming Susan to her new role in our chapter leadership!

We also would like to send a big thank you to Andrew Hellman for all of his wonderful support asSecretary of our chapter. Andrew has done a fantastic job in keeping our social media, such asLinkedIn, up to date and also in informing many contacts in industry, academia and sisterorganizations about our chapter activities. We are sad to see Andrew leave our area but we wish himwell in his future. We are happy to announce that in September, Brea Midthune will become ourSecretary and Asoka Banno will become our Outreach Coordinator. Thank you to both of them forjoining our chapter leadership!

Would you also like to help our chapter? We are looking for an Outreach Coordinator in ThousandOaks/Los Angeles to help us with planning events in those areas. We always looking for informativeand fun articles for our newsletter and it’s a great way to get an online writing credit. Please contactAjay ([email protected]) for more details.

We hope to see you soon!

Donna

Donna Simcoe, MS, MS, MBA, CMPPPresident, AMWA Pacific Southwest Chapter

BackgroundimagebySandrinjaviamorguefi les.ImageURI:http://m

rg.bz/qe6UMN

11 0 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 5: Postscripts v5 n36 _2015Aug

EDITOR'S desk:

Life of Clinical Data: FromCreation to Conclusions

Statistics is l ike taxes. Learning either has a prettyhigh psychological hurdle. However, once the basicprinciples are within grasp, they provide a sense ofcontrol, confidence, and may help avoidembarrassing pitfal ls.

In this issue of Postscripts, we reprint a series of 5articles first published a decade ago by TanyaHoskin, Senior Statistician at Mayo Clinic. Thesearticles may provide a roadmap from preparingclean rel iable datasets to obtaining results andconclusions supported on the strength of the data.We also include 2 articles by Dikran Toroser aboutpresentation of data. Dikran summarizes guidancefrom the AMA Manual of Style on developing Tablesand Figures.

The first step in conducting a clinical study or aresearch project is good planning for data collection,because all downstream activities depend on thecompleteness of data and logical selection ofvariables. However, often Medical Writers are invitedin the room only after a study is complete, and thefirst set of data tables becomes available.Nevertheless, this is a very important milestone. Atthis step, if this data is from a clinical trial , often aMedical Writer may join Clinical Scientist(s),Statistician(s) and folks running the day-to-dayoperations of a cl inical trial , including ClinicalResearch/Trial Associates to look at the data tableswith the goal of cleaning and preparing the datasetfor analysis.

Clean Reliable Dataset

In the first article (Data Basics, page 11 3) of theseries, Tanya emphasizes that the time spent ondata cleaning is tedious, time-consuming, andunglamorous, but is absolutely essential forobtaining quality data and, thus, robust results andstrong conclusions. Using Microsoft Excelspreadsheet as an example, she points to commonred flags, including inconsistencies, text wherenumeric coding should have been used instead orvice versa, or improper handling of missing values.These red flags generate queries to reconcile valueswith the source or raw data. An audit at this levelhelps clean and prepare the data for statisticalanalyses. The goal is to have as clean and asrel iable data tables as possible.

The next step in the data cleaning process is to plotand look at the data critical ly,again—why?—because as the second article in theseries (Always Look at the Data, page 11 5) reminds

it is never safe to assume that your dataset isperfectly clean. Looking at bars and histograms mayreveal obvious errors, improbable range of values, orlogical inconsistencies that may have been missedduring data cleaning Step 1 . Besides helping cleanthe data further, the patterns help guide the choice ofappropriate statistical tests. The third article (DataTypes, page 11 7) is about knowing your data. Is thedata quantitative or qualitative; if quantitative, is itcontinuous or discrete; and, if qual itative, is itnominal or ordinal. Why does this profiling of datamatter? Classifying data into types not only helpsflag i l logical or inconsistent data, but also helpconfirm appropriate statistical test for that dataset.

Choosing the Right Statistical Test for the Job

We are not done looking at the data! Now take astep back, survey the landscape, and look at thedistribution of data. The next article (More Reasonsto Look at the Data, page 1 21 ) describes the kind ofinformation that bubbles up when the distribution isassessed, namely, the center, the spread and theshape of the data. Shape helps choose theappropriate statistical procedure. The fifth article inthe series (page 1 24) is a primer about these testswhich may be parametric or nonparametric tests.Choosing between parametric and nonparametrictests requires knowledge of subtle differencesbetween these tests, not unl ike choosing betweenyou’ll and y'all—while both are contractions of youall, l ikely you wil l use you'll unless you want to give aSouthern flavor to your fiction. In brief, know yourchoice and look smart.

Once, the data has been analyzed, we need tosummarize, make presentation and report thestatistics properly to support the conclusions drawnfrom the data. Two articles by Dikran Toroser onpage 1 27 and 1 29 describe some of the types oftables and figures one can choose from. Next monthon Sept 1 9th at a symposium organized by ourChapter, “Medical Writers' Toolbox Decoded”, youcan also hear Annalise Nawrocki talk about makingFigures and I l lustrations (see page 1 40).

PICT

URE:

Seat

tle

Dail

yTi

mes

news

edit

orqu

arte

rs-

1900

.Vi

aWi

kipe

dia

(htt

p://

en.w

ikip

edia

.org

/wik

i/Ed

itin

g#me

diav

iewe

r/Fi

le:S

eatt

le_D

aily

_Tim

es_n

ews_

edit

or_q

uart

ers_

-_19

00.j

pg)

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 111

Page 6: Postscripts v5 n36 _2015Aug

11 2 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Reporting Statistics: The SAMPL Guidelines

Final ly, how the statistics are reported inmanuscripts, posters and other documentsstrengthen (or weaken) the validity of conclusionsdrawn from the data. Several studies haveconcluded that statistical errors in biomedicall iterature are mainly due to poor statistical reportingand not due to poor statistical choice of methods. 1

Tom Lang and Douglas Altman, both senior AMWAmembers, have developed a set of statisticalreporting guidel ines, “Statistical Analyses andMethods in the Published Literature” or The SAMPLGuidel ines. These guidel ines are available at thewebsite of EQUATOR network.2 Further detai lsabout statistical reporting are published in a book bythese 2 authors.3

Increasing the Statistics Literary Quotient

The series of articles on data and statisticspresented in this issue of Postscripts may serve as afoundation or primer. However, there is much moreto learn, including the concepts of inferentialstatistics, confidence limits, p-values, correlationsand regression analysis, etc—these are the townsjust coming up on the road taken. Towards the goalof Statistics learning phase 2, there are severaltextbooks to choose from (visit your local l ibrary),free MOOC course,4 and then there is a classic bookfrom 1 954 “How to Lie With Statistics” by Darrel lHuff (available at Amazon5 or as a free ebook fomarchive.org.6

“How to Lie with Statistics”

This l ittle book fal ls in the similar genre as Dr Suessbooks—entertaining and informative.7 I t is a smallbook with 1 42 pages ful l of examples of fudgedanalysis or misleading presentation of data, mostlyfrom news media, that wil l keep the reader smil ingand wil l give no reason to put the book down. Eachchapter has several cartoons driving home thestatistical concepts and making the reader laugh atthe same time! As Darrel l admits, the book isdevoted to “wilderness of fraud”, but in theconcluding chapter, he provides tips on “how torecognize sound and usable data in that wildernessof fraud”. This book is a must read and is a perfectcompanion to throw in the travel bag while you driveout of your hometown for your summer vacation.

In the end, Medical Writers have everything to gainby being proficient in data analysis and presentationof facts supported by robust statistical analysis. AsDarrel l Huff wrote:

“The fact is that, despite its mathematical base,

statistics is as much an art as it is science. A

great many manipulations and even distortions

are possible within the bounds of propriety.

Often the statistician must choose among

methods, a subjective process, and find the

one that he wil l use to represent the facts.”

1 . Lang T, Altman D. Basic statistical reporting for articles published in cl inical medical journals: the SAMPL Guidel ines. In: Smart P, Maisonneuve H,Polderman A (eds). Science Editors' Handbook, European Association of Science Editors, 201 3. • Lang TA, Altman DG. Int J Nurs Stud. 201 5Jan;52(1 ):5-9

2. http: //www.equator-network.org/reporting-guidel ines/sampl/3. Lang T, Altman D. How to Report Statistics in Medicine: Annotated Guidel ines for Authors, Editors, and Reviewers. American College of Physicians.

2nd edition. 20064. Statistics in Medicine. Stanford University.https://lagunita.stanford.edu/courses/Medicine/HRP258/Statistics_in_Medicine/about5. Huff D. How to Lie With Statistics. WW Norton & Co, New York. 1 954. http: //www.amazon.com/How-Lie-Statistics-Darrel l-Huff/dp/039331 07286. Archive.org. https://archive.org/detai ls/HowToLieWithStatistics7. Steele JM. Darrel l Huff and Fifty Years of How to Lie with Statistics. Statistical Science2005, Vol. 20, No. 3, 205–209. DOI

1 0.1 21 4/088342305000000205

Page 7: Postscripts v5 n36 _2015Aug

Any statistician wil l tel l you that a large percentage ofour time is spent cleaning and preparing data foranalysis. I t is a tedious, time-consuming, andunglamorous part of the job, but it is absolutelyessential . The quality of the data determines thequality of our results and the conclusions we drawfrom them. For the researcher who is collecting,entering, or manipulating data, mastery offundamental concepts about data is important andcan make your and/or your consulting statistician’sjob easier.

We start with the most fundamental concepts. Theseideas are probably best conceptual ized by thinking ofa spreadsheet, such as Microsoft Excel. Twospreadsheets appear below. Upon first glance, wesee that they contain essential ly the sameinformation. The structure and quality of the secondspreadsheet is far superior to the first, however. Froman analysis perspective, the first spreadsheet isunusable in its current form. These two spreadsheetswil l set the stage for an explanation of the basics ofdata.

Two fundamental concepts: variables andobservational units• We use the structure of the spreadsheet (columns

and rows) to organize our data and enforceconsistency. The columns of a spreadsheetrepresent variables. You could think of a variableas a single piece of information collected on everyindividual (e.g. , height, blood type).

• The rows in the spreadsheet represent theindividuals on whom we are collecting data.Typical ly, it is desirable to have the data for eachunique experimental/observational unit (e.g. ,patient) in a single row of the spreadsheet.

Rules for spreadsheet cells• Each cell of a spreadsheet (or each variable for anindividual) should contain a single piece ofinformation.

• This fol lows from the fundamental concepts thateach column contains a variable (a single piece ofinformation collected on individuals) and each rowcontains information for a single individual. A cell isthe intersection of a particular column and row;therefore, a cell should contain the information forone variable for one individual.

• For example, don’t enter gender and date of birth inthe same cell . I t may seem like a good idea tocombine demographic variables or other similarvariables, but it’s not. Remember that although youcan sti l l see the information, software programs

cannot easily separate thosetwo pieces of information. Youmust keep each distinct pieceof information in a distinct cel l .I f you use a comma at any timeduring your data entry, it’s l ikelya problem!• See the last column of the“bad spreadsheet” for anil lustration of this point. Twodifferent diagnosis codes areentered in a single cell andseparated by a comma. Seethe “good spreadsheet” for onepossible solution to this type ofsituation.

ConsistencyFor data to be useful, it mustbe recorded in a consistentformat. One of the mostimportant tips related to thisconcept is to avoid using text(letters, symbols, etc.) in yourspreadsheet wheneverpossible. Why? The use of textmakes it very difficult tomaintain consistency and can

Data BasicsBy Tanya L Hoskin, MS, Senior Statistician

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 11 3

Page 8: Postscripts v5 n36 _2015Aug

CREDITS:

Article reprinted with permission of MayoFoundation for Medical Education and Research.All Rights Reserved. © Mayo Foundation.

This article was original ly published at mayo.edu in2009. Visit Biostatistics, Epidemiology andResearch Design (BERD) resource center of MayoClinic at:http: //www.mayo.edu/ctsa/resources/consultative-resources/biostatistics-epidemiology-and-research-design-berd-resource—Editor

result in a spreadsheet that is very difficult to use foreven the simplest analyses.

Whenever possible, you should use numeric coding.For example, if the variable is gender, it is better todevelop a numeric coding system such as 1 = male, 2= female rather than using text such as male/femaleor M/F. Although this method may seem morecumbersome, think of it this way – there is only oneway to write a “1 ” but many ways to write a textdescription of gender (“M”, “m”, “male”, etc.). To thesoftware program, “M” and “m” are two differentcharacter values. Look at the gender column of the“bad spreadsheet”. I f you tried to use thisspreadsheet for analysis, the software package wouldtel l you that there are seven different categories forthe gender variable. Make certain to careful lydocument what your numeric codes mean.

• Text added to an otherwise numeric column (forexample, including “kg” and entering 50kg formass) wil l also make that column very difficult towork with; enter the number only and documentthe units (in the column name, for example). Referto the mass column of the “bad spreadsheet” foran example.

• I f you need to capture a text description in additionto the variable of interest, feel free to add and usea notes column. Just remember that it is unl ikelythat you wil l be able to use this comment field forany type of analysis. Record only numeric valuesin the column that contains the variable you wantto analyze.

Insider information: miscellaneous tips• There is a difference between a missing value and azero. Zero is a number and should only be usedwhen a value of zero is observed for the variable ofinterest. Procedures for entering missing valuesdiffer among projects. However, if you are workingwith a spreadsheet for data entry, leaving the cellblank is often the best choice. Remember thatusing text such as “N/A” or “unknown” wil l make anotherwise numeric column difficult to use.

• Use four-digit years for any date variables. Enterdates using a consistent format. The format of datevariables should be MM/DD/YYYY. In the “badspreadsheet”, notice that the use of seven differentdate formats creates a mess.

• Keep extraneous information, such as datasummaries, out of the spreadsheet. The idealspreadsheet should contain only raw data with asingle row of column headings, which contains thevariable names. Notice that the last row of the “badspreadsheet” contains the average height. This isnot part of the raw data and should be recordedelsewhere.

• Statistical software packages recognize two variabletypes: numeric and character. For a variable to beconsidered numeric, which is desirable in mostsituations, the entire column must contain only

numbers. Statistical software packages read theentire column, and any text in any cell of thatcolumn wil l cause that variable to be treated ascharacter data. For example, the height column inthe “bad spreadsheet” would be read in as acharacter variable. This would make it impossibleto get even simple descriptive statistics, such asthe mean and standard deviation, without goingthrough special procedures to clean the data.

Although this l ist cannot cover everything you need tothink about when using a spreadsheet to collect data,these concepts and tips should provide a goodstarting point. I f you go back to the “bad spreadsheet”and “good spreadsheet” examples we started with, itshould now be clear why the latter is superior.

11 4 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

TANYA L HOSKIN, MS, has worked as a statistician at

the Mayo Clinic Deparatment of Health Sciences

Research since 2001 . She is currently in a senior

statistician role collaborating on research projects in

disease areas such as fibromyalgia, benign breast

disease, and breast cancer and has been a co-author on

more than 75 peer-reviewed publications. Previously

she worked at her institution’s statistical consulative

resource, which included educating cl inicians and junior

investigators in the basics of statistics and provided the

motiviation for writing the series of articles reprinted

here. Ms. Hoskin received her Master’s degree in

Statistics from Iowa State University.

Page 9: Postscripts v5 n36 _2015Aug

Always Look at the DataBy Tanya L Hoskin, MS, Senior Statistician

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN

I t is one of the most obvious and yet easily forgottensteps in data analysis – “look at the data.” Anyconsulting statistician wil l tel l you that this is essentialto an accurate and appropriate statistical analysis.Why? Well , it is never safe to assume that your dataset is perfectly clean. Looking at the data can revealobvious errors, particularly impossible or improbablevalues and logical inconsistencies. Furthermore,looking at the data gives you a clearer understandingof the variables and the values they are taking andcan help you choose appropriate statistical analyses.

What I mean by “look at the data” is to look atnumeric and graphical summaries to identify possibleproblems in the data and gain descriptive information.Here we wil l focus on data cleanup. I start byassuming you have a well-organized spreadsheet(see Data Basics; page 11 3 of this issue). Al lcomputer output shown here was generated usingJMP 5.0.1 statistical software.

Rule 1 : Look at the levels of a categorical variable

Rule 1 deals with variables that classify subjectsbased on a category. Examples are gender, bloodtype and histologic stage. For categorical variables,there are usually a small number of possible values.Thus, it makes sense to look at the distribution ofthese variables (i .e. , summarize the categories thatoccur in the data set) to make certain that eachcategory is val id.

We consider data from a study of 1 00 patients:

The bar chart on the left has one bar for eachcategory or level of the variable. The fact that thereare three bars rather than two tel ls us immediatelythat there is a problem since gender can only havetwo possible values. The frequency table on the rightalso shows us that the levels appearing in the dataset are 1 , 2 and 23. We need to identify the subjectwith a 23 recorded, verify the correct gender, and editthe data set accordingly.

In this example, the gender variable has the correctnumber of levels and the correct numeric codes. I twould be easy to mistakenly report that 55% of studyparticipants were male and 45% were female. Thepoint of this example is that you should always look atthe total sample size included in the statisticalprocedure. Here the total sample size used was 97subjects, but there were 1 00 subjects in the data set.Three patients do not have a value fi l led in for thegender variable (i .e. , missing data). Many timesmissing data is not an error because it simply reflectsreality. For a variable such as gender, however, weshould have complete data for al l patients. We needto identify the three patients and fi l l in their gender.

Rule 2 – Look at extreme observations ofquantitative variablesRule 2 deals with variables that can be measured orcounted on a numeric scale. Examples of this type ofvariable are age, systol ic blood pressure, totalcholesterol, height, weight, and the number of daysfrom hospital admission to discharge. Unlikecategorical variables, quantitative variables oftenhave a large number of possible values. Becausethere are so many possible values, it does not makesense to look at each individual value. We can,however, look at the highest values and the lowestvalues recorded for a quantitative variable to identifyany obvious problems.

We consider the distribution of the age variable in astudy of 25 heart attack patients:

**Example 2 – Multiple problems (see next page)

Notice that this display looks different from thedisplays that we saw in Example 1 . The figure on theleft is cal led a histogram. Recall that a bar chart wasused to display categorical variables in Example 1 ,with each bar representing a category. For aquantitative variable such as age or blood pressure,there are no inherent categories. A histogram displays

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 11 5

Page 10: Postscripts v5 n36 _2015Aug

quantitative data by grouping the numeric values intointervals. The bar height represents the number ofobservations fal l ing into that interval. Also notice thatthe numeric summaries are different. In Example 1 ,because the variable was categorical, the softwaresummarized the number (frequency) and proportionfal l ing into each category. For a quantitative variable,the software summarizes the mean and standarddeviation (under the heading “Moments”) as well asthe median, minimum, maximum, and otherpercenti les of the distribution (under the heading“Quanti les”).

This histogram shows us that there is a subject with arecorded age above 250 years. Under the heading“Quanti les”, we see that the maximum value for ageis 253 years, clearly an impossible value. We wouldneed to check this patient’s record and correct thedata set.

The histogram also shows that there is a patient witha recorded age in the interval from 0 years to 25years. The quanti le display shows us that theminimum age in the data set is 4 years. Although thisis not an impossible value, it is an improbable value ifthis data set contains patients who had a heartattack. I t is a good idea to check improbable values tomake certain they were recorded correctly.

Final ly, we look at the total number of observationsincluded in the calculations. In the display above, inthe last row under the heading “Moments”, you seethat N is 24. Thus, although there were 25 patients inthe data set, the age variable was only present for 24.As with gender in Example 1 (b), age is a variable forwhich we should have completedata.

Rule 3 – No one could write down all the rulesRules 1 and 2 are the basics. Depending on yourdata set, there could be many additional data checksto perform. Some other data checks to consider:

• Check for duplicate observations. Data cleaningfrequently reveals that some patients have multiplerecords in a data set. In some cases, there is a

good reason. In other cases, itis a simple mistake. In eitherevent, you need to be awareso you can handle itappropriately. Consult astatistician if you want toanalyze multiple records perpatient since this type ofanalysis requires specialmethodology.• Check the order of dates.Date of recurrence should beafter date of diagnosis. Dateof death cannot be beforedate of diagnosis. Thestatements sound obvious,but these types of errors are

often missed unless you pay close attention andcheck. Using software to calculate the time intervalbetween two dates that should have a specificorder is an easy way to perform this type of datacheck. For example, if you calculate the intervalbetween date of death and date of diagnosis,negative values indicate a problem.

• Check logical relationships among variables. I f onevariable indicates that a patient did not have a CTscan and a second variable has the date of CTscan for that patient as 11 /1 8/2001 , the twovariables are not consistent. I f a data set hasgender-specific variables, such as number ofpregnancies, these variables should only be fi l ledin for patients of the appropriate gender. The list oflogic checks depends on the particular data set.Always take time to think about what logicalrelationships could be verified in your data.

• Use your medical and subject matter knowledge toidentify more subtle inconsistencies.

Collecting your data in a consistent manner with awell-organized spreadsheet is the first step to cleandata but not the last. Rare indeed would be a data setthat did not suffer from at least a few typos or othermistakes. These rules don’t help us find al l errors butcan help us find the obvious ones. Statisticiansregularly check for these types of errors. I f you aredoing your own simple analyses, make sure you donot neglect this important step.

CREDITS:

Article reprinted with permission of MayoFoundation for Medical Education and Research.All Rights Reserved. © Mayo Foundation.

This article was original ly published at mayo.edu in2009. Visit Biostatistics, Epidemiology andResearch Design (BERD) resource center of MayoClinic at:http: //www.mayo.edu/ctsa/resources/consultative-resources/biostatistics-epidemiology-and-research-design-berd-resource—Editor

11 6 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 11: Postscripts v5 n36 _2015Aug

Data TypesBy Tanya L Hoskin, MS, Senior Statistician

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN

Don’t let the title scare you. I know it sounds l ike oneof those topics that only statisticians care about – thekind of topic that makes the eyes of most non-statisticians glaze over. In many ways, data types arevery intuitive. However, when you need to collect,record or analyze your data, you can only accomplishthese tasks successful ly by thinking careful ly aboutwhat type of data you have.

Suppose an investigator wants to determine if aspecific lab value or test result is associated withpatient outcome. In most cases, my first question tothis investigator would be “What does the variablelook like?” In other words, what possible values canthe variable take and how wil l the variable berecorded?

The basicsTypical ly, a variable can describe either a quantitativeor qualitative characteristic of an individual. Examplesof quantitative characteristics are age, BMI ,creatinine, and time from birth to death. Examples ofqualitative characteristics are gender, race, genotypeand vital status. Qualitative variables are also calledcategorical variables. Unfortunately, it gets a l ittlemore complicated.

Quantitative and qualitative data types can each bedivided into two main categories, as depicted inFigure 1 . This means that there are four basic datatypes that we might need to analyze:

1 . Continuous2. Discrete quantitative3. Ordinal4. Nominal

Quantitative variablesYou might think of a quantitative variable as one thatcan only be recorded using a number. Thesevariables describe some quantity about the individualand are often measured (e.g. , body mass ismeasured with a scale) or counted (e.g. , the number

of needle punctures required to obtain the biopsyspecimen is counted).

A quantitative variable can be either continuous ordiscrete. A continuous variable is one that in theorycould take any value in an interval. We say “in theory”simply because we are limited by the precision of themeasuring instrument (e.g. , a patient’s true creatininevalue might be 1 .21 34561 5 but we might only be ableto measure it as 1 .21 3). Examples of continuousvariables are body mass, height, blood pressure andcholesterol.

A discrete quantitative variable is one that can onlytake specific numeric values (rather than any value inan interval), but those numeric values have a clearquantitative interpretation. Examples of discretequantitative variables are number of needlepunctures, number of pregnancies and number ofhospital izations. For these examples, positive wholenumbers are the only possible values (i.e. , it is notpossible to have 1 .5 pregnancies).

Qualitative variablesQualitative or categorical variables describe a qualityor attribute of the individual. Categorical data can beeither nominal or ordinal. Sex is an example of anominal variable, and histologic stage is an exampleof an ordinal variable. What is the difference betweenthese two variables? The values for one of thesevariables have a specific order; for the other variable,they do not. I f one patient has histologic stage 4 andanother patient has histologic stage 1 , you know thatthe stage 4 patient has more severe disease.Although the histologic stages are categories, thecategories have an inherent order. The same cannotbe said for the variable sex. Qualitative data withunordered categories is referred to as nominal;qual itative data with ordered categories is referred toas ordinal.

Why do we care about data types during the datacollection phase?The answer here seems pretty obvious – you mustunderstand the data type of each variable in order torecord its values in a consistent manner. Thisprobably won’t require much thought in most cases,but consider the fol lowing example.

Suppose you are interested in the variable creatininebut plan to analyze it as a binary variable byclassifying patients as creatinine < 1 .8 or creatinine ³1 .8. You could simply collect which of thesecategories each individual fal ls into, but this probablyisn’t the best choice. I f a categorical variable is based

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 11 7

Page 12: Postscripts v5 n36 _2015Aug

on the value of a continuous variable, it is general ly agood idea to collect the continuous variable. Acontinuous variable provides more information than abinary variable, which usually translates into morestatistical power to detect differences among patients.I f, in the analysis phase, you decide that you really dowant to use the binary version of the variable, youcan easily use a formula in a spreadsheet orstatistical software package to create the binaryvariable from the continuous one you collected. Onthe other hand, if you only collect the binary variable,you do not have the source measurement recorded togo back to if necessary.

Why do we care about data types during theanalysis phase?You are probably frequently exposed to terms suchas mean, median, frequency, proportion, two-samplet-test, chi-square test, regression, correlation, logisticregression, etc. These are all statistical calculationsor procedures, but which ones do you use – andwhen? The appropriate statistical calculation orprocedure is driven in large part by the data types.

The most basic example of data types drivingstatistical calculations is i l lustrated in Figure 2, whichshows the distributions of the variables bodytemperature (°C) and diabetes (0 = No diabetes, 1 =Yes diabetes) among 1 420 hospital ized cancer

patients. Diabetes is a nominal variable with only twopossible values. Thus, we want to know the number(frequency) of patients with diabetes and whatproportion of the total sample they represent.Because body temperature is a continuous variablewith many possible values, we summarize itsdistribution by reporting statistics such as the median,minimum, maximum, mean and standard deviation.Clearly it would not be feasible or helpful tosummarize the number and proportion of patientswho had each specific body temperature value, justas it would make no sense to calculate the mean ofthe diabetes variable.

An analysis example

When you begin to pursue analyses more complexthan descriptive statistics, data types are just asimportant and wil l lead you to the appropriatestatistical procedures. Consider two questions in ahypothetical study involving these 1 420 patients:

1 . Is gender associated with body temperature?2. Is gender associated with diabetes?

First, we must consider the data types. We know thatbody temperature is quantitative and more specifical ly

11 8 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 13: Postscripts v5 n36 _2015Aug

continuous; diabetes is qualitative and morespecifical ly nominal. Okay, now what? Whatinformation might be helpful in addressing the firstquestion? Would it be helpful to know the distributionof body temperature separately for males andfemales?

The distributions shown in Figure 3 summarize acontinuous variable (body temperature) for each oftwo groups (females and males). A statistical quantityused to summarize the distribution of a continuousvariable is the mean. We see that the mean bodytemperature for males was 36.90°, compared to36.99° for females. Just as we compare means in thetwo groups in our descriptive statistical analysis, weneed a procedure that wil l statistical ly compare themean among males to the mean among females.

One statistical test for comparing means between twogroups is a two-sample t-test.

To answer question 2, we might start by summarizingthe distribution of the diabetes variable separately formales and females (Figure 4).

The statistical quantity used to summarize thedistribution of a nominal variable such as diabetes isa proportion. From Figure 4 we see that 46/562 (8.2percent) females have diabetes, compared to 76/858(8.9 percent) males. Because of the data types, weknow that we would need a statistical procedure tocompare proportions. The appropriate procedure tostatistical ly compare proportions between two groupsis a chi-square or Fisher’s exact test.

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 11 9

Page 14: Postscripts v5 n36 _2015Aug

ConclusionI t’s not always easy to classify the data type of avariable or to decide how it should be analyzed.Continuous and nominal variables are usuallystraightforward, but discrete quantitative and ordinalvariables can be more challenging. For example, ifyou are interested in reporting the number ofpregnancies among women in your study group, is itmeaningful to treat this as a continuous variable andprovide the mean number of pregnancies? Or wouldit be more meaningful to treat it as an ordinal variableand summarize the number of women with onepregnancy, two pregnancies, etc.? Or would it bemore meaningful to report the number of women whohad one or more pregnancies?

The answers to questions l ike these often depend onmany factors such as the reason that you aresummarizing a particular variable and what youbelieve wil l be the most meaningful and usefulstatistic for your audience.

CREDITS:

Article reprinted with permission of Mayo Foundationfor Medical Education and Research.All Rights Reserved. © Mayo Foundation.

This article was original ly published at mayo.edu in2009. Visit Biostatistics, Epidemiology and ResearchDesign (BERD) resource center of Mayo Clinic at:http: //www.mayo.edu/ctsa/resources/consultative-resources/biostatistics-epidemiology-and-research-design-berd-resource—Editor

1 20 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 15: Postscripts v5 n36 _2015Aug

More Good Reasons to Look at the DataBy Tanya L Hoskin, MS, Senior Statistician

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN

The article Always Look at the Data (see page 11 5)suggested that you look at your data to find problems,such as invalid codes or impossible values. Anotherarticle, Data Types (see page 11 7), explained how thetype of data helps determine what statisticalprocedures are appropriate. This statistical tip wil l bea combination of those two topics. Here, we wil l thinkabout how the overal l look of our data might also helpus determine which statistical procedures to use. Inother words, we are going to think about aspects ofthe data distribution. We wil l focus on quantitative,continuous data.

What is a distribution?

You might hear the word “distribution” al l the time indiscussions about data, but how often do you thinkabout what it means? Basical ly, we have a clinicalquestion we want to answer. In order to answer thatquestion, we collect data from a sample of individualsfrom the population. I t is not very helpful to look ateach individual ’s value separately, so we need a wayto look at the values for the whole sample at once.The distribution of a variable shows us a summary ofal l the values in a single picture. In other words, thedistribution shows us how the values are distributedor where they fal l in the range of possible values.

The histogram in Figure 1 shows the distribution ofthe age variable (AGE1 ) for a sample of 500individuals. The vertical bars represent the number ofpatients out of the total sample who have ages ineach interval. Here the intervals span 5 years. Thetal lest bar shows us that approximately 1 20 of theindividuals had an age between 50 and 55 years.This bar is approximately in the middle of the range ofvalues. Without calculating the average age of our

patients, we might guess that it is between 50 and 55just by looking at this picture. We also know that mostpatients were between 30 and 70 years old. Therewere only a few patients younger than 30 or olderthan 70 based on the very short bars for thoseintervals. Clearly, we gain a lot of information quicklyby looking at the distribution, and this information iscertainly more helpful than a list containing eachindividual ’s age.

Center, spread and shape

The center, spread and shape of a data distributionare the three key pieces of information we can assessby looking at a histogram. “Center” simply refers tothe middle of the distribution, or an estimate of what atypical value would be for these individuals. Based onFigure 1 , we said that the center of the distributionseems to be somewhere between 50 and 55.“Spread” is simply how “spread out” or variable thedata is. In other words, were a wide range of valuesobserved or do patients general ly have values near ecenter of the distribution?

Figure 2 contains the distribution of ages (AGE2) fora different sample of 500 patients. The distribution inFigure 2 has more spread than the distribution inFigure 1 . Can you see why? Well , the center of thedistribution is about the same, somewhere between50 and 55, but we observed a larger range of values.The youngest patient in Figure 2 is between 5 and 1 0years of age, while the oldest is between 85 and 90.Also, we see more patients in intervals that arefurther from the center (e.g. , 25 to 30 years). Theseobservations tel l us that there is more variabil ity orspread in age among the patients from Figure 2compared to Figure 1 .

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 21

Page 16: Postscripts v5 n36 _2015Aug

I f you were to describe the shape of the histograms inFigures 1 and 2, you might say that both are shapedlike a mound. The general appearance of thehistogram is one thing to note about the shape of thedistribution; in many cases it wil l be a mound, butoccasionally you might see a shape that looks l iketwo mounds (cal led a bimodal distribution) or someother non-mound shape. A second very importantproperty to assess with regard to the shape of adistribution is its symmetry. In other words, could youcut the histogram in half and have one side thatlooked roughly l ike the other side? If a distribution isnot symmetric, we might describe it as either right orleft skewed.

Figure 3 shows two examples of distributions withskewed shapes. The distribution of the variable

ejection fraction (EF) could be described as leftskewed; the distribution of the variable heart ratecould be described as right skewed. The “right” and“left” parts of the description refer to the side of thehistogram that trai ls out farther than the other side.

Shape helps determine appropriate statisticalprocedures: a simple example

Look at the distribution of creatinine in Figure 4. I t isbased on a sample of 20 individuals and clearly has aright skew. The sample mean and median are 1 .5 and1 .2, respectively. In this example, these twomeasures of center are quite different. Which oneshould you report? Which value is morerepresentative of a “typical value” for theseindividuals?

1 22 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 17: Postscripts v5 n36 _2015Aug

For highly skewed distributions or those withunusually large or small values (i.e. , outl iers), themedian typical ly is a more appropriate summarystatistic to describe the center of the distribution.Why? Since the mean is calculated by adding all ofthe individual values and dividing by the sample size,it is strongly influenced by extreme observations inthe tai ls of the distribution, especial ly for smallsample sizes. The median, on the other hand, issimply the value that has half of the data points fal l ingbelow it and half above, so it is not affected by themagnitude of extreme observations. In a perfectlysymmetric distribution, the mean and median areequal. In a skewed distribution, the mean is pul led inthe direction of the skew. Thus, the mean is largerthan the median if the distribution is right skewed andsmaller than the median if the distribution is leftskewed.

When one reports the median rather than the mean,the range (minimum and maximum) or interquarti lerange (25th and 75th percenti les) is the appropriatemeasure of spread. For the distribution of creatininein Figure 4, we might report a median creatinine of1 .2 with a range from 0.9 to 3.8.

The normal distribution

In order to talk about more advanced examples ofhow the shape of the data distribution affects ourchoice of statistical procedures, we must discuss thenormal probabil ity distribution. There are assumptionsunderlying many common statistical tests andprocedures, and the most common assumptions arethose related to the normal probabil ity distribution.Specifical ly, many statistical procedures assume yourdata is normally distributed.

The normal probabil ity distribution is a theoretical,mathematical ly defined distribution that explicitlydefines how likely certain ranges of values are tooccur. I t is perfectly symmetric and shaped like a bell .A normal probabil ity distribution curve overlays thetwo histograms in Figure 5. You can see that thehistogram more closely resembles the shape of thenormal curve for the variable age than for the variableduration of hospital stay. We might say that the agedistribution is approximately normal but that durationof hospital stay is not normal and is right skewed.

The normal distribution shows up in a surprisingnumber of natural phenomena. For example, humanhippocampal volumes are approximately normal. I talso shows up in many of our derived statisticalcalculations. By exploiting the desirable properties ofthe normal distribution, statisticians have developedmany of the statistical procedures that are so helpfulfor answering scientific questions.

I f the normal distribution is at the foundation of somany of our statistical procedures, you might wonderhow we deal with the fact that the histograms arerarely, if ever, perfectly normal. We wil l discuss thistopic in the next quarterly statistical tip, but the shortanswer is that “approximately normal” is often goodenough, and we have tools to help when distributionsare clearly not normal.

CREDITS:

Article reprinted with permission of Mayo Foundationfor Medical Education and Research.All Rights Reserved. © Mayo Foundation.

This article was original ly published at mayo.edu in2009. Visit Biostatistics, Epidemiology and ResearchDesign (BERD) resource center of Mayo Clinic at:http: //www.mayo.edu/ctsa/resources/consultative-resources/biostatistics-epidemiology-and-research-design-berd-resource—Editor

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 23

Page 18: Postscripts v5 n36 _2015Aug

Parametric and Nonparametric: Demystifying the TermsBy Tanya L Hoskin, MS, Senior Statistician

Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN

In More Good Reasons to Look at the Data (see page1 21 ), we looked at data distributions to assess center,shape and spread and described how the validity ofmany statistical procedures rel ies on an assumptionof approximate normality. But what do we do if ourdata are not normal? In this article, we’l l cover thedifference between parametric and nonparametricprocedures. Nonparametric procedures are onepossible solution to handle non-normal data.

DefinitionsI f you’ve ever discussed an analysis plan with astatistician, you’ve probably heard the term“nonparametric” but may not have understood what itmeans. Parametric and nonparametric are two broadclassifications of statistical procedures. TheHandbook of Nonparametric Statistics1 from 1 962 (p.2) says:

“A precise and universal ly acceptable definitionof the term ‘nonparametric’ is not presentlyavailable. The viewpoint adopted in thishandbook is that a statistical procedure is of anonparametric type if it has properties which aresatisfied to a reasonable approximation whensome assumptions that are at least of amoderately general nature hold.”

That definition is not helpful in the least, but itunderscores the fact that it is difficult to specifical lydefine the term “nonparametric. ” I t is general ly easierto l ist examples of each type of procedure(parametric and nonparametric) than to define theterms themselves. For most practical purposes,however, one might define nonparametric statisticalprocedures as a class of statistical procedures thatdo not rely on assumptions about the shape or formof the probabil ity distribution from which the datawere drawn.

The short explanation

Several fundamental statistical concepts are helpfulprerequisite knowledge for ful ly understanding theterms “parametric” and “nonparametric. ” Thesestatistical fundamentals include random variables,probabil ity distributions, parameters, population,sample, sampling distributions and the Central LimitTheorem. I cannot explain these topics in a fewparagraphs, as they would usually comprise two orthree chapters in a statistics textbook. Thus, I wil l l imitmy explanation to a few helpful (I hope) l inks amongterms.

The field of statistics exists because it is usuallyimpossible to collect data from all individuals ofinterest (population). Our only solution is to collectdata from a subset (sample) of the individuals ofinterest, but our real desire is to know the “truth”about the population. Quantities such as means,standard deviations and proportions are all importantvalues and are called “parameters” when we aretalking about a population. Since we usually cannotget data from the whole population, we cannot knowthe values of the parameters for that population. Wecan, however, calculate estimates of these quantitiesfor our sample. When they are calculated fromsample data, these quantities are called “statistics.” Astatistic estimates a parameter.

Parametric statistical procedures rely on assumptionsabout the shape of the distribution (i .e. , assume anormal distribution) in the underlying population andabout the form or parameters (i .e. , means andstandard deviations) of the assumed distribution.Nonparametric statistical procedures rely on no orfew assumptions about the shape or parameters ofthe population distribution from which the sample wasdrawn.

As I mentioned, it is sometimes easier to l istexamples of each type of procedure than to definethe terms. Table 1 contains the names of severalstatistical procedures you might be famil iar with andcategorizes each one as parametric ornonparametric. Al l of the parametric procedures l istedin Table 1 rely on an assumption of approximatenormality.

An exampleSuppose you have a sample of critical ly i l l patients.The sample contains 20 female patients and 1 9 malepatients. The variable of interest is hospital length ofstay (LOS) in days, and you would l ike to comparefemales and males. The histograms of the LOSvariable for males and females appear in Figure 1 .We see that the distribution for females has a strongright skew. Notice that the mean for females is 60days while the median is 31 .5 days. For males, thedistribution is more symmetric with a mean andmedian of 30.9 days and 30 days, respectively.Comparing the two groups, their medians are quitesimilar, but their means are very different. This is acase where the assumption of normality associatedwith a parametric test is probably not reasonable. Anonparametric procedure would be more appropriate.

1 24 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 19: Postscripts v5 n36 _2015Aug

This is the situation l isted in the first row of Table 1 –comparing means between two distinct groups. Thus,the appropriate nonparametric procedure is aWilcoxon rank-sum test. This test would give us a p-value of 0.63. Those of you famil iar with p-valuesknow that we typical ly compare our p-value to thevalue 0.05. We usually say that a p-value less than0.05 is an indication of a statistical ly significant result.So, we would say that there is no significantdifference between the genders with respect to lengthof stay based on the Wilcoxon rank-sum test.Incidental ly, the p-value for the two-sample t-test,which is the parametric procedure that assumesapproximate normality, is 0.04. You can see that incertain situations parametric procedures can give amisleading result.

Why don’t we always use nonparametric tests?Although nonparametric tests have the very desirableproperty of making fewer assumptions about thedistribution of measurements in the population fromwhich we drew our sample, they have two maindrawbacks. The first is that they general ly are lessstatistical ly powerful than the analogous parametricprocedure when the data truly are approximatelynormal. “Less powerful” means that there is a smallerprobabil ity that the procedure wil l tel l us that twovariables are associated with each other when they infact truly are associated. I f you are planning a studyand trying to determine how many patients to include,a nonparametric test wil l require a sl ightly larger

sample size to have the same power as thecorresponding parametric test.

The second drawback associated with nonparametrictests is that their results are often less easy tointerpret than the results of parametric tests. Manynonparametric tests use rankings of the values in thedata rather than using the actual data. Knowing thatthe difference in mean ranks between two groups isfive does not real ly help our intuitive understanding ofthe data. On the other hand, knowing that the meansystol ic blood pressure of patients taking the newdrug was five mmHg lower than the mean systol icblood pressure of patients on the standard treatmentis both intuitive and useful.

In short, nonparametric procedures are useful inmany cases and necessary in some, but they are nota perfect solution.

Take-home pointsHere is a summary of the major points and how theymight affect statistical analyses you perform:

• Parametric and nonparametric are two broadclassifications of statistical procedures.

• Parametric tests are based on assumptions aboutthe distribution of the underlying population fromwhich the sample was taken. The most commonparametric assumption is that data areapproximately normally distributed.

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 25

Table 1 . A l isting of parametric tests and analogous nonparametric procedures

Page 20: Postscripts v5 n36 _2015Aug

• Nonparametric tests do not rely on assumptionsabout the shape or parameters of the underlyingpopulation distribution.

• I f the data deviate strongly from the assumptions ofa parametric procedure, using the parametricprocedure could lead to incorrect conclusions.

• You should be aware of the assumptions associatedwith a parametric procedure and should learnmethods to evaluate the validity of thoseassumptions.

• I f you determine that the assumptions of theparametric procedure are not val id, use ananalogous nonparametric procedure instead.

• The parametric assumption of normality isparticularly worrisome for small sample sizes (n <30). Nonparametric tests are often a good optionfor these data.

• I t can be difficult to decide whether to use aparametric or nonparametric procedure in somecases. Nonparametric procedures general ly haveless power for the same sample size than thecorresponding parametric procedure if the datatruly are normal. Interpretation of nonparametricprocedures can also be more difficult than forparametric procedures.

• Visit with a statistician if you are in doubt aboutwhether parametric or nonparametric proceduresare more appropriate for your data.

• The book Practical Nonparametric Statistics2 is anexcellent resource for anyone interested in

learning about this topic in great detai l . Moregeneral texts such as Fundamentals ofBiostatistics3 and Intuitive Biostatistics4 havechapters covering the topic of nonparametricprocedures.

References1 . Walsh, J.E. (1 962) Handbook of NonparametricStatistics, New York: D.V. Nostrand.

2. Conover, W.J. (1 980). Practical NonparametricStatistics, New York: Wiley & Sons.

3. Rosner, B. (2000). Fundamentals of Biostatistics,California: Duxbury Press.

4. Motulsky, H. (1 995). Intuitive Biostatistics, NewYork: Oxford University Press.

1 26 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

CREDITS:

Article reprinted with permission of Mayo Foundationfor Medical Education and Research.All Rights Reserved. © Mayo Foundation.

This article was original ly published at mayo.edu in2009. Visit Biostatistics, Epidemiology and ResearchDesign (BERD) resource center of Mayo Clinic at:http: //www.mayo.edu/ctsa/resources/consultative-resources/biostatistics-epidemiology-and-research-design-berd-resource—Editor

Page 21: Postscripts v5 n36 _2015Aug

AMA-zing Style — the AMA Manual of Style ColumnBy Dikran Toroser, PhD, CMPP, Amgen Inc. , Thousand Oaks, Calif.

In terms of space, a well-structured table is one ofthe most efficient ways to convey a large amount ofdata. As text, the information may take much morespace; and as a figure, detai ls and precise valuesmay be less apparent. Priorities in the creation andpublication of tables are to emphasize importantinformation efficiently and to ensure that each tablemakes a clear point. In addition to presenting studyresults, tables can be used to explain or amplify themethods or highl ight other key points in the article.Like a paragraph, each table should be cohesiveand focused.

Although, tabular presentation of data adds varietyto the article layout, authors and editors shouldavoid using tables and figures simply to break uptext or to impart visual interest. Below are some

considerations when making the choice betweentext, table or figure to present your data.

Types of Tables:Regular table. Displays information arranged incolumns and rows and is used most commonly topresent numerical data. Tabulating all collectedstudy data is unnecessary and actual ly distracts andoverwhelms the reader. Tables should be able tostand independently, without requiring explanationfrom the text.

Tabulation. This is a brief, in-text table that may beused to set material off from text. Tabulations requirethe text to explain their meaning.

Presentation of Data: Tables

When planning your publication, you wil l have to consider the best way to communicate information to youraudience. General ly, data summaries may take the form of text, tables or figures. Tables are a convenient wayto present l ists of numbers or text in columns and are ideal to explain variables. Tables can also make anarticle more readable by removing numeric or l isted data from the text. The AMA Manual of Style containsexcellent information on table design.

It is just as important to think about the organization of tables as it is to think about

the organization of paragraphs. A well-organized table allows readers to grasp the

meaning of the data presented with ease, while a disorganized one will leave the

reader confused about the data itself, or the significance of the data.

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 27

Page 22: Postscripts v5 n36 _2015Aug

Matrix. This is a tabular structure that usesnumbers, short words (eg, no, yes), or symbols (eg,bul lets, check marks) to depict relationships amongitems in columns and rows.

Organizing Information in Tables. Tables shouldbe organized into columns and rows by type andcategory, thereby simplifying access and display ofdata and information. During planning and creation,the writer should consider the primary comparisonsof interest. The primary comparisons should beshown horizontal ly across the table.

Table Components:Tables usually contain 5 major elements: title,column headings, stubs (row headings), body (datafield) consisting of individual cel ls (data points), andfootnotes.

Titles should be succinct, specific and written as aphrase rather than as a sentence.

Column Headings. Main categories in the tableshould have separate columns—each with a briefheading. In tables for studies that have independentand dependent variables, the independent variablesconventional ly are displayed in the left-hand column(stub) and the dependent variables in the columns tothe right.

Table Stubs (Row Headings). The left-most columnof a table contains the table stubs (or row headings),which are used to label the rows of the table andapply to al l items in that row.

Field. The field or body of the table presents thedata. Each data entry point is contained in a cell ,which is the intersection of a column and a row.

Totals. Any discrepancies in the totals (eg, becauseof rounding) should be explained in a footnote.Boldface should not be used to overemphasize datain the table (eg, significant odds ratios or P-values).

Rules and Shading. For JAMA and the ArchivesJournals, tables should be submitted without rulesdrawn in (as opposed to table borders, which areappropriate) or shading. Many journals add rulesand shading during the production process. Forexample, JAMA uses horizontal rules to separaterows of data. Other journals use shading for thesame purpose.

Footnotes. The order of the footnotes is determinedby the placement in the table of the item to which thefootnote refers. The letter for a footnote that appliesto the entire table (eg, one that explains the methodused to gather the data or format of datapresentation) should be placed after the table title. Afootnote that applies to 1 or 2 columns or rowsshould be placed after the column heading(s) orstub(s) to which it refers. A footnote that applies to a

single entry in the table or to several individualentries should be placed at the end of each entry towhich it applies. For both tables and figures,footnotes are indicated with superscript lowercaseletters in alphabetical order (a-z). The font size of thefootnote letters should be large enough to seeclearly without appearing to be part of the actualdata.

Units of Measure. In tables, units of measure,including the variabil ity of the measurement ifreported, should fol low a comma in the table columnheading or stub. The fol lowing are examples of stubentries with units of measure:

Age, mean (SD), yBody mass index, median (IQR)

Abbreviations. Within the table, units of measuremay be abbreviated for space considerations.However, spelled-out words should not be combinedwith abbreviations for units of measure. "1 st wk" or"Week 1 " is acceptable, but not "First wk."

Numbers. Additional digits (including zeros) shouldnot be added, eg, after the decimal point, to provideall data entries with the same number of digits.Doing so may indicate more precise results thanactual ly were calculated or measured. Values forreporting statistical data, such as P values andconfidence intervals, also should be presented androunded appropriately.

Guidelines for Preparing and Submitting Tables.Authors submitting tables in a scientific article shouldconsult the publication's instructions for authors forspecific requirements and preferences regardingtable format.

See pages 81 -98 in the AMA Manual of Style 1 0th

edition for additional information.

DIKRAN TOROSER, PhD, CMPP, amember of the AMWA Pacific Southwestchapter, is a regular contributor to thePostscripts magazine since 201 2. Hedeveloped the monthly AMAzing Stylecolumn which covers topics from theAMA Manual of Style, and has alsowritten on publication-related topics in these pages.Dikran is currently a Senior Medical Writing Manager atAmgen Inc. in Thousand Oaks, California. He earned hisPhD in Biochemistry from Newcastle University (UK),and did his post-doctoral training in biochemical geneticsat the John Innes Center of the Cambridge Laboratory(Norwich, UK) and in molecular biology with the USDA.Prior to Amgen, Dikran was on the faculty (research) atthe School of Pharmacy at the University of SouthernCalifornia. He can be reached at [email protected].

1 28 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 23: Postscripts v5 n36 _2015Aug

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 29

AMA-zing Style — the AMA Manual of Style ColumnBy Dikran Toroser, PhD, CMPP, Amgen Inc. , Thousand Oaks, Calif.

Statistical Graphs

Line Graphs. These have 2 or 3 axes withcontinuous quantitative scales demonstrating therelationship between 2 or more variables, such aschanges over time. Usually, the dependent variableis on the vertical axis (y-axis) and the independentvariable on the horizontal axis (x axis) (Figure 1 ).

Figure 1 Line Graph (adapted from Haber et al(2004) JAMA, 292(20): 2478-2481 )

Survival Plots. These plots, of time-to-eventoutcomes, such as from Kaplan-Meier analyses,display the proportion of individuals, represented onthe y-axis as a proportion or percentage, remainingfree of or experiencing a specific outcome over time,represented on the x-axis. When the outcome ofinterest is relatively frequent, event-free survival isplotted on the y-axis from 0 to 1 .0 (or 0% to 1 00%),with the curve starting at 1 .0 (1 00%). When theoutcome is relatively infrequent (occurs in < 30% ofthe study population), it is preferable to plot upwardstarting at 0 so that the curves can be seen withoutbreaking or truncating the y-axis scale. The curveshould be drawn as a step function (not smoothed).The number of individuals fol lowed up for each timeinterval (number at risk) should be shownunderneath the x-axis. Time-to-event estimatesbecome less certain as the number of individuals

diminishes, so consideration should be given to notdisplaying data when less than 20% of the studypopulation is sti l l in fol low-up.

Figure 2 Survival Plot (adapted from Squadrone etal JAMA, 2005; 293(5): 589)

Scatterplots. In scatterplots, individual data pointsare plotted according to coordinate values withcontinuous, quantitative x- and y-axis scales. Byconvention, independent variables are plotted on thex-axis and dependent variables on the y-axis. Datamarkers are not connected by a curve, but a curvethat is generated mathematical ly may be fitted to thedata and summarize the relationship among thevariables. The statistical method used to generatethe curve and the statistic that summarizes therelationship between the dependent andindependent variables, such as a correlation orregression coefficient, should be provided in thefigure or legend (Figure 3).

Bar Graphs. Bar graphs have a single axis and areused to display frequencies (counts or percentages)on the axis according to categories shown on abaseline. A bar graph is typical ly vertical, withfrequencies shown on a vertical y-axis, but may behorizontal. Data in each category are representedby a bar. Bars should have the same width, beseparated by a space, and be wider than the space

Types of FIGURES

Communication of data requires figures in addition to words and equations. I f a graph isappropriate, you need to make some deliberate choices. The AMA manual containsinvaluable information on your available choices.

The term figure refers to any graphical display used to present information or data, includingstatistical graphs, maps, algorithms, i l lustrations, computer generated images, andphotographs. In scientific articles, selection of a particular type of figure depends on thepurpose and type of information being displayed. Some of the most common types offigures in biomedical publications are discussed below.

Page 24: Postscripts v5 n36 _2015Aug

between them. Bar lengths are proportional tofrequency, the scale on the frequency axis shouldbegin at 0, and the axis should not be broken. Allbars must have a common baseline to facil itatecomparison. Categories of data should be presentedin logical order and consistently with other figuresand tables in the article. The baseline of a bar graphis not a coordinate axis and therefore should nothave tick marks. Bar graphs may be used tocompare frequencies between groups. In mostcases, the number of bars in a grouped bar graphshould not exceed 3. Colors or tones used todesignate each group should be distinct. To ensurethat bars in black-and-white figures aredistinguishable, a contrast in shading of at least 30%for adjacent bars is suggested. Color or shades ofgray should be used instead of patterns andcrosshatching (eg, diagonal l ines) on bars.

Pie Chart. Pie charts compare relationships amongcomponent parts. Categories are represented bysections, with the area of the section beingproportional to the relative frequency of eachcategory. Pie charts are used commonly inpublications intended for lay audiences but shouldbe avoided in scientific publications. The angular

areas of the individual components of pie charts maybe difficult to compare between pie charts. Usually,data depicted in pie charts can be summarized in thetext or in a table.

Dot (Point) Graph. Dot or point graphs displayquantitative data other than counts or frequencies ona single scaled axis according to categories on abaseline (the scaled axis may be horizontal orvertical). Like that in bar graphs, the baseline doesnot represent a scale and therefore does not containtick marks.

Point estimates are represented by discrete datamarkers, preferably with error bars to designatevariabil ity (Figure 5). Dot or point graphs may beused to compare data between study groups,including positive and negative data values relativeto a central ly located 0 baseline ("derivation graph"),paired data from single individuals, or pooled data inmeta-analyses and other analyses that combinedata from individual studies.

See pages 98 to 1 06 in the AMA Manual of Style 1 0th

edition for additional information.

Figure 3 (adapted from Schneider et al (201 0)

Deutsches Arzteblatt International, 1 07 (44): 776-

782)

Figure 4 (adapted from Alexander et al (2004)

JAMA, 292 (1 4):1 696-1 701 )

Figure 5 (adapted from Bell et al (2004) JAMA,

292(1 9):2372-2378)

1 30 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 25: Postscripts v5 n36 _2015Aug

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 31

Page 26: Postscripts v5 n36 _2015Aug

How Do Grantwriters Measure Their SuccessBy Meg Bouvier, AMWA New England Chapter Member

I wrote my first grant application as a professionalwriter in about 2007. I t was an NIH R01resubmission, and I was lucky that it went from ascore in the forty-odd percenti le to 1 2%, and it wasfunded. Since that time, I have wrestled with thequestion of how to communicate to cl ients about myskil l as a grantwriter (please note, I know writersoccasionally post on the AMWA listservs stating thatthe word “grantwriter” is a misnomer. A grant is amonetary award and therefore cannot be written.Please forgive my expedient use of the word“grantwriter”. )

I have employed numerous strategies over the yearsto try to quantify my grantwriting skil ls, someshamelessly pilfered from other writers and others Idevised myself. In the first year or two, before I hadmany successful submissions, I had a sort of‘greatest hits’ of my wins on my company’s website,l isting headings l ike “most improved score,” “fundedon first submission,” and “least known fundingmechanism.”

A later iteration of my website boasted recent wins. Idi l igently l isted R01 s, R21 s, and other mechanismsthat I had helped clients land within the past 1 2-24months. I am a strong believer that the NIH reviewprocess is a fairly rapidly moving target, thereforeoutdated grant experience is not terribly helpful.

By 2011 , I had landed my first major center grantapplication. I was the lead writer on a $1 00 mil l ionHRSA grant application. I t was the largestconstruction grant in university history, and mycareer was launched. This win was fol lowed by afew more large-format wins on submissions forwhich I served as lead writer, so I simply l isted thelarge center grants/contracts on my website and leftit at that.

But I have begun to wonder what it al l means. Whatshould I count as a win? What if an R-series granteeonly has me advise on a submission, or asks me forhelp only on the Aims, Significance, Innovation, andIntro? What if I only edit the submission? Do thesecount as my wins? I sometimes help a client write afirst submission on which they might receive apercenti le score in the high teens, then they land thegrant on the resubmission, on which I did not workdirectly. Do I count that as my win, when the clientcredits me with the ultimate success of thesubmission? Nowadays, my marketing documentsstate that I have helped clients land over $200mil l ion in federal funding. I figure I can justify thatnumber based on the large-format wins for which Iclearly wrote the applications. I can l ive with thatnumber.

I have had clients ask me for my success rate—whatpercent of my clients land their grant? Seems like abenign and perfectly reasonable question, right?Nope. I would never suggest to a novice grantwriterthat they maintain or advertise such a thing.Grantwriting is an iterative process. At NIH, fewapplications are funded on the first try, and it cantake time to titrate a grantee’s submission strategy.Success in one agency, IC, or study section doesnot mean you wil l be successful at others. I t takestime, patience, legwork, and usually multiplesubmissions to figure it out. Most of my clientsunderstand this and are wil l ing to invest time andenergy in developing their relationship with a givenagency over time. And I thoroughly enjoy helpingthem in this process.

While one should write an application as if it werethe only shot at funding, the grantwriter and clientmust also understand that a first submission to anew agency, study section, or IC wil l l ikely wind upbeing a learning experience. I find it very rewardingto work with a cl ient over time as they develop theirunderstanding of a given IC and study section, andbuild a relationship with a Program Officer. I t isgratifying to help that cl ient grow in terms of theirNIH grantsmanship, and hopeful ly to land their granton a subsequent submission and launch theirrelationship with NIH. The same holds true forexperienced grantees looking to make the leap intocenter grants. I t usually takes patience, hard work, aridiculously thick skin, and multiple submissions tosucceed.

No matter how strong the science, NIH statisticsshow that few funded projects are successful on theA0, i .e. first submission. Therefore, a grantwriter whoboasts their funding rate is not l ikely to acceptinexperienced applicants as clients—yet these arethe very cl ients who may benefit the most from ourhelp! I f grantwriters only accept projects they knowhave a strong chance of funding, then who wil l helpinexperienced grantees learn the ropes?

Perhaps the better measure of success in our field isif our cl ients feel we strengthened their application,educated them on the NIH grant process, andimproved their overal l approach tograntsmanship—skil ls they wil l carry with themthroughout their career, whether they are successfulon a given submission or not. The grantsmanshipskil ls we teach them may also be passed on to thescientists our cl ients mentor, and to colleagues forwhom they provide guidance on grant submissions.

I assume I do not need to state that you should

1 32 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 27: Postscripts v5 n36 _2015Aug

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 33

never guarantee success on a grant submission.Great writing and grantsmanship savvy arenecessary, but not sufficient, to funding success atNIH. A grantwriter cannot change the science, andnatural ly many projects are not funded because ofthe science or because of a poor fit with the fundingpriorities of the granting agency. In addition, asgrantwriters we cannot ensure that our cl ients wil lfol low our advice, use our writing, or incorporate ouredits.

I f you are a grantwriter, how do you measuresuccess? How do you capture that information andcommunicate it to potential cl ients? Mostimportantly, what is your goal as a grantwriter?Goodness knows, I am an extremely competitiveperson by nature, just ask my family. But l ike me,perhaps after thinking about it, you wil l find thatgoing for the win is not necessari ly your primary goalas a professional grantwriter.

MEG BOUVIER, PhD, is the owner ofMeg Bouvier Medical Writing(www.megbouvier.com, a company thatassists cl ients in writing persuasivelyabout their biomedical research. She hashelped clients land more than $200mil l ion in federal funds, but is moreproud that she has helped hundreds of cl ients improvetheir overal l grantsmanship and feel less terrified of theNIH submission process. She was a press, pol icy, andcommunications writer for Dr. Francis Coll ins at theNational Institutes of Health after completing an IRTAfellowship at NINDS. She holds a PhD in BiomedicalSciences from the Mt. Sinai School of Medicine.

http: //www.amwa.org/events_annual_conference

Page 28: Postscripts v5 n36 _2015Aug

A Disease by Another NameBy Rebecca J. Anderson, PhD, AMWA Pacific Southwest Chapter Member

The World Health Organization recently issued “bestpractice” guidel ines for naming human infectiousdiseases. The new guidel ines were prompted by thewidespread habit of scientists, the media, and thegeneral public to ascribe unofficial names todiseases. Those catchy nicknames are easy topronounce and remember, and as such they areoften “official ly” adopted. But they may beinadvertently offensive, especial ly when describingdiseases that emerge without warning, are not wellunderstood, and may be life-threatening (think:Black Plague). According to WHO, “certain diseasenames provoke a backlash against members ofparticular rel igious or ethnic communities, createunjustified barriers to travel, and trigger needlessslaughtering of food animals.”

For example, swine flu isn’t transmitted by pigs, butduring a recent outbreak, some countries bannedpork imports. Also, you may remember the push-back from veterans who were not at al l happy that acertain infection (discovered in a Philadelphia hotelduring their convention and sickening many of them)was labeled Legionnaires disease. The name stuck.Clearly as medical writers, we should be mindful ofthese unintended consequences.

WHO recommends that we avoid disease namesthat include a geographic location, people’s names,species of animal or food, and references to specificcultures, populations, industries, and occupations.We should also avoid words that incite undue fear(e.g. , death, fatal, epidemic). That doesn’t leavemuch. In the words of one infectious disease expert,“The WHO document is laudable in its intent, butsl ightly daft. ”

Fortunately, WHO does offer some political ly correctalternatives. I t’s ok to use words that characterizeage group (pediatric, maternal), time course (acute,chronic, progressive, contagious), severity (severe,mild), and seasonality (summer, seasonal).Also, (thank goodness!) the new WHO guidel inesare not retroactive and cover only infectiousdiseases. But I couldn’t help wondering how wemight have applied these rules to diseases that nowhave well-establ ished nicknames.

Here are some examples (in alphabetical order),changed to comply with WHO’s recommendations.(WARNING: these examples are only for i l lustrativepurposes and, thankful ly, not sanctioned by anyorganization)

Alzheimer’s disease: progressive old people’ssyndrome (or Pops)

chickenpox: non-measles, non-rubella, non-

smallpox, poxCrohn’s disease: chronic al imentary propulsivesyndrome (or CRAPs)

Ebola (from the Ebola River in the Congo):severe contagious hemorrhagic fever

Elephantiasis: parasitic leg lymph hypertrophyHantavirus (from the Hantan River in SouthKorea): gripping airway syndrome-pulmonary(or Gasp)

Japanese encephalitis: summer flaviviralencephalitis

Kaposi’s sarcoma: purple spots and bumpsKawasaki disease: non-motorcycle idiopathicfebri le syndrome

Lyme disease (from Old Lyme, Connecticut): tickvector radial red rash

mad cow disease: malicious zoonotic prionspongiform encephalopathy

miner’s lung: chronic environmentalpneumoconiosis

Montezuma’s revenge: entero-toxigenicgastrointestinal distress

Rocky Mountain spotted fever: acute rickettsiarash and fever (or Arrf)

Sjögren’s syndrome: tear-less, spit-lesssyndrome

Stevens-Johnson syndrome: idiopathicprogressive exfol iative necrosis

Toxic shock syndrome: tampon feverTularemia (for Tulare County, CA): severeinfectious ulceroglandular fever

West Nile virus: arthropod-borne encephalitis

The WHO guidel ine is available at:http: //apps.who. int/iris/bitstream/1 0665/1 63636/1 /WHO_HSE_FOS_1 5.1 _eng.pdf?ua=1

REBECCA J ANDERSON, PhD, is a freelance

medical writer and the author of two books,

Nevirapine and the Quest to End Pediatric AIDS

and Career Opportunities in Clinical Drug

Research. Prior to medical writing, Dr. Anderson

managed research and development projects for

twenty-five years in the pharmaceutical/biotech industry. She

holds a Ph.D. in pharmacology from Georgetown University.

She lives in Southern California, and when she is not writing,

she absorbs the sights and sounds of the West Coast’s rich

culture and heritage. She can be reached at

[email protected].

1 34 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 29: Postscripts v5 n36 _2015Aug

Ask APRIL: What is Summer (Work) Style?By April Reynolds, MS, ELS, AMWA Pacific Southwest Chapter Member

Consider your audience

I pol led my writers’ group about topics for thismonth’s column. Overwhelmingly, they wanted tipson how to dress professionally during the scorchingsummer months. I came up with some tips butwasn’t sure how to give realistic and useful advice,considering that I haven’t dressed up during summerfor years. Then it happened: my computer died mid-July, and I had to go to my client’s corporate site forrepairs.

Connect with your audience

Here’s what I wore:a) White short-sleeve, button-up shirt with collarb) Lightweight, dark-wash, trouser-cut jeansc) Peep-toe (meaning a small opening, not a ful lopen-toe) sl ing-back flats

d) Hair in a sleek ponytai l

Here’s why I wore it:a) A crisp white shirt means business, but ashort-sleeve white shirt is more appropriate forsummer.

b) Nice jeans are fine on a Friday (which it was).c) Whereas an open-toe shoe may be toocasual, a peep-toe is dressier while sti l lal lowing your feet to breathe.

d) I t was humid, and I have frizzy hair. This wasmy safest bet.

Create an outline

Before any of this took place, however, I used aninvaluable tactic: planning. Because I knew mymorning would be hectic—getting my child ready,getting myself ready, and getting the right amount ofcoffee to enable al l of this to happen—I had a prettysolid idea of what I was going to wear ahead of time.I only needed to interact with IT, so I could besomewhat casual; however, you just never knowwho you’l l run into. I stuck with the Girl Scouts onthis one and was sure to “be prepared.”

Planning can also be helpful when you’re shoppingfor summer-friendly attire (or any attire, real ly). I tryto have an idea of what I want before you go out tobuy it. That way, I don’t fal l victim to markdowns orget overwhelmed. (My motto: I f you wouldn’t pay ful lprice for it, you shouldn’t buy it on sale.)

Enlist the help of a good editor

Start by looking on Pinterest. (I created a Pinterestboard to give you some ideas:https://www.pinterest.com/writecorrect/summer-work-styles/) Put together an outfit in your head andthen go out and try to recreate it as close as youcan, and within your budget.

Admittedly, Pinterest tends to cater to young, thin,ultra-styl ish women. But what about the rest of us?And what about men? Here re some additionalsuggestions:

Career expert Lindsey Pollak says to dress for your profession. 1 ,2 But what doesthat mean for medical writers and editors interviewing or attending a conference?

And what does that mean when it’s hot outside?

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 35

Page 30: Postscripts v5 n36 _2015Aug

Also, because I worked in retai l during college, Ialways suggest engaging the help of salespeople.They work with the merchandise every day, and theycan see you with an objective eye. Nordstrom haspersonal shoppers, and Levi’s has a great fit guidefor men.3

Make a strong statement

Sure, clothes aren’t everything; it’s the quality of ourwork that’s important. But the idea is to dress l ikeyou’re worth the money you’re asking your cl ient topay—while never letting them see you sweat.

REFERENCES:

1 .http: //www.glamour.com/fashion/blogs/dressed

/201 3/06/outfit-idea-what-to-wear-on-a

2. http: //www. l indseypollak.com/wear-this-not-

that-a-mil lennials-guide-to-business-casual/

3. http: //www. levis.com.au/men-fit-guide

APRIL REYNOLDS, MS, ELS,is a medical writer & editor andthe president of Write/Correct,Inc. She has published workson topics that range fromjeans (for fashion magazines)to genes (for medicalpublications). She lives in SanDiego with her husband andson.

Medical writing’s own fashion experimenter and amateur decorator answersyour style questions.

Email your questions to: [email protected].

1 36 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 31: Postscripts v5 n36 _2015Aug

Chapter Happy Hour in San Diego, June 18, 2015

About 20 members met for drinks and networking at Bella Vista Social Club and Caffe in La Jolla lastmonth. The chapter would l ike to thank Asoka Banno and Brea Midthune for initiating and organizingthe event. Thank you also to Real Life Sciences, a staffing consultancy, and Recruiting ConsultantConor Trombetta for sponsoring appetizers and drinks at the event. In the pictures: Jennifer Veevers,??, Conor Trombetta (top left); Jul ian Kaye, ???, Noelle Demas (top right); Brea Midthune, Andrew

Hellman, Anna Larocca, Valerie Breda (bottom left); and ?? (bottom right).

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 37

Page 32: Postscripts v5 n36 _2015Aug

Carol Koprowski - Encino

Chip Reuben - Redondo Beach

C. Corrigan - Rancho Santa Margarita

Dora Saltzman - San Luis Obispo

Michelle Zakson - Westlake Village

Rachel Meyer - Mesa, AZ

Robin Alexander - Irvine

Sharon Kelly - Buellton

List courtesy of Gail Flores, PhD, AMWA Pacific Southwest chapter membership coordinator.

AMWA Pacific Southwest chapter warmly

welcomes our new members

1 38 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 33: Postscripts v5 n36 _2015Aug

Chapter Events' Calendar

August 1 5, 201 5. Joint meeting with SDRAN to discuss Writing for a PatientAudience.

September 1 9, 201 5 Chapter organized symposium: "Medical Writers’ ToolboxDecoded". Location: Thousand Oaks, CA (Amgen) -- SAVE THE DATE

REGISTRATION : https://s08.1 23signup.com/servlet/SignUpMember?PG=1 5348111 82300&P=1 5348111 911 429654800

FLYER:http: //media.wix.com/ugd/b7c3b3_a9d9fada9046401 68660d1 d2780465f0.pdf?dn=August%201 5%20SDRAN_AMWA%20Program%20Flye.pdf

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 39

Page 34: Postscripts v5 n36 _2015Aug

September 1 9, 201 5(Saturday)

Amgen, Inc.

One Amgen Center Dr

Building 24 Conference Center Auditorium

Thousand Oaks, CA 91 320

AMWA Pacific Southwest Chapter presents:

Medical Writers’ Toolbox Decoded

The skil ls and expertise needed for excell ing as a medical writer are diverse and depend on the

industry and job function. While there is no substitute for hands-on experience, this symposium is

an opportunity to get a flavor of how medical writers practice their craft.

Join us for a day of interactive lectures, demonstrations, and discussions to learn about:

• Guidel ines, styles guides, and templates used by medical writers in cl inical writing and publication

development

• I l lustrations and data presentation tools and tricks

• Reference library tools and information management strategies in a pharmaceutical environment.

1 40 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 35: Postscripts v5 n36 _2015Aug

About the Speakers:

DIKRAN TOROSER, PhD, CMPP, a member of the AMWA Pacific Southwest chapter, is a

regular contributor to the Postscripts magazine since 201 2. He developed the monthly AMAzing

Style column which covers topics from the AMA Manual of Style, and has also written on

publication-related topics in these pages. Dikran is currently a Senior Medical Writing Manager

at Amgen Inc. in Thousand Oaks, California. He earned his PhD in Biochemistry from

Newcastle University (UK), and did his post-doctoral training in biochemical genetics at the

John Innes Center of the Cambridge Laboratory (Norwich, UK) and in molecular biology with

the USDA. Prior to Amgen, Dikran was on the faculty (research) at the School of Pharmacy at

the University of Southern California. He can be reached at [email protected].

ANNALISE M NAWROCKI, PhD, is a member of the Pacific Southwest Chapter of the

American Medical Writers Association. Annalise holds a BA in Molecular Biology and Genetics

(with a minor in English Literature) from Northwestern University, and earned her PhD in

Ecology and Evolutionary Biology from the University of Kansas. She is currently a Medical

Writing Manager at Amgen Inc. in Thousand Oaks, California, where she develops publications

and conference presentations in the cardiovascular therapeutic area. She can be reached at

[email protected].

CHRISTOPHER MUNDY, MS, PMP, is Principal and Knowledge Strategy Consultant at CM

Consulting, San Francisco Bay Area. He works with start-ups, biotech and pharmaceutical

companies to conduct information gap analyses, review the competitive intel l igence landscape,

and develop corporate l ibrary solutions. He has implemented integrated information systems

managing corporate records, scientific, research, cl inical and regulatory records l inked to cloud

systems providing on-demand access to functional groups within and outside the organization.

He has several project management credentials under his belt including the Six Sigma Green

Belt. He is pursuing his Master's Degree in Information and Knowledge Strategy from Columbia

University, New York. He can be reached via LinkedIn:

https://www. l inkedin.com/in/christophermmundy

Registration : Register at https://www.1 23signup.com/event?id=pxfjd

Register by September 4, 201 5

Registration l imited to 1 50 participants

Cost: Free (courtesy of Amgen)

Lunch: Provided by Amgen

AMWA Chapter Contacts (regarding symposium):

Ajay Malik, PhD, [email protected]

Donna Simcoe, MS Biotech, MS Med Writing, MBA, CMPP, [email protected]

Amgen Organizing Committee:

Jenilyn J. Virrey, PhD, Medical Writing Senior Manager, [email protected]

Albert Rhee, PhD, Medical Writing Manager, [email protected]

Laura Bruce, Administrative Coordinator, [email protected]

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 41

Page 36: Postscripts v5 n36 _2015Aug

Technical Writer/Editor

Undisclosed Company in Phoenix, AZRecruiter: Sterl ing-Hoffman Executive Searchhttp: //www.mybiotechcareer.com/JD/Medical-Writing-Arizona-Biotechnology-Jobs-Careers-8659

Freelance Technical Writer

Dermalogica, Carson, CAhttp://chc.tbe.taleo.net/chc04/ats/careers/requisition. jsp?org=DERMA&cws=2&rid=2081 &source=Indeed

Medical Writer - Pharmaceutical

Brandkarma, Irvine, CAhttp://job-openings.monster.com/monster/8af005f9-c97d-4303-b0e9-a0ce8b5b4e77?mescoid=2700440001 001 &jobPosition=5

Medical Writing Specialist

Covidien, Irvine, CAhttp://job-openings.monster.com/monster/65f1 7c9b-d7df-4976-81 86-87f8568e8034?mescoid=2700440001 001 &jobPosition=7

Medical Writer

Sonendo, Inc. , Laguna Hil ls, CAhttps://www.smartrecruiters.com/SonendoInc/82527360-medical-writer

Technical Writer, Biotech

Sequoia, Oceanside, CAhttps://www.smartrecruiters.com/Sequoia/8336581 3-technical-writer-biotech

Principal Medical Writer

Ardea Biosciences, Inc. , San Diego, CAhttp://job-openings.monster.com/monster/55747035-25f3-4ae9-9758-01 74ef402a1 c?mescoid=2700440001 001 &jobPosition=1 4

Senior/Principal Medical Writer

Intercept Pharmaceuticals, San Diego, CAhttp://interceptpharma.submit4jobs.com/index.cfm?fuseaction=8541 6.viewjobdetai l&CID=8541 6&JID=1 961 72&source=Indeed

Clinical Document Specialist

Neurocrine Biosciences, Inc. , San Diego, CAhttp://www.biospace.com/jobs/job-l isting/cl inical-document-special ist-346338

Medical Writing Open PositionsCompiled By: Sharyn Batey, PharmD, MSPH

Employment Coordinator, AMWA Pacific Southwest Chapter

Career Corner

*Note: Occasionally weblinks in the PDF document may not work if the web address is long and

splits into 2 l ines. You may copy and paste the complete l ink into a new browser tab or window

to reach the correct website.

1 42 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

Page 37: Postscripts v5 n36 _2015Aug

Senior Medical Writer

Neurocrine Biosciences, Inc. , San Diego, CAhttp://www.biospace.com/jobs/job-l isting/sr-medical-writer-346339

Senior Medical Writer

Nuvasive, San Diego, CAhttp://job-openings.monster.com/monster/5f840a7a-a24b-4685-baaa-f9c46ef62d54?mescoid=2700440001 001 &jobPosition=1 7

Medical Writing Associate Director

Receptos, San Diego, CAhttps://receptos.hyrel l .com/UI/Views/Applicant/VirtualStepPositionDetails.aspx?TemplateId=1 62444&IsAutoRefresh=True&r=Indeed&tzi=Pacific%20Standard%20Time

Medical Writer

Undisclosed Company in South Region of CaliforniaRecruiter: Sterl ing-Hoffman Executive Searchhttp: //www.mybiotechcareer.com/JD/Clinical-Research-Affairs-R-AND-D-Science-Medical-Affairs-California-South-Region-Biotechnology-Jobs-Careers-9911

Medical Writer / Diabetes - Outcomes Surveys

Undisclosed Company in San Diego, CARecruiter: Writing Assistance, Inc.http: //mh1 88.maxhire.net/cp/?E8556C361 D4351 5B7E591 26532571 C69482D7C48&AspxAutoDetectCookieSupport=1

Senior Associate Regulatory Writing

Amgen, Thousand Oaks, CAhttps://sjobs.brassring.com/TGWebHost/jobdetai ls.aspx?jobId=11 45986&partnerid=25236&siteid=5308&codes=JB_Indeed

Regulatory Writing Manager

Amgen, Thousand Oaks, CAhttp://job-openings.monster.com/monster/1 611 bcb9-cdc3-4878-9e73-33c7b1 9df8fa?mescoid=2700439001 001 &jobPosition=11

Regulatory Writing Senior Manager

Amgen, Thousand Oaks, CAhttp://job-openings.monster.com/monster/ca774dcb-405a-40b2-b549-1 ec0d5e5ef5a?mescoid=2700439001 001 &jobPosition=1 3

Medical Writer (Protocols and CSRs) - Remote

Lotus Clinical Research, LLC, Pasadena, CAhttp://lotuscr.com

If you want to share job leads with the members of the Pacific Southwest Chapter, pleasecontact Sharyn at [email protected].

POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5 1 43

Page 38: Postscripts v5 n36 _2015Aug

Sir Ronald Fisher, Statistician ,Geneticist, and a Farmer

Backpage

Sir Ronald Aylmer Fisher, FRS, was a statistician and geneticist whose influence extends frompopulation and evolutionary biology to psychology and agricultural research. He is the founder ofmodern discipl ine of statistics introducing numerous concepts, including analysis of variance (ANOVA),Fisher's z-distribution (F distribution), maximum likel ihood estimations, and many more. Hetransformed the fields of psychology, agricultural research, and genetics by introducing new statisticalmethods and introducing the use of mathematical models.

His first book, "Statistical Methods for Research Workers," first published in 1 925 remains an influentialtext in the field. 1 This book advanced the principles of "design of experiments"—later (in 1 935), hepublished another book with the same name.2 His work is reflected in the way we design and conductcl inical trials today using the foundations he created for designing of experiments with placebo groupand null hypothesis, and using statistical analysis tools, such as, analysis of variance, statisticalinference, etc. The time and effort invested at the front end of a clinical trial is more important thandata analysis after the study is over. He once said: "To consult the statistician after an experiment isfinished is often merely to ask him to conduct a post mortem examination. He can perhaps say whatthe experiment died of.” 3

Fun Facts: Due to poor eyesight, he learned mathematics by being read aloud. After being told that hecan't fol low his career in mathematics, he l ived as subsistence farmer for 2 years. Those 2 years werethe launch pad to his contributions to the field of agricultural research, evolutionary biology, andapplied statistics.

References1 . Fisher RA. Statistical methods for research workers (1 4th ed.). Oliver and Boyd, Edinburgh. (1 970) [1 925] ISBN-1 0: 0-05-0021 70-2, 0-02-844730-1

2. Fisher RA. The Design of Experiments (9th ed.). Macmil lan. (1 971 ) [1 935]. ISBN-1 0: 0-02-844690-93. Presidential Address to the First Indian Statistical Congress (1 938)

Sources and Further ReadingWikipedia. https://en.wikipedia.org/wiki/Ronald_FisherMejdal A. Celebrating statisticians: Ronald A. Fisher. JMP Blog. 201 4 Jan.http: //blogs.sas.com/content/jmp/201 3/01 /07/celebrating-statistics-statisticians-ronald-a-fisher/

Savage LJ. On Rereading R. A. Fisher. Ann Statist. 1 976; 4(3):441 -500. http: //projecteuclid.org/eucl id.aos/11 76343456—Editor

1 44 POSTSCRIPTS | VOL 5, NO. 36 | AUGUST 201 5

StainedglasswindowinthedininghallofCaiusCollege,Cambridge,

commemoratingRonaldFisherandrepresentingaLatinsquare.

CREDIT:Wikipedia