7
ORIGINAL ARTICLE Interobserver agreement for the ATS/ERS/JRS/ALAT criteria for a UIP pattern on CT Simon L F Walsh, 1 Lucio Calandriello, 2 Nicola Sverzellati, 3 Athol U Wells, 4 David M Hansell, 5 on behalf of The UIP Observer Consort 1 Department of Radiology, Kings College Hospital Foundation Trust, London, UK 2 Department of Bioimaging and Radiological Sciences, Institute of Radiology, Catholic University, A. GemelliHospital, Rome, Italy 3 Department of Clinical Sciences, Section of Radiology, University of Parma, Parma, Italy 4 Interstitial Lung Diseases Unit, Royal Brompton Hospital, London, UK 5 Department of Radiology, Royal Brompton Hospital, London, UK Correspondence to Dr Simon L F Walsh, Department of Radiology, Kings College Hospital Foundation Trust, Denmark Hill, London SE5 9RS, UK; [email protected] Received 30 April 2015 Revised 25 September 2015 Accepted 15 October 2015 Published Online First 19 November 2015 http://dx.doi.org/10.1136/ thoraxjnl-2015-208148 To cite: Walsh SLF, Calandriello L, Sverzellati N, et al. Thorax 2016;71: 4551. ABSTRACT Objectives To establish the level of observer variation for the current ATS/ERS/JRS/ALAT criteria for a diagnosis of usual interstitial pneumonia (UIP) on CT among a large group of thoracic radiologists of varying levels of experience. Materials and methods 112 observers (96 of whom were thoracic radiologists) categorised CTs of 150 consecutive patients with brotic lung disease using the ATS/ERS/JRS/ALAT CT criteria for a UIP pattern (3 categoriesUIP, possibly UIP and inconsistent with UIP). The presence of honeycombing, traction bronchiectasis and emphysema was also scored using a 3-point scale (denitely present, possibly present, absent). Observer agreement for the UIP categorisation and for the 3 CT patterns in the entire observer group and in subgroups stratied by observer experience, were evaluated. Results Interobserver agreement across the diagnosis category scores among the 112 observers was moderate, ranging from 0.48 (IQR 0.18) for general radiologists to 0.52 (IQR 0.20) for thoracic radiologists of 1020 yearsexperience. A binary score for UIP versus possible or inconsistent with UIP was examined. Observer agreement for this binary score was only moderate. No signicant differences in agreement levels were identied when the CTs were stratied according to multidisciplinary team (MDT) diagnosis or patient age or when observers were categorised according to experience. Observer agreement for each of honeycombing, traction bronchiectasis and emphysema were 0.59±0.12, 0.42±0.15 and 0.43±0.18, respectively. Conclusions Interobserver agreement for the current ATS/ERS/JRS/ALAT CT criteria for UIP is only moderate among thoracic radiologists, irrespective of their experience, and did not vary with patient age or the MDT diagnosis. INTRODUCTION Accurate diagnosis of idiopathic pulmonary brosis/ usual interstitial pneumonia (IPF/UIP) is essential to ensure prompt initiation of appropriate treatment and enrolment in clinical trials. CT has a key role in making the diagnosis of IPF/UIP. 1 2 The most recent guidelines published by American Thoracic Society (ATS)/European Respiratory Society (ERS)/ Japanese Respiratory Society (JRS)/Latin American Thoracic Association (ALAT) specify the CTappear- ances of three diagnostic categories: UIP, possible UIP and inconsistent with UIP patterns. 2 Imaging criteria for the diagnosis of UIP include the presence of honeycombing in a basal and subpleural distribution without features considered incompat- ible with a diagnosis of IPF/UIP. In the correct clin- ical context, these appearances are considered sufcient to diagnose IPF/UIP without surgical lung biopsy (SLB). 24 If the CT appearances are not those of UIP, the diagnosis of IPF cannot be made on imaging alone. Therefore CT plays a critical role in the evaluation of patients with suspected IPF and once performed, signicantly inuences subsequent management decisions. Reasonable levels of observer agreement are a requisite for diagnostic criteria to be clinically useful. Several studies have reported on the obser- ver agreement for a CT diagnosis of IPF/UIP based upon earlier guidelines, with conicting results. 57 However, all of these studies involved expert thor- acic radiologists, selectively chosen for their experi- ence in the interpretation of diffuse lung diseases on CT. In contrast, thoracic radiologists, working in non-specialist centres may also be required to provide opinions on CTs of patients with suspected IPF/UIP. Therefore, the aim of this study was to evaluate interobserver agreement for the ATS/ERS/ JRS/ALAT CT criteria for UIP among an inter- national group of thoracic radiologists of varying levels of experience. As it has been reported that traction bronchiectasis and emphysema may con- found the identication of honeycombing, 8 obser- ver agreement for the presence of honeycombing, traction bronchiectasis and emphysema were also evaluated in this study. Key messages What is the key question? What is the interobserver agreement for the current ATS/ERS/JRS/ALAT CT criteria for usual interstitial pneumonia (UIP) among radiologists? What is the bottom line? Interobserver agreement among radiologists for the ATS/ERS/JRS/ALAT criteria for a UIP pattern on CT is moderate. Why read on? CT plays a critical role in the evaluation of patients with suspected idiopathic pulmonary brosis and once performed, signicantly inuences subsequent management decisions. Walsh SLF, et al. Thorax 2016;71:4551. doi:10.1136/thoraxjnl-2015-207252 45 Interstitial lung disease on 4 May 2019 by guest. Protected by copyright. http://thorax.bmj.com/ Thorax: first published as 10.1136/thoraxjnl-2015-207252 on 19 November 2015. Downloaded from

ORIGINAL ARTICLE Interobserver agreement for the ATS/ERS ... · ORIGINAL ARTICLE Interobserver agreement for the ATS/ERS/JRS/ALAT criteria for a UIP pattern on CT Simon L F Walsh,1

Embed Size (px)

Citation preview

ORIGINAL ARTICLE

Interobserver agreement for the ATS/ERS/JRS/ALATcriteria for a UIP pattern on CTSimon L F Walsh,1 Lucio Calandriello,2 Nicola Sverzellati,3 Athol U Wells,4

David M Hansell,5 on behalf of The UIP Observer Consort

1Department of Radiology,Kings College HospitalFoundation Trust, London, UK2Department of Bioimagingand Radiological Sciences,Institute of Radiology, CatholicUniversity, “A. Gemelli”Hospital, Rome, Italy3Department of ClinicalSciences, Section of Radiology,University of Parma, Parma,Italy4Interstitial Lung Diseases Unit,Royal Brompton Hospital,London, UK5Department of Radiology,Royal Brompton Hospital,London, UK

Correspondence toDr Simon L F Walsh,Department of Radiology,Kings College HospitalFoundation Trust, DenmarkHill, London SE5 9RS, UK;[email protected]

Received 30 April 2015Revised 25 September 2015Accepted 15 October 2015Published Online First19 November 2015

▸ http://dx.doi.org/10.1136/thoraxjnl-2015-208148

To cite: Walsh SLF,Calandriello L, Sverzellati N,et al. Thorax 2016;71:45–51.

ABSTRACTObjectives To establish the level of observer variationfor the current ATS/ERS/JRS/ALAT criteria for a diagnosisof usual interstitial pneumonia (UIP) on CT among alarge group of thoracic radiologists of varying levels ofexperience.Materials and methods 112 observers (96 of whomwere thoracic radiologists) categorised CTs of 150consecutive patients with fibrotic lung disease using theATS/ERS/JRS/ALAT CT criteria for a UIP pattern (3categories—UIP, possibly UIP and inconsistent with UIP).The presence of honeycombing, traction bronchiectasisand emphysema was also scored using a 3-point scale(definitely present, possibly present, absent). Observeragreement for the UIP categorisation and for the 3 CTpatterns in the entire observer group and in subgroupsstratified by observer experience, were evaluated.Results Interobserver agreement across the diagnosiscategory scores among the 112 observers was moderate,ranging from 0.48 (IQR 0.18) for general radiologists to0.52 (IQR 0.20) for thoracic radiologists of 10–20 years’experience. A binary score for UIP versus possible orinconsistent with UIP was examined. Observeragreement for this binary score was only moderate. Nosignificant differences in agreement levels were identifiedwhen the CTs were stratified according tomultidisciplinary team (MDT) diagnosis or patient age orwhen observers were categorised according toexperience. Observer agreement for each ofhoneycombing, traction bronchiectasis and emphysemawere 0.59±0.12, 0.42±0.15 and 0.43±0.18,respectively.Conclusions Interobserver agreement for the currentATS/ERS/JRS/ALAT CT criteria for UIP is only moderateamong thoracic radiologists, irrespective of theirexperience, and did not vary with patient age or theMDT diagnosis.

INTRODUCTIONAccurate diagnosis of idiopathic pulmonary fibrosis/usual interstitial pneumonia (IPF/UIP) is essential toensure prompt initiation of appropriate treatmentand enrolment in clinical trials. CT has a key rolein making the diagnosis of IPF/UIP.1 2 The mostrecent guidelines published by American ThoracicSociety (ATS)/European Respiratory Society (ERS)/Japanese Respiratory Society (JRS)/Latin AmericanThoracic Association (ALAT) specify the CTappear-ances of three diagnostic categories: UIP, possibleUIP and inconsistent with UIP patterns.2 Imagingcriteria for the diagnosis of UIP include the

presence of honeycombing in a basal and subpleuraldistribution without features considered incompat-ible with a diagnosis of IPF/UIP. In the correct clin-ical context, these appearances are consideredsufficient to diagnose IPF/UIP without surgical lungbiopsy (SLB).2–4 If the CT appearances are notthose of UIP, the diagnosis of IPF cannot be madeon imaging alone. Therefore CT plays a critical rolein the evaluation of patients with suspected IPF andonce performed, significantly influences subsequentmanagement decisions.Reasonable levels of observer agreement are a

requisite for diagnostic criteria to be clinicallyuseful. Several studies have reported on the obser-ver agreement for a CT diagnosis of IPF/UIP basedupon earlier guidelines, with conflicting results.5–7

However, all of these studies involved expert thor-acic radiologists, selectively chosen for their experi-ence in the interpretation of diffuse lung diseaseson CT. In contrast, thoracic radiologists, working innon-specialist centres may also be required toprovide opinions on CTs of patients with suspectedIPF/UIP. Therefore, the aim of this study was toevaluate interobserver agreement for the ATS/ERS/JRS/ALAT CT criteria for UIP among an inter-national group of thoracic radiologists of varyinglevels of experience. As it has been reported thattraction bronchiectasis and emphysema may con-found the identification of honeycombing,8 obser-ver agreement for the presence of honeycombing,traction bronchiectasis and emphysema were alsoevaluated in this study.

Key messages

What is the key question?▸ What is the interobserver agreement for the

current ATS/ERS/JRS/ALAT CT criteria for usualinterstitial pneumonia (UIP) amongradiologists?

What is the bottom line?▸ Interobserver agreement among radiologists for

the ATS/ERS/JRS/ALAT criteria for a UIP patternon CT is moderate.

Why read on?▸ CT plays a critical role in the evaluation of

patients with suspected idiopathic pulmonaryfibrosis and once performed, significantlyinfluences subsequent management decisions.

Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252 45

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from

METHODSCase selection and CT protocolCTs from consecutive patients with a multidisciplinary team(MDT) diagnosis of idiopathic fibrotic lung disease, chronichypersensitivity pneumonitis (CHP) or fibrotic lung diseaseassociated with a connective tissue disease, diagnosed at theinterstitial lung disease unit of the Royal Brompton andHarefield NHS Foundation Trust, UK, between 1 January 2004and 31 June 2010 were selected. CTs were clinically indicatedin all cases and for the purposes of a retrospective examinationof these data, informed consent was not required by the institu-tional review board. Cases with an MDT diagnosis of fibroticsarcoidosis were excluded from the study as it was consideredthat these cases might disproportionately increase the number ofcases fulfilling the ‘inconsistent with UIP’ CT diagnostic criteria.CTs were performed on a 64-slice multidetector computed tom-ography (MDCT) (Somatom Sensation 64, Siemens, Erlangen,Germany) or a 4-slice MDCT (Siemens Volume Zoom, Siemens,Erlangen, Germany) in all cases. Images were reconstructed atsection thickness of 1.5 mm (4-slice) or 1 mm (64-slice) using ahigh spatial frequency algorithm. All patients were examined inthe supine position from lung apices to lung bases at full-suspended inspiration using standard acquisition parameters:∼90 mA, 120 kVp.

Participating observersAn invitation for observers to participate in the study wasapproved by the officers of the European Society of ThoracicImaging (ESTI), Society of Thoracic Radiologists (STR), BritishSociety of Thoracic Imaging (BSTI), the Italian Society of ChestRadiologists (ISCR) and the Korean Society of ThoracicRadiologists (KSTR), and was sent to each society’s member-ship. Observers were required to provide their specialty (chestradiologist, general radiologist or thoracic radiology fellow) andnumber of years experience practising in that specialty.

Case distribution and scoringScoring of cases was performed in two stages:1. For the first stage, in order to distribute the cases to a large

number of observers from different countries, a web-basedimage viewing application with 32-bit encrypted passwordaccess was used. Fifteen interspaced sections from eachanonymised CT were selected. All images were loaded intothe viewing application as TIFF (tagged file format) images,uncompressed and with an image resolution of 600 pixelsper inch. For each case, observers were asked to assign adiagnosis category score based upon the current ATS/ERS/JRS/ALAT CT criteria for UIP (3 point score—UIP, possibleUIP, inconsistent with UIP). These guidelines were providedon the web application for reference.2 The identification ofhoneycombing is critical in assigning a diagnosis of ‘UIP’and traction bronchiectasis and emphysema are known topotentially confound this determination. For this reason,observers were also required to score the presence of thesethree CT patterns (honeycombing, traction bronchiectasisand emphysema) using a 3-point score; definitely present,possibly present or absent (CT pattern score). No definitionsfor these patterns were given to the observers before partici-pating in the study and no training was given to theobservers.

2. For the second stage of the study, two subsets of observerswere selected randomly from the participants of first stage ofthe study—one made up of thoracic radiologists with greaterthan 20 years experience and one made up of thoracic

radiologists with less than 10 years experience. These twogroups were given the full volumetric thin-section CT datain digital imaging and communications in medicine(DICOM) format from a new cohort of patients with fibroticlung disease. As in the first stage of the study, the observersscored the presence of the three CT patterns describedabove and assigned a diagnosis category based upon thecurrent ATS/ERS/JRS/ALAT CT criteria for UIP.

Statistical analysisData are given as means with SDs, medians with IQR, ornumber of patients and percentage, where appropriate.Statistical analyses were performed using STATAV.12 (StataCorp,College Station, Texas). Cohen’s weighted κ coefficient (κw) wasused to evaluate interobserver agreement for diagnosis categoryscore and each of the three CT pattern scores. Weighting the κcoefficient allows the degree of disagreement to be quantified byassigning greater emphasis to large differences between scores.9

Weighted κ coefficients were categorised as follows: poor(0<κw≤0.20), fair (0.20<κw≤0.40), moderate (0.40<κw≤0.60),good (0.60<κw≤0.80) and excellent (0.80<κw≤1.00). 9 As theaim of the study was to evaluate interobserver agreement ratherthan accuracy, a defined gold standard for the diagnosis andpattern scores was not required.

For binary scores (eg, UIP vs not possible UIP or inconsistentwith UIP), interobserver agreement was expressed as anunweighted κ coefficient (κ), expressed as a single figure calcu-lated for multiple observers. For non-binary scores, weightedκ coefficients were calculated for each unique unordered pair ofobservers and expressed as means with SDs for each high-reso-lution computed tomography (HRCT) variable.

RESULTSPatient population and observer groupsFor the first stage of the study (scoring of online cases), a totalof 472 consecutive patients presenting to the interstitial lungdisease unit were identified. From this cohort 92 patients withan MDT diagnosis of fibrotic sarcoidosis were excluded. Fromthe remaining 380 patients, 150 cases were randomly selected.Of these 150 patients, 78 were female. Mean age at the time ofCT was 61.5 years (SD=12.2 years). MDT diagnoses of thestudy group were as follows: Idiopathic fibrotic lung disease(IPF n=34, biopsy proven=3, fibrotic non-specific interstitialpneumonia (NSIP) n=21, biopsy proven n=3), connectivetissue disease related fibrotic lung disease (n=51, biopsy provenn=4) and CHP (n=44, biopsy proven n=12). A total of 112observers completed the first stage of the study (each scoring all150 cases). Ninety-six were thoracic radiologists, 16 generalradiologists. Thoracic imaging society representation for the112 radiologists was STR (n=42), ESTI (n=39), Italian societyof thoracic radiology (ISTR) (n=15), BSTI (n=12) and KSTR(n=4). Mean experience was 11.9 years (SD=8.5 years)(table 1).

For the second stage of the study (scoring of volumetric thin-section DICOM images), a new cohort of 75 cases of fibroticlung disease was selected. MDT diagnoses for these cases wereas follows: Idiopathic fibrotic lung disease (n=25, biopsyproven n=5), connective tissue disease related fibrotic lungdisease (n=25, biopsy proven n=4) and CHP (n=25, biopsyproven n=12). A total of 22 thoracic radiologists completed thisstage of the study (<10 years experience n=10, >20 yearsexperience n=12).

46 Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from

Interobserver agreement for scoring of online casesDiagnosis category scoresFor the first stage of the study, interobserver agreement acrossthe diagnosis category scores (UIP, possible UIP, inconsistentwith UIP) among the 112 observers was moderate, ranging from0.48 (IQR 0.18) for general radiologists to 0.52 (IQR 0.20) forthoracic radiologists of 10–20 years experience (table 2, figures1A–D and 2A–D). At the extremes of the experience spectrum,interobserver agreement among thoracic imaging fellows was0.50 (IQR 0.10) and for thoracic radiologists of greater than20 years experience, 0.51 (IQR 0.18). The diagnosis categoryscores were converted to a binary ‘UIP versus possible UIP orinconsistent with UIP’ score. Mean interobserver agreement forthis binary score was moderate ranging from 0.36 for thoracicimaging fellows to 0.42 for thoracic radiologists of less than10 years experience (table 2). A second analysis was performedto investigate whether patient age, or MDT diagnosis variedwith observer agreement for the binary diagnosis score. No

significant differences were identified for these subgroups (table3). Emerging reports suggest that SLB might not be required inpatients with a ‘possibly UIP’ pattern on CT.10 Therefore, asecond binary score (‘UIP or possible UIP’ vs inconsistent withUIP) was also evaluated. Interobserver agreement for this dis-tinction was also moderate ranging from 0.39 for thoracicimaging fellows to 0.45 for thoracic radiologists of greater than20 years experience.

CT pattern scoresWeighted κ values for the presence of honeycombing using the3-point score: definitely present, possibly present or absent,ranged from 0.56 (IQR 0.12) (for thoracic radiologists withmore than 20 years experience) to 0.65 (IQR 0.23) (for thoracicradiology fellows) (table 4). Weighted κ values for the 3-pointtraction bronchiectasis score ranged from 0.32 (IQR 0.25) (forthoracic radiology fellows) to 0.45 (IQR 0.18) (for thoracicradiologists of more than 20 years experience) (table 4).Weighted κ values for the 3-point emphysema score rangedfrom 0.40 (IQR 0.13) (for thoracic radiology fellows) to 0.55(IQR 0.22) (for thoracic radiologists of less than 10 yearsexperience) (table 4).

Interobserver agreement for scoring of volumetricthin-section CTFor the second stage of the study, interobserver agreementacross the diagnosis category scores (UIP, possible UIP, inconsist-ent with UIP) when cases were evaluated on volumetric thin-section CT was for thoracic radiologists with less than 10 yearsexperience, 0.54 (IQR 0.17) and for thoracic radiologists ofgreater than 20 years experience, 0.40 (IQR 0.12).Interobserver agreement for each of the CT patterns scores isshown in table 5.

DISCUSSIONThe key finding in the present study is that interobserver agree-ment among a large cohort of thoracic radiologists, for theradiological diagnosis of UIP based upon the most recent ATS/ERS/JRS/ALAT guidelines is at best moderate, and is not signifi-cantly increased among thoracic radiologists with greater levelsof experience.

IPF is a chronic progressive fibrosing interstitial pneumonia,which is characterised by a histopathological and/or a radio-logical pattern of UIP.1 11 The distinction of IPF/UIP from otherchronic fibrosing lung diseases is important because IPF/UIP hasa particularly poor prognosis. Diagnosing IPF however, may bechallenging because it requires an integrated multidisciplinaryapproach involving physicians, radiologists and pathologists andindirect evidence suggests that early expert assessment is import-ant.1 2 11 12 Prompt and accurate diagnosis allows commence-ment of treatment, as well as access to clinical trials andevaluation for lung transplantation. The most recent evidence-based guidelines for the diagnosis and management of IPF rep-resent the collaborative effort of the American Thoracic Society,the European Respiratory Society, the Japanese RespiratorySociety and the Latin American Thoracic Society and clearlyspecifies the CT features which stratify patients into one ofthree radiologic categories, ‘UIP’, ‘possible UIP’ and ‘inconsist-ent with UIP’.2 In the correct clinical context, a radiologicaldiagnosis of ‘UIP’ secures a diagnosis of IPF.13–16 In patientswhose CT diagnosis is ‘possible UIP’ or ‘inconsistent with UIP’,SLB should be considered, although at least one study reportsthat a diagnosis of ‘possible UIP’ may be sufficient to diagnoseIPF/UIP in the proper clinical setting.10 Therefore CT plays a

Table 1 Observer demographics (n=112) for the first stage of thestudy (see Methods section)

Observer group n=112

Thoracic radiologist (n=91)>20 years experience 22 (19.6%)10–20 years experience 27 (24.1%)<10 years experience 42 (37.5%)General radiologist (n=16)>20 years experience 4 (3.5%)10–20 years experience 3 (2.7%)<10 years experience 9 (8.0%)

Thoracic imaging fellow (n=5)1 year experience 5 (4.4%)

Table 2 Scoring of online cases: Interobserver agreement for thediagnosis categories, ‘UIP’, ‘possible UIP’ and ‘inconsistent withUIP’ expressed as Cohen’s weighted κ coefficient stratifiedaccording to observer experience and specialty

Interobserveragreement

Mean±SDMedian(IQR)

UIP diagnosis categories (UIP, possible UIP, inconsistent with UIP)Thoracic radiology fellows (n=5) 0.47±0.05 0.50 (0.10)Thoracic radiologists (experience <10 years, n=42) 0.50±0.12 0.51 (0.16)Thoracic radiologists (experience 10–20 years, n=27) 0.51±0.11 0.52 (0.20)Thoracic radiologists (experience >20 years, n=22) 0.48±0.14 0.51 (0.18)General radiologists (n=16) 0.45±0.13 0.48 (0.18)

Binary diagnosis score (Typical UIP or Possible UIP/inconsistent with UIP)Thoracic radiology fellows (n=5) 0.36*Thoracic radiologists (experience <10 years, n=42) 0.42*Thoracic radiologists (experience 10–20 years, n=27) 0.39*Thoracic radiologists (experience >20 years, n=22) 0.40*General radiologists (n=16) 0.41*

The ‘possible UIP’ and ‘inconsistent with UIP’ categories were combined to generatea binary ‘typical UIP or possible UIP/inconsistent with UIP’ score. Interobserveragreement expressed as Cohen’s κ coefficient for this binary categorisation, stratifiedaccording to observer experience and specialty.*Unweighted κ.UIP, usual interstitial pneumonia.

Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252 47

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from

critical role in the evaluation of patients with suspected IPF andonce performed, significantly influences subsequent manage-ment decisions.

Reasonable levels of observer agreement are a requisite fordiagnostic criteria to be clinically useful. As up to two-thirdsof patients with IPF/UIP are diagnosed based upon CT appear-ances alone, the issue of interobserver agreement between radi-ologists for this diagnosis is important.2 3 16 Despite this, theinterobserver agreement for the most recent ATS/ERS/JRS/ALAT CT criteria for IPF/UIP is not known. In a study by Azizet al, observer agreement between 11 thoracic radiologists wasevaluated for diagnosis of 131 cases of different diffuse lungdiseases. In this study, agreement for a diagnosis of IPF/UIPwas reported as good (κw=0.63). 7 In contrast, a study byLynch et al,5 which involved 315 cases of IPF/UIP, reportedlow levels of observer agreement between two expert thoracicradiologists for a CT pattern considered ‘consistent with IPF’(κw=0.33). Thomeer et al6 reported similar, low levels ofagreement between three expert thoracic radiologists for a

radiological diagnosis of typical UIP (κw=0.40). Most recently,in a paper by Assayag et al,17 CT appearances of 69 cases ofbiopsy proven rheumatoid-related interstitial lung were evalu-ated by two experienced thoracic radiologists applying a binaryscore of ‘definite UIP’ or ‘not’ (based upon current ATS/ERS/JRS/ALAT CT criteria), and reported a κ coefficient of 0.67 forthis score. A limitation of these studies is that all, with the excep-tion of one,17 predate the most recent diagnostic guidelines andtherefore may not easily be interpreted in the context of currentrecommendations. In addition, all employed small numbers ofacademic radiologists from tertiary referral centres with specificexpertise in the evaluation of patients with IPF.5 6 In contrast, theresults of our study demonstrate that, in a large diverse group ofthoracic radiologists, the interobserver agreement for the currentCT criteria for a diagnosis of IPF/UIP is only moderate.Furthermore, no significant differences in agreement weredemonstrated between observer subgroups of different levels ofexperience. These results are reinforced by the finding that inter-observer agreement was not improved among thoracic

Figure 1 (A) Biopsy proven usual interstitial pneumonia (UIP) in a patient with a multidisciplinary diagnosis of rheumatoid arthritis related fibroticlung disease. Assigned CT diagnoses expressed as a percentage of 116 observers were: definite UIP 20%, possible UIP 36.5%, inconsistent with UIP42.6%. In this case 62.6% of observers assigned a grade of definitely present for honeycombing. (B) Biopsy proven UIP in a patient with amultidisciplinary diagnosis of rheumatoid arthritis related fibrotic lung disease. Assigned CT diagnoses expressed as a percentage of 116 observerswere: definite UIP 20%, possible UIP 36.5%, inconsistent with UIP 42.6%. In this case 62.6% of observers assigned a grade of definitely present forhoneycombing. (C) Biopsy proven UIP in a patient with a multidisciplinary diagnosis of rheumatoid arthritis related fibrotic lung disease. AssignedCT diagnoses expressed as a percentage of 116 observers were: definite UIP 20%, possible UIP 36.5%, inconsistent with UIP 42.6%. In this case62.6% of observers assigned a grade of definitely present for honeycombing. (D) Biopsy proven UIP in a patient with a multidisciplinary diagnosis ofrheumatoid arthritis related fibrotic lung disease. Assigned CT diagnoses expressed as a percentage of 116 observers were: definite UIP 20%,possible UIP 36.5%, inconsistent with UIP 42.6%. In this case 62.6% of observers assigned a grade of definitely present for honeycombing.

48 Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from

radiologists of greater than 20 years experience, when the studywas repeated using thin-section volumetric CT.

The rationale for converting the 3-point diagnosis score (UIP,possible UIP, inconsistent with UIP) to a binary score (UIP orpossible UIP/inconsistent with UIP) was that based upon currentguidelines, the key radiological determination is identifyingpatients whose CT allows a confident diagnosis of IPF/UIP to bemade, therefore avoiding the need for SLB. In addition, this dis-tinction has prognostic implications in the setting of IPF—atypical UIP pattern on CT may confer a worse prognosis thanthose cases of IPF who present with atypical CT appearances ofUIP.18 The overall level of interobserver agreement for the entirecohort of observers for this binary diagnosis score was moder-ate, regardless of observer experience, whether the observer wasa thoracic radiologist or general radiologist. Recently, Grudenet al10 reported that in the absence of honeycombing, a hetero-geneous pattern of fibrosis on CT maybe sufficient to secure adiagnosis of UIP. In our study, a second binary score of ‘UIP andpossible UIP’ versus ‘inconsistent with UIP’ was generated andinterobserver agreement for this categorisation was also

Figure 2 (A) Biopsy proven usual interstitial pneumonia (UIP) in a patient with a multidisciplinary diagnosis of usual interstitial pneumonia (IPF).Assigned CT diagnoses expressed as a percentage of 116 observers were: definite UIP 73.9%, possible UIP 21.7%, inconsistent with UIP (4.3%). Inthis case 91.3% of observers assigned a grade of definitely present for honeycombing. (B) Biopsy proven UIP in a patient with a multidisciplinarydiagnosis of IPF. Assigned CT diagnoses expressed as a percentage of 116 observers were: definite UIP 73.9%, possible UIP 21.7%, inconsistentwith UIP (4.3%). In this case 91.3% of observers assigned a grade of definitely present for honeycombing. (C) Biopsy proven UIP in a patient with amultidisciplinary diagnosis of IPF. Assigned CT diagnoses expressed as a percentage of 116 observers were: definite UIP 73.9%, possible UIP 21.7%,inconsistent with UIP (4.3%). In this case 91.3% of observers assigned a grade of definitely present for honeycombing. (D) Biopsy proven UIP in apatient with a multidisciplinary diagnosis of IPF. Assigned CT diagnoses expressed as a percentage of 116 observers were: definite UIP 73.9%,possible UIP 21.7%, inconsistent with UIP (4.3%). In this case 91.3% of observers assigned a grade of definitely present for honeycombing.

Table 3 Scoring of online cases: The ‘possible UIP’ and‘inconsistent with UIP’ categories were combined to generate abinary ‘typical UIP or possible UIP/inconsistent with UIP’ score

Binary diagnosis score (Typical UIP or PossibleUIP/inconsistent with UIP)

Interobserveragreement*

Disease subgroupsMDT diagnosis—CHP (n=44) 0.39MDT diagnosis—CTD-ILD (n=51) 0.35MDT diagnosis—Idiopathic FLD (n=55) 0.38

Age subgroupsPatient age <50 years (n=25) 0.31

Patient age 50–60 years (n=40) 0.39Patient age 60–70 years (n=48) 0.42Patient age >70 years (n=37) 0.40

*Interobserver agreement expressed as Cohen’s κ coefficient for this binarycategorisation stratified according to multidisciplinary diagnosis and patient age.CHP, chronic hypersensitivity pneumonitis; CTD-ILD, connective tissue disease relatedinterstitial lung disease; FLD, fibrotic lung disease; MDT, multidisciplinary team; UIP,usual interstitial pneumonia.

Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252 49

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from

moderate. Given the results of the current study, it might be thatthe best use of HRCT in this setting may be as the starting pointof a diagnostic process which is dynamic, includes diseasebehaviour and prognostic biomarkers, and which focuses lessstringently on specific HRCT patterns.

The level of agreement for honeycombing might be anexplanation for the source of disagreement between the UIPdiagnosis categories. Interobserver agreement for honeycomb-ing was moderately better than for the UIP diagnosis categor-ies. This suggests that although some disagreement for thediagnosis of UIP may have been caused by discrepancies with

regards to the presence of honeycombing (the differencebetween a ‘UIP’ and ‘possible UIP’), a smaller proportion ofdisagreement may have related to the distribution of the honey-combing change (the difference between ‘UIP’ and an ‘incon-sistent with UIP’ CT pattern). For example, in cases whereemphysema and fibrosis coexist, observers might agree thathoneycombing was present, but might disagree on nature ofupper lobe cystic change, with some observers calling this hon-eycombing (and therefore regarding the distribution ofhoneycombing as atypical for UIP) but others calling thesechanges emphysema and deciding that the true honeycombchange predominated in the lower lobes (figure 1A–D).Recently, Watadani et al reported levels of observer agreementfor honeycombing similar to those demonstrated in the currentstudy.8 The current study extends the findings of Watadani et alby relating interobserver agreement for honeycombing to theevaluation of CT appearances in patients with suspectedIPF/UIP. Although speculative, agreement on the individualfeatures considered ‘inconsistent with a UIP pattern’ might alsohave impacted overall agreement on the diagnosis categories.For example, the presence of subtle mosaic attenuation orground glass opacification (both of which may be seen incardiac failure—a complication of IPF/UIP) on a HRCT whichotherwise has typical UIP features might result in a diagnosiscategory of ‘inconsistent with UIP’. A limitation of the currentstudy is that in cases considered inconsistent with UIP, we didnot require observers to specify why.

A number of issues with regards to our methodology warrantdiscussion. First, the study was performed in two stages: thefirst stage involved the scoring of cases online and the secondinvolved the scoring of full volumetric thin-section CT data.The decision to perform the second stage of the study wasmade after the results of the first stage of the study were avail-able and was designed to test if the level of interobserver agree-ment was improved when observers had access to fullvolumetric thin-section CT. Second, we intentionally did notsupply observers with clinical data. The primary aim of thestudy was to evaluate interobserver agreement for the ATS/ERS/JRS/ALAT CT criteria for IPF/UIP, which do not specify clinicalcriteria.2 Interobserver agreement for a MDT diagnosis (whichis a related but separate issue and would include all availableclinical information) in the setting of fibrotic lung disease mightbe a logical follow-up study. Third, in common with otherstudies of interobserver agreement, we did not use an independ-ent ‘gold standard’ against which observers’ scores were evalu-ated.7 The primary goal of the study was to quantify levels ofobserver agreement for a CT diagnosis of UIP based uponcurrent ATS/ERS/JRS/ALAT CT criteria rather than accuracy ofCT diagnosis, which is a separate, albeit related, issue. Fourth,our patient population was a selection of consecutive cases offibrotic lung disease referred to our interstitial lung disease unit.We excluded cases of fibrotic sarcoidosis on the basis that thisdiagnosis is in most cases straightforward and because inclusionof these patients might disproportionately increase the numberof cases with an ‘inconsistent with UIP’ CT diagnosis. The usualdiagnostic difficulty encountered on CT, in the context offibrotic lung disease, is the separation of patients with IPF/UIP,fibrotic NSIP and CHP which can only be achieved based onCT appearances alone in approximately 50% of cases.19

Furthermore, when IPF/UIP presents with non-classical CTappearances, the usual alternative diagnoses are fibrotic NSIP orCHP.20 Fifth, we did not confine cases to those with a biopsyproven diagnosis, as this would effectively eliminate patientswith typical UIP features.

Table 4 Scoring of online cases: Interobserver agreement for the3-point HRCT pattern scores for honeycombing, tractionbronchiectasis and emphysema expressed as Cohen’s weighted κcoefficient stratified according to observer experience and specialty

Interobserveragreement

Mean±SDMedian(IQR)

Honeycombing score (Definite, possible, absentThoracic radiology fellows (n=5) 0.60±0.14 0.65 (0.23)Thoracic radiologists (experience <10 years, n=42) 0.61±0.11 0.63 (0.14)Thoracic radiologists (experience 10–20 years, n=27) 0.60±0.12 0.63 (0.16)Thoracic radiologists (experience >20 years, n=22) 0.59±0.10 0.60 (0.13)General radiologists (n=16) 0.57±0.11 0.58 (0.14)

Traction bronchiectasis score (Definite, possible, absent)Thoracic radiology fellows (n=5) 0.28±0.13 0.32 (0.25)Thoracic radiologists (experience <10 years, n=42) 0.40±0.15 0.42 (0.22)Thoracic radiologists (experience 10–20 years, n=27) 0.45±0.15 0.48 (0.23)Thoracic radiologists (experience >20 years, n=22) 0.45±0.13 0.46 (0.18)General radiologists (n=16) 0.41±0.14 0.44 (0.17)

Emphysema score (Definite, possible, absent)Thoracic radiology fellows (n=5) 0.41±0.08 0.40 (0.13)Thoracic radiologists (experience <10 years, n=42) 0.54±0.15 0.55 (0.22)Thoracic radiologists (experience 10–20 years, n=27) 0.53±0.13 0.54 (0.19)Thoracic radiologists (experience >20 years, n=22) 0.42±0.17 0.43 (0.28)General radiologists (n=16) 0.43±0.15 0.40 (0.22)

Table 5 Interobserver agreement for the diagnosis categories andCT pattern scores assessed on volumetric thin-section CT, expressedas Cohen’s weighted κ coefficient stratified according to observerexperience

Interobserver agreement

Mean±SD Median (IQR)

UIP diagnosis categories (UIP, possible UIP, inconsistent with UIP)Under 10 years experience 0.50±0.17 0.54 (0.17)Over 20 years experience 0.39±0.15 0.40 (0.12)

Honeycombing score (Definite, possible, absent)Under 10 years experience 0.51±0.18 0.53 (0.11)Over 20 years experience 0.47±0.13 0.48 (0.09)

Traction bronchiectasis score (Definite, possible, absent)Under 10 years experience 0.62±0.14 0.65 (0.09)Over 20 years experience 0.47±0.16 0.45 (0.14)

Emphysema score (Definite, possible, absent)Under 10 years experience 0.58±0.11 0.57 (0.11)Over 20 years experience 0.36±0.20 0.34 (0.18)

UIP, usual interstitial pneumonia.

50 Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from

We had no control over the observers who participated in thestudy. An open invitation was made to one national (BSTI) andfour international (ESTI, STR, KSTR, ISCR) thoracic imagingsocieties without any exclusion criteria. However, in order toevaluate the performance of general thoracic radiologists whoroutinely provide opinions on CT studies, our approach forenrolling observers was necessarily broad. This is in contrast tomost previous studies where observers are preselected becauseof their expertise, which conceivably may reduce their clinicalapplicability to the general thoracic radiologist population. As ithas been suggested by some that the current ATS/ERS/JRS/ALAT guidelines require expertise that may not always be avail-able, evaluating the performance of thoracic radiologists ofvarying levels of expertise is clinically important.

In conclusion, we have demonstrated in a large number ofthoracic radiologists, of varying levels of experience, thatinterobserver agreement for the most recent ATS/ERS/JRS/ALAT CT criteria for a diagnosis of IPF/UIP is at best onlymoderate. As CT diagnosis plays an important role in influen-cing management decisions during the initial evaluation ofpatients with suspected IPF, accurate and consistent applica-tion of these guidelines among thoracic radiologists is clinic-ally important. Based upon the results of this study,modification of these criteria may be necessary to improveobserver agreement.

Collaborators The UIP Observer Consort are Joseph Jacob, Anand Devaraj,Giampaola Gavelli, Hrudaya Nath, Joseph Tashjian, Denis Tack, ZsuzsannaMonostori, Takeshi Johkoh, Anne Davies, Maurizio Zompatori, Roberta Polverosi,Ennio Vincenzo Sassani, Gilbert R Ferretti, Can Zafer Karaman, Gianluigi Sergiacomi,Tomas Franquet, Kavita Garg, Richard Mannion, Paula Campos, Robert Karl, PaulBurrowes, Franco Quagliarini, Louise Norlen, John T Murchison, Antoine Khalil,Sundeep M Nayak, Alain Nchimi, Nevzat Karabulut, Anna Rita Larici, SuzanneMatthews, Eva Castañer, Juan Arenas-Jimenez, Ralph Drosten, Ryoko Egashira,Hyun-Ju Lee, Santiago Rossi, Mosleh Al-Raddadi, Anastasia Oikonomou, Anagha PParkar, Pierre Yves Brillet, Wagner Diniz de Paula, Laurent Medart, FlorianPoschenrieder, Igor Pozek, Taiwo Senbanjo, Katharina Marten-Engelke, JuntimaEuathrongchit, Cal Delaplain, Giancarlo Cortese, Gian Alberto Soardi, AleksandarGrgic, Jeffrey Kanne, Michelle Muller, John Bruzzi, Adrien Jankowski, GracijelaBozovic, Andrea Goncalves, Justus Roos, Daniel Vargas, Mark Elias, DjenabaBradford-Kennedy, Danielle Seaman, Brett Memauri, Humera Chaudhary, FoongWong, Julia Alegria, H Henry Guo, Helmut Prosch, Luce Cantin, Goffredo Serra, ElisaBaratella, Rosalba Silecchia, Aziza Icksan, Adam Wallis, Sidhdharth Damani, RahulRenapurkar, Samanjit Hare, Annemilia del Ciello, Jennifer Jung, Mario Silva,Sushilkumar Sonavane, Sue Thomas, Anna Beattie, Benedikt Kislinger, Leo Ca, GirishShroff, Hester Gietema, Yuranga Weerakkody, Stephen Hobbs, Shinyu Izumi, TomooKishaba, Akira Shiraki, Yuko Waseda, Maria Chiara Castoldi, Vincent Herpels,Antonio Pais, Roger Matus, Enrico Lubin, Ben Wilson, Mathias Mueller, ShadhaAhmed Alzubaidi, Alain Ortiz, lara Sequeiros, Nicola Boscolo Bariga, Mark Wills, SzeMun Mak, Giuseppe Aquaro, Esin Cakmakci Midia, Nicola Schembri, Joseph A MSheehan, Alexia Farrugia, Jennifer Tomich, J M Fernandez Garcia-Hierro.

Funding This study was supported by the NIHR Respiratory Disease BiomedicalResearch Unit at the Royal Brompton and Harefield NHS Foundation Trust andImperial College London. DMH is the recipient of an NIHR Senior Investigator award.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

REFERENCES1 Travis WD, Costabel U, Hansell DM, et al. An official American Thoracic Society/

European Respiratory Society statement: update of the international multidisciplinaryclassification of the idiopathic interstitial pneumonias. Am J Respir Crit Care Med2013;188:733–48.

2 Raghu G, Collard HR, Egan JJ, et al. An official ATS/ERS/JRS/ALAT statement:idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis andmanagement. Am J Respir Crit Care Med 2011;183:788–824.

3 Hunninghake GW, Zimmerman MB, Schwartz DA, et al. Utility of a lung biopsy forthe diagnosis of idiopathic pulmonary fibrosis. Am J Respir Crit Care Med2001;164:193–6.

4 Raghu G, Mageto YN, Lockhart D, et al. The accuracy of the clinical diagnosis ofnew-onset idiopathic pulmonary fibrosis and other interstitial lung disease: aprospective study. Chest 1999;116:1168–74.

5 Lynch DA, Godwin JD, Safrin S, et al. High-resolution computed tomography inidiopathic pulmonary fibrosis: diagnosis and prognosis. Am J Respir Crit Care Med2005;172:488–93.

6 Thomeer M, Demedts M, Behr J, et al. Multidisciplinary interobserver agreement inthe diagnosis of idiopathic pulmonary fibrosis. Eur Respir J 2008;31:585–91.

7 Aziz ZA, Wells AU, Hansell DM, et al. HRCT diagnosis of diffuse parenchymal lungdisease: inter-observer variation. Thorax 2004;59:506–11.

8 Watadani T, Sakai F, Johkoh T, et al. Interobserver variability in the CT assessmentof honeycombing in the lungs. Radiology 2013;266:936–44.

9 Brennan P, Silman A. Statistical methods for assessing observer variability in clinicalmeasures. BMJ 1992;304:1491–4.

10 Gruden JF, Panse PM, Leslie KO, et al. UIP diagnosed at surgical lung biopsy,2000–2009: HRCT patterns and proposed classification system. AJR Am JRoentgenol 2013;200:W458–67.

11 American Thoracic Society/European Respiratory Society InternationalMultidisciplinary Consensus Classification of the Idiopathic Interstitial Pneumonias.This joint statement of the American Thoracic Society (ATS), and the EuropeanRespiratory Society (ERS) was adopted by the ATS board of directors, June 2001and by the ERS Executive Committee, June 2001. Am J Respir Crit Care Med2002;165:277–304.

12 Lamas DJ, Kawut SM, Bagiella E, et al. Delayed access and survival in idiopathicpulmonary fibrosis: a cohort study. Am J Respir Crit Care Med 2011;184:842–7.

13 Mathieson JR, Mayo JR, Staples CA, et al. Chronic diffuse infiltrative lung disease:comparison of diagnostic accuracy of CT and chest radiography. Radiology1989;171:111–16.

14 Grenier P, Valeyre D, Cluzel P, et al. Chronic diffuse interstitial lung disease: diagnosticvalue of chest radiography and high-resolution CT. Radiology 1991;179:123–32.

15 Lee KS, Primack SL, Staples CA, et al. Chronic infiltrative lung disease: comparisonof diagnostic accuracies of radiography and low- and conventional-dose thin-sectionCT. Radiology 1994;191:669–73.

16 Swensen SJ, Aughenbaugh GL, Myers JL. Diffuse lung disease: diagnostic accuracy ofCT in patients undergoing surgical biopsy of the lung. Radiology 1997;205:229–34.

17 Assayag D, Elicker BM, Urbania TH, et al. Rheumatoid arthritis-associated interstitiallung disease: radiologic identification of usual interstitial pneumonia pattern.Radiology 2014;270:583–8.

18 Flaherty KR, Thwaite EL, Kazerooni EA, et al. Radiological versus histologicaldiagnosis in UIP and NSIP: survival implications. Thorax 2003;58:143–8.

19 Silva CI, Müller NL, Lynch DA, et al. Chronic hypersensitivity pneumonitis:differentiation from idiopathic pulmonary fibrosis and nonspecific interstitialpneumonia by using thin-section CT. Radiology 2008;246:288–97.

20 Sverzellati N, Wells AU, Tomassetti S, et al. Biopsy-proved idiopathic pulmonaryfibrosis: spectrum of nondiagnostic thin-section CT diagnoses. Radiology2010;254:957–64.

Walsh SLF, et al. Thorax 2016;71:45–51. doi:10.1136/thoraxjnl-2015-207252 51

Interstitial lung disease on 4 M

ay 2019 by guest. Protected by copyright.

http://thorax.bmj.com

/T

horax: first published as 10.1136/thoraxjnl-2015-207252 on 19 Novem

ber 2015. Dow

nloaded from