7
The Laryngoscope V C 2011 The American Laryngological, Rhinological and Otological Society, Inc. Comprehensive Assessment of Thyroidectomy Skills Development: A Pilot Project David A. Diaz Voss Varela, MD; Mohammad U. Malik, MD; Carol B. Thompson, MS, MBA; Charles W. Cummings, MD; Nasir I. Bhatti, MD; Ralph P. Tufano, MD Objectives/Hypothesis: To test the validity, reliability, and feasibility of an evaluation tool designed to measure the development of trainees’ surgical skills in the operating room for thyroid surgery. Study Design: Prospective validation study. Methods: A modified Delphi technique was employed to develop a new Objective Structured Assessment of Technical Skills–based instrument for thyroid surgery. During a 1-year period, 16 otolaryngology–head and neck surgery residents (ranging from postgraduate year 2 to 6) and one endocrine surgery fellow were evaluated by one faculty member obtaining a total of 94 evaluations. Performance was rated using a task-based checklist (TBC) and a global rating scale (GRS). The TBC measured trainees’ thyroidectomy technical skills, and the GRS assessed their overall surgical performance. Results: Based on four clinical levels (junior, intermediate, senior, and surgical fellow) our tool demonstrated construct validity for both components of the assessment instrument, specifically for the TBC showing a mean difference of 0.9 (95% confidence interval: 0.5-1.3, P < .001) between the contiguous clinical levels senior versus intermediate. Cronbach a,a measure of internal consistency, was 0.96 for both components of the instrument. The correlation between the TBC and GRS was also high within trainee (r ¼ 0.62, n ¼ 94, P < .001) and across trainees (r ¼ 0.96, n ¼ 17, P < .001). Conclusions: Our tool proved to be a valid, reliable, and feasible instrument for assessing competency in thyroid surgery. It is effective in providing timely formative feedback during and upon the conclusion of the surgical procedure by identifying procedural tasks for which additional training is necessary. In addition, it enables longitudinal tracking of resi- dents’ surgical performance, thus ensuring their appropriate development. Key Words: Accreditation Council for Graduate Medical Education (ACGME), surgical competency, education, thyroidectomy, core competencies, surgical skills assessment, otolaryngology. Level of Evidence: 1b. Laryngoscope, 122:103–109, 2012 INTRODUCTION In the past decade, graduate medical education has experienced major changes that are paving the way for a new outcome-based education. Residency programs are now expected to produce competent physicians because of increased pressure by the public and by their respec- tive specialty boards. 1 In 2001, the Accreditation Council for Graduate Medical Education (ACGME) implemented the Outcome Project, whose mission statement is to enhance residency education through outcomes. 1 Cur- rently, residency programs are required to objectively measure their trainees for six core competencies: patient care, medical knowledge, practice-based learning and improvement, interpersonal and communication skills, professionalism, and systems-based practice. Although most of these competencies can be relatively easy to assess, measuring trainees’ operative competence (an integral part of patient care) in surgical specialties may pose a challenge owing to the lack of standardized objec- tive assessment tools. 2 Work-hour limitations and other barriers, such as faculty workload and insufficient administrative sup- port, may hinder the program directors’ ability to fully comply with the ACGME mandate. 3 The addition of objective, quantifiable measures to the subjective end-of- rotation assessments that have traditionally guided trainee evaluations would help program directors meet the ACGME’s requirements. Therefore, the creation and implementation of objective, valid, and reliable assess- ment tools for surgical competency are deemed essential. In 1997, Martin et al. 4 developed a performance-based evaluation for assessing trainees’ surgical competence, also known as Objective Structured Assessment of Tech- nical Skills (OSATS). Subsequently, several surgical specialties, such as general surgery, ophthalmology, and obstetrics and gynecology, have successfully adopted this evaluation format to assess their residents’ surgical skills. 5–7 In otolaryngology–head and neck surgery From the Department of Otolaryngology–Head and Neck Surgery, Johns Hopkins University School of Medicine (D.A.D.V .V ., M.U.M., C.W.C., N.I.B., R.P .T.); and the Biostatistics Center, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health (C.B.T.), Baltimore, Maryland, U.S.A. Editor’s Note: This Manuscript was accepted for publication Sep- tember 8, 2011. Ralph P. Tufano, MD, uses the Medtronic NIM monitor while per- forming and assessing residents’ performance in thyroid surgery. The authors have no funding, financial relationships, or conflicts of interest to disclose. Send correspondence to Nasir I. Bhatti, MD, 601 N. Caroline St., Suite 6241, Johns Hopkins Outpatient Center, Baltimore, MD 21287. E-mail: [email protected] DOI: 10.1002/lary.22381 Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development 103

Comprehensive assessment of thyroidectomy skills development: A pilot project

Embed Size (px)

Citation preview

Page 1: Comprehensive assessment of thyroidectomy skills development: A pilot project

The LaryngoscopeVC 2011 The American Laryngological,Rhinological and Otological Society, Inc.

Comprehensive Assessment of Thyroidectomy Skills Development:A Pilot Project

David A. Diaz Voss Varela, MD; Mohammad U. Malik, MD; Carol B. Thompson, MS, MBA;

Charles W. Cummings, MD; Nasir I. Bhatti, MD; Ralph P. Tufano, MD

Objectives/Hypothesis: To test the validity, reliability, and feasibility of an evaluation tool designed to measure thedevelopment of trainees’ surgical skills in the operating room for thyroid surgery.

Study Design: Prospective validation study.Methods: A modified Delphi technique was employed to develop a new Objective Structured Assessment of Technical

Skills–based instrument for thyroid surgery. During a 1-year period, 16 otolaryngology–head and neck surgery residents(ranging from postgraduate year 2 to 6) and one endocrine surgery fellow were evaluated by one faculty member obtaininga total of 94 evaluations. Performance was rated using a task-based checklist (TBC) and a global rating scale (GRS). The TBCmeasured trainees’ thyroidectomy technical skills, and the GRS assessed their overall surgical performance.

Results: Based on four clinical levels (junior, intermediate, senior, and surgical fellow) our tool demonstrated constructvalidity for both components of the assessment instrument, specifically for the TBC showing a mean difference of 0.9 (95%confidence interval: 0.5-1.3, P < .001) between the contiguous clinical levels senior versus intermediate. Cronbach a, ameasure of internal consistency, was 0.96 for both components of the instrument. The correlation between the TBC and GRSwas also high within trainee (r ¼ 0.62, n ¼ 94, P < .001) and across trainees (r ¼ 0.96, n ¼ 17, P < .001).

Conclusions: Our tool proved to be a valid, reliable, and feasible instrument for assessing competency in thyroidsurgery. It is effective in providing timely formative feedback during and upon the conclusion of the surgical procedure byidentifying procedural tasks for which additional training is necessary. In addition, it enables longitudinal tracking of resi-dents’ surgical performance, thus ensuring their appropriate development.

Key Words: Accreditation Council for Graduate Medical Education (ACGME), surgical competency, education,thyroidectomy, core competencies, surgical skills assessment, otolaryngology.

Level of Evidence: 1b.Laryngoscope, 122:103–109, 2012

INTRODUCTIONIn the past decade, graduate medical education has

experienced major changes that are paving the way for anew outcome-based education. Residency programs arenow expected to produce competent physicians becauseof increased pressure by the public and by their respec-tive specialty boards.1 In 2001, the Accreditation Councilfor Graduate Medical Education (ACGME) implementedthe Outcome Project, whose mission statement is toenhance residency education through outcomes.1 Cur-rently, residency programs are required to objectivelymeasure their trainees for six core competencies: patient

care, medical knowledge, practice-based learning andimprovement, interpersonal and communication skills,professionalism, and systems-based practice. Althoughmost of these competencies can be relatively easy toassess, measuring trainees’ operative competence (anintegral part of patient care) in surgical specialties maypose a challenge owing to the lack of standardized objec-tive assessment tools.2

Work-hour limitations and other barriers, such asfaculty workload and insufficient administrative sup-port, may hinder the program directors’ ability to fullycomply with the ACGME mandate.3 The addition ofobjective, quantifiable measures to the subjective end-of-rotation assessments that have traditionally guidedtrainee evaluations would help program directors meetthe ACGME’s requirements. Therefore, the creation andimplementation of objective, valid, and reliable assess-ment tools for surgical competency are deemed essential.In 1997, Martin et al.4 developed a performance-basedevaluation for assessing trainees’ surgical competence,also known as Objective Structured Assessment of Tech-nical Skills (OSATS). Subsequently, several surgicalspecialties, such as general surgery, ophthalmology, andobstetrics and gynecology, have successfully adopted thisevaluation format to assess their residents’ surgicalskills.5–7 In otolaryngology–head and neck surgery

From the Department of Otolaryngology–Head and Neck Surgery,Johns Hopkins University School of Medicine (D.A.D.V.V., M.U.M., C.W.C.,N.I.B., R.P.T.); and the Biostatistics Center, Department of Biostatistics,Johns Hopkins Bloomberg School of Public Health (C.B.T.), Baltimore,Maryland, U.S.A.

Editor’s Note: This Manuscript was accepted for publication Sep-tember 8, 2011.

Ralph P. Tufano, MD, uses the Medtronic NIM monitor while per-forming and assessing residents’ performance in thyroid surgery.

The authors have no funding, financial relationships, or conflictsof interest to disclose.

Send correspondence to Nasir I. Bhatti, MD, 601 N. Caroline St.,Suite 6241, Johns Hopkins Outpatient Center, Baltimore, MD 21287.E-mail: [email protected]

DOI: 10.1002/lary.22381

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

103

Page 2: Comprehensive assessment of thyroidectomy skills development: A pilot project

(OHNS), several groups have been able to develop,implement, and validate feasible and reliable instru-ments for various surgical procedures: endoscopic sinussurgery, mastoidectomy, and rigid bronchoscopy anddirect laryngoscopy.2,8,9

Thyroidectomy, a core procedure in OHNS, is agood example of a surgical procedure for which residentsneed to be able to evaluate the patient in a compre-hensive manner before surgery while integratingtheir knowledge of the complex detail of head and neckanatomy with their raw surgical skills.10 These charac-teristics make this procedure an ideal candidate forevaluating a trainee’s surgical competence with anOSATS-based assessment tool. Therefore, the purpose ofthis study was to develop, implement, and pilot-test anewly developed two-component (a task-based checklist[TBC] and a global rating scale [GRS]) thyroidectomyevaluation instrument for head and neck surgery train-ees. In addition, we wanted to evaluate this tool for itsfeasibility and construct validity for the assessment oftechnical skills in primary thyroid surgery in the operat-ing room (OR).

MATERIALS AND METHODS

Study Design and ParticipantsAfter obtaining approval from the institutional review

board at the Johns Hopkins Hospital, we proceeded with thisprospective pilot study of thyroidectomy skills development andassessment in head and neck surgery trainees during a periodof 1 year. Sixteen residents, ranging from postgraduate year(PGY) 2 to 6, from the department of OHNS at the Johns Hop-kins University and one endocrine surgery fellow (PGY-7) (in anAmerican Association of Endocrine Surgeons accreditedfellowship who finished an ACGME accredited general surgeryresidency) from the same institution were observed while per-forming thyroid surgery in the OR. All trainees were evaluatedwith a newly developed TBC and GRS for thyroid surgery atthe end of each procedure by one faculty member of the divisionof Head and Neck Surgery whose practice primarily focuses onthyroid and parathyroid surgery.

Components of the Assessment ToolA modified Delphi technique was used to develop the con-

tents of a new OSATS-based instrument for thyroid surgery bya panel of head and neck surgeons with thyroid surgery experi-ence in the OHNS department. With a 5-point Likert scale

Fig. 1. Task-based checklist component of the thyroid surgery assessment tool.

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

104

Page 3: Comprehensive assessment of thyroidectomy skills development: A pilot project

linked to descriptors at the middle and ends of each scaleitem, a thyroidectomy TBC was developed with detailed facultyinput. After extensive review, 10 items, considered the critical,specific, and assessable tasks for achieving the goals ofa thyroidectomy procedure, were included in this checklist(Fig. 1).

Based on a previously developed and validated tool toassess technical skills in the OR by Winckel et al.,11 a secondcomponent of the evaluation instrument (GRS) for the sameprocedure was created by the same panel of faculty members(Fig. 2). The purpose of the GRS is to evaluate a trainee’s over-all surgical performance. It also aims to measure visual-motor

Fig. 2. Global rating scale component of the thyroid surgery assessment tool.

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

105

Page 4: Comprehensive assessment of thyroidectomy skills development: A pilot project

and cognitive performance required to perform a thyroidectomyin a safe and successful manner. A descriptive five-point Likertscale was also developed for each of its nine items. Two addi-tional yes/no questions and a ‘‘procedure comments’’ sectionwere included to help the evaluating faculty member providestructured and timely formative feedback to the trainee at theend of each case. The first yes/no question queried whether therater had provided feedback during and after the procedure tothe resident. The second one queried whether the faculty mem-ber thought the trainee was competent to perform the surgeryindependently. The ‘‘procedure comments’’ section was useful forthe evaluator to rank each procedure as a standard or difficultcase. Similar TBC and GRS components have been validated forother OHNS core procedures.2,8,12

Statistical AnalysisConstruct validity was evaluated for each component of

the tool by comparing trainees’ mean percentage scores acrossadvancing PGY levels using general linear model analysis,adjusting for multiple evaluations per trainee. The inter-itemreliability for both the TBC and the GRS was measured byassessing their internal consistency with Cronbach a. A value of.80 was considered acceptable. Correlations between TBC andGRS scores were performed within trainee13 and across train-ees.14 All data analyses were performed using STATA version11.1 (StataCorp LP, College Station, TX) software.

RESULTSA total of 94 evaluations were completed for 17

trainees across six PGY levels (levels 2–7). They wereevaluated by a single faculty member from the divisionof Head and Neck Surgery as they performed thyroidsurgery in the OR during a period of 1 year.

In this pilot study, the evaluation instrument wasconsidered feasible to use based on the rater’s high rateof usage and the time taken to complete each evaluation(median time of 2 minutes; range, 1–7 minutes). In addi-tion, the evaluator found this assessment tool to beunderstandable, simple, and practical. Moreover, train-ees found the tool to be useful in providing timelyformative feedback during and after each case, thusimproving their performance in subsequent evaluations.

The tool’s internal consistency, used to determineits inter-item reliability, was evaluated with Cronbach a.The Cronbach a on the 94 observations was 0.96 for boththe TBC and the GRS. Because Cronbach a does notadjust for the multiple evaluations per trainee, threeadditional evaluations were performed based on oneevaluation for each of the 17 trainees: their earliest eval-uation, their latest evaluation, and a randomly selectedevaluation. The a values for the TBC were 0.96, 0.98,and 0.97, respectively. The a values for the GRS were0.96, 0.96, and 0.96, respectively. All are consistent withthe Cronbach a values obtained on the 94 evaluations.

Correlations between the two components of theassessment tool were also evaluated. Within the sametrainee, an increase in the TBC score was shown to beassociated with an increase in the GRS score, takinginto account that there are multiple evaluations pertrainee (r ¼ 0.62, n ¼ 94, P < .001). Trainees with ahigh value on the TBC score also tended to have a highvalue on the GRS score, taking into account multipleevaluations per trainee (r ¼ 0.96, n ¼ 17, P < .001).

For this study, both the TBC and the GRS demon-strated construct validity. To capture major gradations ofexperience, rather than just by PGY level, we createdfour clinical groups to distinguish levels of trainees forthis analysis: junior (PGY-2), intermediate (PGY-3 and4), senior (PGY-5 and 6), and clinical surgical fellow(PGY-7). Our results show a trend of achieving a higheraverage score with advancing clinical groups for boththe TBC (Fig. 3) and the GRS (Fig. 4).

Following a general linear model analysis adjustingfor within trainee correlation of evaluations, we per-formed pairwise comparisons between contiguousclinical groups adjusting for an experiment-wise errorrate. Table I shows comparisons made for the TBC. Sim-ilarly, Table II shows comparisons made for the GRS.The mean difference between intermediate and juniortrainees was not significant for either the TBC (0.23,95% confidence interval [CI]: �0.23 to 0.69), P > .303) orthe GRS (0.23, 95% CI: �0.33 to 0.79, P > .397). How-ever, the mean difference between senior and

Fig. 3. Mean and 95% confidence interval (CI) scores on the task-based checklist (TBC) for trainees at different levels of training.PGY ¼ postgraduate year.

Fig. 4. Mean and 95% confidence interval (CI) scores on theglobal rating scale (GRS) for trainees at different levels of training.PGY ¼ postgraduate year.

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

106

Page 5: Comprehensive assessment of thyroidectomy skills development: A pilot project

intermediate trainees was considered statistically signi-ficant for both the TBC (0.9, 95% CI: 0.5-1.3, P < .001)and the GRS (0.8, 95% CI: 0.31-1.3, P ¼ .02). Similarly,between the clinical surgical fellow group and seniortrainees, the mean difference was found to be statisti-cally significant for both the TBC (0.28, 95% CI: 0.16-0.39, P < .001) and the GRS (0.56, 95%CI: 0.38-0.75, P <.001).

Figure 5 is an example of a trainee’s (PGY-5) meanscores for the TBC and the GRS during the year of eval-uations. The evaluations show a mix of both standardand difficult procedures and increasing scores (learningcurve) during that period. Even though a decrease inthe mean score was seen in a case that was regarded asdifficult, residents achieved higher scores in subsequentcases, thus showing the trainees’ development andimprovement of technical skills.

DISCUSSIONAlthough the ACGME has been striving to change

medical training into an outcome-based education, fullcompliance by residency programs will likely takeseveral years. Most surgical training programs, as theyhave done for many years, assess trainees’ surgical skillswith recall-based observations of surgical cases and withsubjective end-of-rotation faculty evaluations. Althoughthese methods are an important part of residents’ train-ing assessments, they have been shown to have poorreliability and validity.15 Residency program directorsare now responsible for graduating physicians that canobjectively demonstrate competency in all core compe-tencies, including technical skills. Therefore, theincorporation of valid objective assessment tools to thetraditional evaluation process for the assessment of sur-gical skills is essential to guarantee the formation ofcompetent otolaryngologists.

Since the introduction of the Outcome Project, theACGME has followed a timeline composed of four phases

that outline the necessary steps to transform currentmedical education into an outcome-based model.16 Bythe end of phase 3 (July 2006 to June 2011), residencyprograms needed to have a ‘‘full integration of the com-petencies and their assessment with learning andclinical care.’’16 Therefore, objective evaluations aredeemed essential for training programs to comply withsuch a competency mandate, hence our motivation toimplement a newly developed OSATS-based assessmenttool for thyroid surgery. With work-hour limitations inplace and demands for increasing clinical productivity,our study goal was to create an evaluation tool thatcould be feasible, valid, and reliable and that wouldmeet current ACGME terms.

After faculty input on the key procedural stepsrequired to perform thyroid surgery, the TBC was cre-ated. The checklist tasks include those that facultydeemed necessary for a competent execution of this typeof surgery. Being able to detect and then direct teachingand learning resources toward those specific areas(tasks) needed for improvement is the goal of the TBC.9

Moreover, the TBC lets the evaluator give timely forma-tive feedback, which allows for the correction of errors.17

Not only does it provide structure to allow faculty to pro-vide feedback, it can also identify residents in need ofremediation. Program directors then have a chance forearly identification of surgically challenged residentsand an opportunity to remediate and retest their skills.Thus trainees are accountable for their own achievementand improvement of surgical skills. Moreover, all resi-dents are aware of what is being evaluated, making thegoals of the procedure explicit, consistent, and welldefined. Although the purpose of creating such tools isto become as objective as possible in assessing surgicalskills, subjectivity still ensues but not as heavily as withprevious assessment methods.

Stack et al.18 have recently created and published athyroid-specific tool for the assessment of surgical com-petence. Their efforts are greatly appreciated because

TABLE I.Pairwise Comparisons for the Task-Based Checklist Between Contiguous Clinical Groups.

MeanDifference 95% CI P Value

Intermediate trainees (PGY-3 and 4) and junior trainees (PGY-2) 0.23 �0.23 to 0.69 >.303

Senior trainees (PGY-5 and 6) and intermediate trainees 0.9 0.5-1.3 <.001

Clinical surgical fellow (PGY-7) and senior trainees 0.28 0.16-0.39 <.001

CI ¼ confidence interval; PGY ¼ postgraduate year.

TABLE II.Pairwise Comparisons for the Global Rating Scale Between Contiguous Clinical Groups.

Mean Difference 95% CI P Value

Intermediate trainees (PGY-3 and 4) and junior trainees (PGY-2) 0.23 �0.33 to 0.79 >.397

Senior trainees (PGY-5 and 6) and intermediate trainees 0.8 0.31-1.3 .02

Clinical surgical fellow (PGY-7) and senior trainees 0.56 0.38-0.75 <.001

CI ¼ confidence interval; PGY ¼ postgraduate year.

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

107

Page 6: Comprehensive assessment of thyroidectomy skills development: A pilot project

they made the first attempt at creating a tool that eva-luates residents’ surgical skills in thyroid surgery.However, we believe that the larger number of items ontheir specific checklist, referred to as the Hemithyroidec-tomy-Specific Scale (18 items),18 as compared to ourTBC (10 items), makes it less feasible for faculty to com-ply with the evaluation using their tool after eachsurgical case. Williams et al.19 previously reported thatresidency programs should focus on sampling (moreobservations per resident), rather than increasing thenumber of rating items. They concluded that the latterhas actually little effect on reliability and is unlikely toassess core competencies adequately. In addition, Kimet al.20 previously discussed that before an evaluationtool designed to measure surgical competence is adoptedby any program, it must be feasible for regular use byraters. Such feasibility, they say, is shown by the easeand speed at which ratings can be completed. We believethat the more items there are on an assessment tool, theless feasible the evaluation tool becomes. Even though atthe beginning of our study it took between 5 and 7minutes to complete the assessment, by the end it tookaround 1 minute. This was only possible because of con-tinuous faculty development. Moreover, the relativelyshort amount of time needed to complete the evaluationgave the rater the chance to go over each task with thetrainee and point out the necessary skills that need tobe cultivated to become competent. With appropriate fac-ulty development, our tool became feasible, thusensuring timely formative feedback during and immedi-ately after the procedure and minimization of the effectof recall bias.

One of our study goals was to assess whether ourtool was capable of discriminating residents of differentexperience levels, hence demonstrating the assessment’sconstruct validity. Interestingly, the major score differ-ence for both the TBC and GRS seen in our results wasbetween intermediate and senior residents, which corre-sponds to the ‘‘steep slope’’ or rapid improvement phase

of a learning curve (Figs. 3 and 4).21 The weaker differ-ence seen between junior and intermediate residentscould be due to the limited surgical responsibilities atjunior levels, which corresponds to the earlier learningcurve phase where the acquisition of skills is minimal. Itis at the senior level where residents’ confidence hasimproved and their increased experience on tougher sur-gical cases (i.e., parotidectomy and neck dissections) hasimproved their thyroidectomy surgical skills, clearly dif-ferentiating them from junior residents. As residentsadvanced in their level of training, they were also shownto score higher on average TBC and GRS scores (Figs. 3and 4). An interesting observation from our results isthat at the clinical fellowship level, the average scoresfor both the TBC and the GRS are consistently higherthan 4. This indicates that at the highest level of surgi-cal training, trainees surpass competency (3/5 on ourscale) and achieve proficiency (4/5 on our scale). It is atthe residency level that trainees need to demonstratecompetency, and our tool has shown that before graduat-ing they have reached this level, making the tool usefulfor in-training assessment of surgical competency.Another important psychometric property of an evalua-tion tool that needs to be assessed before adopting it inany training program is its reliability. Both the TBC andthe GRS achieved acceptable inter-item reliability, witha Cronbach a of 0.96 for each component of the assess-ment instrument. Therefore, based on our results, webelieve our tool has demonstrated its feasibility, validity,and reliability as an assessment instrument to be imple-mented in OHNS training programs for the assessmentof surgical competency in thyroid surgery.

One of the strengths in our study, as compared tothe previous article by Stack et al.,18 is that we wereable to show progress in the acquisition of surgical skillsthroughout the head and neck surgery rotation. As thiswas a pilot project, we decided to test our tool with onefaculty member of the head and neck surgery clinic asthe rater. This would ensure high-quality training forthe residents, while setting the expert’s technical skillsas the benchmark for each of the items on the tool. Theevaluator had a chance to work with a single trainee onmultiple occasions in a sequential manner, thus ensur-ing the development of each trainee’s surgical skills. Theevaluator also knew when a difficult case was assignedto each trainee, and even though for that case theassessment score may have dropped, we saw an immedi-ate increase in performance on the following cases(Fig. 5). This shows that complexity of the case is an im-portant consideration, and it is the understanding andimprovement of those skills that render the traineescompetent. Our tool has demonstrated its capacity fordiscerning the complexity of the surgical procedure andshowing improvement in the evaluation score after adifficult case. Having an evaluator whose practice pri-marily focuses on thyroid and parathyroid surgeryenabled him to set benchmarks on the specific assessabletasks of the TBC, hence allowing him to understandwhen such benchmarks were tougher to achieve on diffi-cult cases. Therefore, the evaluator could providevaluable feedback to the residents allowing them to

Fig. 5. Mean scores of a single trainee (postgraduate year 5) forboth the task-based checklist (TBC) and the global rating scale(GRS), showing the acquisition of surgical skills throughout a headand neck surgery rotation despite complexity (standard or difficult)of the cases.

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

108

Page 7: Comprehensive assessment of thyroidectomy skills development: A pilot project

improve their skills in subsequent cases as demonstratedin Figure 5.

One limitation of our study is that the participantswere evaluated by a single rater who already knew ourresidents, thus making it vulnerable to faculty bias.However, to minimize any halo or horn effect, this issuewas controlled with constant professional development.Such development began by engaging experts in the fieldof head and neck surgery (faculty members) to giveinput to create the assessment instrument. Once theinstrument was implemented for assessing surgicalskills in the OR, the evaluator attended meetings wherehe or she provided feedback on the way the evaluationtakes place. If needed, changes in the evaluation processwere made to ensure the correct use of the assessmentinstrument, thus producing and developing a capableevaluator. Nevertheless, the use of a video assessment ina blinded fashion would help eliminate this type of fac-ulty bias,22 thus ensuring a more objective evaluation.Moreover, further research that includes more faculty asraters, would assess whether this tool demonstratesinterrater reliability and would further validate ourassessment instrument. Another limitation in our studywas the inability to standardize each surgical case toevery trainee, as each real-life patient depicts differentpathology as opposed to a simulated, standardizedscenario. However, Laeeq et al.23 have previously statedthat such variability makes each case more challengingand provides meaningful learning and evaluation oppor-tunities for the trainees and evaluators.

Even though the aim of graduating trainees is todemonstrate surgical competency in all core proceduresfor OHNS, they must continue to excel and aspire tobecome experts to ensure patient safety. Expertise onlycomes with actual deliberate practice and proper forma-tive feedback. Ericsson et al.24 stated that performersbecome experts over a lifelong deliberate effort toimprove performance in a specific domain. Having astandardized and objective evaluation for assessing sur-gical competency aids the trainee in achieving this typeof practice, thus fostering the need to excel whileimproving their surgical skills. Being able to adopt suchpractice is built around timely formative feedback, whichwe believe is the single most important aspect of assess-ing residents’ surgical performance.

CONCLUSIONWe have developed and successfully implemented a

feasible, valid, and reliable evaluation instrument forthe assessment of technical skills in thyroid surgery.The benefits of assessing trainees’ surgical skills withOSATS-based evaluation tools extend beyond just fulfill-ing the mandate of the ACGME. A timely and consistentassessment of technical skills would enable a carefulinsight to the progression of a resident’s learning curve.Moreover, being able to longitudinally track residents’surgical skills assessments would help ensure their

appropriate development. The faculty evaluator was ableto provide timely formative feedback to the traineesduring and immediately after each procedure. Such feed-back is thought to be the single most important aspect ofobjectively assessing surgical skills. Both faculty buy-inand faculty development are essential components inthe success of any skills assessment program.9 It is im-portant to remember that the acquisition of technicalskills is critical to better patient care, which can only bevalidated with objective methods, thus ensuring thedevelopment of competent physicians.

BIBLIOGRAPHY

1. Brown DJ, Thompson RE, Bhatti NI. Assessment of operative competencyin otolaryngology residency: Survey of US Program Directors. Laryngo-scope 2008;118:1761–1764.

2. Lin SY, Laeeq K, Ishii M, et al. Development and pilot-testing of a feasi-ble, reliable, and valid operative competency assessment tool for endo-scopic sinus surgery. Am J Rhinol Allergy 2009;23:354–359.

3. Laeeq K, Weatherly RA, Masood H, et al. Barriers to the implementationof competency-based education and assessment: a survey of otolaryngo-logy program directors. Laryngoscope 2010;120:1152–1158.

4. Martin JA, Regehr G, Reznick R, et al. Objective structured assessmentof technical skill (OSATS) for surgical residents. Br J Surg 1997;84:273–278.

5. Beard JD, Choksy S, Khan S. Assessment of operative competence duringcarotid endarterectomy. Br J Surg 2007;94:726–730.

6. Ezra DG, Aggarwal R, Michaelides M, et al. Skills acquisition and assess-ment after a microsurgical skills course for ophthalmology residents.Ophthalmology 2009;116:257–262.

7. Goff BA, Lentz GM, Lee D, Houmard B, Mandel LS. Development of anobjective structured assessment of technical skills for obstetric and gyn-ecology residents. Obstet Gynecol 2000;96:146–150.

8. Ishman SL, Brown DJ, Boss EF, et al. Development and pilot testing ofan operative competency assessment tool for pediatric direct laryngo-scopy and rigid bronchoscopy. Laryngoscope 2010;120:2294–2300.

9. Laeeq K, Bhatti NI, Carey JP, et al. Pilot testing of an assessment tool forcompetency in mastoidectomy. Laryngoscope 2009;119:2402–2410.

10. Osborn C, Parangi S. Partial thyroidectomy: illustrated reflections for sur-gical residents. Curr Surg 2006;63:39–43.

11. Winckel CP, Reznick RK, Cohen R, Taylor B. Reliability and constructvalidity of a structured technical skills assessment form. Am J Surg1994;167:423–427.

12. Francis HW, Masood H, Chaudhry KN, et al. Objective assessment of mas-toidectomy skills in the operating room. Otol Neurotol 2010;31:759–765.

13. Bland JM, Altman DG. Calculating correlation coefficients with repeatedobservations: Part 1–Correlation within subjects. BMJ 1995;310:446.

14. Bland JM, Altman DG. Calculating correlation coefficients with repeatedobservations: Part 2–Correlation between subjects. BMJ 1995;310:633.

15. Wanzel KR, Ward M, Reznick RK. Teaching the surgical craft: From selec-tion to certification. Curr Probl Surg 2002;39:573–659.

16. ACGME Outcome Project Enhancing residency education through out-comes assessment. August 1, 2003. Available at: http://www.acgme.org/outcome/. Accessed May 15, 2011.

17. Sanfey H, Dunnington G. Verification of proficiency: a prerequisite forclinical experience. Surg Clin North Am 2010;90:559–567.

18. Stack BC Jr, Siegel E, Bodenner D, Carr MM. A study of resident profi-ciency with thyroid surgery: creation of a thyroid-specific tool. Otolaryn-gol Head Neck Surg 2010;142:856–862.

19. Williams RG, Verhulst S, Colliver JA, Dunnington GL. Assuring the reli-ability of resident performance appraisals: more items or more observa-tions? Surgery 2005;137:141–147.

20. Kim MJ, Williams RG, Boehler ML, Ketchum JK, Dunnington GL. Refin-ing the evaluation of operating room performance. J Surg Educ 2009;66:352–356.

21. Fraser SA, Feldman LS, Stanbridge D, Fried GM. Characterizing thelearning curve for a basic laparoscopic drill. Surg Endosc 2005;19:1572–1578.

22. Laeeq K, Infusino S, Lin SY, et al. Video-based assessment of operativecompetency in endoscopic sinus surgery. Am J Rhinol Allergy 2010;24:234–237.

23. Laeeq K, Waseem R, Weatherly RA, et al. In-training assessment and pre-dictors of competency in endoscopic sinus surgery. Laryngoscope 2010;120:2540–2545.

24. Ericsson KA, Krampe RT, Tesch-Romer C. The role of deliberate practicein the acquisition of expert performance. Psychol Rev 1993;100:363–406.

Laryngoscope 122: January 2012 Diaz Voss Varela et al.: Thyroidectomy Skills Development

109