6
EDUCATION Relationship Between Objective Assessment of Technical Skills and Subjective In-Training Evaluations in Surgical Residents Liane S Feldman, MD, FACS, Sarah E Hagarty, MD, Gabriela Ghitulescu, MD, Donna Stanbridge, RN, Gerald M Fried, MD, FACS BACKGROUND: Technical skills of residents have traditionally been evaluated using subjective In-Training Evaluation Reports (ITERs). We have developed the McGill Inanimate System for Training and Evaluation of Laparoscopic Skills (MISTELS), an objective measure of laparoscopic technical ability. The purpose of the study was to assess the concurrent validity of the MISTELS by exploring the relationship between MISTELS score and ITER assessment. STUDY DESIGN: Fifty surgery residents were assessed on the MISTELS system. Concurrent ITER assessments of technical skill were collected, and the proportion of superior ratings for the year was calculated. Statistical comparisons were performed by ANOVA and chi-square analysis. The Pearson cor- relation coefficient was used to compare the scores in the MISTELS with the ITER ratings. RESULTS: The 50 residents received 277 ITERs for the year, of which 103 (37%) were “superior,” 170 (61%) “satisfactory,” 4 (1%) “borderline,” and 0 “unsatisfactory.” The MISTELS score corre- lated moderately well with the proportion of superior ITER scores (r 0.51, p 0.01). Residents who passed the MISTELS had a higher proportion of superior ITER assessments than those who failed the MISTELS (p 0.02), but residents who performed below their expected level on the MISTELS still received mainly satisfactory ITERs (82 18%). CONCLUSIONS: The ITER assessment is poor at identifying residents with below-average technical skills. Res- idents who perform well in the MISTELS laparoscopic simulator also have better ITER eval- uations, providing evidence for the concurrent validity of the MISTELS. Multiple assessment instruments are recommended for assessment of technical competency. ( J Am Coll Surg 2004; 198:105–110. © 2004 by the American College of Surgeons) Complete care of the surgical patient requires knowl- edge, judgment, and technical skill. Although knowl- edge and judgment are evaluated using standardized written and oral examinations, assessment of technical skill remains a nascent science. This is despite the fact that technical ability is the hallmark of the surgeon, and is, in large part, what separates surgical from nonsurgical specialties. The advent of minimally invasive surgery highlighted deficiencies in teaching and credentialing of technical skills. The increase in common bile duct injury seen with the introduction of laparoscopic cholecystectomy focused public and peer attention on the need for more structured means to learn and assess technical compe- tence. 1 Nevertheless, most residency programs use sub- jective assessments to evaluate technical skill. In general, these consist of In-Training Evaluation Reports (ITERs), a list of global rating scales completed by the faculty at various intervals of residency training. Al- though it is of potential use to the trainee, this method of assessment has been criticized for many reasons includ- ing lack of validity and reliability. 2 To teach and evaluate basic laparoscopic technical skill, we developed the McGill Inanimate System for Training and Evaluation of Laparoscopic Skills (MISTELS). 3 MISTELS consists of five standardized laparoscopic tasks performed on a video trainer. Each task is scored No competing interests declared. This work was supported by an unrestricted educational grant from Tyco Healthcare. Presented in part as a poster at Society of American Gastrointestinal Surgeons, St Louis, Missouri, April 2001. Received March 21, 2003; Revised August 7, 2003; Accepted August 11, 2003. From the Steinberg-Bernstein Centre for Minimally Invasive Surgery, McGill University, Montreal, Quebec, Canada. Correspondence address: Liane S Feldman, MD, Montreal General Hospital, L9-316, 1650 Cedar Ave, Montreal, Quebec, Canada, H3G 1A4. 105 © 2004 by the American College of Surgeons ISSN 1072-7515/04/$30.00 Published by Elsevier Inc. doi:10.1016/j.jamcollsurg.2003.08.020

Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

Embed Size (px)

Citation preview

Page 1: Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

EDUCATION

Relationship Between Objective Assessment ofTechnical Skills and Subjective In-TrainingEvaluations in Surgical ResidentsLiane S Feldman, MD, FACS, Sarah E Hagarty, MD, Gabriela Ghitulescu, MD, Donna Stanbridge, RN,Gerald M Fried, MD, FACS

BACKGROUND: Technical skills of residents have traditionally been evaluated using subjective In-TrainingEvaluation Reports (ITERs). We have developed the McGill Inanimate System for Training andEvaluation of Laparoscopic Skills (MISTELS), an objective measure of laparoscopic technicalability. The purpose of the study was to assess the concurrent validity of the MISTELS byexploring the relationship between MISTELS score and ITER assessment.

STUDY DESIGN: Fifty surgery residents were assessed on the MISTELS system. Concurrent ITER assessments oftechnical skill were collected, and the proportion of superior ratings for the year was calculated.Statistical comparisons were performed by ANOVA and chi-square analysis. The Pearson cor-relation coefficient was used to compare the scores in the MISTELS with the ITER ratings.

RESULTS: The 50 residents received 277 ITERs for the year, of which 103 (37%) were “superior,” 170(61%) “satisfactory,” 4 (1%) “borderline,” and 0 “unsatisfactory.” The MISTELS score corre-lated moderately well with the proportion of superior ITER scores (r � 0.51, p � 0.01).Residents who passed the MISTELS had a higher proportion of superior ITER assessmentsthan those who failed the MISTELS (p � 0.02), but residents who performed below theirexpected level on the MISTELS still received mainly satisfactory ITERs (82 � 18%).

CONCLUSIONS: The ITER assessment is poor at identifying residents with below-average technical skills. Res-idents who perform well in the MISTELS laparoscopic simulator also have better ITER eval-uations, providing evidence for the concurrent validity of the MISTELS. Multiple assessmentinstruments are recommended for assessment of technical competency. ( J Am Coll Surg 2004;198:105–110. © 2004 by the American College of Surgeons)

Complete care of the surgical patient requires knowl-edge, judgment, and technical skill. Although knowl-edge and judgment are evaluated using standardizedwritten and oral examinations, assessment of technicalskill remains a nascent science. This is despite the factthat technical ability is the hallmark of the surgeon, andis, in large part, what separates surgical from nonsurgicalspecialties.

The advent of minimally invasive surgery highlighted

deficiencies in teaching and credentialing of technicalskills. The increase in common bile duct injury seenwith the introduction of laparoscopic cholecystectomyfocused public and peer attention on the need for morestructured means to learn and assess technical compe-tence.1 Nevertheless, most residency programs use sub-jective assessments to evaluate technical skill. In general,these consist of In-Training Evaluation Reports(ITERs), a list of global rating scales completed by thefaculty at various intervals of residency training. Al-though it is of potential use to the trainee, this method ofassessment has been criticized for many reasons includ-ing lack of validity and reliability.2

To teach and evaluate basic laparoscopic technical skill,we developed the McGill Inanimate System for Trainingand Evaluation of Laparoscopic Skills (MISTELS).3

MISTELS consists of five standardized laparoscopictasks performed on a video trainer. Each task is scored

No competing interests declared.This work was supported by an unrestricted educational grant from TycoHealthcare.Presented in part as a poster at Society of American Gastrointestinal Surgeons,St Louis, Missouri, April 2001.

Received March 21, 2003; Revised August 7, 2003; Accepted August 11,2003.From the Steinberg-Bernstein Centre for Minimally Invasive Surgery, McGillUniversity, Montreal, Quebec, Canada.Correspondence address: Liane S Feldman, MD, Montreal General Hospital,L9-316, 1650 Cedar Ave, Montreal, Quebec, Canada, H3G 1A4.

105© 2004 by the American College of Surgeons ISSN 1072-7515/04/$30.00Published by Elsevier Inc. doi:10.1016/j.jamcollsurg.2003.08.020

Page 2: Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

for time and precision, and an overall score is calculated.Mean scores with 95% confidence intervals have beendetermined for various levels of training4 and an overallpass/fail score has been established.5 To be a useful train-ing and evaluation tool the validity and reliability of theMISTELS must be demonstrated over a series of exper-iments. Previous work has provided evidence for theconstruct validity3-4,6-9 and reliability (test-retest 0.89and interrater 0.99, unpublished data) of the system.Studies by our group8,10 and others11–13 suggest that skillslearned in a laparoscopic box or virtual reality trainermay be transferable to the operating room environment.

When other scales of the same attribute exist, com-parison of results from the two instruments yields infor-mation about the concurrent validity of the new instru-ment.14 The overall purpose of this study was to assessthe concurrent validity of the MISTELS, by examiningwhether performance in the MISTELS laparoscopicsimulator, an objective assessment of technical skill, cor-relates with the In-Training Evaluation Report (ITER), awidely used subjective assessment of technical skill.

METHODSBetween 1997 and 2000, 116 MISTELS assessments ofMcGill surgical trainees were identified from a prospec-tively maintained database. Eliminating medical stu-dents, repeated evaluations of the same individual, andentries with missing demographic or scoring data left 60residents. Complete ITERs for the year the resident wasassessed in the MISTELS were identified for 50 residents(35 men, 15 women), who formed the study sample.Residents were grouped into junior (PGY 1–2, n � 20)intermediate (PGY 3–4, n � 21), and senior (PGY 5,n � 9) categories.

MISTELSThe MISTELS program consists of five standardizedexercises, performed in a video-trainer, and has beendescribed in detail elsewhere.3 Briefly, the simulator con-sists of a trainer box (40 � 30 � 19.5 cm USSC Lap-trainer, United States Surgical Corporation) covered byan opaque membrane. Two 12-mm trocars (USSC Sur-giport) are placed according to standard protocolthrough the membrane at convenient working angles oneither side of a 10-mm zero-degree laparoscope (USSCSurgiview) connected to a video monitor. The five tasksrange from basic to more advanced laparoscopic skillsand include peg transfer, pattern cutting, endoloop

placement, intracorporeal knot tying, and extracorpo-real knot tying. A 20-minute introductory video dem-onstrating correct performance of each task is shown tosubjects before testing.

The peg transfer involves lifting up six pegs from onepegboard using the left hand, transferring each to theright hand, placing each peg on a second pegboard, thenreversing the exercise from right to left. The pattern-cutting task involves cutting a 4-cm diameter, pre-marked circle from a 10 � 10-cm gauze suspended be-tween alligator clips. For the endoloop task, the subjectplaces and tightens a commercially available pretied slip-knot (USSC Surgitie) on a tubular foam appendage. Inthe knot-tying tasks, a simple suture is placed throughpremarked positions in a slit Penrose drain, and the su-ture is tied using an intracorporeal or extracorporealknot.

Performance of each task was scored for speed andprecision by an observer blinded to ITER assessment.For each task a time score was calculated by subtractingthe actual time to complete the task from a preset cut-offtime. This gave higher scores to faster performers. If thetime to complete the task exceeded the cut-off, the scorewas recorded as zero. No negative scores were assigned.Precision was factored in with a penalty score unique toeach task. The final score was calculated for each exerciseby subtracting the penalty score from the performancetime score. Normalized scores were calculated by divid-ing each task score by a constant for the task, and mul-tiplying by 100. This resulted in scores falling in a sim-ilar range for each task. Higher scores reflected faster andmore accurate performance in each task. The totalMISTELS score was calculated by summing the normal-ized scores for each of the five tasks.

ITER scoringSurgical residents are assessed after each 2- to 4-monthrotation by attending surgeons with standardized In-Training Evaluation Reports (ITERs). ITERs includeassessment of three broad categories of competency:knowledge, skills and attitudes, and a global evaluationof competence, with space for comments. Within thesecategories, 16 specific attributes, ranging from “knowl-edge of clinical sciences” to “interprofessional relation-ships” are rated as superior, satisfactory, borderline, un-satisfactory, or could not judge. Question 10 asks theevaluator to rate the resident’s “technical skill” in thismanner. A resident receiving a global evaluation of “bor-

106 Feldman et al Assessing Technical Skills J Am Coll Surg

Page 3: Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

derline” or “unsatisfactory” receives no credit for therotation. The ITERs are kept in the resident’s perma-nent file in the program director’s office.

The date of the MISTELS evaluation was noted foreach resident. ITER evaluations of technical skill foreach rotation for the PGY year in which the resident wastested with the MISTELS program were retrieved by anadministrative assistant blinded to the purpose of thestudy. In order to maintain anonymity during data anal-ysis, the residents were coded by number for both ITERgrades and MISTELS scores rather than by name. Theproportion of “superior” ITER technical skills scoreswas calculated for each resident.

Statistical analysisData are expressed as mean � standard deviation, unlessotherwise stated. Statistical comparisons were per-formed by one-way analysis of variance (ANOVA) andchi-square analysis. The correlation between the MIS-TELS total score and the proportion of superior ITERtechnical skills scores was assessed using the Pearson co-efficient. Because of the effect of PGY level on MIS-TELS score, a multivariable analysis was performed toassess the relationship between MISTELS and ITERscores, and p � 0.05 was considered statisticallysignificant.

RESULTSThe 50 residents received a total of 277 ITERs for tech-nical skill, of which 103 (37%) were “superior,” 170(61%) “satisfactory,” and 4 (1%) “borderline.” No resi-dent received an “unsatisfactory” grade. The mean num-ber of ITERs per resident per year was 5.5 � 1.5 (range3 to 8) and did not vary significantly by PGY year. Al-though the trend was for the more senior residents toreceive more superior evaluations, this was not statisti-cally significant (Table 1).

To investigate the relationship between resident per-formances in the MISTELS and ITER, residents weredivided into quartiles based on ITER scores (group 1:0% to 19% superior ITERs; group 2: 20% to 34% su-perior ITERs; group 3: 35% to 74% superior ITERs;and group 4: 75% to 100% superior ITERs). Residentswith poorer ITER grades also performed less well in fourof the five MISTELS tasks, and had significantly lowertotal scores (Table 2).

There was a significant relationship between eachMISTELS task and ITER assessment (Table 3). As thescore on the MISTELS increased, so did the percentageof superior ITERs. Because of the known relationshipbetween PGY and MISTELS scores, and becausehigher-level PGY residents tended to have better ITERgrades, a multivariable analysis was performed, includ-

Table 1. In-Training Evaluation Reports per PGY level

Level nNumber ofITERs/y

SuperiorITER (%)

SatisfactoryITER (%)

BorderlineITER (%)

UnsatisfactoryITER (%)

PGY 1–2 20 6 � 1.5 28 � 28 71 � 26 2 � 5 0PGY 3–4 21 5.5 � 1.4 45 � 29 54 � 28 1 � 4 0PGY 5 9 4.7 � 1 47 � 36 51 � 36 2 � 7 0p 0.09 0.2 0.18 0.85 1

Data expressed as mean � SD.ITERs, In-Training Evaluation Reports.

Table 2. McGill Inanimate System for Training and Evaluation of Laparoscopic Skills Scores Stratified by In-Training EvaluationReports QuartileITERquartile n Peg Cutting Endoloop IC knot EC knot

Totalscore

1 15 32 � 30 45 � 25 38 � 33 39 � 31 33 � 27 187 � 1252 13 33 � 27 55 � 25 63 � 16 47 � 28 44 � 15 242 � 683 13 52 � 20 66 � 15 61 � 30 65 � 28 53 � 20 296 � 694 9 62 � 19 77 � 8 73 � 16 84 � 15 50 � 17 346 � 43

p 0.016 0.003 0.014 0.002 0.07 0.0004

Data expressed as mean � SD.Group 1: 0% to 19% superior; group 2: 20% to 34% superior; group 3: 35% to 74% superior; and group 4: 75% to 100% superior.EC, extracorporeal; IC, intracorporeal; ITER, In-Training Evaluation Report.

107Vol. 198, No. 1, January 2004 Feldman et al Assessing Technical Skills

Page 4: Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

ing PGY and MISTELS total score in a model predictingITER score. In this model, MISTELS score remained asignificant predictor of ITER grade (p � 0.0005); PGYdid not (p � 0.35).

The passing score for the MISTELS has previouslybeen set at 270.9 Residents who passed the MISTELShad a higher proportion of superior ITERs than resi-dents who did not pass (Table 4). Mean and 95% con-fidence intervals have previously been determined forthe MISTELS. In the study sample, residents scoringlower than expected for their level of training on theMISTELS also had lower ITER assessments than resi-dents scoring at or above their MISTEL level (Table 5).

DISCUSSIONWe found a relationship between objective performanceon the MISTELS laparoscopic trainer and subjectiveassessment of technical skill in surgical residents. Scoresin four of five MISTELS tasks (all except extracorporealknot tying) and MISTELS total score were found topositively correlate with ITER assessment. Residentswho passed the MISTELS had higher ITER grades thanresidents who failed MISTELS. Residents performingbelow expected levels on the MISTELS also performedless well on the ITERs. These results provide evidencefor the concurrent validity of the MISTELS in evaluat-ing technical skills.

The MISTELS total score correlated moderately wellwith the percent of superior ITERs (r � 0.51), fallingwithin the range of correlation expected when compar-ing two measures of the same attribute (0.4 to 0.8).14 Sothe MISTELS total score accounts for about 25% of thevariability in ITER assessment. Which other factorsmight affect ITER assessment? The ITER assessment of

technical skill is a global evaluation and likely reflectsboth decision making and technical skill in the operat-ing room; MISTELS reflects psychomotor performancealone. The discrepancy between MISTELS and ITERscoring is in line with the widely quoted estimation that“ . . . about 75% of the important events in an operationare related to making decisions, and about 25% todexterity.”15

Although widely used, the validity of the ITER as ameasurement tool has been repeatedly challenged2,16,17

and problems with the ITER system were apparent inthis study. First, ITER scores were essentially all superioror satisfactory. The mean percentage of superior evalua-tions per year was 38.5 � 30%. It was distinctly unusualfor a resident to receive a “borderline” assessment, prob-ably because this grade leads to automatic review by theprogram director, and may result in having to repeat therotation. There were no unsatisfactory evaluations. Dif-ferences between residents were apparent only when theproportion of “superior” evaluations was calculated forthe year, as would be done for a resident’s year-end sum-mary assessment by the program director. So residentsscoring in the lowest group on the MISTELS still ob-tained mainly “satisfactory” ITERs throughout the year.If the resident and the training program are relying solelyon ITERs, residents struggling with their technical skillsare unlikely to be readily identified and the chance for

Table 3. Correlation Between McGill Inanimate System forTraining and Evaluation of Laparoscopic Skills Scores andIn-Training Evaluation Reports ScoresMISTELS task ITER Scores

Peg transfer 0.45 (0.001)Pattern cutting 0.45 (0.001)Endoloop placement 0.35 (0.01)Intracorporeal knot 0.45 (0.001)Extracorporeal knot 0.29 (0.04)Total score 0.51 (0.0001)

Values are Pearson correlation coefficients; p values are in parentheses; and thepercent of superior ITERs was used in analysis.ITERs, In-training Evaluation Reports; MISTELS, McGill Inanimate Systemfor Training and Evaluation of Laparoscopic Skills.

Table 4. Residents Who Passed the McGill Inanimate Sys-tem for Training and Evaluation of Laparoscopic Skills AlsoHad Higher In-training Evaluation Reports ScoresMISTELSoutcome n

Superior ITER(%, mean � SD) p Value

Fail 26 29 � 25Pass 24 49 � 32 0.02

ITER, In-Training Evaluation Report; MISTELS, McGill Inanimate Systemfor Training and Evaluation of Laparoscopic Skills.

Table 5. Residents Performing below Expected Level on theMcGill Inanimate System for Training and Evaluation of Lapa-roscopic Skills Also Had Poorer In-Training Evaluation Re-ports AssessmentsMISTELS totalscore* n

SuperiorITER (%)

SatisfactoryITER (%)

BorderlineITER (%)

� Expected for level 15 16 � 20 82 � 18 2 � 5� Expected for level 12 38 � 27 60 � 28 2 � 6� Expected for level 23 53 � 30 46 � 29 1 � 4p 0.0006 0.0007 0.75

Data expressed as mean � SD.ITER, In-Training Evaluation Report; MISTELS, McGill Inanimate Systemfor Training and Evaluation of Laparoscopic Skills.

108 Feldman et al Assessing Technical Skills J Am Coll Surg

Page 5: Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

timely remedial training may be missed. This failure ofevaluators to use the entire rating scale has been termedthe “central tendency error.”16,17 In our program, thesignificant penalties resulting from rating a resident be-low average negatively impact on the ITER’s utility as afeedback mechanism.

Other rater errors decrease the validity of ITERs asmeasurements of technical skill. The “halo effect” refersto the fact that performance in one area can cloud judg-ment of other aspects of performance. For example,Warf and colleagues18 found that senior residents weremore likely than interns to be judged “competent” in anObjective Structured Clinical Examination, eventhough there was no difference in the specific skill beingtested. In order to address the possible impact of PGYlevel on ITER score, we stratified the MISTELS scoresaccording to previously defined “expected” levels forPGY level, and included PGY as a variable in a modelpredicting ITER. We did not find that PGY independ-ently predicted ITER when the MISTELS score wastaken into account.

Despite the drawbacks of the ITER, it remains inwidespread use, so was chosen to represent a measure oftechnical skill against which to judge the MISTELS. Wehave previously assessed concurrent validity of theMISTELS by comparing performance in the MISTELSwith performance of analogous tasks in a pig model,demonstrating good correlation in five of seven tasks.8

Ideally, the concurrent validity of the MISTELS wouldbe judged using intraoperative objective assessmentscales, such as those described by Wanzel and associates.2

These studies are being planned.Several weaknesses are identifiable in this study. First

of all, residents were judged on their skill after only oneMISTELS attempt. This would particularly penalize thejunior residents, who improve quickly after their initialunfamiliarity with the system is overcome. In addition,the study sample may have had practice opportunitieson the simulators that were not recorded in the database.For example, we found a much higher than anticipatednumber of residents performing above their expectedlevel, suggesting the group had exposure to MISTELS inthe past. Finally, MISTELS was designed to test laparo-scopic skills; the ITER is a global assessment of all tech-nical skill. We have not assessed how MISTELS corre-lates with other objective measures of nonlaparoscopicsurgical skill.

Although the ITER is far from an ideal instrument for

measuring technical skill, it is widely used, and the rela-tionship between MISTELS and ITER scores supportsthe notion that some component of technical ability ismeasured by the ITER. A reconfigured ITER, includinga more detailed rating scale that does not harshly penal-ize the weakest residents may be more informative fortrainees. But the era of the ITER as the sole arbiter oftechnical competence is probably ending. Objectivemeasures, like MISTELS for laparoscopic skills, provideimmediate feedback, an assessment of where the traineestands in relation to his or her peers, and a definition ofcompetence derived in a defensible manner. The use ofmultiple assessment tools to define competence and aidin training is suggested.

Author ContributionsStudy conception and design: Feldman, FriedAcquisition of data: Stanbridge, Hagarty, GhitulescuAnalysis and interpretation of data: Feldman, HagartyDrafting of manuscript: Feldman, HagartyCritical revision: FriedStatistical expertise: FeldmanObtaining funding: FriedSupervision: Fried, Feldman

REFERENCES

1. Rogers DA, Elstein AS, Bordage G. Improving continuing med-ical education for surgical techniques: applying the lessonslearned in the first decade of minimal access surgery. Ann Surg2001;233:159–166.

2. Wanzel KR, Ward M, Reznick RK. Teaching the surgical craft:from selection to certification. Curr Prob Surg 2002;39:573–660.

3. Derossis AM, Fried GM, Abrahamowicz M, et al. Developmentof a model for training and evaluation of laparoscopic skills.Am J Surg 1998;175:482–487.

4. Ghitulescu GA, Derossis AM, Feldman LS, et al. A model forevaluation of laparoscopic skills: is there correlation to level oftraining? Surg Endosc 2001;15[Supp 1]:S127.

5. Fraser SA, Klassen DR, Feldman LS, et al. Setting the pass-failscore for the MISTELS system. Surg Endosc (in press).

6. Derossis AM, Antoniuk M, Fried GM. Evaluation of laparo-scopic skills: a 2-year follow-up during residency training. CanJ Surg 1999;42:293–296.

7. Derossis AM, Bothwell J, Sigman HH, Fried GM. The effect ofpractice on performance in a laparoscopic simulator. Surg En-dosc 1998;12:1117–1120.

8. Fried GM, Derossis AM, Bothwell J, Sigman HH. Comparisonof laparoscopic performance in vivo with performance measuredin a laparoscopic simulator. Surg Endosc 1999;13:1077–1081.

9. Ghitulescu GA, Derossis AM, Stanbridge D, et al. A model forevaluation of laparoscopic skills: is there external validity? SurgEndosc 2001;15[Supp 1]:S128.

109Vol. 198, No. 1, January 2004 Feldman et al Assessing Technical Skills

Page 6: Relationship between objective assessment of technical skills and subjective in-training evaluations in surgical residents

10. Medeiros LE, Feldman LS, Antoniuk M, Fried GM. Does prac-tice in vitro result in improved suturing skills in vivo? SurgEndosc 2000;14[Supp 1]:S205.

11. Seymour NE, Gallagher AG, Roman SA, et al. Virtual realitytraining improves operating room performance: results of a ran-domized double-blinded study. Ann Surg 2002;236:458–464.

12. Scott DJ, Bergen PC, Rege RV, et al. Laparoscopic training onbench models: better and more cost effective than operatingroom experience? J Am Coll Surg 2000;191:272–283.

13. Hyltander A, Lilegren E, Rhodin PH, Lonroth H. The transferof basic skills learned in a laparoscopic simulator to the operat-ing room. Surg Endosc 2002;16:1324–1328.

14. Streiner DL, Norman GR. Health Measurement scales: A prac-

tical guide to their development and use. 2nd ed. New York:Oxford University Press; 1996:4–14.

15. Spencer FC. Observations on the teaching of operative tech-nique. Bull Am Coll Surg 1983;3:3–6.

16. Turnbull J, Gray J, MacFayden J. Improving in-training evalu-ation programs. J Gen Int Med 1998;13:317–323.

17. Gray JD. Global rating scales in residency education. Acad Med1996;71[Supp 1]:S56–63.

18. Warf BC, Donnelly MB, Schwartz RW, Sloan DA. The relativecontributions of interpersonal and specific clinical skills to theperception of global clinical competence. J Surg Res 1999;86:17–23.

110 Feldman et al Assessing Technical Skills J Am Coll Surg